Skip to content

Complete GenAI use case for creating a chat bot for an employee handbook

Notifications You must be signed in to change notification settings

BlueprintTechnologies/blueprint-Databricks-genAI-chatbot-accelerator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

3502da7 · Feb 13, 2024

History

10 Commits
Feb 12, 2024
Feb 13, 2024
Feb 12, 2024
Feb 13, 2024

Repository files navigation

The Use Case

Generative AI unlocks incredible value from data. The key is starting with a solid business use case to truly realize that value.

The use case we will apply this accelerator to is from our HR department. In fact, the exact ask was "Imagine if we flipped our HR inbox/etc. into something similar like a bot…trained it with our handbook, IT policies, benefits, engagement protocol, etc…it would cut down on so many of the asks!"

To demonstrate this use case, this accelerator starts with raw data data in the form of a publicly available employee handbook. We use Valve's New Employee Handbook, which aside from being a fun read, is publicly available with a full path to the PDF.

Some of the technical benefits of this accelerator:

  • Create a new catalog in Unity Catalog with bronze and silver tables and a managed volume
  • Extract and load PDF content into Unity Catalog using AutoLoader
  • Leverage Databricks Vector Search to create and store document embeddings
  • Leverage Databricks Vector Search and Delta Sync to create and sync a vector index with a Delta table
  • Leverage the llama2-70B-Chat model through with Databricks Foundation Model endpoint
  • Create a chat bot within a Databricks notebook

A look at the chatbot from this accelerator

About This Accelerator

This accelerator is self-contained across five notebooks.

  1. No third party services are used. Everything, including your data, stays within your Databricks workspace.
  2. Although your workspace must attached to a Unity Catalog metastore, this accelerator generates the catalog, schema, tables and volumes for you.
  3. This accelerator leverages a managed volume in Unity Catalog. External storage does not have to be defined.
  4. No cluster init scripts are used.
  5. All code is displayed in the notebooks.

Databricks Prerequisites

Before You Get Started

Medallion Architecture

Although not required, it would be helpful to be familiar with the medallion architecture used in a lakehouse. The layers in the architecture are referred to as bronze, silver, and gold. However, your organization may utilize different terminology. You can learn more about the medallion architecture here. Image of medallion architecture

Recommended Compute

The Databricks workspace used to test this accelerator is in the West US 2 region.

  • Compute policy: unrestricted
  • Databricks Runtime: 14.3 LTS ML
  • Worker type: Standard_DS3_v2
  • Min workers: 1
  • Max workers: 2
  • Autoscaling: yes
  • Photon acceleration: no
  • Termination period: 10 minutes

Optimization

During the course of building out this accelerator, we made heavy use of the Lakehouse Optimizer (LHO) to analyze our compute performance and orchestration.

Early on, based on back-of-napkin estimates, we had configured our compute worker for a minimum of 4 and maximum of 8. However, after evaluating the CPU Process Load and Process Memory Load KPIs in LHO, we adjusted our compute without a noticeable impact to performance.

About

Complete GenAI use case for creating a chat bot for an employee handbook

Topics

Resources

Stars

Watchers

Forks

Languages