Skip to content

Latest commit

 

History

History
131 lines (84 loc) · 6.93 KB

Darmstadt_v1.md

File metadata and controls

131 lines (84 loc) · 6.93 KB

Red Hat Developers Hands-On Day 2023, Darmstadt

Event

Room: Helium, Darmstadtium Time: November 29th, 2023 15:30 - 18:00 CET

Instructors:

Introduction

Welcome to our hands-on LLM application development workshop! Today you'll learn how to develop a simple chatbot (or "GPT") that can answer based on your own documents and how to deploy it with Podman and on OpenShift.

Software Stack

We will use a bunch of cool open source software to build our bot:

For local experimentation you should have a Python 3.11 virtual env set up already. We will also hand out access to an OpenShift cluster with GPUs and Red Hat OpenShift Data Science (RHODS) where you can use a Jupyter notebook and deploy your streamlit app or connect to a hosted Ollama service.

Basic Concepts

By now you should be familiar with commercial offerings like ChatGPT, Bing, Bard or Claude. If not, don't worry ;)

We're not going to cover GPT architecture today, remember it's going to be hands-on. If you're interested in theory we've got you covered in the Further Readings section below.

Instead let's have a look at our Linuxbot example streamlit app and go through its functionality step by step:

  • Streamlit and how to easily build interactive apps
  • Connecting to Ollama
  • What is Ollama and llama.cpp?
  • Models - Strange creatures and where to find them
  • Quantization? How to be GPU poor and local Llama rich
  • Prompting: System prompt vs user prompt
  • Tokenization, see OpenAI tokenizer

The basic architecture of our RAG bot will look like this: Basic architecture of a RAG bot

See LlamaIndex High-level Concepts for what's needed to query our documents:

  • Embeddings & Vector Store: see this nice interactive Solara demo of embeddings with retrieval
  • Context
  • Retrieval augmented generation (RAG)

Coincidentally few days ago LlamaIndex released RAGs, exactly what we are going to build today: Introducing RAGs: Your Personalized ChatGPT Experience Over Your Data

Since it was released shortly after this example app you should have a look what can improved here. One thing not implemented yet in RAGs are local models. 🦙

A very good introduction to RAG can be found in RAG 101 for Enterprise.

RAG 101

It's time to build

Now that we covered the very basics it's time for learning by doing! First we should modify our example bot and give it some custom data, a different prompt or try out different models. Some of them can have quite the personality.

Have a look at the included notebooks to see examples for text summary and natural SQL query.

Some ideas for experimentation & improvement

We've collected a few ideas and tried to cluster them according to their required skill level:

Beginners:

  • Create your own bot, some inspiration for GPTs:
  • Build a simple prompt injection game where the user must guess a secret the GPT tries to hide
  • Don't generate embeddings in the streamlit app (bad practice) and utilize a real vector database like Chroma, Weaviate, Qdrant, Milvus, Pinecone. Even SQLite or Postgres can be used as vector DB.

Intermediate:

  • Make complex texts and concepts, e.g legislature, accessible for everyone. See this example.
  • Replace Ollama with a LiteLLM proxy
  • Port the streamlit app to Solara

Expert:

It's best if you find some other people interested in the same idea and change the table setup accordingly. The instructors will go from team to team. First to setup the infrastructure and then to help you implement your bot. Don't forget to ask your favorite (local) LlaMA.

Further Readings

General Introductions

Tokenization

Embeddings & Vector Databases

RAG

Treasure troves: