Room: Helium, Darmstadtium Time: November 29th, 2023 15:30 - 18:00 CET
Instructors:
- Rodeina Mohamed [email protected]
- Steffen Röcker [email protected]
Welcome to our hands-on LLM application development workshop! Today you'll learn how to develop a simple chatbot (or "GPT") that can answer based on your own documents and how to deploy it with Podman and on OpenShift.
We will use a bunch of cool open source software to build our bot:
For local experimentation you should have a Python 3.11 virtual env set up already. We will also hand out access to an OpenShift cluster with GPUs and Red Hat OpenShift Data Science (RHODS) where you can use a Jupyter notebook and deploy your streamlit app or connect to a hosted Ollama service.
By now you should be familiar with commercial offerings like ChatGPT, Bing, Bard or Claude. If not, don't worry ;)
We're not going to cover GPT architecture today, remember it's going to be hands-on. If you're interested in theory we've got you covered in the Further Readings section below.
Instead let's have a look at our Linuxbot example streamlit app and go through its functionality step by step:
- Streamlit and how to easily build interactive apps
- Connecting to Ollama
- What is Ollama and llama.cpp?
- Models - Strange creatures and where to find them
- Quantization? How to be GPU poor and local Llama rich
- Prompting: System prompt vs user prompt
- Tokenization, see OpenAI tokenizer
The basic architecture of our RAG bot will look like this:
See LlamaIndex High-level Concepts for what's needed to query our documents:
- Embeddings & Vector Store: see this nice interactive Solara demo of embeddings with retrieval
- Context
- Retrieval augmented generation (RAG)
Coincidentally few days ago LlamaIndex released RAGs, exactly what we are going to build today: Introducing RAGs: Your Personalized ChatGPT Experience Over Your Data
Since it was released shortly after this example app you should have a look what can improved here. One thing not implemented yet in RAGs are local models. 🦙
A very good introduction to RAG can be found in RAG 101 for Enterprise.
Now that we covered the very basics it's time for learning by doing! First we should modify our example bot and give it some custom data, a different prompt or try out different models. Some of them can have quite the personality.
Have a look at the included notebooks to see examples for text summary and natural SQL query.
We've collected a few ideas and tried to cluster them according to their required skill level:
Beginners:
- Create your own bot, some inspiration for GPTs:
- Build a simple prompt injection game where the user must guess a secret the GPT tries to hide
- Don't generate embeddings in the streamlit app (bad practice) and utilize a real vector database like Chroma, Weaviate, Qdrant, Milvus, Pinecone. Even SQLite or Postgres can be used as vector DB.
Intermediate:
- Make complex texts and concepts, e.g legislature, accessible for everyone. See this example.
- Replace Ollama with a LiteLLM proxy
- Port the streamlit app to Solara
Expert:
- Create a multimodal bot that can understand images (LLaVA or BakLLaVA) and speech (Whisper)
- Create a loader for Kiwix ZIM files (e.g Wikipedia)
- Add Ollama embedding REST API support to LlamaIndex
It's best if you find some other people interested in the same idea and change the table setup accordingly. The instructors will go from team to team. First to setup the infrastructure and then to help you implement your bot. Don't forget to ask your favorite (local) LlaMA.
- Andrej Karpathy - The busy person's intro to LLMs (slides)
- Matthias Plappert - Understanding LLMs An Introduction to Modern Language Modeling
- The Illustrated Transfomer
- Jeremy Howard - A Hacker's Guide to Language Models
- Cohere - Word and Sentence Embeddings
- Weaviate - A Gentle Introduction to Vector Databases
- Weaviate - Vector Embeddings Explained
- RAG 101 for Enterprise
- Prompt Engineering Techniques - RAG (with history)
- WandB - A Gentle Introduction to Retrieval Augmented Generation (RAG)
- Arize - Introduction to retrieval augmented generation
Treasure troves: