- Chatbot - Add LLM connection closures for non-streaming ad-hoc calls (e.g. CoT calls). This has removed the resource warning as identified in Issue #12. Improved debug messages.
- Chatbot - Update Chain of Thought (CoT) to check request before routing all prompts through the CoT process. Using
/think always
will force CoT for all requests. Additionally, CoT prompts updated for better responses.
- Chatbot - Add Chain of Thought (CoT) thinking option using the
/think on
or/think off
toggles to the UI. When activated, queries will be passed through an out-of-band CoT loop to allow the LLM to thoughtfully explore answer and then provide a conclusion summary to the user. Set environmental variable "THINKING" to "true" to default all conversations to CoT mode.
- Chatbot - Fix error handling bug used to auto-detect max content length of LLM. Updated user input UI rendering to better handle indention.
- News Bot Script - Added logic to verify news summary from LLM to help prevent hallucinations.
- DocMan - Add basic authentication and secure connection options to Weaviate.
- Chatbot - Add support for HEIC file type and resize all images to max dimensions of 1024. Handle image pasting into input field. Remove previous images from context thread.
- Chatbot - Clean up logging: non-critical logs are moved to DEBUG level.
- Chatbot - Allows user to drag and drop images into the context window for multi-modal vision LLMs.
- DocMan - Updated to use progressive loading to help with larger document and chunk lists. Performance and bug fixes.
- Chatbot - Updated /rag commands to allow turning auto-RAG on and off, setting the collection and result number.
- DocMan - Switch to async and socket communication to more responsive UI. Bug fixes.
- Chatbot and DocMan: Provide control for WEAVIATE_HOST and WEAVIATE_GRPC_HOST (and PORTs) settings separately via environmental variables.
- DocMan: Bug fixes
- DocMan: Fix some bugs and add features to process more document types (file or URL).
- Chatbot: Using Document class for RAG functions.
- DocMan: New web based UI for managing documents in the Weaviate vector database. Allows user to upload and embed content from URLs and uploaded files. Provides optional chunking and management of embedded documents.
- Chatbot: Fix a bug that was counting null tokens.
- Chatbot: Add toxic filter option (uses environmental variable
TOXIC_THRESHOLD
) to analyze and filter out bad prompts. Uses LLM to evaluate and score prompt. Set variable between 0 and 1 or 99 to disable (default). - Chatbot: Add
EXTRA_BODY
variable (JSON string) to customize chat completion calls.
- Chatbot: Add logic to detect OpenAI URL and disable non-OpenAI stop_token_ids.
- Chatbot: Fix issue where DOM was being corrupted by popup. New logic creates separate div for conversation debug.
- Chatbot: Add
Debug Session
link to footer to display conversation thread.
- Chatbot: Update some RAG to remove duplicate documents.
- Update TemplateResponse arguments to current format as reported in #7.
Chatbot
- Expand
/news/
RAG command to include reference URL links in news article headlines. - Add response statistics (number of tokens and tokens per second) to footer.
- Serve up local copy of socket.io.js library to help with air-gap installations.
- Add logic to chatbot to support OpenAI API servers that do not support the
/v1/models
API. This allows the Chatbot to work with Ollama provided the user specifies the LLM_MODEL, example docker run script:
docker run \
-d \
-p 5000:5000 \
-e PORT=5000 \
-e OPENAI_API_KEY="Asimov-3-Laws" \
-e OPENAI_API_BASE="http://localhost:11434/v1" \
-e LLM_MODEL="llama3" \
-e USE_SYSTEM="false" \
-e MAXTOKENS=4096 \
-e TZ="America/Los_Angeles" \
-v $PWD/.tinyllm:/app/.tinyllm \
--name chatbot \
--restart unless-stopped \
jasonacox/chatbot
- Add chatbot workaround for Meta Llama-3 support via stop token addition.
- Add logic to better handle model maximum context length errors with automated downsizing.
- Error handling and auto-retry for model changes on LLM.
- Add intuitive UI control at top of user input area to allow user to resize text input box.
- Add error checking and help for
/stock {company}
command. - Allow user input textarea to be resized vertically.
- Fixed bug with baseprompt updates to respond to saved Settings or new sessions.
- Updated baseprompt to include date and guidance for complex and open-ended questions.
- Add
TZ
local timezone environmental variable to ensure correct date in baseprompt.
- Added ability to change LLM Temperature and MaxTokens in settings.
- Added optional prompt settings read-only options to allow viewing but prevent changes (
PROMPT_RO=true
).
- Moved from Qdrant to Weaviate - This externalizes the sentence transformation work and lets the chatbot run as a smaller service. Activate by setting
WEAVIATE_HOST
to the address of the DB. - Added "References" text to output from
/rag
queries. - Added
ONESHOT
environmental variable that ifTrue
will remove conversation threading allowing each query to be answered as a standalone sessions. - Added
RAG_ONLY
environmental variable that ifTrue
will assume all queries should be directed to the default RAG database as set byWEAVIATE_LIBRARY
. - See #5
docker run \
-d \
-p 5000:5000 \
-e PORT=5000 \
-e OPENAI_API_BASE="http://localhost:8000/v1" \
-e ONESHOT="true" \
-e RAG_ONLY="false" \
-e WEAVIATE_HOST="localhost" \
-e WEAVIATE_LIBRARY="tinyllm" \
-v $PWD/.tinyllm:/app/.tinyllm \
--name chatbot \
--restart unless-stopped \
jasonacox/chatbot
- Add CUDA support for sentence transformers.
- Improve web page import function
extract_text_from_html()
for better RAG formatting. - Add RAG instructions for Weaviate Vector DB
-
Added logic to poll LLM for model list. If only one model is available, use that. Otherwise verify the user requested model is available.
-
Chatbot UI now shows model name and adds responsive elements to better display on mobile devices.
-
Add encoding user prompts to correctly display html code in Chatbot.
-
Fix
chat.py
CLI chatbot to handle user/assistant prompts for vLLM.
- Bug fix for
handle_url_prompt()
to extract text from URL.
- Speed up command functions using async, using
aiohttp
. - Fix prompt_expand for rag command.
- Added topic option to
/news
command.
- Speed up user prompt echo. Immediately send to chat windows instead of waiting for LLM stream to start.
- Optimize message handling dispatching using async.
- Use AsyncOpenAI for non-streamed queries.
- Ported Chatbot to the async FastAPI and Uvicorn ASGI high speed web server implementation (#3).
- Added /stats page to display configuration settings and current stats (optional
?format=json
) - UI updated to help enforce focus on text entry box.
- Moved
prompts.json
and Sentence Transformer model location to a./.tinyllm
for Docker support.
- Add
/stats
URL to Chatbot for settings and current status information. - Update Chatbot HTML to set focus on user textbox.
- Move
prompts.json
and Sentence Transformer models into.tinyllm
directory.
- Improve Chatbot for Docker
- Added admin alert broadcast feature (
POST /alert
)
- Add multi-line entry to prompt input using Shift-Enter.
- Fix HTML and CSS to support windows resize for settings dialogue box.
- Bug fix and Simplify RAG commands using slash prompts.
Commands: /reset /version /sessions /rag /news /weather /stock
- vLLM provides a faster inference engine capable of handling multiple session simultaneously. It also runs well in Nvidia Docker containers. The llama-cpp-python implementation suffers from being single threaded and being fragile in containers (segment faults and core dumps). TODO: vLLM does not support older Nvidia cards by default. TODO: Provide instructions on modifying vLLM to run on Pascal based GPUs (e.g. Nvidia GTX 1060, Quadro P6000 or Tesla P100).
- Chatbot: System prompts are not needed by vLLM as it does the translation based on the model being used. Using system prompts is now a configuration toggle in chatbot.
- Updated default prompts.
- Minor formatting updates
- Settings button allows user to update base and query prompts for the chatbot.
- LLMserver: Added chat format parameters to llama-cpp-python startup to ensure correct chat prompts are given to LLM based on model. See https://github.com/abetlen/llama-cpp-python/blob/main/llama_cpp/llama_chat_format.py and consolidated list: https://github.com/jasonacox/TinyLLM/blob/main/llmserver/models/services/chatformats
- LLMserver: Updated tinyllm startup script to include
restart
command. - Chatbot: Added
/news
RAG command to chatbot which will cause it to attempt to fetch the latest news and have the LLM summarize it for you.
- Chatbot: Added
:
commands that will run a classifier on the prompt to determine RAG method to inform the LLM with current data to provide the response.
- Chatbot: Added "Copy code" button to code excerpts in LLM response.
- Chatbot: Added
@
and!
commands to pull prompt data documents from vector databse for RAG responses.