A flexible and customizable template for building chat applications with multiple LLM backends. This template provides a clean, responsive user interface and server-side implementation that supports streaming responses from different language models.
- Multi-model support: Use local LLMs through LM Studio or Google's Gemini model
- Real-time streaming: See responses as they're generated
- Session management: Maintain conversation context across messages
- Responsive design: Works on desktop and mobile devices
- Simple and clean UI: Bootstrap-based interface with minimal dependencies
- Python 3.8+
- LM Studio (for local LLM support)
- Google API key (for Gemini model)
-
Clone this repository:
git clone https://github.com/jero98772/LLM-Chat-template.git cd LLm-template
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Set up your environment:
- Start LM Studio with the server running on
http://localhost:1234/v1
- Set your Google API key in the
app.py
file
- Start LM Studio with the server running on
-
Start the server:
python app.py
-
Open your browser and navigate to:
http://localhost:8000
llm-chat-template/
├── app.py # FastAPI application
├── requirements.txt # Python dependencies
├── static/ # Static files
│ ├── css/
│ │ └── style.css # Custom CSS
│ └── js/
│ └── script.js # Frontend JavaScript
└── templates/ # HTML templates
└── index.html # Main chat interface
The application supports two LLM backends:
-
LM Studio / Local LLM
- Uses the OpenAI-compatible API
- Default model: "TheBloke/dolphin-2.2.1-mistral-7B-GGUF"
- Configure in
app.py
by changing thebase_url
andmodel
parameters
-
Google Gemini
- Uses the Google GenerativeAI API
- Default model: "gemini-2.0-flash"
- Configure in
app.py
by changing the API key and model name
You can customize various aspects of the application:
- UI: Modify the HTML and CSS files in the
templates
andstatic
directories - Model Parameters: Update temperature, max_tokens, etc. in the
chat_answer_*
functions - Streaming Behavior: Adjust the streaming implementation in both backend and frontend code
To add a new LLM provider:
- Create a new
chat_answer_*
function inapp.py
- Add the model to the dropdown in
index.html
- Update the
modelDetails
object inscript.js
- Modify the
generate_stream
function to handle the new model
The current implementation uses in-memory storage. To add persistent storage:
- Add a database connection (SQLite, PostgreSQL, etc.)
- Modify the chat history functions to use the database
- Add user authentication if needed
- FastAPI: Web framework
- Uvicorn: ASGI server
- Jinja2: Template engine
- OpenAI: For LM Studio API compatibility
- Google GenerativeAI: For Gemini API
This project is licensed under the GPLV3 License - see the LICENSE file for details.