Rag with Memory is a project that leverages Llama 2 7b chat assistant to perform RAG (Retrieval-Augmented Generation) on uploaded documents. Additionally, it operates in a chat-based setting with short-term memory by summarizing all previous K conversations into a standalone conversation to build upon the memory. Inspired from langchain - https://python.langchain.com/docs/use_cases/question_answering/#adding-memory
The database used is from the amazing Vlite repository which builds on numpy - https://github.com/sdan/vlite
The user interface (UI) provides the ability to change the current prompt and experiment with the most important text generation parameters. The little left corner of the UI keeps track of the tokens generated over the session.
To get started with Rag with Memory, follow these steps:
git clone https://github.com/JINO-ROHIT/rag-with-memory.git
cd rag-with-memory
streamlit run app.py
Make sure to add the hf token in the .env file.
- Open the application in your web browser.
- Upload a document for RAG processing.
- Explore the chat-based setting with short-term memory.
- Use the UI to experiment with different prompts and text generation parameters.
- Regular old RAG on the given document using a numpy database(Vlite)
- Chat-based setting with short-term memory
- UI for changing prompts and experimenting with text generation parameters
- Token count tracking in the UI
If you'd like to contribute to Rag with Memory, follow these steps:
Fork the repository.
Create a new branch (git checkout -b feature/new-feature).
Make your changes and commit them (git commit -am 'Add new feature').
Push to the branch (git push origin feature/new-feature).
Create a new pull request.
This project is licensed under the MIT License - see the LICENSE file for details.
For any questions or feedback, feel free to reach out:
Email: [email protected]