This application is designed to provide accurate and instant answers to farmer queries, trained on valuable Kisan Call Center (KCC) data. This project implements a specialized Farmer Query Assistant using a Retrieval-Augmented Generation (RAG) architecture. It is built to answer questions related to agriculture and horticulture, leveraging a fine-tuned LLM and a vector database for context-aware responses.
The entire application is packaged for serverless GPU deployment using Modal.
- Overview
- How it Works (Architecture)
- Technology Stack
- Project Structure
- Deployment with Modal
- How to Use
The core of this project is a modal.App that serves a Gradio web interface. This UI allows users to select a category and ask a farming-related question.
- RAG Pipeline: Uses a ChromaDB vector store to retrieve relevant Q&A pairs from a knowledge base (
train.pkl) before generating an answer. - High-Performance LLM: Utilizes the
meta-llama/Meta-Llama-3.1-8Bmodel, fine-tuned with PEFT adapters (naveenng10/farmer-assistant). - Optimized Inference: The model is loaded with 4-bit quantization (
BitsAndBytesConfig) and runs on a T4 GPU for efficient inference. - Strong Guardrails:
- Uses a keyword-based filter (
is_agriculture_related) to immediately reject off-topic questions (sports, politics, etc.). - Provides a detailed system prompt to the LLM, instructing it to only answer farming-related queries.
- Uses a keyword-based filter (
- Vector Search: Employs
sentence-transformers/all-MiniLM-L6-v2for embedding queries and documents. - Multilingual Support: Includes a
Helsinki-NLP/opus-mt-mul-enmodel to translate model outputs (and potentially inputs) to English, ensuring consistent response language. - Serverless Deployment: Fully configured for deployment on Modal, handling environment setup, model downloads, and scaling automatically.
The application follows a RAG pipeline with strict guardrails:
- User Input: A user provides a
categoryand aqueryvia the Gradio UI. - Guardrail 1 (Keyword Filter): The query is first checked by the
is_agriculture_relatedfunction. If it contains blocked keywords (e.g., "cricket", "politics"), the process stops and returns a default rejection message. - RAG - Retrieve:
- The user's
queryis encoded into a vector embedding. - ChromaDB is queried to find the top 3 most similar questions from its database (which was built from
train.pkl). - A relevance check (
is_query_relevant) is performed based on the distance of the results.
- The user's
- RAG - Augment:
- The similar questions and their corresponding answers are retrieved from the vector store's metadata.
- This information is formatted into a
contextstring.
- Prompt Engineering:
- A final prompt is constructed containing:
- System Message: A strict set of rules defining the assistant's role and limitations.
- RAG Context: The similar Q&A pairs found in step 4.
- User Query: The original question from the user.
- A final prompt is constructed containing:
- Generation: The complete prompt is sent to the fine-tuned Llama 3.1 model (
FarmerAssistantModel) to generate an answer. - Post-processing: The generated text is passed through the
translate_to_englishfunction to ensure the final output is in English. - Response: The final answer is displayed to the user in the Gradio interface.
- Deployment & Infrastructure: Modal
- Web UI: Gradio
- LLM:
meta-llama/Meta-Llama-3.1-8B - Fine-tuning:
peft(adapters loaded fromnaveenng10/farmer-assistant) - Quantization:
bitsandbytes - Core AI/ML:
torch,transformers,accelerate - Vector Database (RAG):
chromadb - Embedding Model:
sentence-transformers/all-MiniLM-L6-v2 - Translation Model:
Helsinki-NLP/opus-mt-mul-en
For the Modal app to build correctly, your local directory should be structured as follows. The script assumes it is located in src/inference/.
your-project-root/
├── src/
│ ├── data/
│ │ ├── train.pkl # Required: Pickle file with Q&A data
│ │ └── chroma_database/ # Will be created by the script if it doesn't exist
│ │
│ └── inference/
│ ├── farmer-assistant.py # The main Modal script
│ └── constants.py # Required: Must contain QUERY_TYPE list
│
└── ... (other files)
Note: The script uses relative paths (os.path.dirname(__file__)) to find the src/data directory and constants.py.
Follow these steps to deploy the Farmer Assistant.
- A Modal account.
- Python 3.8+ installed.
- A Hugging Face account with an access token. You must have accepted the license terms for
meta-llama/Meta-Llama-3.1-8B.
-
Install the Modal client:
pip install modal
-
Set up Modal authentication:
modal setup
This will open a browser window to link your local machine to your Modal account.
-
Create a Hugging Face Secret: The application requires your Hugging Face token to download the Llama model. Create a secret in Modal named
huggingface-secretwith your token.modal secret create huggingface-secret HF_TOKEN="hf_YOUR_HUGGING_FACE_TOKEN"
You can either run the app in "serve" mode for development or "deploy" it for a persistent endpoint.
-
To Serve (Development Mode): This command runs the app from your local directory and streams logs to your terminal. The app will hot-reload if you save changes to the file. It will stop when you press
Ctrl+C.modal serve src/inference/farmer-assistant.py
-
To Deploy (Production Mode): This command builds the container image, uploads your files, and creates a permanent, shareable URL for your Gradio application.
modal deploy src/inference/farmer-assistant.py
After running either command, Modal will output a URL (e.g., https://your-username--farmer-assistant-gradio-app.modal.run) where you can access the Gradio UI.
- Open the Modal URL provided after deployment.
- You will see the 🌾 Farmer Query Assistant interface.
- Select a category from the dropdown menu (e.g., "Plant Protection").
- Type your farming-related question in the "Your Question" text box.
- Click the "Submit" button.
- The assistant will process your query through the RAG pipeline and display the answer in the output box.