Emma is an AI-powered interactive handbook designed specifically for Ignatian Marians at the University of the Immaculate Conception (UIC). She provides instant answers about academic policies, campus life, and student services - no handbook skimming required.
- Academic Policy Guidance - Get clear explanations about attendance, grading, and course requirements
- Campus Life Information - Learn about events, facilities, and resources available on campus
- Student Services Support - Navigate administrative processes, support services, and more
- Natural Language Interface - Ask questions in everyday language, just like chatting with a friend
Emma is built using:
- Google Gemma 3 - For natural language understanding and generation
- LM Studio - For local model deployment and management
- Tailwind CSS - For responsive and elegant UI design
- Vite - For lightning-fast frontend development
- ChromaDB - For vector database and semantic search capabilities
- Node.js (v18 or higher)
- Python (v3.9 or higher)
- Git
- LM Studio with the following models downloaded and available:
gemma-3-4b-it-qat
gemma-3-12b-it-qat
(optional, for vision/OCR during ingestion)text-embedding-nomic-embed-text-v1.5
-
Clone the repository
git clone https://github.com/nedpals/emma.git cd emma
-
Install dependencies
# Install frontend dependencies cd frontend npm install # Install backend dependencies cd .. pip install -r requirements.txt
-
Start the development servers
# Start the frontend development server (in frontend directory) cd frontend npm run dev # In another terminal, start the backend server python main.py
-
Access Emma at
http://localhost:8000
Emma uses ChromaDB as its vector store to enable semantic search capabilities. There are two primary methods for ingesting handbook content:
Method 1: Using LM Studio (Recommended for local processing)
- Place your handbook documents (PDF format) in the project's root directory (e.g.,
handbook.pdf
). - Ensure LM Studio is running and serving the required models (
gemma-3-12b-it-qat
for vision andtext-embedding-nomic-embed-text-v1.5
for embeddings) athttp://localhost:1234
. - Run the embedding script. Choose one of the following commands:
- Standard Speed: Processes documents in smaller batches (default: 2). Suitable for systems with limited resources.
python embedding.py
- Faster Speed: Processes documents in larger batches (e.g., 600). Requires more system resources (RAM/VRAM) but significantly speeds up ingestion. Adjust the
MAX_EMBED_COUNT
value based on your system's capabilities.MAX_EMBED_COUNT=600 python embedding.py
- Standard Speed: Processes documents in smaller batches (default: 2). Suitable for systems with limited resources.
- The script will first use the vision model (
gemma-3-12b-it-qat
) to extract text segments from each page of the PDF, caching the results in theextracted_2
directory. Then, it will use the embedding model (text-embedding-nomic-embed-text-v1.5
) to create vector embeddings for each segment. - The embeddings and vector store data will be persisted in the
embeddings_db
directory.
Method 2: Using Google AI Studio (Alternative for text extraction)
This method is useful if you encounter issues with local vision model processing or prefer using Google's cloud-based models for the initial text extraction.
- Go to Google AI Studio.
- Create a new prompt. Upload your handbook PDF file.
- Use the prompt content from the
ingest_gemini_prompt.txt
file in this repository. Ensure you are using a capable multimodal model like Gemini 2.5 Pro. - Run the prompt. Google AI Studio will process the PDF and generate a JSON output containing the extracted text segments based on the prompt's instructions.
- Copy the entire JSON output.
- Create a new file named
page_0.json
inside theextracted_2
directory within your local project folder (create theextracted_2
directory if it doesn't exist). - Paste the copied JSON content into
extracted_2/page_0.json
and save the file. - Ensure LM Studio is running and serving only the required embedding model (
text-embedding-nomic-embed-text-v1.5
) athttp://localhost:1234
. - Run the embedding script (choose standard or faster speed as described in Method 1):
# Standard speed python embedding.py # OR Faster speed # MAX_EMBED_COUNT=600 python embedding.py
- The script will detect the cached data in
extracted_2/page_0.json
, skip the vision/OCR step, and proceed directly to embedding the text segments using the local embedding model. - The embeddings and vector store data will be persisted in the
embeddings_db
directory.
We welcome contributions to make Emma even better! If you'd like to contribute:
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
This project is not affiliated with, endorsed by, or connected to the University of the Immaculate Conception (UIC). Emma is an independent, personal project created with a strong desire to assist Ignatian Marians by utilizing the latest technologies available. All information provided should be verified with official UIC sources and personnel.
Made with ❤️ for Ignatian Marians