Skip to content

jaypound/chatbot_memory

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chatbot with Memory

A production-ready chatbot system with persistent memory, user profiles, and document ingestion capabilities.

Features

  • Persistent Memory: Stores facts about users using a weighted fact system
  • User Profiles: Automatically builds and updates user profiles from conversations
  • Document Ingestion: Process and learn from PDF documents (50-200 pages)
  • Vector Search: Semantic search across conversation history and documents
  • Fact Connections: Discovers relationships between different pieces of information
  • Fast Retrieval: Uses PostgreSQL with pgvector for efficient similarity search

Architecture

  • Backend: FastAPI + PostgreSQL with pgvector
  • Embeddings: OpenAI text-embedding-3-large (3072 dimensions)
  • LLM: OpenAI GPT-4o for reasoning and responses
  • Short-term Memory: Redis for recent conversation context
  • Document Processing: PyMuPDF and Unstructured for PDF extraction

Setup

Prerequisites

  1. PostgreSQL with pgvector extension
  2. Redis
  3. Python 3.9+
  4. OpenAI API key

Database Setup

  1. Ensure you have PostgreSQL installed with the pgvector extension:
CREATE EXTENSION IF NOT EXISTS vector;
  1. Create a .env file based on .env.example:
cp .env.example .env
# Edit .env with your settings
  1. Initialize the database:
python setup_database.py

Installation

  1. Create a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

Running the API

python -m api.main

The API will be available at http://localhost:8000

API Endpoints

Users

  • POST /users - Create a new user
  • GET /users/{user_id} - Get user details

Conversations

  • POST /conversations - Create a new conversation
  • GET /conversations/{user_id} - List user's conversations

Chat

  • POST /chat - Send a message and get a response

Documents

  • POST /documents/upload - Upload and process a PDF
  • GET /documents/{user_id} - List user's documents

Memory

  • GET /memory/{user_id}/facts - Get user's memory facts

Usage Example

import requests

# Create a user
response = requests.post("http://localhost:8000/users", 
    json={"username": "alice"})
user_id = response.json()["user_id"]

# Create a conversation
response = requests.post(f"http://localhost:8000/conversations?user_id={user_id}",
    json={"title": "First Chat"})
conversation_id = response.json()["conversation_id"]

# Send a message
response = requests.post("http://localhost:8000/chat",
    json={
        "conversation_id": conversation_id,
        "message": "Hi! I'm interested in learning Python for data science."
    })
print(response.json()["response"])

# Upload a document
with open("python_tutorial.pdf", "rb") as f:
    response = requests.post(
        f"http://localhost:8000/documents/upload?user_id={user_id}",
        files={"file": f}
    )

Memory System

How Facts are Stored

  • Facts are stored as subject-predicate-object triples
  • Each fact has a weight (importance) and confidence score
  • Facts decay over time if not reinforced
  • Similar facts are merged to avoid duplication

User Profile Schema

{
    "name": "Alice",
    "pronouns": "she/her",
    "interests": ["Python", "data science", "machine learning"],
    "skills": ["programming", "statistics"],
    "goals": ["build ML models", "analyze data"],
    "tone_preference": "friendly and encouraging",
    "reading_level": "technical",
    "conversation_style": "detailed explanations",
    "important_contacts": [{"name": "Bob", "relationship": "colleague"}],
    "constraints": ["prefers visual examples"]
}

Configuration

Key environment variables:

  • OPENAI_API_KEY: Your OpenAI API key
  • DATABASE_URL: PostgreSQL connection string
  • REDIS_URL: Redis connection string
  • MAX_MEMORY_FACTS: Maximum facts to store per user (default: 1000)
  • MEMORY_DECAY_DAYS: Days before memories start decaying (default: 90)

Performance Considerations

  • Document chunks are embedded in batches for efficiency
  • Embeddings are cached in memory to avoid recomputation
  • Vector similarity search is optimized with pgvector indexes
  • Facts are weighted and pruned to maintain relevance

Future Enhancements

  • Support for more document formats (Word, HTML, Markdown)
  • Graph database integration for complex relationships
  • Multi-modal memory (images, audio)
  • Export/import memory snapshots
  • Fine-tuned models for better fact extraction

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published