An automated system for mapping source medical concepts to OMOP standard concepts using vector similarity search and LLM-based reranking.
- Docker and Docker Compose installed
- OpenAI API key
- OMOP vocabulary files (CONCEPT.csv, CONCEPT_RELATIONSHIP.csv, CONCEPT_ANCESTOR.csv)
git clone <your-repo-url>
cd auto-omop-mapperCopy the example environment file and edit it:
cp .env.example .envEdit .env with your configuration:
# Database Configuration (PostgreSQL)
POSTGRES_USER=omop_user
POSTGRES_PASSWORD=your_secure_password_here
POSTGRES_DB=omop_mapper
POSTGRES_HOST=postgres-db
POSTGRES_PORT=5432
# Vector Database
QDRANT_URL=http://qdrant:6333
# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_heredocker-compose up -dThis will start:
- PostgreSQL database (port 5432)
- Qdrant vector database (ports 6333, 6334)
- Auto OMOP Mapper web application (port 8501)
Open your browser and go to: http://localhost:8501
- Place your OMOP vocabulary files in the
vocabulary/directory:- CONCEPT.csv: Main OMOP concept table
- CONCEPT_RELATIONSHIP.csv: Relationships between concepts
- CONCEPT_ANCESTOR.csv: Hierarchical relationships
- Go to "Import Data" → "OMOP Vocabulary Tables"
- Import the vocabulary files
- Go to "Import Data" → "ATC7 Processing"
- Click "Process ATC7 Codes" to find and store ATC7 codes for drug concepts
- Go to "Import Data" → "Source Concepts"
- Upload your CSV file with source concepts that need mapping
- Required columns:
source_value,source_concept_name
- Go to "Import Data" → "Embedding Management"
- Click "Embed Standard Concepts" to create vector embeddings
- Only standard concepts will be embedded, with ATC7 metadata for drugs
- You can see how many concepts are embedded on this page.
- Go to "Configuration" page to change:
- LLM Model for reranking (gpt-4o, gpt-4o-mini, etc.)
- Embedding Model (text-embedding-3-small, text-embedding-3-large, etc.)
- When changing embedding settings, a new vector collection will be created
- Search: Use the "Search" page to test similarity search
- Map: Use the "Map Concepts" page for interactive mapping
- Commit: Review and export mappings on the "Check and Commit" page
The system uses PostgreSQL for data storage, Qdrant for vector search and OpenAI API for LLM calls.
This repository contains work developed as part of my role at the Croatian Institute of Public Health.