This repository contains a Jupyter notebook that demonstrates how to build an advanced search system using BGE-M3 and Qdrant.
The key feature of this sample is the use of an all-in-one embedding model (BGE-M3) that generates three types of vectors in a single pass:
- Dense vectors: For semantic similarity (1024 dimensions)
- Sparse vectors: For lexical/keyword matching
- ColBERT token vectors: For fine-grained token-level matching
This multi-vector approach provides superior search quality by combining the strengths of different embedding types within a single model.
- Python 3.9+
- Docker (for running Qdrant)
- Jupyter Notebook
The system operates in the following steps:
- Data Loading: Products are loaded from a CSV file
- Text Formatting: Product information is formatted for embedding
- Embedding Generation: BGE-M3 generates all three embedding types in one pass
- Vector Database Setup: Qdrant collection is configured for hybrid search
- Data Indexing: Product data and embeddings are stored in Qdrant
- Search: Queries go through the same embedding process and retrieve results