BGE-M3 Qdrant sample. Hybrid search & reranking

Dense vectors: For semantic similarity (1024 dimensions)
Sparse vectors: For lexical/keyword matching
ColBERT token vectors: For fine-grained token-level matching

This repository contains a Jupyter notebook that demonstrates how to build an advanced search system using BGE-M3 and Qdrant.

The key feature of this sample is the use of an all-in-one embedding model (BGE-M3) that generates three types of vectors in a single pass:

This multi-vector approach provides superior search quality by combining the strengths of different embedding types within a single model.

Requirements

The system operates in the following steps:

Data Loading: Products are loaded from a CSV file
Text Formatting: Product information is formatted for embedding
Embedding Generation: BGE-M3 generates all three embedding types in one pass
Vector Database Setup: Qdrant collection is configured for hybrid search
Data Indexing: Product data and embeddings are stored in Qdrant
Search: Queries go through the same embedding process and retrieve results

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
products.csv		products.csv
sample.ipynb		sample.ipynb