Repository accompanying the Haystack US 2025 workshop "Learning to Hybrid Search"
- Docker to run OpenSearch and OpenSearch Dashboards
- Python and pip
- Dataset: the notebooks assume the ESCI dataset to be downloaded. You can change the path to where the dataset can be found in the notebooks accordingly.
Execute the following command to fire up OpenSearch and OpenSearch Dashboards:
docker compose up -d
Create a virtual environment:
python3 -m venv .venv
Activate the virtual environment:
source .venv/bin/activate
Install the requirements:
pip3 install -r requirements.txt
Start Jupyter:
jupyter notebook
Open http://localhost:8888 in your browser (you might need to go for http://127.0.0.1:8888)
- Prepare OpenSearch: necessary setup steps to enable embedding generation during index and query time.
- Index ESCI Data: load the product data.
- Queries, queries, queries: run lexical and hybrid queries.
- Baseline Search & Metrics: calculate search quality metrics for the baseline.
- Best Hybrid Search Configuration: identify the best configiuration parameters to run arithmetic combination of hybrid search.
- Dynamic Hybrid Search Optimization - Model Training and Evaluation: do feature engineering and evaluate good feature combinations.
- Calculate Search Metrics with Dynamic Optimizer: run the trained model on the test set.