diff --git a/bootcamp/tutorials/quickstart/movie_recommendation_with_milvus.ipynb b/bootcamp/tutorials/quickstart/movie_recommendation_with_milvus.ipynb
new file mode 100644
index 000000000..019348632
--- /dev/null
+++ b/bootcamp/tutorials/quickstart/movie_recommendation_with_milvus.ipynb
@@ -0,0 +1,405 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ " \n",
+ " "
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Movie Recommendation with Milvus\n",
+ "In this notebook, we will explore how to generate embeddings of movie descriptions using OpenAI and leverage those embeddings within Milvus to recommend movies that match your preferences. To enhance our search results, we will utilize filtering to perform metadata searches. The dataset used in this example is sourced from HuggingFace datasets and contains over 8,000 movie entries, providing a rich pool of options for movie recommendations.\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Dependencies and Environment\n",
+ "You can install the dependencies by running the following command:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! pip install openai pymilvus datasets tqdm"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "> If you are using Google Colab, to enable dependencies just installed, you may need to **restart the runtime** (click on the \"Runtime\" menu at the top of the screen, and select \"Restart session\" from the dropdown menu)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We will use OpenAI as the LLM in this example. You should prepare the [api key](https://platform.openai.com/docs/quickstart) `OPENAI_API_KEY` as an environment variable."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "\n",
+ "os.environ[\"OPENAI_API_KEY\"] = \"sk-***********\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Initialize OpenAI client and Milvus\n",
+ "Initialize the OpenAI client."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from openai import OpenAI\n",
+ "\n",
+ "openai_client = OpenAI()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Set the collection name and dimension for the embeddings."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "COLLECTION_NAME = \"movie_search\"\n",
+ "DIMENSION = 1536\n",
+ "\n",
+ "BATCH_SIZE = 1000"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Connect to Milvus."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from pymilvus import (\n",
+ " connections,\n",
+ " utility,\n",
+ " FieldSchema,\n",
+ " Collection,\n",
+ " CollectionSchema,\n",
+ " DataType,\n",
+ ")\n",
+ "\n",
+ "# Connect to Milvus Database\n",
+ "connections.connect(uri=\"./milvus.db\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "> As for the argument of `url` and `token`:\n",
+ "> - Setting the `uri` as a local file, e.g.`./milvus.db`, is the most convenient method, as it automatically utilizes [Milvus Lite](https://milvus.io/docs/milvus_lite.md) to store all data in this file.\n",
+ "> - If you have large scale of data, say more than a million vectors, you can set up a more performant Milvus server on [Docker or Kubernetes](https://milvus.io/docs/quickstart.md). In this setup, please use the server address and port as your uri, e.g.`http://localhost:19530`. If you enable the authentication feature on Milvus, use \":\" as the token, otherwise don't set the token.\n",
+ "> - If you want to use [Zilliz Cloud](https://zilliz.com/cloud), the fully managed cloud service for Milvus, adjust the `uri` and `token`, which correspond to the [Public Endpoint and Api key](https://docs.zilliz.com/docs/on-zilliz-cloud-console#free-cluster-details) in Zilliz Cloud."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Remove collection if it already exists\n",
+ "if utility.has_collection(COLLECTION_NAME):\n",
+ " utility.drop_collection(COLLECTION_NAME)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Define the fields for the collection, which include the id, title, type, release year, rating, and description."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Create collection which includes the id, title, and embedding.\n",
+ "fields = [\n",
+ " FieldSchema(name=\"id\", dtype=DataType.INT64, is_primary=True, auto_id=True),\n",
+ " FieldSchema(name=\"title\", dtype=DataType.VARCHAR, max_length=64000),\n",
+ " FieldSchema(name=\"type\", dtype=DataType.VARCHAR, max_length=64000),\n",
+ " FieldSchema(name=\"release_year\", dtype=DataType.INT64),\n",
+ " FieldSchema(name=\"rating\", dtype=DataType.VARCHAR, max_length=64000),\n",
+ " FieldSchema(name=\"description\", dtype=DataType.VARCHAR, max_length=64000),\n",
+ " FieldSchema(name=\"embedding\", dtype=DataType.FLOAT_VECTOR, dim=DIMENSION),\n",
+ "]\n",
+ "schema = CollectionSchema(fields=fields)\n",
+ "collection = Collection(name=COLLECTION_NAME, schema=schema)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Create the index on the collection and load it."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Create the index on the collection and load it.\n",
+ "collection.create_index(\n",
+ " field_name=\"embedding\",\n",
+ " index_params={\"metric_type\": \"IP\", \"index_type\": \"AUTOINDEX\", \"params\": {}},\n",
+ ")\n",
+ "collection.load()"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Dataset\n",
+ "With Milvus up and running we can begin grabbing our data. `Hugging Face Datasets` is a hub that holds many different user datasets, and for this example we are using HuggingLearners's netflix-shows dataset. This dataset contains movies and their metadata pairs for over 8 thousand movies. We are going to embed each description and store it within Milvus along with its title, type, release_year and rating."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import datasets\n",
+ "\n",
+ "# Download the dataset\n",
+ "dataset = datasets.load_dataset(\"hugginglearners/netflix-shows\", split=\"train\")"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Insert the Data\n",
+ "Now that we have our data on our machine we can begin embedding it and inserting it into Milvus. The embedding function takes in text and returns the embeddings in a list format. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def emb_texts(texts):\n",
+ " res = openai_client.embeddings.create(input=texts, model=\"text-embedding-3-small\")\n",
+ " return [res_data.embedding for res_data in res.data]"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "This next step does the actual inserting. We iterate through all the entries and create batches that we insert once we hit our set batch size. After the loop is over we insert the last remaning batch if it exists. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|██████████| 8807/8807 [01:52<00:00, 78.52it/s] \n"
+ ]
+ }
+ ],
+ "source": [
+ "from tqdm import tqdm\n",
+ "\n",
+ "data = [\n",
+ " [], # title\n",
+ " [], # type\n",
+ " [], # release_year\n",
+ " [], # rating\n",
+ " [], # description\n",
+ "]\n",
+ "\n",
+ "# Embed and insert in batches\n",
+ "for i in tqdm(range(0, len(dataset))):\n",
+ " data[0].append(dataset[i][\"title\"] or \"\")\n",
+ " data[1].append(dataset[i][\"type\"] or \"\")\n",
+ " data[2].append(dataset[i][\"release_year\"] or -1)\n",
+ " data[3].append(dataset[i][\"rating\"] or \"\")\n",
+ " data[4].append(dataset[i][\"description\"] or \"\")\n",
+ " if len(data[0]) % BATCH_SIZE == 0:\n",
+ " data.append(emb_texts(data[4]))\n",
+ " collection.insert(data)\n",
+ " data = [[], [], [], [], []]\n",
+ "\n",
+ "# Embed and insert the remainder\n",
+ "if len(data[0]) != 0:\n",
+ " data.append(emb_texts(data[4]))\n",
+ " collection.insert(data)\n",
+ " data = [[], [], [], [], []]"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Query the Database\n",
+ "With our data safely inserted into Milvus, we can now perform a query. The query takes in a tuple of the movie description you are searching for and the filter to use. More info about the filter can be found [here](https://milvus.io/docs/boolean.md). The search first prints out your description and filter expression. After that for each result we print the score, title, type, release year, rating and description of the result movies. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Description: movie about a fluffly animal Expression: release_year < 2019 and rating like \"PG%\"\n",
+ "Results:\n",
+ "\tRank: 1 Score: 0.4221611022949219 Title: The Adventures of Tintin\n",
+ "\t\tType: Movie Release Year: 2011 Rating: PG\n",
+ "This 3-D motion capture adapts Georges Remi's classic comic strip about the adventures\n",
+ "of fearless young journalist Tintin and his trusty dog, Snowy.\n",
+ "\n",
+ "\tRank: 2 Score: 0.40418487787246704 Title: Hedgehogs\n",
+ "\t\tType: Movie Release Year: 2016 Rating: PG\n",
+ "When a hedgehog suffering from memory loss forgets his identity, he ends up on a big\n",
+ "city journey with a pigeon to save his habitat from a human threat.\n",
+ "\n",
+ "\tRank: 3 Score: 0.3980240225791931 Title: Osmosis Jones\n",
+ "\t\tType: Movie Release Year: 2001 Rating: PG\n",
+ "Peter and Bobby Farrelly outdo themselves with this partially animated tale about an\n",
+ "out-of-shape 40-year-old man who's the host to various organisms.\n",
+ "\n",
+ "\tRank: 4 Score: 0.3947009742259979 Title: The Lamb\n",
+ "\t\tType: Movie Release Year: 2017 Rating: PG\n",
+ "A big-dreaming donkey escapes his menial existence and befriends some free-spirited\n",
+ "animal pals in this imaginative retelling of the Nativity Story.\n",
+ "\n",
+ "\tRank: 5 Score: 0.39375394582748413 Title: Open Season 2\n",
+ "\t\tType: Movie Release Year: 2008 Rating: PG\n",
+ "Elliot the buck and his forest-dwelling cohorts must rescue their dachshund pal from\n",
+ "some spoiled pets bent on returning him to domesticity.\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "import textwrap\n",
+ "\n",
+ "\n",
+ "def query(query, top_k=5):\n",
+ " text, expr = query\n",
+ " res = collection.search(\n",
+ " emb_texts(text),\n",
+ " anns_field=\"embedding\",\n",
+ " expr=expr,\n",
+ " param={\n",
+ " \"metric_type\": \"IP\",\n",
+ " \"params\": {},\n",
+ " },\n",
+ " limit=top_k,\n",
+ " output_fields=[\"title\", \"type\", \"release_year\", \"rating\", \"description\"],\n",
+ " )\n",
+ " for hit in res:\n",
+ " print(\"Description:\", text, \"Expression:\", expr)\n",
+ " print(\"Results:\")\n",
+ " for ii, hits in enumerate(hit):\n",
+ " print(\n",
+ " \"\\t\" + \"Rank:\",\n",
+ " ii + 1,\n",
+ " \"Score:\",\n",
+ " hits.score,\n",
+ " \"Title:\",\n",
+ " hits.entity.get(\"title\"),\n",
+ " )\n",
+ " print(\n",
+ " \"\\t\\t\" + \"Type:\",\n",
+ " hits.entity.get(\"type\"),\n",
+ " \"Release Year:\",\n",
+ " hits.entity.get(\"release_year\"),\n",
+ " \"Rating:\",\n",
+ " hits.entity.get(\"rating\"),\n",
+ " )\n",
+ " print(textwrap.fill(hits.entity.get(\"description\"), 88))\n",
+ " print()\n",
+ "\n",
+ "\n",
+ "my_query = (\"movie about a fluffly animal\", 'release_year < 2019 and rating like \"PG%\"')\n",
+ "\n",
+ "query(my_query)"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "haystack",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.13"
+ },
+ "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}