From fac4111a19d0e760e9deb9300d0c6aaf7e6db560 Mon Sep 17 00:00:00 2001
From: ChengZi <chen.zhang@zilliz.com>
Date: Mon, 21 Oct 2024 16:37:01 +0800
Subject: [PATCH 1/3] Add build RAG with Milvus and Lepton AI tutorial

Signed-off-by: ChengZi <chen.zhang@zilliz.com>
---
 .../build_RAG_with_milvus_and_lepton.ipynb    | 534 ++++++++++++++++++
 1 file changed, 534 insertions(+)
 create mode 100644 bootcamp/tutorials/integration/build_RAG_with_milvus_and_lepton.ipynb
diff --git a/bootcamp/tutorials/integration/build_RAG_with_milvus_and_lepton.ipynb b/bootcamp/tutorials/integration/build_RAG_with_milvus_and_lepton.ipynb
new file mode 100644
index 000000000..49c12025d
--- /dev/null
+++ b/bootcamp/tutorials/integration/build_RAG_with_milvus_and_lepton.ipynb
@@ -0,0 +1,534 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": false,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "source": [
+    "<a href=\"https://colab.research.google.com/github/milvus-io/bootcamp/blob/master/bootcamp/tutorials/integration/build_RAG_with_milvus_and_lepton.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>   <a href=\"https://github.com/milvus-io/bootcamp/blob/master/bootcamp/tutorials/integration/build_RAG_with_milvus_and_lepton.ipynb\" target=\"_blank\">\n",
+    "    <img src=\"https://img.shields.io/badge/View%20on%20GitHub-555555?style=flat&logo=github&logoColor=white\" alt=\"GitHub Repository\"/>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Build RAG with Milvus and Lepton AI\n",
+    "\n",
+    "[Lepton AI](https://www.lepton.ai/) enables developers and enterprises to run AI applications efficiently in minutes, and at a production ready scale.\n",
+    "Lepton AI allows you to build models in a Python native way, debug and test models locally, deploy them to the cloud with a single command, and consume models in any application with a simple, flexible API. It provides a comprehensive environment for deploying various AI models, including large language models (LLMs) and diffusion models, without the need for extensive infrastructure setup.\n",
+    "\n",
+    "In this tutorial, we will show you how to build a RAG (Retrieval-Augmented Generation) pipeline with Milvus and Lepton AI."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "\n",
+    "## Preparation\n",
+    "### Dependencies and Environment"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "vscode": {
+     "languageId": "shellscript"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "! pip install --upgrade pymilvus[model] openai requests tqdm"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "> If you are using Google Colab, to enable dependencies just installed, you may need to **restart the runtime** (click on the \"Runtime\" menu at the top of the screen, and select \"Restart session\" from the dropdown menu)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Lepton enables the OpenAI-style API. You can login to its official website and prepare the [api key](https://www.lepton.ai/docs) `LEPTONAI_TOKEN` as an environment variable."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": false,
+    "jupyter": {
+     "outputs_hidden": false
+    },
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "os.environ[\"LEPTONAI_TOKEN\"] = \"***********\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Prepare the data\n",
+    "\n",
+    "We use the FAQ pages from the [Milvus Documentation 2.4.x](https://github.com/milvus-io/milvus-docs/releases/download/v2.4.6-preview/milvus_docs_2.4.x_en.zip) as the private knowledge in our RAG, which is a good data source for a simple RAG pipeline.\n",
+    "\n",
+    "Download the zip file and extract documents to the folder `milvus_docs`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "vscode": {
+     "languageId": "shellscript"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "! wget https://github.com/milvus-io/milvus-docs/releases/download/v2.4.6-preview/milvus_docs_2.4.x_en.zip\n",
+    "! unzip -q milvus_docs_2.4.x_en.zip -d milvus_docs"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We load all markdown files from the folder `milvus_docs/en/faq`. For each document, we just simply use \"# \" to separate the content in the file, which can roughly separate the content of each main part of the markdown file."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from glob import glob\n",
+    "\n",
+    "text_lines = []\n",
+    "\n",
+    "for file_path in glob(\"milvus_docs/en/faq/*.md\", recursive=True):\n",
+    "    with open(file_path, \"r\") as file:\n",
+    "        file_text = file.read()\n",
+    "\n",
+    "    text_lines += file_text.split(\"# \")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Prepare the LLM and Embedding Model\n",
+    "\n",
+    "Lepton enables the OpenAI-style API, and you can use the same API with minor adjustments to call the LLM."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from openai import OpenAI\n",
+    "\n",
+    "lepton_client = OpenAI(\n",
+    "    api_key=os.environ[\"LEPTONAI_TOKEN\"],\n",
+    "    base_url=\"https://mistral-7b.lepton.run/api/v1/\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Define a embedding model to generate text embeddings using the `milvus_model`. We use the `DefaultEmbeddingFunction` model as an example, which is a pre-trained and lightweight embedding model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pymilvus import model as milvus_model\n",
+    "\n",
+    "embedding_model = milvus_model.DefaultEmbeddingFunction()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Generate a test embedding and print its dimension and first few elements."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "768\n",
+      "[-0.04836066  0.07163023 -0.01130064 -0.03789345 -0.03320649 -0.01318448\n",
+      " -0.03041712 -0.02269499 -0.02317863 -0.00426028]\n"
+     ]
+    }
+   ],
+   "source": [
+    "test_embedding = embedding_model.encode_queries([\"This is a test\"])[0]\n",
+    "embedding_dim = len(test_embedding)\n",
+    "print(embedding_dim)\n",
+    "print(test_embedding[:10])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Load data into Milvus"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Create the Collection"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pymilvus import MilvusClient\n",
+    "\n",
+    "milvus_client = MilvusClient(uri=\"./milvus_demo.db\")\n",
+    "\n",
+    "collection_name = \"my_rag_collection\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "> As for the argument of `MilvusClient`:\n",
+    "> - Setting the `uri` as a local file, e.g.`./milvus.db`, is the most convenient method, as it automatically utilizes [Milvus Lite](https://milvus.io/docs/milvus_lite.md) to store all data in this file.\n",
+    "> - If you have large scale of data, you can set up a more performant Milvus server on [docker or kubernetes](https://milvus.io/docs/quickstart.md). In this setup, please use the server uri, e.g.`http://localhost:19530`, as your `uri`.\n",
+    "> - If you want to use [Zilliz Cloud](https://zilliz.com/cloud), the fully managed cloud service for Milvus, adjust the `uri` and `token`, which correspond to the [Public Endpoint and Api key](https://docs.zilliz.com/docs/on-zilliz-cloud-console#free-cluster-details) in Zilliz Cloud."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Check if the collection already exists and drop it if it does."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "if milvus_client.has_collection(collection_name):\n",
+    "    milvus_client.drop_collection(collection_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Create a new collection with specified parameters. \n",
+    "\n",
+    "If we don't specify any field information, Milvus will automatically create a default `id` field for primary key, and a `vector` field to store the vector data. A reserved JSON field is used to store non-schema-defined fields and their values."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "milvus_client.create_collection(\n",
+    "    collection_name=collection_name,\n",
+    "    dimension=embedding_dim,\n",
+    "    metric_type=\"IP\",  # Inner product distance\n",
+    "    consistency_level=\"Strong\",  # Strong consistency level\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Insert data\n",
+    "Iterate through the text lines, create embeddings, and then insert the data into Milvus.\n",
+    "\n",
+    "Here is a new field `text`, which is a non-defined field in the collection schema. It will be automatically added to the reserved JSON dynamic field, which can be treated as a normal field at a high level."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Creating embeddings: 100%|██████████| 72/72 [00:00<00:00, 1090216.20it/s]\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "{'insert_count': 72,\n",
+       " 'ids': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71],\n",
+       " 'cost': 0}"
+      ]
+     },
+     "execution_count": 19,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from tqdm import tqdm\n",
+    "\n",
+    "data = []\n",
+    "\n",
+    "doc_embeddings = embedding_model.encode_documents(text_lines)\n",
+    "\n",
+    "for i, line in enumerate(tqdm(text_lines, desc=\"Creating embeddings\")):\n",
+    "    data.append({\"id\": i, \"vector\": doc_embeddings[i], \"text\": line})\n",
+    "\n",
+    "milvus_client.insert(collection_name=collection_name, data=data)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Build RAG\n",
+    "\n",
+    "### Retrieve data for a query\n",
+    "\n",
+    "Let's specify a frequent question about Milvus."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "question = \"How is data stored in milvus?\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Search for the question in the collection and retrieve the semantic top-3 matches."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "search_res = milvus_client.search(\n",
+    "    collection_name=collection_name,\n",
+    "    data=embedding_model.encode_queries(\n",
+    "        [question]\n",
+    "    ),  # Convert the question to an embedding vector\n",
+    "    limit=3,  # Return top 3 results\n",
+    "    search_params={\"metric_type\": \"IP\", \"params\": {}},  # Inner product distance\n",
+    "    output_fields=[\"text\"],  # Return the text field\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's take a look at the search results of the query\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[\n",
+      "    [\n",
+      "        \" Where does Milvus store data?\\n\\nMilvus deals with two types of data, inserted data and metadata. \\n\\nInserted data, including vector data, scalar data, and collection-specific schema, are stored in persistent storage as incremental log. Milvus supports multiple object storage backends, including [MinIO](https://min.io/), [AWS S3](https://aws.amazon.com/s3/?nc1=h_ls), [Google Cloud Storage](https://cloud.google.com/storage?hl=en#object-storage-for-companies-of-all-sizes) (GCS), [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs), [Alibaba Cloud OSS](https://www.alibabacloud.com/product/object-storage-service), and [Tencent Cloud Object Storage](https://www.tencentcloud.com/products/cos) (COS).\\n\\nMetadata are generated within Milvus. Each Milvus module has its own metadata that are stored in etcd.\\n\\n###\",\n",
+      "        0.6572665572166443\n",
+      "    ],\n",
+      "    [\n",
+      "        \"How does Milvus flush data?\\n\\nMilvus returns success when inserted data are loaded to the message queue. However, the data are not yet flushed to the disk. Then Milvus' data node writes the data in the message queue to persistent storage as incremental logs. If `flush()` is called, the data node is forced to write all data in the message queue to persistent storage immediately.\\n\\n###\",\n",
+      "        0.6312146186828613\n",
+      "    ],\n",
+      "    [\n",
+      "        \"How does Milvus handle vector data types and precision?\\n\\nMilvus supports Binary, Float32, Float16, and BFloat16 vector types.\\n\\n- Binary vectors: Store binary data as sequences of 0s and 1s, used in image processing and information retrieval.\\n- Float32 vectors: Default storage with a precision of about 7 decimal digits. Even Float64 values are stored with Float32 precision, leading to potential precision loss upon retrieval.\\n- Float16 and BFloat16 vectors: Offer reduced precision and memory usage. Float16 is suitable for applications with limited bandwidth and storage, while BFloat16 balances range and efficiency, commonly used in deep learning to reduce computational requirements without significantly impacting accuracy.\\n\\n###\",\n",
+      "        0.6115777492523193\n",
+      "    ]\n",
+      "]\n"
+     ]
+    }
+   ],
+   "source": [
+    "import json\n",
+    "\n",
+    "retrieved_lines_with_distances = [\n",
+    "    (res[\"entity\"][\"text\"], res[\"distance\"]) for res in search_res[0]\n",
+    "]\n",
+    "print(json.dumps(retrieved_lines_with_distances, indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Use LLM to get a RAG response\n",
+    "\n",
+    "Convert the retrieved documents into a string format."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "context = \"\\n\".join(\n",
+    "    [line_with_distance[0] for line_with_distance in retrieved_lines_with_distances]\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Define system and user prompts for the Lanage Model. This prompt is assembled with the retrieved documents from Milvus."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "SYSTEM_PROMPT = \"\"\"\n",
+    "Human: You are an AI assistant. You are able to find answers to the questions from the contextual passage snippets provided.\n",
+    "\"\"\"\n",
+    "USER_PROMPT = f\"\"\"\n",
+    "Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.\n",
+    "<context>\n",
+    "{context}\n",
+    "</context>\n",
+    "<question>\n",
+    "{question}\n",
+    "</question>\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Use the `mistral-7b` model provided by Lepton AI to generate a response based on the prompts."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "metadata": {
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Inserted data in Milvus, including vector data, scalar data, and collection-specific schema, are stored in persistent storage as incremental logs. Milvus supports multiple object storage backends such as MinIO, AWS S3, Google Cloud Storage, Azure Blob Storage, Alibaba Cloud OSS, and Tencent Cloud Object Storage (COS). Metadata are generated within Milvus and stored in etcd.\n"
+     ]
+    }
+   ],
+   "source": [
+    "response = lepton_client.chat.completions.create(\n",
+    "    model=\"mistral-7b\",\n",
+    "    messages=[\n",
+    "        {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n",
+    "        {\"role\": \"user\", \"content\": USER_PROMPT},\n",
+    "    ],\n",
+    ")\n",
+    "print(response.choices[0].message.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Great! We have successfully built a RAG pipeline with Milvus and Lepton AI."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
\ No newline at end of file

From e802097ac8504cdcbcaf8c6fde5041ebe90b665f Mon Sep 17 00:00:00 2001
From: ChengZi <chen.zhang@zilliz.com>
Date: Tue, 22 Oct 2024 11:38:37 +0800
Subject: [PATCH 2/3] build RAG with Milvus and Fireworks AI

Signed-off-by: ChengZi <chen.zhang@zilliz.com>
---
 .../build_RAG_with_milvus_and_fireworks.ipynb | 539 ++++++++++++++++++
 1 file changed, 539 insertions(+)
 create mode 100644 bootcamp/tutorials/integration/build_RAG_with_milvus_and_fireworks.ipynb

diff --git a/bootcamp/tutorials/integration/build_RAG_with_milvus_and_fireworks.ipynb b/bootcamp/tutorials/integration/build_RAG_with_milvus_and_fireworks.ipynb
new file mode 100644
index 000000000..896864d21
--- /dev/null
+++ b/bootcamp/tutorials/integration/build_RAG_with_milvus_and_fireworks.ipynb
@@ -0,0 +1,539 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": false,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "source": [
+    "<a href=\"https://colab.research.google.com/github/milvus-io/bootcamp/blob/master/bootcamp/tutorials/integration/build_RAG_with_milvus_and_fireworks.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>   <a href=\"https://github.com/milvus-io/bootcamp/blob/master/bootcamp/tutorials/integration/build_RAG_with_milvus_and_fireworks.ipynb\" target=\"_blank\">\n",
+    "    <img src=\"https://img.shields.io/badge/View%20on%20GitHub-555555?style=flat&logo=github&logoColor=white\" alt=\"GitHub Repository\"/>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Build RAG with Milvus and Fireworks AI\n",
+    "\n",
+    "[Fireworks AI](https://fireworks.ai/) is a generative AI inference platform offering industry-leading speed and production-readiness for running and customizing models.\n",
+    "Fireworks AI provides a variety of generative AI services, including serverless models, on-demand deployments, and fine-tuning capabilities. It offers a comprehensive environment for deploying various AI models, including large language models (LLMs) and embedding models. Fireworks AI aggregates numerous models, enabling users to easily access and utilize these resources without the need for extensive infrastructure setup.\n",
+    " \n",
+    "In this tutorial, we will show you how to build a RAG (Retrieval-Augmented Generation) pipeline with Milvus and Fireworks AI."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "\n",
+    "## Preparation\n",
+    "### Dependencies and Environment"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "vscode": {
+     "languageId": "shellscript"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "! pip install --upgrade pymilvus openai requests tqdm"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "> If you are using Google Colab, to enable dependencies just installed, you may need to **restart the runtime** (click on the \"Runtime\" menu at the top of the screen, and select \"Restart session\" from the dropdown menu)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Fireworks AI enables the OpenAI-style API. You can login to its official website and prepare the [api key](https://docs.fireworks.ai/getting-started/introduction) `FIREWORKS_API_KEY` as an environment variable."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "collapsed": false,
+    "jupyter": {
+     "outputs_hidden": false
+    },
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "os.environ[\"FIREWORKS_API_KEY\"] = \"***********\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Prepare the data\n",
+    "\n",
+    "We use the FAQ pages from the [Milvus Documentation 2.4.x](https://github.com/milvus-io/milvus-docs/releases/download/v2.4.6-preview/milvus_docs_2.4.x_en.zip) as the private knowledge in our RAG, which is a good data source for a simple RAG pipeline.\n",
+    "\n",
+    "Download the zip file and extract documents to the folder `milvus_docs`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "vscode": {
+     "languageId": "shellscript"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "! wget https://github.com/milvus-io/milvus-docs/releases/download/v2.4.6-preview/milvus_docs_2.4.x_en.zip\n",
+    "! unzip -q milvus_docs_2.4.x_en.zip -d milvus_docs"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We load all markdown files from the folder `milvus_docs/en/faq`. For each document, we just simply use \"# \" to separate the content in the file, which can roughly separate the content of each main part of the markdown file."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from glob import glob\n",
+    "\n",
+    "text_lines = []\n",
+    "\n",
+    "for file_path in glob(\"milvus_docs/en/faq/*.md\", recursive=True):\n",
+    "    with open(file_path, \"r\") as file:\n",
+    "        file_text = file.read()\n",
+    "\n",
+    "    text_lines += file_text.split(\"# \")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Prepare the LLM and Embedding Model\n",
+    "\n",
+    "We initialize a client to prepare the LLM and embedding model. Fireworks AI enables the OpenAI-style API, and you can use the same API with minor adjustments to call the embedding model and the LLM."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from openai import OpenAI\n",
+    "\n",
+    "fireworks_client = OpenAI(\n",
+    "    api_key=os.environ[\"FIREWORKS_API_KEY\"],\n",
+    "    base_url=\"https://api.fireworks.ai/inference/v1\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Define a function to generate text embeddings using the client. We use the `nomic-ai/nomic-embed-text-v1.5` model as an example."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def emb_text(text):\n",
+    "    return (\n",
+    "        fireworks_client.embeddings.create(\n",
+    "            input=text, model=\"nomic-ai/nomic-embed-text-v1.5\"\n",
+    "        )\n",
+    "        .data[0]\n",
+    "        .embedding\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Generate a test embedding and print its dimension and first few elements."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "768\n",
+      "[0.04815673828125, 0.0261993408203125, -0.1749267578125, -0.03131103515625, 0.068115234375, -0.00621795654296875, 0.03955078125, -0.0210723876953125, 0.039703369140625, -0.0286102294921875]\n"
+     ]
+    }
+   ],
+   "source": [
+    "test_embedding = emb_text(\"This is a test\")\n",
+    "embedding_dim = len(test_embedding)\n",
+    "print(embedding_dim)\n",
+    "print(test_embedding[:10])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Load data into Milvus"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Create the Collection"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pymilvus import MilvusClient\n",
+    "\n",
+    "milvus_client = MilvusClient(uri=\"./milvus_demo.db\")\n",
+    "\n",
+    "collection_name = \"my_rag_collection\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "> As for the argument of `MilvusClient`:\n",
+    "> - Setting the `uri` as a local file, e.g.`./milvus.db`, is the most convenient method, as it automatically utilizes [Milvus Lite](https://milvus.io/docs/milvus_lite.md) to store all data in this file.\n",
+    "> - If you have large scale of data, you can set up a more performant Milvus server on [docker or kubernetes](https://milvus.io/docs/quickstart.md). In this setup, please use the server uri, e.g.`http://localhost:19530`, as your `uri`.\n",
+    "> - If you want to use [Zilliz Cloud](https://zilliz.com/cloud), the fully managed cloud service for Milvus, adjust the `uri` and `token`, which correspond to the [Public Endpoint and Api key](https://docs.zilliz.com/docs/on-zilliz-cloud-console#free-cluster-details) in Zilliz Cloud."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Check if the collection already exists and drop it if it does."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "if milvus_client.has_collection(collection_name):\n",
+    "    milvus_client.drop_collection(collection_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Create a new collection with specified parameters. \n",
+    "\n",
+    "If we don't specify any field information, Milvus will automatically create a default `id` field for primary key, and a `vector` field to store the vector data. A reserved JSON field is used to store non-schema-defined fields and their values."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "milvus_client.create_collection(\n",
+    "    collection_name=collection_name,\n",
+    "    dimension=embedding_dim,\n",
+    "    metric_type=\"IP\",  # Inner product distance\n",
+    "    consistency_level=\"Strong\",  # Strong consistency level\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Insert data\n",
+    "Iterate through the text lines, create embeddings, and then insert the data into Milvus.\n",
+    "\n",
+    "Here is a new field `text`, which is a non-defined field in the collection schema. It will be automatically added to the reserved JSON dynamic field, which can be treated as a normal field at a high level."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Creating embeddings: 100%|██████████| 72/72 [00:28<00:00,  2.51it/s]\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "{'insert_count': 72, 'ids': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71], 'cost': 0}"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from tqdm import tqdm\n",
+    "\n",
+    "data = []\n",
+    "\n",
+    "for i, line in enumerate(tqdm(text_lines, desc=\"Creating embeddings\")):\n",
+    "    data.append({\"id\": i, \"vector\": emb_text(line), \"text\": line})\n",
+    "\n",
+    "milvus_client.insert(collection_name=collection_name, data=data)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Build RAG\n",
+    "\n",
+    "### Retrieve data for a query\n",
+    "\n",
+    "Let's specify a frequent question about Milvus."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "question = \"How is data stored in milvus?\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Search for the question in the collection and retrieve the semantic top-3 matches."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "search_res = milvus_client.search(\n",
+    "    collection_name=collection_name,\n",
+    "    data=[\n",
+    "        emb_text(question)\n",
+    "    ],  # Use the `emb_text` function to convert the question to an embedding vector\n",
+    "    limit=3,  # Return top 3 results\n",
+    "    search_params={\"metric_type\": \"IP\", \"params\": {}},  # Inner product distance\n",
+    "    output_fields=[\"text\"],  # Return the text field\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's take a look at the search results of the query\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[\n",
+      "    [\n",
+      "        \" Where does Milvus store data?\\n\\nMilvus deals with two types of data, inserted data and metadata. \\n\\nInserted data, including vector data, scalar data, and collection-specific schema, are stored in persistent storage as incremental log. Milvus supports multiple object storage backends, including [MinIO](https://min.io/), [AWS S3](https://aws.amazon.com/s3/?nc1=h_ls), [Google Cloud Storage](https://cloud.google.com/storage?hl=en#object-storage-for-companies-of-all-sizes) (GCS), [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs), [Alibaba Cloud OSS](https://www.alibabacloud.com/product/object-storage-service), and [Tencent Cloud Object Storage](https://www.tencentcloud.com/products/cos) (COS).\\n\\nMetadata are generated within Milvus. Each Milvus module has its own metadata that are stored in etcd.\\n\\n###\",\n",
+      "        0.8334928750991821\n",
+      "    ],\n",
+      "    [\n",
+      "        \"How does Milvus flush data?\\n\\nMilvus returns success when inserted data are loaded to the message queue. However, the data are not yet flushed to the disk. Then Milvus' data node writes the data in the message queue to persistent storage as incremental logs. If `flush()` is called, the data node is forced to write all data in the message queue to persistent storage immediately.\\n\\n###\",\n",
+      "        0.746377170085907\n",
+      "    ],\n",
+      "    [\n",
+      "        \"What is the maximum dataset size Milvus can handle?\\n\\n  \\nTheoretically, the maximum dataset size Milvus can handle is determined by the hardware it is run on, specifically system memory and storage:\\n\\n- Milvus loads all specified collections and partitions into memory before running queries. Therefore, memory size determines the maximum amount of data Milvus can query.\\n- When new entities and and collection-related schema (currently only MinIO is supported for data persistence) are added to Milvus, system storage determines the maximum allowable size of inserted data.\\n\\n###\",\n",
+      "        0.7328270673751831\n",
+      "    ]\n",
+      "]\n"
+     ]
+    }
+   ],
+   "source": [
+    "import json\n",
+    "\n",
+    "retrieved_lines_with_distances = [\n",
+    "    (res[\"entity\"][\"text\"], res[\"distance\"]) for res in search_res[0]\n",
+    "]\n",
+    "print(json.dumps(retrieved_lines_with_distances, indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Use LLM to get a RAG response\n",
+    "\n",
+    "Convert the retrieved documents into a string format."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "context = \"\\n\".join(\n",
+    "    [line_with_distance[0] for line_with_distance in retrieved_lines_with_distances]\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Define system and user prompts for the Lanage Model. This prompt is assembled with the retrieved documents from Milvus."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "SYSTEM_PROMPT = \"\"\"\n",
+    "Human: You are an AI assistant. You are able to find answers to the questions from the contextual passage snippets provided.\n",
+    "\"\"\"\n",
+    "USER_PROMPT = f\"\"\"\n",
+    "Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.\n",
+    "<context>\n",
+    "{context}\n",
+    "</context>\n",
+    "<question>\n",
+    "{question}\n",
+    "</question>\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Use the `llama-v3p1-405b-instruct` model provided by Fireworks to generate a response based on the prompts.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "According to the provided context, Milvus stores data in two ways:\n",
+      "\n",
+      "1. Inserted data, including vector data, scalar data, and collection-specific schema, are stored in persistent storage as incremental logs. This can be done using multiple object storage backends such as MinIO, AWS S3, Google Cloud Storage, Azure Blob Storage, Alibaba Cloud OSS, and Tencent Cloud Object Storage.\n",
+      "2. Metadata, which are generated within Milvus, are stored in etcd, with each Milvus module having its own metadata.\n",
+      "\n",
+      "Additionally, when data is inserted, it is first loaded into a message queue, and then written to persistent storage as incremental logs by the data node. The `flush()` function can be used to force the data node to write all data in the message queue to persistent storage immediately.\n"
+     ]
+    }
+   ],
+   "source": [
+    "response = fireworks_client.chat.completions.create(\n",
+    "    model=\"accounts/fireworks/models/llama-v3p1-405b-instruct\",\n",
+    "    messages=[\n",
+    "        {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n",
+    "        {\"role\": \"user\", \"content\": USER_PROMPT},\n",
+    "    ],\n",
+    ")\n",
+    "print(response.choices[0].message.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Great! We have successfully built a RAG pipeline with Milvus and Fireworks AI."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}

From e554f1641ab359054b6016c110c87369de92adc0 Mon Sep 17 00:00:00 2001
From: ChengZi <chen.zhang@zilliz.com>
Date: Tue, 22 Oct 2024 11:48:04 +0800
Subject: [PATCH 3/3] Add badges to the vector visualization notebook

Signed-off-by: ChengZi <chen.zhang@zilliz.com>
---
 .../quickstart/vector_visualization.ipynb     | 30 ++++++++++++-------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/bootcamp/tutorials/quickstart/vector_visualization.ipynb b/bootcamp/tutorials/quickstart/vector_visualization.ipynb
index f128af083..6b024bd65 100644
--- a/bootcamp/tutorials/quickstart/vector_visualization.ipynb
+++ b/bootcamp/tutorials/quickstart/vector_visualization.ipynb
@@ -2,15 +2,23 @@
  "cells": [
   {
    "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a href=\"https://colab.research.google.com/github/milvus-io/bootcamp/blob/master/bootcamp/tutorials/quickstart/vector_visualization.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>   <a href=\"https://github.com/milvus-io/bootcamp/blob/master/bootcamp/tutorials/quickstart/vector_visualization.ipynb\" target=\"_blank\">\n",
+    "    <img src=\"https://img.shields.io/badge/View%20on%20GitHub-555555?style=flat&logo=github&logoColor=white\" alt=\"GitHub Repository\"/>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": false
+   },
    "source": [
     "# Vector Visualization\n",
     "In this example, we will show how to visualize the embeddings(vectors) in Milvus using [t-SNE](https://www.wikiwand.com/en/articles/T-distributed_stochastic_neighbor_embedding).\n",
     "\n",
     "Dimensionality reduction techniques, such as t-SNE, are invaluable for visualizing complex, high-dimensional data in a 2D or 3D space while preserving the local structure. This enables pattern recognition, enhances understanding of feature relationships, and facilitates the interpretation of machine learning model outcomes. Additionally, it aids in algorithm evaluation by visually comparing clustering results, simplifies data presentation to non-specialist audiences, and can reduce computational costs by working with lower-dimensional representations. Through these applications, t-SNE not only helps in gaining deeper insights into datasets but also supports more informed decision-making processes."
-   ],
-   "metadata": {
-    "collapsed": false
-   }
+   ]
   },
   {
    "cell_type": "markdown",
@@ -618,17 +626,17 @@
   },
   {
    "cell_type": "markdown",
-   "source": [
-    "As we can see, the query vector is close to the retrieved vectors. Although the retrieved vectors are not within a standard circle with a fixed radius centered on the query, we can see that they are still very close to the query vector on the 2D plane.\n",
-    "\n",
-    "Using dimensionality reduction techniques can facilitate the understanding of vectors and troubleshooting. Hope you can get a better understanding of vectors through this tutorial."
-   ],
    "metadata": {
     "collapsed": false,
     "pycharm": {
      "name": "#%% md\n"
     }
-   }
+   },
+   "source": [
+    "As we can see, the query vector is close to the retrieved vectors. Although the retrieved vectors are not within a standard circle with a fixed radius centered on the query, we can see that they are still very close to the query vector on the 2D plane.\n",
+    "\n",
+    "Using dimensionality reduction techniques can facilitate the understanding of vectors and troubleshooting. Hope you can get a better understanding of vectors through this tutorial."
+   ]
   }
  ],
  "metadata": {
@@ -652,4 +660,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 2
-}
\ No newline at end of file
+}