Skip to content

Commit b9f70ee

Browse files
committedMar 24, 2024
Added pdf, env and notebook
1 parent f035bee commit b9f70ee

File tree

4 files changed

+293
-0
lines changed

4 files changed

+293
-0
lines changed
 

‎.DS_Store

6 KB
Binary file not shown.

‎data/budget_speech.pdf

1.42 MB
Binary file not shown.

‎llamaIndex_Gemini.ipynb

+288
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,288 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "code",
5+
"execution_count": 21,
6+
"metadata": {},
7+
"outputs": [],
8+
"source": [
9+
"#!pip install pinecone-client\n",
10+
"#!pip install llama-index-llms-gemini\n",
11+
"#!pip install llama-index-vector-stores-pinecone\n",
12+
"#!pip install llama-index\n",
13+
"#!pip install llama-index-embeddings-gemini"
14+
]
15+
},
16+
{
17+
"cell_type": "markdown",
18+
"metadata": {},
19+
"source": [
20+
"Step 1: Import libraries and define API keys\n",
21+
"We'll need to import a few libraries and take care of some basics."
22+
]
23+
},
24+
{
25+
"cell_type": "code",
26+
"execution_count": 7,
27+
"metadata": {},
28+
"outputs": [],
29+
"source": [
30+
"import os\n",
31+
"from pinecone import Pinecone\n",
32+
"from llama_index.llms.gemini import Gemini\n",
33+
"from llama_index.vector_stores.pinecone import PineconeVectorStore\n",
34+
"from llama_index.core import StorageContext\n",
35+
"from llama_index.embeddings.gemini import GeminiEmbedding\n",
36+
"from llama_index.core import ServiceContext, VectorStoreIndex, SimpleDirectoryReader, download_loader, set_global_service_context\n",
37+
"from llama_index.core import Settings"
38+
]
39+
},
40+
{
41+
"cell_type": "markdown",
42+
"metadata": {},
43+
"source": [
44+
"Set API keys and set Gemini as llm"
45+
]
46+
},
47+
{
48+
"cell_type": "code",
49+
"execution_count": 2,
50+
"metadata": {},
51+
"outputs": [],
52+
"source": [
53+
"GOOGLE_API_KEY = \"AIzaSyDEvLkqFWDGcNEdfej5nGtGk_gqELwini4\"\n",
54+
"PINECONE_API_KEY = \"953c33e9-4c8e-4c61-868a-64b246640ef4\""
55+
]
56+
},
57+
{
58+
"cell_type": "code",
59+
"execution_count": 3,
60+
"metadata": {},
61+
"outputs": [],
62+
"source": [
63+
"os.environ[\"GOOGLE_API_KEY\"] = GOOGLE_API_KEY\n",
64+
"os.environ[\"PINECONE_API_KEY\"] = PINECONE_API_KEY"
65+
]
66+
},
67+
{
68+
"cell_type": "code",
69+
"execution_count": 4,
70+
"metadata": {},
71+
"outputs": [],
72+
"source": [
73+
"# set llm as Gemini Pro\n",
74+
"llm = Gemini()"
75+
]
76+
},
77+
{
78+
"cell_type": "markdown",
79+
"metadata": {},
80+
"source": [
81+
"Step 2: Create a Pinecone client\n",
82+
"To send data back and forth between the app and Pinecone, we'll need to instantiate a Pinecone client. It's a one-liner:"
83+
]
84+
},
85+
{
86+
"cell_type": "code",
87+
"execution_count": 22,
88+
"metadata": {},
89+
"outputs": [],
90+
"source": [
91+
"#pinecone_client = Pinecone(api_key=os.environ[\"PINECONE_API_KEY\"])\n",
92+
"pinecone_client = Pinecone(api_key=PINECONE_API_KEY)"
93+
]
94+
},
95+
{
96+
"cell_type": "code",
97+
"execution_count": 6,
98+
"metadata": {},
99+
"outputs": [
100+
{
101+
"name": "stdout",
102+
"output_type": "stream",
103+
"text": [
104+
"testindex\n"
105+
]
106+
}
107+
],
108+
"source": [
109+
"# list pinecone indexes\n",
110+
"for index in pinecone_client.list_indexes():\n",
111+
" print(index['name'])"
112+
]
113+
},
114+
{
115+
"cell_type": "markdown",
116+
"metadata": {},
117+
"source": [
118+
"Step 3: Select the Pinecone index\n",
119+
"Using our Pinecone client, we can select the Index that we previously created and assign it to the variable pinecone_index:"
120+
]
121+
},
122+
{
123+
"cell_type": "code",
124+
"execution_count": 8,
125+
"metadata": {},
126+
"outputs": [],
127+
"source": [
128+
"pinecone_index = pinecone_client.Index(\"testindex\")"
129+
]
130+
},
131+
{
132+
"cell_type": "markdown",
133+
"metadata": {},
134+
"source": [
135+
"Step 4: Call the documents"
136+
]
137+
},
138+
{
139+
"cell_type": "code",
140+
"execution_count": 23,
141+
"metadata": {},
142+
"outputs": [],
143+
"source": [
144+
"documents = SimpleDirectoryReader(\"data\").load_data()"
145+
]
146+
},
147+
{
148+
"cell_type": "markdown",
149+
"metadata": {},
150+
"source": []
151+
},
152+
{
153+
"cell_type": "markdown",
154+
"metadata": {},
155+
"source": [
156+
"Step 5: Generate embeddings using GeminiEmbedding\n",
157+
"\n",
158+
"By default, LlamaIndex assumes you are using OpenAI to generate embeddings.\n",
159+
"To configure it to use Gemini instead, we need to set up the service context which lets LlamaIndex know which llm and which embedding model to use."
160+
]
161+
},
162+
{
163+
"cell_type": "code",
164+
"execution_count": 12,
165+
"metadata": {},
166+
"outputs": [],
167+
"source": [
168+
"embed_model = GeminiEmbedding(model_name=\"models/embedding-001\")\n",
169+
"\n",
170+
"Settings.llm = llm\n",
171+
"Settings.embed_model = embed_model\n",
172+
"Settings.chunk_size = 512"
173+
]
174+
},
175+
{
176+
"cell_type": "markdown",
177+
"metadata": {},
178+
"source": [
179+
"Step 6: Generate and store embeddings in the Pinecone index\n",
180+
"Using the VectorStoreIndex class, LlamaIndex takes care of sending the data chunks to the embedding model and then handles storing the vectorized data into the Pinecone index."
181+
]
182+
},
183+
{
184+
"cell_type": "code",
185+
"execution_count": 16,
186+
"metadata": {},
187+
"outputs": [
188+
{
189+
"name": "stderr",
190+
"output_type": "stream",
191+
"text": [
192+
"Upserted vectors: 100%|██████████| 32/32 [00:01<00:00, 26.58it/s]\n"
193+
]
194+
}
195+
],
196+
"source": [
197+
"# store embeddings in pinecone index\n",
198+
"vector_store = PineconeVectorStore(pinecone_index=pinecone_index)\n",
199+
"\n",
200+
"# Create a StorageContext using the created PineconeVectorStore\n",
201+
"storage_context = StorageContext.from_defaults(\n",
202+
" vector_store=vector_store\n",
203+
")\n",
204+
"\n",
205+
"# Use the chunks of documents and the storage_context to create the index\n",
206+
"index = VectorStoreIndex.from_documents(\n",
207+
" documents, \n",
208+
" storage_context=storage_context\n",
209+
")"
210+
]
211+
},
212+
{
213+
"cell_type": "markdown",
214+
"metadata": {},
215+
"source": [
216+
"Step 7: Query Pinecone vector store\n",
217+
"\n",
218+
"Now the contents of the URL are converted to embeddings and stored in the Pinecone index.\n",
219+
"Let's perform a similarity search by querying the index"
220+
]
221+
},
222+
{
223+
"cell_type": "code",
224+
"execution_count": 18,
225+
"metadata": {},
226+
"outputs": [],
227+
"source": [
228+
"# query pinecone index for similar embeddings\n",
229+
"query_engine = index.as_query_engine()"
230+
]
231+
},
232+
{
233+
"cell_type": "code",
234+
"execution_count": 19,
235+
"metadata": {},
236+
"outputs": [],
237+
"source": [
238+
"gemini_response = query_engine.query(\"What are the plans covered under Rooftop solarization and muft bijli?\")"
239+
]
240+
},
241+
{
242+
"cell_type": "code",
243+
"execution_count": 20,
244+
"metadata": {},
245+
"outputs": [
246+
{
247+
"name": "stdout",
248+
"output_type": "stream",
249+
"text": [
250+
"Through rooftop solarization, one crore households will be enabled to obtain up to 300 units free electricity every month.\n"
251+
]
252+
}
253+
],
254+
"source": [
255+
"# print response\n",
256+
"print(gemini_response)"
257+
]
258+
},
259+
{
260+
"cell_type": "code",
261+
"execution_count": null,
262+
"metadata": {},
263+
"outputs": [],
264+
"source": []
265+
}
266+
],
267+
"metadata": {
268+
"kernelspec": {
269+
"display_name": "Python 3",
270+
"language": "python",
271+
"name": "python3"
272+
},
273+
"language_info": {
274+
"codemirror_mode": {
275+
"name": "ipython",
276+
"version": 3
277+
},
278+
"file_extension": ".py",
279+
"mimetype": "text/x-python",
280+
"name": "python",
281+
"nbconvert_exporter": "python",
282+
"pygments_lexer": "ipython3",
283+
"version": "3.11.0"
284+
}
285+
},
286+
"nbformat": 4,
287+
"nbformat_minor": 2
288+
}

‎requirements.txt

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
llama-index
2+
llama-index-llms-gemini
3+
llama-index-vector-stores-pinecone
4+
llama-index-embeddings-gemini
5+
llama-index-readers-file

0 commit comments

Comments
 (0)
Please sign in to comment.