Skip to content

Commit 0159c93

Browse files
authored
Merge pull request #64 from vanna-ai/update-train-documentation
Add vn-ask notebook
2 parents 3c22af0 + 2c8edf9 commit 0159c93

19 files changed

+1061
-2055
lines changed

docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,6 @@ Vanna provides additional functionality to manage your training data to maintain
1616

1717
## Where can I use **Vanna.AI**?
1818
- Use in a [Streamlit app](streamlit.md)
19-
- Use in [Jupyter Notebooks](notebooks/vn-starter.md)
19+
- Use in [Jupyter Notebooks](notebooks/vn-ask.md)
2020
- Add a Slack bot that responds to `/askvanna [question]` (coming soon)
2121
- Use in a Python app

docs/intro-to-vanna.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Intro to Vanna: A Python-based AI SQL co-pilot
22

3-
**TLDR**: We help data people that know Python write SQL faster using AI. [See our starter notebook here](notebooks/vn-starter.md).
3+
**TLDR**: We help data people that know Python write SQL faster using AI. [See our starter notebook here](notebooks/vn-ask.md).
44

55
## The deluge of data
66

@@ -15,7 +15,7 @@ Since you are reading this, chances are you are one of those fortunate few (data
1515

1616
## Introducing Vanna, the SQL co-pilot
1717

18-
Vanna, at its core, is a co-pilot to Python & SQL savvy data people to to streamline the process of writing custom SQL on your company’s data warehouse using AI and LLMs. Most of our users use our Python package directly via Jupyter Notebooks ([starter notebook here](notebooks/vn-starter.md)) —
18+
Vanna, at its core, is a co-pilot to Python & SQL savvy data people to to streamline the process of writing custom SQL on your company’s data warehouse using AI and LLMs. Most of our users use our Python package directly via Jupyter Notebooks ([starter notebook here](notebooks/vn-ask.md)) —
1919

2020
```python
2121
sql = vn.generate_sql(question='What are the top 10 customers by Sales?')

docs/notebooks/plot1.png

32.8 KB
Loading

docs/notebooks/plot2.png

40.2 KB
Loading

docs/notebooks/plot3.png

28.6 KB
Loading

docs/notebooks/vn-ask.md

Lines changed: 284 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,284 @@
1+
![Vanna AI](https://img.vanna.ai/vanna-full.svg)
2+
3+
This notebook will help you unleash the full potential of AI-powered data analysis at your organization. We'll go through how to "bulk train" Vanna and generate SQL, tables, charts, and explanations, all with minimal code and effort. For more about Vanna, see our [intro blog post](https://medium.com/vanna-ai/intro-to-vanna-a-python-based-ai-sql-co-pilot-218c25b19c6a).
4+
5+
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vanna-ai/vanna-py/blob/main/notebooks/vn-full.ipynb)
6+
7+
[![Open in GitHub](https://img.vanna.ai/github.svg)](https://github.com/vanna-ai/vanna-py/blob/main/notebooks/vn-full.ipynb)
8+
9+
# Install Vanna
10+
First we install Vanna from [PyPI](https://pypi.org/project/vanna/) and import it.
11+
Here, we'll also install the Snowflake connector. If you're using a different database, you'll need to install the appropriate connector.
12+
13+
14+
```python
15+
%pip install vanna
16+
%pip install snowflake-connector-python
17+
```
18+
19+
20+
```python
21+
import vanna as vn
22+
import snowflake.connector
23+
import pandas as pd
24+
```
25+
26+
# Login
27+
Creating a login and getting an API key is as easy as entering your email (after you run this cell) and entering the code we send to you. Check your Spam folder if you don't see the code.
28+
29+
30+
```python
31+
api_key = vn.get_api_key('[email protected]')
32+
vn.set_api_key(api_key)
33+
```
34+
35+
# Set your Model
36+
You need to choose a globally unique model name. Try using your company name or another unique string. All data from models are isolated - there's no leakage.
37+
38+
39+
```python
40+
vn.set_model('my-model') # Enter your dataset name here. This is a globally unique identifier for your dataset.
41+
```
42+
43+
# Set Database Connection
44+
These details are only referenced within your notebook. These database credentials are never sent to Vanna's severs.
45+
46+
47+
```python
48+
vn.connect_to_snowflake(account='my-account', username='my-username', password='my-password', database='my-database')
49+
```
50+
51+
# Get Results
52+
This gets the SQL, gets the dataframe, and prints them both. Note that we use your connection string to execute the SQL on your warehouse from your local instance. Your connection nor your data gets sent to Vanna's servers. For more info on how Vanna works, [see this post](https://medium.com/vanna-ai/how-vanna-works-how-to-train-it-data-security-8d8f2008042).
53+
54+
55+
```python
56+
vn.ask("What are the top 10 customers by sales?")
57+
```
58+
59+
SELECT c.c_name as customer_name,
60+
sum(l.l_extendedprice * (1 - l.l_discount)) as total_sales
61+
FROM snowflake_sample_data.tpch_sf1.lineitem l join snowflake_sample_data.tpch_sf1.orders o
62+
ON l.l_orderkey = o.o_orderkey join snowflake_sample_data.tpch_sf1.customer c
63+
ON o.o_custkey = c.c_custkey
64+
GROUP BY customer_name
65+
ORDER BY total_sales desc limit 10;
66+
67+
![plot1](plot1.png)
68+
69+
70+
AI-generated follow-up questions:
71+
What are the countries of the top 10 customers by sales?
72+
How many orders did each of the top 10 customers place?
73+
What is the average sales amount per customer in the top 10?
74+
Can you provide a breakdown of the sales by country for the top 10 customers?
75+
Who are the top 10 customers in terms of returned parts gross value?
76+
What are the total sales for each customer in the top 3?
77+
Can you provide a breakdown of the sales by region for the top customers?
78+
How many customers are there in each country?
79+
What is the total revenue for the top 5 countries?
80+
Can you provide a breakdown of the sales by customer for the top 5 countries?
81+
82+
83+
84+
```python
85+
vn.ask("Which 5 countries have the highest sales?")
86+
```
87+
88+
SELECT n.n_name as country_name,
89+
sum(l.l_extendedprice * (1 - l.l_discount)) as total_sales
90+
FROM snowflake_sample_data.tpch_sf1.nation n join snowflake_sample_data.tpch_sf1.customer c
91+
ON n.n_nationkey = c.c_nationkey join snowflake_sample_data.tpch_sf1.orders o
92+
ON c.c_custkey = o.o_custkey join snowflake_sample_data.tpch_sf1.lineitem l
93+
ON o.o_orderkey = l.l_orderkey
94+
GROUP BY country_name
95+
ORDER BY total_sales desc limit 5;
96+
97+
98+
99+
<div>
100+
<style scoped>
101+
.dataframe tbody tr th:only-of-type {
102+
vertical-align: middle;
103+
}
104+
105+
.dataframe tbody tr th {
106+
vertical-align: top;
107+
}
108+
109+
.dataframe thead th {
110+
text-align: right;
111+
}
112+
</style>
113+
<table border="1" class="dataframe">
114+
<thead>
115+
<tr style="text-align: right;">
116+
<th></th>
117+
<th>COUNTRY_NAME</th>
118+
<th>TOTAL_SALES</th>
119+
</tr>
120+
</thead>
121+
<tbody>
122+
<tr>
123+
<th>0</th>
124+
<td>FRANCE</td>
125+
<td>8960205391.8314</td>
126+
</tr>
127+
<tr>
128+
<th>1</th>
129+
<td>INDONESIA</td>
130+
<td>8942575217.6237</td>
131+
</tr>
132+
<tr>
133+
<th>2</th>
134+
<td>RUSSIA</td>
135+
<td>8925318302.0710</td>
136+
</tr>
137+
<tr>
138+
<th>3</th>
139+
<td>MOZAMBIQUE</td>
140+
<td>8892984086.0088</td>
141+
</tr>
142+
<tr>
143+
<th>4</th>
144+
<td>JORDAN</td>
145+
<td>8873862546.7864</td>
146+
</tr>
147+
</tbody>
148+
</table>
149+
</div>
150+
151+
152+
![plot2](plot2.png)
153+
154+
155+
AI-generated follow-up questions:
156+
What are the total sales for each country?
157+
Which country has the highest number of customers?
158+
What are the total sales for each customer?
159+
What are the top 3 customers with the highest sales?
160+
What is the total revenue for each customer and country?
161+
What are the total sales for each customer in Europe?
162+
What are the top 10 countries with the highest total order amount?
163+
Which country has the highest number of failed orders?
164+
What are the top 3 customers with the highest sales?
165+
166+
167+
168+
```python
169+
vn.ask("Who are the top 2 biggest customers in each region?")
170+
```
171+
172+
with ranked_customers as (SELECT c.c_name as customer_name,
173+
r.r_name as region_name,
174+
row_number() OVER (PARTITION BY r.r_name
175+
ORDER BY sum(l.l_quantity * l.l_extendedprice) desc) as rank
176+
FROM snowflake_sample_data.tpch_sf1.customer c join snowflake_sample_data.tpch_sf1.orders o
177+
ON c.c_custkey = o.o_custkey join snowflake_sample_data.tpch_sf1.lineitem l
178+
ON o.o_orderkey = l.l_orderkey join snowflake_sample_data.tpch_sf1.nation n
179+
ON c.c_nationkey = n.n_nationkey join snowflake_sample_data.tpch_sf1.region r
180+
ON n.n_regionkey = r.r_regionkey
181+
GROUP BY customer_name, region_name)
182+
SELECT region_name,
183+
customer_name
184+
FROM ranked_customers
185+
WHERE rank <= 2;
186+
187+
188+
189+
<div>
190+
<style scoped>
191+
.dataframe tbody tr th:only-of-type {
192+
vertical-align: middle;
193+
}
194+
195+
.dataframe tbody tr th {
196+
vertical-align: top;
197+
}
198+
199+
.dataframe thead th {
200+
text-align: right;
201+
}
202+
</style>
203+
<table border="1" class="dataframe">
204+
<thead>
205+
<tr style="text-align: right;">
206+
<th></th>
207+
<th>REGION_NAME</th>
208+
<th>CUSTOMER_NAME</th>
209+
</tr>
210+
</thead>
211+
<tbody>
212+
<tr>
213+
<th>0</th>
214+
<td>ASIA</td>
215+
<td>Customer#000102022</td>
216+
</tr>
217+
<tr>
218+
<th>1</th>
219+
<td>ASIA</td>
220+
<td>Customer#000148750</td>
221+
</tr>
222+
<tr>
223+
<th>2</th>
224+
<td>AMERICA</td>
225+
<td>Customer#000095257</td>
226+
</tr>
227+
<tr>
228+
<th>3</th>
229+
<td>AMERICA</td>
230+
<td>Customer#000091630</td>
231+
</tr>
232+
<tr>
233+
<th>4</th>
234+
<td>EUROPE</td>
235+
<td>Customer#000028180</td>
236+
</tr>
237+
<tr>
238+
<th>5</th>
239+
<td>EUROPE</td>
240+
<td>Customer#000053809</td>
241+
</tr>
242+
<tr>
243+
<th>6</th>
244+
<td>MIDDLE EAST</td>
245+
<td>Customer#000143500</td>
246+
</tr>
247+
<tr>
248+
<th>7</th>
249+
<td>MIDDLE EAST</td>
250+
<td>Customer#000103834</td>
251+
</tr>
252+
<tr>
253+
<th>8</th>
254+
<td>AFRICA</td>
255+
<td>Customer#000131113</td>
256+
</tr>
257+
<tr>
258+
<th>9</th>
259+
<td>AFRICA</td>
260+
<td>Customer#000134380</td>
261+
</tr>
262+
</tbody>
263+
</table>
264+
</div>
265+
266+
267+
![plot3](plot3.png)
268+
269+
270+
AI-generated follow-up questions:
271+
- What are the total sales for each customer in Europe?
272+
- What are the total sales for each customer in the United States?
273+
- How many customers are there in each country?
274+
- What is the total revenue for each customer in each country?
275+
- Which customers have the highest total sales?
276+
- Which customers have the highest number of orders?
277+
- Which customers have the highest returned parts gross value in Africa?
278+
- What are the total sales for the top 3 customers?
279+
- What are the total sales for the top 10 customers?
280+
- What is the total sales for each customer?
281+
282+
283+
# Run as a Web App
284+
If you would like to use this functionality in a web app, you can deploy the Vanna Streamlit app and use your own secrets. See [this repo](https://github.com/vanna-ai/vanna-streamlit).

0 commit comments

Comments
 (0)