Open
Description
For example, a user query ...
Assume that csv_file.csv has the following metadata
string,uint,float
csv(csv_file.csv)
| project state = $0 as string, age = $1 as uint, income = $2 as float
| where age >= 20
| project state, age_group = age / 10, income
| group by [state, age_group]
| project state, age_group,
population = count(*), sum_income = sum(income),
max_income = max(income), min_income = min(income),
avg_income = average(income), median_income = median(income)
| order by median_income desc
| limit 10
csv(csv_file.csv, csv_file_meta.txt)
| where age >= 20
| project state, age_group = age / 10, income
| group by [state, age_group]
| project state, age_group,
population = count(*), sum_income = sum(income),
max_income = max(income), min_income = min(income),
avg_income = average(income), median_income = median(income)
| order by median_income desc
| limit 10
The above query can be translated into the following execution tree
top_n_sort : state, age_group, population, sum_income, max_income, min_income,
| avg_income, median_income with limit(10) & sort by median_income desc
|
+-- project : state, age_group, population, sum_income, max_income, min_income,
| avg_income = sum_income / convert(population, double),
| median_income
|
+-- hash_agg : group [state, age_group],
| aggregate [population = count(*), sum_income = sum(income),
| max_income = max(income), min_income = min(income),
| median_income = median(income)]
|
+-- csv_file_scanner : state = $0, age_group = convert($1, uint) / 10,
income = convert($2, float) with filter(age_group >= 2)