-
I use VegaFusion to chart big DataFrames. In addition to Pandas, I want to support PySpark DataFrames. But since they might be too big to fit into the memory of a single machine, I want to outsource actual aggregation from VegaFusion directly to Spark. I'm aware that VegaFusion doesn't support this natively (as far as I understand), but I was wondering if you could point me in a direction I should look into? As I understand, in VF 1.x it could be done by using "SQL API" ( Initially I was thinking about using |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Hi @OlegWock, as you noted we had some architecture in place in version 1 for generating SQL for different dialects. This was removed for v2 because it was a significant source of complexity (given how it worked) and wasn't being used beyond duckdb. And after the optimizations in VegaFusion 2 (and probably in DataFusion as well), the duckdb path wasn't faster, so I didn't put the work into exploring the That said, I'm definitely interested in bringing this functionality back if there's demand and we develop a cleaner architecture. Here's what I'm thinking might work. We still convert the Vega transforms to DataFusion DataFrame operations, but instead of only evaluating these to arrow at the end of the transform pipeline with
I think what I'd propose is having VegaFusion restore the Datasets interface that was present in version 1. The Take a look and see what you think! |
Beta Was this translation helpful? Give feedback.
Hi @OlegWock, as you noted we had some architecture in place in version 1 for generating SQL for different dialects. This was removed for v2 because it was a significant source of complexity (given how it worked) and wasn't being used beyond duckdb. And after the optimizations in VegaFusion 2 (and probably in DataFusion as well), the duckdb path wasn't faster, so I didn't put the work into exploring the
unparse
workflow after all.That said, I'm definitely interested in bringing this functionality back if there's demand and we develop a cleaner architecture.
Here's what I'm thinking might work.
We still convert the Vega transforms to DataFusion DataFrame operations, but instead of only ev…