What are the advantages of using Presto on single data source? #15445

matanper · 2020-11-17T16:06:32Z

matanper
Nov 17, 2020

If I have only a single data source (like Kudu, Mongodb, Druid, Cassandra, etc...) are there any advantages to query from it using presto instead of directly?

Answered by bitsondatadev

Dec 28, 2020

@matanper I actually answered a question about this on the Presto Community Broadcast but the question was geared more towards RDBMSes. The general answer to this is there is very little benefit to adding Presto over a single data source unless you are replacing the Hive runtime with a Presto distribution (see more on hive connector). Outside of this use case that Presto was built around, you generally won't get any extra performance benefits of adding a distributed query engine over another distributed database. These databases typically have better optimizations (indexes and cacheing) that make them better choices when only dealing with that data source. To answer the specific questions…

View full answer

bitsondatadev · 2020-12-28T14:51:05Z

bitsondatadev
Dec 28, 2020

@matanper I actually answered a question about this on the Presto Community Broadcast but the question was geared more towards RDBMSes. The general answer to this is there is very little benefit to adding Presto over a single data source unless you are replacing the Hive runtime with a Presto distribution (see more on hive connector). Outside of this use case that Presto was built around, you generally won't get any extra performance benefits of adding a distributed query engine over another distributed database. These databases typically have better optimizations (indexes and cacheing) that make them better choices when only dealing with that data source. To answer the specific questions, some advantages to adding Presto over a single data source would be:

ANSI SQL standard over a database that has a complex Query DSL, proprietary SQL flavor, or other interface.
You're replacing or augmenting Spark, Impala, Hive to support interactive queries over big data using the hive connector. (Mentioned above)
You have the capability to run federated analytics queries over multiple data sources in the future.

This was answered for the Trino distribution (which is a different flavor of Presto formerly known as prestosql) but generally applies to all Presto distributions. Here are the show notes for the episode.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Presto

What are the advantages of using Presto on single data source? #15445

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Presto

What are the advantages of using Presto on single data source? #15445

matanper Nov 17, 2020

Replies: 1 comment

bitsondatadev Dec 28, 2020

matanper
Nov 17, 2020

bitsondatadev
Dec 28, 2020