What are the advantages of using Presto on single data source? #15445
-
If I have only a single data source (like Kudu, Mongodb, Druid, Cassandra, etc...) are there any advantages to query from it using presto instead of directly? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
@matanper I actually answered a question about this on the Presto Community Broadcast but the question was geared more towards RDBMSes. The general answer to this is there is very little benefit to adding Presto over a single data source unless you are replacing the Hive runtime with a Presto distribution (see more on hive connector). Outside of this use case that Presto was built around, you generally won't get any extra performance benefits of adding a distributed query engine over another distributed database. These databases typically have better optimizations (indexes and cacheing) that make them better choices when only dealing with that data source. To answer the specific questions, some advantages to adding Presto over a single data source would be:
This was answered for the Trino distribution (which is a different flavor of Presto formerly known as prestosql) but generally applies to all Presto distributions. Here are the show notes for the episode. |
Beta Was this translation helpful? Give feedback.
@matanper I actually answered a question about this on the Presto Community Broadcast but the question was geared more towards RDBMSes. The general answer to this is there is very little benefit to adding Presto over a single data source unless you are replacing the Hive runtime with a Presto distribution (see more on hive connector). Outside of this use case that Presto was built around, you generally won't get any extra performance benefits of adding a distributed query engine over another distributed database. These databases typically have better optimizations (indexes and cacheing) that make them better choices when only dealing with that data source. To answer the specific questions…