Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28576: Add jdbc tests for tpcds queries #5510

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

soumyakanti3578
Copy link
Contributor

What changes were proposed in this pull request?

Add a new test driver for jdbc tests TestMiniLlapLocalJdbcCliDriver to create tpcds tables in dockerized postgres, and create corresponding external tables. Generate explain plans for tpcds queries.

Why are the changes needed?

To improve JDBC test coverage

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

mvn test -pl itests/qtest  -Pitests -Dtest=TestMiniLlapLocalJdbcCliDriver -Dtest.output.overwrite=true

@deniskuzZ
Copy link
Member

deniskuzZ commented Oct 24, 2024

@soumyakanti3578, isn't that already covered by TestTezTPCDS30TBPerfCliDriver?

@soumyakanti3578
Copy link
Contributor Author

@deniskuzZ No, TestTezTPCDS30TBPerfCliDriver generates logical plans and uses a stats dump. This is different because here we actually create tpcds tables in Postgres, create external tables in Hive corresponding to the tables in Postgres, and then generate the jdbc plans.

@deniskuzZ
Copy link
Member

deniskuzZ commented Oct 28, 2024

@deniskuzZ No, TestTezTPCDS30TBPerfCliDriver generates logical plans and uses a stats dump. This is different because here we actually create tpcds tables in Postgres, create external tables in Hive corresponding to the tables in Postgres, and then generate the jdbc plans.

hi @soumyakanti3578 , why not simply parameterize the db to run tests against? default is derby, which could be overridden. why create a separate driver for that?

Copy link

sonarcloud bot commented Oct 28, 2024

@soumyakanti3578
Copy link
Contributor Author

soumyakanti3578 commented Oct 29, 2024

@deniskuzZ

why not simply parameterize the db to run tests against?

Only option I saw was to add something like --!qt:database:postgres:<script-to-run-in-postgres>. Did you have this in mind as well? I didn't go with this approach because I need to run all tpcds queries on external tables. There are 2 ways to do this:

  1. Have a single qfile - test_all_tpcds_queries_postgres.q with --!qt:database:postgres:q_test_tpcds_tables.postgres.sql (q_test_tpcds_tables.postgres.sql) on the top of the file to create all tpcds queries in postgres. Then we would need the contents of q_test_external_tpcds_tables_postgres.q in test_all_tpcds_queries_postgres.q to create all external tables. And then, we will need all tpcds queries too. This would be a huge file, and the resulting qout file could be very difficult to maintain too.
  2. Other approach that I can think of is to have individual q files for each tpcds query, but then we will need to create <script-to-run-in-postgres> for each one of them if we want to only create the required tables in postgres. We can also just create all tpcds tables for each q file but that would be inefficient I guess. We would still need to create external tables in each q file. For example, for cbo_query3.q, we would need to create 3 jdbc tables (date_dim, store_sales, item) in <script-to-run-in-postgres-for-cbo_query3>, and 3 external tables in cbo_query3_postgres.q, before the actual query. This could become very difficult to maintain too.

With the new driver, we can configure CliConfigs.java to run 1 file to create all tpcds tables once with

setJdbcInitScript("q_test_tpcds_tables.postgres.sql");

and run 1 file to create all external tables once with

setExternalTablesForJdbcInitScript("q_test_external_tpcds_tables_postgres.q");

and in CoreJdbcCliDriver.java, we can launch the docker container before starting the tests with beforeClass(), run all the tests, then stop the docker container in shutdown() method.

I think this approach is easier to maintain.

But please let me know what your thoughts are regarding this, and if you have something else in mind too!

Thanks!

@deniskuzZ
Copy link
Member

hi @soumyakanti3578, sorry for not replying, please give me some time, I am overwhelmed with diff requests atm

@soumyakanti3578
Copy link
Contributor Author

@deniskuzZ No problem at all. Thanks for looking into this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants