Skip to content

HIVE-28576: Add jdbc tests for tpcds queries #5510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

soumyakanti3578
Copy link
Contributor

What changes were proposed in this pull request?

Add a new test driver for jdbc tests TestMiniLlapLocalJdbcCliDriver to create tpcds tables in dockerized postgres, and create corresponding external tables. Generate explain plans for tpcds queries.

Why are the changes needed?

To improve JDBC test coverage

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

mvn test -pl itests/qtest  -Pitests -Dtest=TestMiniLlapLocalJdbcCliDriver -Dtest.output.overwrite=true

@deniskuzZ
Copy link
Member

deniskuzZ commented Oct 24, 2024

@soumyakanti3578, isn't that already covered by TestTezTPCDS30TBPerfCliDriver?

@soumyakanti3578
Copy link
Contributor Author

@deniskuzZ No, TestTezTPCDS30TBPerfCliDriver generates logical plans and uses a stats dump. This is different because here we actually create tpcds tables in Postgres, create external tables in Hive corresponding to the tables in Postgres, and then generate the jdbc plans.

@deniskuzZ
Copy link
Member

deniskuzZ commented Oct 28, 2024

@deniskuzZ No, TestTezTPCDS30TBPerfCliDriver generates logical plans and uses a stats dump. This is different because here we actually create tpcds tables in Postgres, create external tables in Hive corresponding to the tables in Postgres, and then generate the jdbc plans.

hi @soumyakanti3578 , why not simply parameterize the db to run tests against? default is derby, which could be overridden. why create a separate driver for that?

Copy link

@soumyakanti3578
Copy link
Contributor Author

soumyakanti3578 commented Oct 29, 2024

@deniskuzZ

why not simply parameterize the db to run tests against?

Only option I saw was to add something like --!qt:database:postgres:<script-to-run-in-postgres>. Did you have this in mind as well? I didn't go with this approach because I need to run all tpcds queries on external tables. There are 2 ways to do this:

  1. Have a single qfile - test_all_tpcds_queries_postgres.q with --!qt:database:postgres:q_test_tpcds_tables.postgres.sql (q_test_tpcds_tables.postgres.sql) on the top of the file to create all tpcds queries in postgres. Then we would need the contents of q_test_external_tpcds_tables_postgres.q in test_all_tpcds_queries_postgres.q to create all external tables. And then, we will need all tpcds queries too. This would be a huge file, and the resulting qout file could be very difficult to maintain too.
  2. Other approach that I can think of is to have individual q files for each tpcds query, but then we will need to create <script-to-run-in-postgres> for each one of them if we want to only create the required tables in postgres. We can also just create all tpcds tables for each q file but that would be inefficient I guess. We would still need to create external tables in each q file. For example, for cbo_query3.q, we would need to create 3 jdbc tables (date_dim, store_sales, item) in <script-to-run-in-postgres-for-cbo_query3>, and 3 external tables in cbo_query3_postgres.q, before the actual query. This could become very difficult to maintain too.

With the new driver, we can configure CliConfigs.java to run 1 file to create all tpcds tables once with

setJdbcInitScript("q_test_tpcds_tables.postgres.sql");

and run 1 file to create all external tables once with

setExternalTablesForJdbcInitScript("q_test_external_tpcds_tables_postgres.q");

and in CoreJdbcCliDriver.java, we can launch the docker container before starting the tests with beforeClass(), run all the tests, then stop the docker container in shutdown() method.

I think this approach is easier to maintain.

But please let me know what your thoughts are regarding this, and if you have something else in mind too!

Thanks!

@deniskuzZ
Copy link
Member

hi @soumyakanti3578, sorry for not replying, please give me some time, I am overwhelmed with diff requests atm

@soumyakanti3578
Copy link
Contributor Author

@deniskuzZ No problem at all. Thanks for looking into this :)

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Feel free to reach out on the [email protected] list if the patch is in need of reviews.

@github-actions github-actions bot added the stale label Dec 31, 2024
@github-actions github-actions bot closed this Jan 7, 2025
@deniskuzZ
Copy link
Member

deniskuzZ commented Feb 9, 2025

hi @soumyakanti3578, sorry, I didn't grasp the meaning of those tests at first. I'll reopen the PR.
Basically, that would cover the tpcds queries on external JDBC tables, right?

Would it make sence to rename init script like:

q_test_tpcds_schema.postgres.sql, 
q_test_tpcds_extDB_schema-postgres.sql

LOG.info("No jdbc init script detected. Skipping");
return;
}
externalDB = QTestDatabaseHandler.DatabaseType.valueOf("POSTGRES").create();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reuse QTestDatabaseHandler here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I don't think we can reuse QTestDatabaseHandler here.
We want to create a DB in @BeforeClass method so that the tables are available for all 200 tests. Otherwise we will have to create tpcds tables and corresponding hive external tables again and again for each test file.

QTestDatabaseHandler is set up to create DB and launch docker for each individual test, as it is done in beforeTest method.

Please let me know if I am missing something, and if there is a better way to handle this.

Copy link
Member

@deniskuzZ deniskuzZ Mar 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QTestDatabaseHandler dbHandler = qt.getDatabaseHandler();
externalDB = dbHandler.initDb("postgres", scriptFile)

Copy link

sonarqubecloud bot commented Feb 9, 2025

@soumyakanti3578
Copy link
Contributor Author

Thanks @deniskuzZ for reopening the PR. I will look into your suggestions, and also the failure, and revive this PR soon. :)

@deniskuzZ
Copy link
Member

hi @soumyakanti3578, please ping me directly, I might miss GitHub notifications.

Copy link

@@ -345,6 +347,22 @@ protected void setInitScript(String initScript) {
this.initScript = initScript;
}
}

public String getJdbcInitScript() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we reuse the existing initScript?

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Feel free to reach out on the [email protected] list if the patch is in need of reviews.

@github-actions github-actions bot added the stale label May 28, 2025
@github-actions github-actions bot closed this Jun 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants