Optimize join tables from different databases: executor by ea-rus · Pull Request #10146 · mindsdb/mindsdb

ea-rus · 2024-11-11T14:04:09Z

Description

Updates:

Subselect step is used to get value from previous step data.

select distinct <column> from <step data1>

Next step is fetching data using these values as filter

select * from db2.table2 where <column> in (<ids from previous step>)

Side fix:
If join is without condition:

add 0=0 filter (multiply rows)
but use limitation. If expected number of rows is exceed limit - raise exception

Dependent on mindsdb/mindsdb_sql#412

Fixes #issue_number

Type of change

⚡ New feature (non-breaking change which adds functionality)

Verification Process

To ensure the changes are working as expected:

Test Location: Specify the URL or path for testing.
Verification Steps: Outline the steps or queries needed to validate the change. Include any data, configurations, or actions required to reproduce or see the new functionality.

Additional Media:

I have attached a brief loom video or screenshots showcasing the new functionality or change.

Checklist:

[x My code follows the style guidelines(PEP 8) of MindsDB.
I have appropriately commented on my code, especially in complex areas.
Necessary documentation updates are either made or tracked in issues.
Relevant unit and integration tests are updated or added.

StpMax · 2024-11-12T09:07:33Z

mindsdb/api/executor/sql_query/steps/join_step.py

            if step.query.condition is None:
-                raise NotSupportedYet('Unable to join table without condition')
+                # prevent memory overflow
+                if len(left_data) * len(right_data) < 10 ** 7:


Are left_data and right_data dataframes? If so, then may be better to get real size (df.memory_usage(index=True, deep=True).sum()) and compare with free memory?

They are ResultSets

ea-rus added 5 commits November 11, 2024 16:29

if join is without condition - multiply rows. but with limitation

a5c3779

support values from CTE in subselect step

b7c2a05

Merge branch 'main' into cte-support

74f5bdb

join tables test

24e6be4

join tables test

77f6768

ea-rus requested a review from StpMax November 11, 2024 15:09

StpMax approved these changes Nov 12, 2024

View reviewed changes

updated mindsdb sql

e9ef44d

ZoranPandovski merged commit f71ba0c into main Nov 14, 2024

ZoranPandovski deleted the cte-support branch November 14, 2024 13:28

mindsdb locked and limited conversation to collaborators Nov 14, 2024

ZoranPandovski temporarily deployed to alpha-dev November 14, 2024 13:39 — with GitHub Actions Inactive

ZoranPandovski temporarily deployed to dev November 14, 2024 13:39 — with GitHub Actions Inactive

ZoranPandovski temporarily deployed to hackathon November 14, 2024 13:39 — with GitHub Actions Inactive

ZoranPandovski temporarily deployed to staging November 14, 2024 13:39 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize join tables from different databases: executor#10146

Optimize join tables from different databases: executor#10146
ZoranPandovski merged 6 commits intomainfrom
cte-support

ea-rus commented Nov 11, 2024

Uh oh!

StpMax Nov 12, 2024

Uh oh!

ea-rus Nov 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ea-rus commented Nov 11, 2024

Description

Type of change

Verification Process

Additional Media:

Checklist:

Uh oh!

StpMax Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

ea-rus Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants