Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] fix slot id conflicts in array_map (backport #52692) #52759

Closed
wants to merge 1 commit into from

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Nov 8, 2024

Why I'm doing:

What I'm doing:

Before we rewrite the lambda expression, we need to find the max slot id currently used, and the slot id of new rewritten ColumnRef needs to be allocated from then on.

In the previous implementation, max_used_slot_id only counts the slot ids used in the lambda expression.
If multiple children of the same expression contain lambda expression, such as coalesce(element_at(array_map(x->x+any_match(array_map(x->x<10,arr_1)), arr_1), 1),element_at(array_map(x->x+any_match(array_map(x->x<10,arr_2)), arr_2),1)), then there is a possibility of duplicate allocation of slot ids.

In this PR, I made some changes to avoid this problem:

  1. When allocating slot ids, allocate from the maximum slot id used by the entire expression tree to avoid conflicts in slot ids within the expression tree. But this is not global after all. If there are multiple expression trees doing the same thing, there may still be problems with duplicate allocations.
  2. Considering that the result of outer common expression will only be used locally by the lambda function, as long as we avoid polluting the input chunk, it doesn’t matter even if the local slot id is repeated, so we choose to use a temporary chunk to execute the lambda expression.

Note: This is just a temporary solution and a bit ugly. When we support expression reuse strategy of scan predicates, the slot id allocation of common expressions will be handed over to FE, which has a global perspective and will definitely ensure that there will be no duplicate allocations. This is the most elegant solution. I will support it later.

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.3
    • 3.2
    • 3.1
    • 3.0
    • 2.5

This is an automatic backport of pull request #52692 done by [Mergify](https://mergify.com). ## Why I'm doing:

What I'm doing:

Before we rewrite the lambda expression, we need to find the max slot id currently used, and the slot id of new rewritten ColumnRef needs to be allocated from then on.

In the previous implementation, max_used_slot_id only counts the slot ids used in the lambda expression.
If multiple children of the same expression contain lambda expression, such as coalesce(element_at(array_map(x->x+any_match(array_map(x->x<10,arr_1)), arr_1), 1),element_at(array_map(x->x+any_match(array_map(x->x<10,arr_2)), arr_2),1)), then there is a possibility of duplicate allocation of slot ids.

In this PR, I made some changes to avoid this problem:

  1. When allocating slot ids, allocate from the maximum slot id used by the entire expression tree to avoid conflicts in slot ids within the expression tree. But this is not global after all. If there are multiple expression trees doing the same thing, there may still be problems with duplicate allocations.
  2. Considering that the result of outer common expression will only be used locally by the lambda function, as long as we avoid polluting the input chunk, it doesn’t matter even if the local slot id is repeated, so we choose to use a temporary chunk to execute the lambda expression.

Note: This is just a temporary solution and a bit ugly. When we support expression reuse strategy of scan predicates, the slot id allocation of common expressions will be handed over to FE, which has a global perspective and will definitely ensure that there will be no duplicate allocations. This is the most elegant solution. I will support it later.

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Signed-off-by: silverbullet233 <[email protected]>
(cherry picked from commit 59dc2e1)

# Conflicts:
#	be/src/exprs/array_map_expr.cpp
#	be/src/exprs/expr.cpp
Copy link
Contributor Author

mergify bot commented Nov 8, 2024

Cherry-pick of 59dc2e1 has failed:

On branch mergify/bp/branch-3.2/pr-52692
Your branch is up to date with 'origin/branch-3.2'.

You are currently cherry-picking commit 59dc2e14df.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   be/src/exprs/expr.h
	modified:   be/src/exprs/lambda_function.cpp
	modified:   be/src/exprs/lambda_function.h
	modified:   test/sql/test_array_fn/R/test_array_map_2
	modified:   test/sql/test_array_fn/T/test_array_map_2

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   be/src/exprs/array_map_expr.cpp
	both modified:   be/src/exprs/expr.cpp

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

Copy link
Contributor Author

mergify bot commented Nov 8, 2024

@mergify[bot]: Backport conflict, please reslove the conflict and resubmit the pr

@mergify mergify bot closed this Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant