Skip to content

Conversation

@ivakegg
Copy link
Collaborator

@ivakegg ivakegg commented Oct 31, 2025

date partitioned query planner

jschmidt10
jschmidt10 previously approved these changes Nov 7, 2025
DefaultQueryPlanner subPlan = basePlanner.clone();

// Get the range stream for the new date range and query
return subPlan.reprocess(subPlanConfig, subPlanConfig.getQuery(), scannerFactory);
Copy link
Collaborator

@apmoriarty apmoriarty Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about the memory impact of starting a range stream and hanging onto the reference. I suppose if this solution isn't tenable we could always grab the range stream, verify it has a hit, and then close it -- marking this subplan as 'has data'.

The point is this: even though the concurrency is limited for how many range streams are executing at one point in time, and even though the scanners close between next calls, we still have the entire object in memory and it ain't cheap.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we may need to reduce the concurrency down to 1 if that is an issue. This is a straight forward tradeoff between memory and speed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not fully grasping what you are talking about as a solution. If it has a hit, and we close it, then we can't get the hits. What am I missing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After our discussion I understand what you were getting at. Essentially the Intersection/Union tree for each partition will be held in memory. So if we have a very large query and several partitions, that in itself could become quite memory intensive.
I am now thinking that I should do these subplans within the range stream so that they are processed as needed. instead of all up front.
I am going to keep this PR up but I will work on an alternative solution.

jschmidt10
jschmidt10 previously approved these changes Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants