-
Notifications
You must be signed in to change notification settings - Fork 271
Updated the planning to be concurrent for index holes in the #3258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: integration
Are you sure you want to change the base?
Conversation
date partitioned query planner
warehouse/query-core/src/main/java/datawave/query/planner/DatePartitionedQueryPlanner.java
Outdated
Show resolved
Hide resolved
warehouse/query-core/src/main/java/datawave/query/planner/DatePartitionedQueryPlanner.java
Outdated
Show resolved
Hide resolved
warehouse/query-core/src/main/java/datawave/query/planner/DatePartitionedQueryPlanner.java
Outdated
Show resolved
Hide resolved
warehouse/query-core/src/main/java/datawave/query/planner/DatePartitionedQueryPlanner.java
Outdated
Show resolved
Hide resolved
warehouse/query-core/src/main/java/datawave/query/planner/DatePartitionedQueryPlanner.java
Outdated
Show resolved
Hide resolved
warehouse/query-core/src/main/java/datawave/query/planner/DatePartitionedQueryPlanner.java
Outdated
Show resolved
Hide resolved
warehouse/query-core/src/main/java/datawave/query/planner/DatePartitionedQueryPlanner.java
Outdated
Show resolved
Hide resolved
| DefaultQueryPlanner subPlan = basePlanner.clone(); | ||
|
|
||
| // Get the range stream for the new date range and query | ||
| return subPlan.reprocess(subPlanConfig, subPlanConfig.getQuery(), scannerFactory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about the memory impact of starting a range stream and hanging onto the reference. I suppose if this solution isn't tenable we could always grab the range stream, verify it has a hit, and then close it -- marking this subplan as 'has data'.
The point is this: even though the concurrency is limited for how many range streams are executing at one point in time, and even though the scanners close between next calls, we still have the entire object in memory and it ain't cheap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we may need to reduce the concurrency down to 1 if that is an issue. This is a straight forward tradeoff between memory and speed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not fully grasping what you are talking about as a solution. If it has a hit, and we close it, then we can't get the hits. What am I missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After our discussion I understand what you were getting at. Essentially the Intersection/Union tree for each partition will be held in memory. So if we have a very large query and several partitions, that in itself could become quite memory intensive.
I am now thinking that I should do these subplans within the range stream so that they are processed as needed. instead of all up front.
I am going to keep this PR up but I will work on an alternative solution.
date partitioned query planner