Skip to content

Add abilty to yield in Ivarators, AndIterator, OrIterator and return metrics before yield (#704) #2042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 15, 2025

Conversation

billoley
Copy link
Collaborator

@billoley billoley commented Jul 21, 2023

Add abilty to yield in Ivarators, AndIterator, OrIterator and return metrics before yield (#704)

WaitWindowObserver object tracks the remaining time before a yield and has a bunch of convenience methods for manipulating keys. !YIELD_AT_BEGIN and \uffffYIELD_AT_END are used to yield either before a given key or after a given key. They are used most often in the colFam but sometimes in the colQual if the query is using non-sorted UIDs as an optimization. The ! sorts before alphanumeric characters and the \uffff sorts after alphanumerics. The rest of the strings are for easy identification.

The remaining time is tracked in the WaitWindowObserver using a separate thread to minimize the calls to System.currentTimeMillis.

When time is expired, a WaitWindowOverrunException is thrown and then caught/propagated (determining the correct yieldKey) at each AndIterator/OrIterator up to the WaitWindowOverseerIterator which is under either the SerialIterator of the PipelineIterator but above the rest of the boolean stack of iterators.

If collectTimingDetails=false, we simply yield when either the SerialIterator of the PipelineIterator detects a WaitWindowOverrun. If collectTimingDetails=true, then we return a document with a WAIT_WINDOW_OVERRUN attribute and a TIMING_METADATA that hands off its souce, next, yield, etc metrics to the DocumentTransformerSupport class and then gets ignored. On the next call to the QueryIterator, the hasTop method checks with the WaitWindowObserver and yields.

IveratorFutures are now tracked in IteratorThreadPoolManager so that when an Ivarator yields before fillSortedSet is completed for all ranges, a post-yield call can reclaim the HDFS-backed sorted set from the IvaratorFuture.

Ivarator yielding and timeouts:

Yielding is taken care of by checks in WaitWindowObserver and uses property query.iterator.yield.threshold.ms

IteratorThreadPoolManager has a timer-based check that an Ivarator has not been in fillSortedSet more than ivaratorCacheScanTimeout which is set by the logic property IvaratorCacheScanTimeoutMinutes which is set by the property query.max.call.time.minutes (default 60 minutes). When an Ivarator suspends and resumes, the start time is maintained such that this check operates against the cumulative time in fillSortedSet for that Ivarator.

IteratorThreadPoolManager has a timer-based check that an IvaratorRunnable (used in fillSortedSet to fill the set from part of the Range) is not running for greater than tserver.datawave.ivarator.runnableTimeoutMinutes which is set in Accumulo properties and checked frequently for changes on a timer. The default is also 60 minutes -- so it will likely never be involked, but can be used as a failsafe. It can also be set lower temporarily to force shutdown of all IvaratorRunnables.

IteratorThreadPoolManager also uses the tserver.datawave.ivarator.runnableTimeoutMinutes property to ensure that IvaratorFuture objectd are evicted from the Caffeine cache after 1.1 * that setting. These objects are removed when an Ivarator completes but could be abandoned if an Ivarator is suspended on yield and then not resumed.

Included fixes to the PipelineIterator discovered during testing:

  1. When yielding, PipelineIterator should evaluate all possible keys to find the lowest, including the evaluationQueue, results, and yieldKey from WaitWindowOverrun. This could have caused missed data when yielding in the PipelineIterator

  2. Must fill the evaluationQueue after calling flushCompletedResults in getNext otherwise flushCompletedResults might empty the evaluationQueue with no valid results and on the next call to getNext, results will be empty and when cacheNextResult is called, evaluationQueue will be empty and PipelineIterator declares itself done. This could have caused missed data when yielding in the PipelineIterator

  3. Pipeline should save any caught Exception so that the PipelineIterator can propagate the Exception when calling getResult. This is particularly important when running Pipline in an Executor via the PipelineIterator. Otherwise, the evaluation of that Document just returns null which is the same as a Document not matching. The SerialIterator runs the Pipline.run() method inline and was not subject to this bug.

QueryIteratorIT and extended classes WaitWindowQueryIteratorSerialIT and WaitWindowQueryIteratorPipelineIT use a test harness that more closely simulates Accumulo's LookupTask.

IvaratorYieldingTest now tests Pipeline/Serial, sortedUIDs/unsortedUIDs, and collectTimingDetails T/F for 8 variations instead of the previous 2 variations Serial + sortedUIDs/unsortedUIDs.

@billoley billoley force-pushed the feature/issue-704 branch 2 times, most recently from 5be0608 to 1935337 Compare October 3, 2023 13:28
apmoriarty
apmoriarty previously approved these changes Nov 3, 2023
@billoley billoley force-pushed the feature/issue-704 branch 2 times, most recently from cf0b765 to 32e9748 Compare April 18, 2025 15:02
@billoley billoley force-pushed the feature/issue-704 branch 2 times, most recently from f978af9 to fd8ea92 Compare April 23, 2025 19:47
@billoley billoley requested review from ivakegg and apmoriarty April 23, 2025 19:48
apmoriarty
apmoriarty previously approved these changes Apr 24, 2025
Copy link
Collaborator

@apmoriarty apmoriarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted a few preferences, otherwise looks good.

apmoriarty
apmoriarty previously approved these changes May 7, 2025
Copy link
Collaborator

@ivakegg ivakegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have to complete this review, but here is a start.

@billoley billoley force-pushed the feature/issue-704 branch from 320da8b to fc01e41 Compare May 12, 2025 18:47
cq.substring(fieldnameIndex + 1) + '\0' + cf + '\0');
Key startKey = r.getStartKey();
if (!sortedUIDs) {
String cq = WaitWindowObserver.removeMarkers(startKey.getColumnQualifier()).toString();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a little more inline documentation as to what we are doing when we update the WaitWindowObserver?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add more documentation as necessary. However, this line is a static call to strip the YIELD_AT_BEGIN or YIELD_AT_END marker from the colQual (if it exists). The WaitWindowObserver itself is not affected.

@hgklohr hgklohr added this pull request to the merge queue May 15, 2025
Merged via the queue into integration with commit eee4940 May 15, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants