-
Notifications
You must be signed in to change notification settings - Fork 259
Add abilty to yield in Ivarators, AndIterator, OrIterator and return metrics before yield (#704) #2042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
64a4062
to
5eb1278
Compare
5eb1278
to
b887203
Compare
warehouse/query-core/src/main/java/datawave/query/iterator/waitwindow/WaitWindowObserver.java
Outdated
Show resolved
Hide resolved
warehouse/query-core/src/main/java/datawave/query/iterator/waitwindow/WaitWindowObserver.java
Outdated
Show resolved
Hide resolved
warehouse/query-core/src/main/java/datawave/query/iterator/waitwindow/WaitWindowObserver.java
Show resolved
Hide resolved
.../query-core/src/main/java/datawave/query/iterator/waitwindow/WaitWindowOverseerIterator.java
Show resolved
Hide resolved
warehouse/query-core/src/main/java/datawave/query/iterator/waitwindow/WaitWindowObserver.java
Outdated
Show resolved
Hide resolved
warehouse/query-core/src/main/java/datawave/query/iterator/waitwindow/WaitWindowObserver.java
Outdated
Show resolved
Hide resolved
warehouse/query-core/src/main/java/datawave/query/iterator/waitwindow/WaitWindowObserver.java
Show resolved
Hide resolved
warehouse/query-core/src/main/java/datawave/core/iterators/IteratorThreadPoolManager.java
Outdated
Show resolved
Hide resolved
warehouse/query-core/src/main/java/datawave/query/exceptions/WaitWindowOverrunException.java
Outdated
Show resolved
Hide resolved
.../query-core/src/main/java/datawave/core/iterators/DatawaveFieldIndexCachingIteratorJexl.java
Outdated
Show resolved
Hide resolved
warehouse/query-core/src/main/java/datawave/query/jexl/functions/KeyAdjudicator.java
Show resolved
Hide resolved
5be0608
to
1935337
Compare
4fa6746
to
7582320
Compare
.../query-core/src/main/java/datawave/core/iterators/DatawaveFieldIndexCachingIteratorJexl.java
Outdated
Show resolved
Hide resolved
warehouse/query-core/src/main/java/datawave/query/iterator/waitwindow/WaitWindowObserver.java
Outdated
Show resolved
Hide resolved
warehouse/query-core/src/main/java/datawave/query/iterator/logic/OrIterator.java
Outdated
Show resolved
Hide resolved
warehouse/query-core/src/main/java/datawave/query/iterator/logic/AndIterator.java
Show resolved
Hide resolved
41a39a8
to
1fced91
Compare
1fced91
to
d91ca47
Compare
d91ca47
to
071254f
Compare
071254f
to
50599b9
Compare
50599b9
to
3632ca3
Compare
3632ca3
to
0d3080e
Compare
f59d712
to
fe67b2d
Compare
cf0b765
to
32e9748
Compare
f978af9
to
fd8ea92
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted a few preferences, otherwise looks good.
warehouse/query-core/src/main/java/datawave/query/iterator/pipeline/SerialIterator.java
Outdated
Show resolved
Hide resolved
warehouse/query-core/src/main/java/datawave/query/iterator/profile/QuerySpanCollector.java
Show resolved
Hide resolved
warehouse/query-core/src/test/java/datawave/query/iterator/QueryIteratorIT.java
Outdated
Show resolved
Hide resolved
warehouse/query-core/src/test/java/datawave/query/iterator/QueryIteratorIT.java
Outdated
Show resolved
Hide resolved
fd8ea92
to
b0649b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have to complete this review, but here is a start.
...use/query-core/src/main/java/datawave/core/iterators/DatawaveFieldIndexListIteratorJexl.java
Outdated
Show resolved
Hide resolved
...use/query-core/src/main/java/datawave/core/iterators/DatawaveFieldIndexListIteratorJexl.java
Outdated
Show resolved
Hide resolved
...use/query-core/src/main/java/datawave/core/iterators/DatawaveFieldIndexListIteratorJexl.java
Outdated
Show resolved
Hide resolved
...se/query-core/src/main/java/datawave/core/iterators/DatawaveFieldIndexRangeIteratorJexl.java
Outdated
Show resolved
Hide resolved
...se/query-core/src/main/java/datawave/core/iterators/DatawaveFieldIndexRangeIteratorJexl.java
Outdated
Show resolved
Hide resolved
...e/query-core/src/main/java/datawave/core/iterators/DatawaveFieldIndexFilterIteratorJexl.java
Outdated
Show resolved
Hide resolved
...e/query-core/src/main/java/datawave/core/iterators/DatawaveFieldIndexFilterIteratorJexl.java
Outdated
Show resolved
Hide resolved
.../query-core/src/main/java/datawave/core/iterators/DatawaveFieldIndexCachingIteratorJexl.java
Outdated
Show resolved
Hide resolved
warehouse/query-core/src/main/java/datawave/query/attributes/WaitWindowExceededMetadata.java
Outdated
Show resolved
Hide resolved
warehouse/query-core/src/main/java/datawave/query/iterator/waitwindow/WaitWindowObserver.java
Show resolved
Hide resolved
…metrics before yield (#704)
…es; add some documentation
320da8b
to
fc01e41
Compare
cq.substring(fieldnameIndex + 1) + '\0' + cf + '\0'); | ||
Key startKey = r.getStartKey(); | ||
if (!sortedUIDs) { | ||
String cq = WaitWindowObserver.removeMarkers(startKey.getColumnQualifier()).toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a little more inline documentation as to what we are doing when we update the WaitWindowObserver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add more documentation as necessary. However, this line is a static call to strip the YIELD_AT_BEGIN or YIELD_AT_END marker from the colQual (if it exists). The WaitWindowObserver itself is not affected.
Add abilty to yield in Ivarators, AndIterator, OrIterator and return metrics before yield (#704)
WaitWindowObserver object tracks the remaining time before a yield and has a bunch of convenience methods for manipulating keys. !YIELD_AT_BEGIN and \uffffYIELD_AT_END are used to yield either before a given key or after a given key. They are used most often in the colFam but sometimes in the colQual if the query is using non-sorted UIDs as an optimization. The ! sorts before alphanumeric characters and the \uffff sorts after alphanumerics. The rest of the strings are for easy identification.
The remaining time is tracked in the WaitWindowObserver using a separate thread to minimize the calls to System.currentTimeMillis.
When time is expired, a WaitWindowOverrunException is thrown and then caught/propagated (determining the correct yieldKey) at each AndIterator/OrIterator up to the WaitWindowOverseerIterator which is under either the SerialIterator of the PipelineIterator but above the rest of the boolean stack of iterators.
If collectTimingDetails=false, we simply yield when either the SerialIterator of the PipelineIterator detects a WaitWindowOverrun. If collectTimingDetails=true, then we return a document with a WAIT_WINDOW_OVERRUN attribute and a TIMING_METADATA that hands off its souce, next, yield, etc metrics to the DocumentTransformerSupport class and then gets ignored. On the next call to the QueryIterator, the hasTop method checks with the WaitWindowObserver and yields.
IveratorFutures are now tracked in IteratorThreadPoolManager so that when an Ivarator yields before fillSortedSet is completed for all ranges, a post-yield call can reclaim the HDFS-backed sorted set from the IvaratorFuture.
Ivarator yielding and timeouts:
Yielding is taken care of by checks in WaitWindowObserver and uses property query.iterator.yield.threshold.ms
IteratorThreadPoolManager has a timer-based check that an Ivarator has not been in fillSortedSet more than ivaratorCacheScanTimeout which is set by the logic property IvaratorCacheScanTimeoutMinutes which is set by the property query.max.call.time.minutes (default 60 minutes). When an Ivarator suspends and resumes, the start time is maintained such that this check operates against the cumulative time in fillSortedSet for that Ivarator.
IteratorThreadPoolManager has a timer-based check that an IvaratorRunnable (used in fillSortedSet to fill the set from part of the Range) is not running for greater than tserver.datawave.ivarator.runnableTimeoutMinutes which is set in Accumulo properties and checked frequently for changes on a timer. The default is also 60 minutes -- so it will likely never be involked, but can be used as a failsafe. It can also be set lower temporarily to force shutdown of all IvaratorRunnables.
IteratorThreadPoolManager also uses the tserver.datawave.ivarator.runnableTimeoutMinutes property to ensure that IvaratorFuture objectd are evicted from the Caffeine cache after 1.1 * that setting. These objects are removed when an Ivarator completes but could be abandoned if an Ivarator is suspended on yield and then not resumed.
Included fixes to the PipelineIterator discovered during testing:
When yielding, PipelineIterator should evaluate all possible keys to find the lowest, including the evaluationQueue, results, and yieldKey from WaitWindowOverrun. This could have caused missed data when yielding in the PipelineIterator
Must fill the evaluationQueue after calling flushCompletedResults in getNext otherwise flushCompletedResults might empty the evaluationQueue with no valid results and on the next call to getNext, results will be empty and when cacheNextResult is called, evaluationQueue will be empty and PipelineIterator declares itself done. This could have caused missed data when yielding in the PipelineIterator
Pipeline should save any caught Exception so that the PipelineIterator can propagate the Exception when calling getResult. This is particularly important when running Pipline in an Executor via the PipelineIterator. Otherwise, the evaluation of that Document just returns null which is the same as a Document not matching. The SerialIterator runs the Pipline.run() method inline and was not subject to this bug.
QueryIteratorIT and extended classes WaitWindowQueryIteratorSerialIT and WaitWindowQueryIteratorPipelineIT use a test harness that more closely simulates Accumulo's LookupTask.
IvaratorYieldingTest now tests Pipeline/Serial, sortedUIDs/unsortedUIDs, and collectTimingDetails T/F for 8 variations instead of the previous 2 variations Serial + sortedUIDs/unsortedUIDs.