[df] [for discussion] JIT graph creation functions only once #17282
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR
This is some code, that could also be improved further, to reduce the per-sample JIT cost in RDataFrame, and potentially a way to avoid the "controlled memory leaks" #15520 (though this code currently does not yet address them in full).
I don't propose to merge this code as is, but I'm opening the PR as a reference for future discussion
Before this PR
Currently JITed nodes of the computation graph, RDF JITs the function code just once, but JITs code lines to create the computational graph once per graph, with code like
This is done by methods like
BookDefineJit
that craft the code as a string and pass it toRLoopManager::toJitExec
. Calls are accumulated, and executed in one go byRLoopManager::Jit()
With this PR
Methods like
BookDefineJit
now call a new functionRLoopManager::RegisterJitHelperCall
and the whole operation is refactored in three stepsgInterpreter
Similarly to the current setup, the declarations from point (1) are accumulated in a single string and passed to the interpreter in one go by
RLoopManager::Jit()
, and the address lookup (2) is also done insideRLoopManager::Jit()
The deferred function calls (3) are done both in
RLoopManager::Jit()
and also inRLoopManager::Run
in order to also allow doing them in multiple threads when RunGraphs is used (RunGraphs calls Jit only on one of the loop managers, and then calls Run on all multithreaded)What next?
BookDefineJit
and deleted in the JITed calls likeJitDefineHelper
could be instead owned by the deferred function call object via RAIIvoid *
for the weak pointer, the extra column names for variations could be in a smart pointer, ...