Improve construct query result to triples#2654
Improve construct query result to triples#2654marvin7122 wants to merge 187 commits intoad-freiburg:masterfrom
Conversation
|
Comment regarding commit with hash 62b0ec9 Precomputing the constants (IRIs, Literals) that are present in the construct-query template and then skipping them when evaluating the construct-template triple patterns based on a particular row of the result table yields an improvement of about 23% (as I have measured it, in comparison to commit caed761, both binaries built in Release mode) on the following query on the dblp index: |
# Conflicts: # CMakeLists.txt
… each row of the result table (result of the WHERE clause), and thus treat them in the caching mechanism in the same way that we treat Variables
…stats in the server log (even when I build for the Debug Release type),I dont get why
…a CONSTRUCT-query, move the cache stats computation and report to after we have iterated over all rows in the result table for the WHERE-clause
…ryExporter cache, to check if this is the reason why the statistics are not written to the server log
…s of the constructQueryExporter are still written to the server log
…variableHits_; now returns variableMisses_ as it should be
…, Iri, Literal classes and put them into the ConstructQueryEvaluator class. copy helper functions from ExportQueryExecutionTrees to this class aswell. Those still need to be refactored
…ueryCache for Iri's and Literals, since their values should be the same across all rows of the WHERE-clause-result-table and across all triples in the CONSTRUCT-query clause.
…different types of Graphterms in ConstructQueryEvaluator instead of the classes themselves
544b1d0 to
4e5c6e2
Compare
4e5c6e2 to
932da72
Compare
|
According to my measurements, with the binary for commit Settings for the benchmark: results: |
|
Perfomance comparison of the CONSTRUT query exporter: query: results: |
|
I have benchmarked and analyzed many more queries. Also, i did some empirical tests for setting the size of the batchsize, but I do not want to spam this PR with too many comments right now. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #2654 +/- ##
==========================================
- Coverage 91.60% 91.58% -0.02%
==========================================
Files 483 487 +4
Lines 41360 41607 +247
Branches 5493 5540 +47
==========================================
+ Hits 37886 38104 +218
- Misses 1897 1919 +22
- Partials 1577 1584 +7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…anced readability
…StringTriples method
joka921
left a comment
There was a problem hiding this comment.
Some initial comments.
There is a lot of good stuff in there,
but we can have a discussion about the details in person.
| template <typename K, typename V> | ||
| class StableLRUCache { | ||
| public: | ||
| explicit StableLRUCache(size_t capacity) : capacity_{capacity} { | ||
| AD_CONTRACT_CHECK(capacity > 0); |
There was a problem hiding this comment.
I think there are simpler ways to do this (e.g. using a node-based hashmap, or wrapping the values in a unique_ptr (relying on the reserve/capcity behavior is a little bit wonky).
But as the interface of this cache is simple, we can iterate that later on once we have identified the impact of the different tradeoff.
TLDR: If affordable, I would like to have this as "the ordinary LruCache we already have, but configured with different template parameters to reduce code bloat".
|
|
||
| // Class for computing the result of an already parsed and planned query and | ||
| // exporting it in different formats (TSV, CSV, Turtle, JSON, Binary). | ||
| // |
| // | ||
| // Blocks, where all rows are before OFFSET, are requested (and hence | ||
| // computed), but skipped. | ||
| // | ||
| // Blocks, where at least one row is after OFFSET but before the effective | ||
| // export limit (minimum of the LIMIT and the value of the `send` parameter), | ||
| // are requested and yielded (together with the corresponding `LocalVocab` | ||
| // and the range from that `IdTable` that belongs to the result). | ||
| // | ||
| // Blocks after the effective export limit until the LIMIT are requested, and | ||
| // counted towards the `totalResultSize`, but not yielded. | ||
| // | ||
| // Blocks after the LIMIT are not even requested. |
There was a problem hiding this comment.
don't delete comments, but move them alognside the declaration?
| constexpr ConstructOutputFormat mediaTypeToConstructFormat( | ||
| ad_utility::MediaType mediaType) { | ||
| using enum ad_utility::MediaType; | ||
| using enum ConstructOutputFormat; | ||
| switch (mediaType) { | ||
| case turtle: | ||
| return TURTLE; | ||
| case csv: | ||
| return CSV; | ||
| case tsv: | ||
| return TSV; | ||
| default: | ||
| // This should never be reached for valid CONSTRUCT formats | ||
| return TURTLE; | ||
| } |
There was a problem hiding this comment.
I don't see why we can't use the `MediaType directly? that transformation doesn't seem to do much:)
| static_assert( | ||
| format == MediaType::octetStream || format == MediaType::csv || | ||
| format == MediaType::tsv || format == MediaType::sparqlXml || | ||
| format == MediaType::sparqlJson || format == MediaType::qleverJson || | ||
| format == MediaType::binaryQleverExport || format == MediaType::turtle); |
There was a problem hiding this comment.
can be simplified (set up a constexpr array of the supported media types (including a using enum etc.), and then use ad_utility::contains in the assertion (opportunity to improve).
| // TODO<ms2144>: Use more principled approach: maybe compute batch size | ||
| // dynamically based on the number of variables and available cache size, | ||
| // rather than using a fixed value. And also monitor how much of the L2 cache | ||
| // is used when a batch is being processed. |
There was a problem hiding this comment.
Yes, and also:
Is 64 enough s.t. it is not the reading from the vocabulary that is still the bottle neck (I am very interested in the perf graphs / flame graphs).
| // Get value for a specific blank node at a row in the batch | ||
| const std::string& getBlankNodeValue(size_t blankNodeIdx, | ||
| size_t rowInBatch) const { | ||
| return blankNodeValues_[blankNodeIdx][rowInBatch]; | ||
| } |
There was a problem hiding this comment.
This code dublication can be abstracted away in a 2D-Array class etc. (that stores the vector + the get function + is templated.
|
|
||
| // Ordered list of `BlankNodes` with precomputed format info for evaluation | ||
| // (index corresponds to cache index) | ||
| std::vector<BlankNodeFormatInfo> blankNodesToEvaluate_; |
There was a problem hiding this comment.
The ideas all are nice,
I currently think the module is definnitely too long.
For example all the caching + statistics can be seaprate,
and the analysis of the template can also be in a separate module that is then just used by the evaluator.
| namespace { | ||
| // Parse QLEVER_CONSTRUCT_BATCH_SIZE environment variable. | ||
| // Returns the configured value if valid, or DEFAULT_BATCH_SIZE otherwise. | ||
| size_t parseBatchSizeFromEnv() { | ||
| const char* envVal = std::getenv("QLEVER_CONSTRUCT_BATCH_SIZE"); | ||
| if (envVal == nullptr) { | ||
| AD_LOG_INFO << "CONSTRUCT batch size: " | ||
| << ConstructTripleGenerator::DEFAULT_BATCH_SIZE | ||
| << " (default)\n"; | ||
| return ConstructTripleGenerator::DEFAULT_BATCH_SIZE; | ||
| } | ||
| try { | ||
| size_t val = std::stoull(envVal); | ||
| if (val > 0) { | ||
| AD_LOG_INFO << "CONSTRUCT batch size from environment: " << val << "\n"; | ||
| return val; | ||
| } | ||
| AD_LOG_WARN << "QLEVER_CONSTRUCT_BATCH_SIZE must be > 0, got: " << envVal | ||
| << ", using default: " | ||
| << ConstructTripleGenerator::DEFAULT_BATCH_SIZE << "\n"; | ||
| } catch (const std::exception& e) { | ||
| AD_LOG_WARN << "Invalid QLEVER_CONSTRUCT_BATCH_SIZE value: " << envVal | ||
| << " (" << e.what() << "), using default: " | ||
| << ConstructTripleGenerator::DEFAULT_BATCH_SIZE << "\n"; | ||
| } | ||
| return ConstructTripleGenerator::DEFAULT_BATCH_SIZE; | ||
| } | ||
| } // namespace |
There was a problem hiding this comment.
If you want this, use an established mechanism like qlevers runtime parameters etc.
This is surprising to see somewhere in a cpp file :))
| std::optional<BatchEvaluationCache> batchCache_; | ||
| std::vector<const std::string*> variableStrings_; | ||
| }; | ||
|
|
There was a problem hiding this comment.
This code is very long, maybe first clean up:)
…tility::Mediatype`, and therefore introduces complexity without a benefit.
Overview
Conformance check passed ✅No test result changes. |
|



Improvement of perfomance of CONSTRUCT query export runtimes of about 70%:
This PR improves the perfomance of CONSTRUCT query result serialization through 4 main optimizations:
id-to-string caching: A
StableLRUCachememoizes id to string conversions, avoiding redundant vocabulary lookups when the same entity appears multiple times across multiple rows of the result-table.Column-Oriented batch processing: Rows of the result-table are processed in batches (default batch size 64, i did not get "stable" results when trying to empirically find out which batch size was best, more on that below). This allows us to fetch the Values for the variables for one variable after each other across the rows in the batch (first fetch the values for variable ?x across rows 0 to 63, then fetch the values for variable ?y for rows 0 to 63 and so on). Since
IdTableuses a column-major memory layout, reading all Ids for a variable across different result-table rows creates sequential memory access patterns that benefit from CPU prefetching.direct formatting: For streaming output, the generator now yields formatted strings directly, eliminating intermediate
StringTripleobject allocations.Constants (iris, Literals) and the column indices corresponding to the variables in the
IdTableare computed once, before we iterate over any result-table-rows.