Skip to content

Improve construct query result to triples#2654

Draft
marvin7122 wants to merge 187 commits intoad-freiburg:masterfrom
marvin7122:improveConstructQueryResultToTriples
Draft

Improve construct query result to triples#2654
marvin7122 wants to merge 187 commits intoad-freiburg:masterfrom
marvin7122:improveConstructQueryResultToTriples

Conversation

@marvin7122
Copy link
Contributor

@marvin7122 marvin7122 commented Jan 21, 2026

Improvement of perfomance of CONSTRUCT query export runtimes of about 70%:
This PR improves the perfomance of CONSTRUCT query result serialization through 4 main optimizations:

  1. id-to-string caching: A StableLRUCache memoizes id to string conversions, avoiding redundant vocabulary lookups when the same entity appears multiple times across multiple rows of the result-table.

  2. Column-Oriented batch processing: Rows of the result-table are processed in batches (default batch size 64, i did not get "stable" results when trying to empirically find out which batch size was best, more on that below). This allows us to fetch the Values for the variables for one variable after each other across the rows in the batch (first fetch the values for variable ?x across rows 0 to 63, then fetch the values for variable ?y for rows 0 to 63 and so on). Since IdTable uses a column-major memory layout, reading all Ids for a variable across different result-table rows creates sequential memory access patterns that benefit from CPU prefetching.

  3. direct formatting: For streaming output, the generator now yields formatted strings directly, eliminating intermediate StringTriple object allocations.

  4. Constants (iris, Literals) and the column indices corresponding to the variables in the IdTable are computed once, before we iterate over any result-table-rows.

@marvin7122
Copy link
Contributor Author

marvin7122 commented Jan 21, 2026

Comment regarding commit with hash 62b0ec9

Precomputing the constants (IRIs, Literals) that are present in the construct-query template and then skipping them when evaluating the construct-template triple patterns based on a particular row of the result table yields an improvement of about 23% (as I have measured it, in comparison to commit caed761, both binaries built in Release mode) on the following query on the dblp index:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

CONSTRUCT {
  ?s rdf:type ?type .
  ?s <http://example.org/active> "yes" .
  ?s ?p ?o .
}
WHERE {
  ?s ?p ?o .
}
LIMIT 1000000

… each row of the result table (result of the WHERE clause), and thus treat them in the caching mechanism in the same way that we treat Variables
…stats in the server log (even when I build for the Debug Release type),I dont get why
…a CONSTRUCT-query, move the cache stats computation and report to after we have iterated over all rows in the result table for the WHERE-clause
…ryExporter cache, to check if this is the reason why the statistics are not written to the server log
…s of the constructQueryExporter are still written to the server log
…variableHits_; now returns variableMisses_ as it should be
…, Iri, Literal classes and put them into the ConstructQueryEvaluator class. copy helper functions from ExportQueryExecutionTrees to this class aswell. Those still need to be refactored
…ueryCache for Iri's and Literals, since their values should be the same across all rows of the WHERE-clause-result-table and across all triples in the CONSTRUCT-query clause.
…different types of Graphterms in ConstructQueryEvaluator instead of the classes themselves
@marvin7122 marvin7122 closed this Feb 2, 2026
@marvin7122 marvin7122 reopened this Feb 2, 2026
@marvin7122 marvin7122 force-pushed the improveConstructQueryResultToTriples branch from 544b1d0 to 4e5c6e2 Compare February 2, 2026 11:01
@marvin7122 marvin7122 force-pushed the improveConstructQueryResultToTriples branch from 4e5c6e2 to 932da72 Compare February 2, 2026 12:19
@marvin7122
Copy link
Contributor Author

marvin7122 commented Feb 2, 2026

According to my measurements, with the binary for commit cfddafe built in Release mode, running the attached SELECT query takes about 1750ms, while running the attached CONSTRUCT query takes 1100ms. Thus, the
CONSTRUCT exporter is now even faster than the SELECT query export.

Settings for the benchmark:

=== Query Comparison Configuration ===
Date: Mon Feb  2 02:24:24 PM CET 2026
Server binary: /home/userNoPriv/code/qlever/qlever-code/build/qlever-server
Fresh server instance: YES (per query)

Query A: SELECT
  Accept header: text/csv
  Response extension: csv
  Query: SELECT ?s ?p ?o WHERE { ?s ?p ?o . } LIMIT 1000000

Query B: CONSTRUCT
  Accept header: text/csv
  Response extension: csv
  Query: CONSTRUCT { ?s ?p ?o } WHERE { ?s ?p ?o } LIMIT 1000000

Warmup runs: 1
Measured runs: 4
Query timeout: 3600s

Responses saved in: /home/userNoPriv/code/qlever/profiles_benchmarks/benchmarks/compare_queries_20260202-142424/responses

results:

Query A (SELECT): 1756 ms (avg) wall-clock time from sending the request to receiving the whole response.
Query B (CONSTRUCT):  1174 ms (avg) wall-clock time [...].
Time delta: CONSTRUCT is 582ms faster than SELECT (66%)
=== Memory (RSS) ===
--------------------------------------------------------------------------------
SELECT         : avg peak    217 MB, max peak    218 MB, avg delta +117 MB
CONSTRUCT      : avg peak    232 MB, max peak    234 MB, avg delta +131 MB
--------------------------------------------------------------------------------
Memory delta: CONSTRUCT uses +14MB more than SELECT.

@marvin7122
Copy link
Contributor Author

Perfomance comparison of the CONSTRUT query exporter:
commit cca37689 (HEAD of the master branch as of writing) vs commit cfddafeb (HEAD of improveConstructQueryResultToTriples):
Both binaries built in Release mode, query asked on dlblp index.

query:

CONSTRUCT {
  ?s ?p ?o .
}
WHERE {
  ?s ?p ?o .
}
LIMIT 1000000

results:

======================================================================
QLever Benchmark Analysis Report
======================================================================
Analysis date: 2026-02-02 15:12:39
Version 1: masterBranchHEAD
Version 2: myFeatureBranchHEAD
Results directory: /home/userNoPriv/code/qlever/profiles_benchmarks/benchmarks/benchmark_20260202-150727

SUMMARY STATISTICS
----------------------------------------------------------------------
Statistic                 masterBranchHEAD     myFeatureBranchHEAD  Difference     
----------------------------------------------------------------------
Mean Response Time        3607ms               1070ms               -70.3%         
Median Response Time      3597ms               1068ms               -70.3%             

Test Used: Student's t-test
Sample Sizes: 10 vs 10
Test Statistic: t = 206.818

@marvin7122
Copy link
Contributor Author

I have benchmarked and analyzed many more queries. Also, i did some empirical tests for setting the size of the batchsize, but I do not want to spam this PR with too many comments right now.

@codecov
Copy link

codecov bot commented Feb 2, 2026

Codecov Report

❌ Patch coverage is 89.87342% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.58%. Comparing base (cca3768) to head (a4d83f8).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/engine/ConstructTripleGenerator.cpp 84.50% 8 Missing and 3 partials ⚠️
src/engine/ConstructIdCache.cpp 12.50% 6 Missing and 1 partial ⚠️
src/engine/ConstructBatchProcessor.cpp 96.59% 2 Missing and 4 partials ⚠️
src/engine/ConstructIdCache.h 37.50% 5 Missing ⚠️
src/engine/ConstructQueryEvaluator.cpp 92.30% 1 Missing and 1 partial ⚠️
src/engine/ExportQueryExecutionTrees.cpp 94.11% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2654      +/-   ##
==========================================
- Coverage   91.60%   91.58%   -0.02%     
==========================================
  Files         483      487       +4     
  Lines       41360    41607     +247     
  Branches     5493     5540      +47     
==========================================
+ Hits        37886    38104     +218     
- Misses       1897     1919      +22     
- Partials     1577     1584       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments.
There is a lot of good stuff in there,
but we can have a discussion about the details in person.

Comment on lines +23 to +27
template <typename K, typename V>
class StableLRUCache {
public:
explicit StableLRUCache(size_t capacity) : capacity_{capacity} {
AD_CONTRACT_CHECK(capacity > 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are simpler ways to do this (e.g. using a node-based hashmap, or wrapping the values in a unique_ptr (relying on the reserve/capcity behavior is a little bit wonky).
But as the interface of this cache is simple, we can iterate that later on once we have identified the impact of the different tradeoff.

TLDR: If affordable, I would like to have this as "the ordinary LruCache we already have, but configured with different template parameters to reduce code bloat".


// Class for computing the result of an already parsed and planned query and
// exporting it in different formats (TSV, CSV, Turtle, JSON, Binary).
//
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't delete comments?

Comment on lines -255 to -267
//
// Blocks, where all rows are before OFFSET, are requested (and hence
// computed), but skipped.
//
// Blocks, where at least one row is after OFFSET but before the effective
// export limit (minimum of the LIMIT and the value of the `send` parameter),
// are requested and yielded (together with the corresponding `LocalVocab`
// and the range from that `IdTable` that belongs to the result).
//
// Blocks after the effective export limit until the LIMIT are requested, and
// counted towards the `totalResultSize`, but not yielded.
//
// Blocks after the LIMIT are not even requested.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't delete comments, but move them alognside the declaration?

Comment on lines 1086 to 1100
constexpr ConstructOutputFormat mediaTypeToConstructFormat(
ad_utility::MediaType mediaType) {
using enum ad_utility::MediaType;
using enum ConstructOutputFormat;
switch (mediaType) {
case turtle:
return TURTLE;
case csv:
return CSV;
case tsv:
return TSV;
default:
// This should never be reached for valid CONSTRUCT formats
return TURTLE;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we can't use the `MediaType directly? that transformation doesn't seem to do much:)

Comment on lines +1112 to +1116
static_assert(
format == MediaType::octetStream || format == MediaType::csv ||
format == MediaType::tsv || format == MediaType::sparqlXml ||
format == MediaType::sparqlJson || format == MediaType::qleverJson ||
format == MediaType::binaryQleverExport || format == MediaType::turtle);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be simplified (set up a constexpr array of the supported media types (including a using enum etc.), and then use ad_utility::contains in the assertion (opportunity to improve).

// TODO<ms2144>: Use more principled approach: maybe compute batch size
// dynamically based on the number of variables and available cache size,
// rather than using a fixed value. And also monitor how much of the L2 cache
// is used when a batch is being processed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and also:
Is 64 enough s.t. it is not the reading from the vocabulary that is still the bottle neck (I am very interested in the perf graphs / flame graphs).

Comment on lines 221 to 225
// Get value for a specific blank node at a row in the batch
const std::string& getBlankNodeValue(size_t blankNodeIdx,
size_t rowInBatch) const {
return blankNodeValues_[blankNodeIdx][rowInBatch];
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code dublication can be abstracted away in a 2D-Array class etc. (that stores the vector + the get function + is templated.


// Ordered list of `BlankNodes` with precomputed format info for evaluation
// (index corresponds to cache index)
std::vector<BlankNodeFormatInfo> blankNodesToEvaluate_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ideas all are nice,
I currently think the module is definnitely too long.
For example all the caching + statistics can be seaprate,

and the analysis of the template can also be in a separate module that is then just used by the evaluator.

Comment on lines 24 to 51
namespace {
// Parse QLEVER_CONSTRUCT_BATCH_SIZE environment variable.
// Returns the configured value if valid, or DEFAULT_BATCH_SIZE otherwise.
size_t parseBatchSizeFromEnv() {
const char* envVal = std::getenv("QLEVER_CONSTRUCT_BATCH_SIZE");
if (envVal == nullptr) {
AD_LOG_INFO << "CONSTRUCT batch size: "
<< ConstructTripleGenerator::DEFAULT_BATCH_SIZE
<< " (default)\n";
return ConstructTripleGenerator::DEFAULT_BATCH_SIZE;
}
try {
size_t val = std::stoull(envVal);
if (val > 0) {
AD_LOG_INFO << "CONSTRUCT batch size from environment: " << val << "\n";
return val;
}
AD_LOG_WARN << "QLEVER_CONSTRUCT_BATCH_SIZE must be > 0, got: " << envVal
<< ", using default: "
<< ConstructTripleGenerator::DEFAULT_BATCH_SIZE << "\n";
} catch (const std::exception& e) {
AD_LOG_WARN << "Invalid QLEVER_CONSTRUCT_BATCH_SIZE value: " << envVal
<< " (" << e.what() << "), using default: "
<< ConstructTripleGenerator::DEFAULT_BATCH_SIZE << "\n";
}
return ConstructTripleGenerator::DEFAULT_BATCH_SIZE;
}
} // namespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want this, use an established mechanism like qlevers runtime parameters etc.
This is surprising to see somewhere in a cpp file :))

std::optional<BatchEvaluationCache> batchCache_;
std::vector<const std::string*> variableStrings_;
};

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is very long, maybe first clean up:)

@sparql-conformance
Copy link

Overview

Number of Tests Passed ✅ Intended ✅ Failed ❌ Not tested
547 449 73 25 0

Conformance check passed ✅

No test result changes.

Details: https://qlever.dev/sparql-conformance-ui?cur=a4d83f81bf2641a825d457873206d6a234a936a7&prev=f35a290fc35e28fefdc9ac56139660fad14ab860

@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 4, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants