Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
187 commits
Select commit Hold shift + click to select a range
638b26d
replace auto with explicit types, order function parameters for impro…
marvin7122 Dec 31, 2025
9031950
applied improvements as suggested by clang-tidy
marvin7122 Dec 31, 2025
494ae39
add comments with notes for myself and refactor code to make it more …
marvin7122 Jan 2, 2026
88f90d3
reverse function constructQueryResultToTriples to state from before p…
marvin7122 Jan 3, 2026
72042d3
add caching mechanism for construct query exporter
marvin7122 Jan 3, 2026
ce4ccc3
convert CacheStats from struct to class and add apropriate functions
marvin7122 Jan 3, 2026
b032418
remove logging of cache stats
marvin7122 Jan 3, 2026
de914dc
write construtQueryResultToTriples cache statistics to AD_LOG_DEBUG
marvin7122 Jan 4, 2026
60391dc
add ConstructQueryCache::evaluateWithCacheImpl<BlankNode>
marvin7122 Jan 4, 2026
744c019
fix caching for BlankNodes: we need to clear the blankNodeCache after…
marvin7122 Jan 4, 2026
a4b3233
only compute logging statistics for ConstructQueryExportCache when Lo…
marvin7122 Jan 4, 2026
9d0fcc6
write to AD_LOG_INFO instead to AD_LOG_DEBUG, because I dont see the …
marvin7122 Jan 5, 2026
215b8ac
bugfix: do not compute stats after every row of the result table for …
marvin7122 Jan 5, 2026
a458d6f
remove if clause for logging of cache statistics for the constructQue…
marvin7122 Jan 5, 2026
0ebd5d8
fix logging, such that, when the function exits early, the cache stat…
marvin7122 Jan 5, 2026
e853087
write to AD_LOG_INFO instead of to AD_LOG_DEBUG
marvin7122 Jan 5, 2026
e8e6ce1
fix formatting of construct query cache logger
marvin7122 Jan 5, 2026
118fce3
fix bug: variableMisses() function in ConstructQueryCache.h returned …
marvin7122 Jan 5, 2026
951ff10
fix bug: forgot to delete line referencing local var that no longer e…
marvin7122 Jan 5, 2026
a25da20
decouple implementation of evaluate function from Variable, BlankNode…
marvin7122 Jan 8, 2026
e6f1043
do not use the context for creating the key hashes for the constructQ…
marvin7122 Jan 8, 2026
d6ca5c7
fix compilation failure
marvin7122 Jan 8, 2026
843788e
fix typo: clearRowCache was accidentally defined in ConstructQueryCac…
marvin7122 Jan 9, 2026
7984636
remove function declaration of function which does not exist anymore
marvin7122 Jan 9, 2026
67889e0
fix comments
marvin7122 Jan 9, 2026
1afbce9
remove unused function
marvin7122 Jan 9, 2026
ae6de47
apply clang formatter pre-commit hook
marvin7122 Jan 9, 2026
e5d02a6
remove [nodiscard] from const methods of class
marvin7122 Jan 9, 2026
0b9d6b5
remove old functions that are no longer needed since we evaluate the …
marvin7122 Jan 9, 2026
37c5ea9
unify parser/data/Literal.h and rdfTypes/Literal.h aswell as parser/d…
marvin7122 Jan 13, 2026
31eb335
create constructor for Literal with only one parameter
marvin7122 Jan 14, 2026
824ef15
fix blank line
marvin7122 Jan 14, 2026
fc2d9c3
undo unnecessary nullptr check
marvin7122 Jan 14, 2026
1a38802
fix typo
marvin7122 Jan 14, 2026
7803716
make parameter name more meaningful
marvin7122 Jan 14, 2026
c1b58f7
--amend
marvin7122 Jan 14, 2026
2646c79
undo unnecessary change
marvin7122 Jan 14, 2026
7f1fa4d
remove constructor which should not be used
marvin7122 Jan 14, 2026
fdf5d81
revert test files to version from commit 6ea62c63
marvin7122 Jan 14, 2026
7e8f32e
redo changes that do not directly have to do with my task
marvin7122 Jan 14, 2026
ac0264b
remove outdated TODO<ms2144> comments
marvin7122 Jan 14, 2026
b88c37a
remove comment where function declaration no longer exists
marvin7122 Jan 14, 2026
af758f8
make constructor that was previously private private again
marvin7122 Jan 14, 2026
c257453
update function calls s.t. that they use the appropriate functions in…
marvin7122 Jan 14, 2026
1057ba9
BlankNode should note have evaluate method, the appropriate method in…
marvin7122 Jan 15, 2026
0fc3c68
rename evaluate functions in ConstructQueryEvaluator, since the objec…
marvin7122 Jan 15, 2026
22ecae5
start fixing tests that use the old versions of Literal.h and Iri.h
marvin7122 Jan 15, 2026
e193c4a
try to reverse changes to Literal and Iri classes and classes where t…
marvin7122 Jan 16, 2026
4147ac4
change code s.t. there is no diff to HEAD of current master
marvin7122 Jan 16, 2026
ffbda41
change idTable s.t. it matches HEAD of upstream/master
marvin7122 Jan 16, 2026
8d6ba21
revert version of SparqlAntlrParserTestHelpers.h to before changes we…
marvin7122 Jan 16, 2026
6167741
rename ConstructQueryEvaluator::evaluateVar to ConstructQueryEvaluato…
marvin7122 Jan 16, 2026
0dc9c0d
fix SparqlDataTypesTest.cpp: use ConstructQueryEvaluator to evaluate …
marvin7122 Jan 16, 2026
3c50d58
fix linker error, remove evaluate method from parser/data/Iri.h class…
marvin7122 Jan 16, 2026
d45bec8
fix linker error, remove evaluate method from parser/data/Iri.h class…
marvin7122 Jan 16, 2026
77c3f9b
rename function param
marvin7122 Jan 16, 2026
5bcf8b3
commit to save work
marvin7122 Jan 16, 2026
3bf754c
refactor constructQueryResultToTriples
marvin7122 Jan 17, 2026
84f1251
refactor for improved readability
marvin7122 Jan 17, 2026
b96b69e
refactor ExportQueryExecutionTrees::constructQueryResultToTriples for…
marvin7122 Jan 19, 2026
a13cd18
RowTripleProducer now only holds a reference to the constructTemplate…
marvin7122 Jan 19, 2026
a092eb0
move function definitions from ConstructTripleGenerator.h to Construc…
marvin7122 Jan 19, 2026
8cb42e6
since template implementations can only be defined in header files, m…
marvin7122 Jan 19, 2026
ed8fade
fix compile errors
marvin7122 Jan 19, 2026
6fd2f4b
improve documentation and formatting
marvin7122 Jan 20, 2026
c0e342c
remove header file from engine library
marvin7122 Jan 22, 2026
45dc544
add copyright messages to files which did not have one before.
marvin7122 Jan 22, 2026
e31dd14
make paths of files in #include directives absolute.
marvin7122 Jan 22, 2026
57bec88
resolve improvement suggestion made in https://github.com/ad-freiburg…
marvin7122 Jan 22, 2026
89d5166
remove ConstructQueryExportContext.h from include directives where it…
marvin7122 Jan 22, 2026
eb5b35d
remove [[nodiscard]] annotations.
marvin7122 Jan 22, 2026
85c2dd8
Reintroduce comments that provide context for classes.
marvin7122 Jan 22, 2026
209415b
Remove include of ConstructQueryExportContext.h where it is not needed.
marvin7122 Jan 22, 2026
4dd0d9a
Introduce a lambda to abbreviate calling of ConstructQueryEvaluator::…
marvin7122 Jan 22, 2026
c168be1
Abbreviate `ConstructQueryEvaluator::evaluate` to `evaluate` by makin…
marvin7122 Jan 22, 2026
2ea12d3
remove [[nodiscard]] annotations.
marvin7122 Jan 22, 2026
9d61d35
Remove outdated part of code: Variable.h no longer needs forward decl…
marvin7122 Jan 22, 2026
c9f82e5
remove unused include directive.
marvin7122 Jan 22, 2026
36f23fe
add dummy comments
marvin7122 Jan 22, 2026
62822c0
add dummy comments
marvin7122 Jan 22, 2026
7697744
add dummy comments
marvin7122 Jan 22, 2026
16ac7fa
add comments that I added when trying to understand the code
marvin7122 Jan 16, 2026
13cd21a
Do not make deep copies of BlankNode, Iri, Literal.
marvin7122 Jan 22, 2026
9645a6b
use visitor pattern for ConstructQueryEvaluator::evaluate(GraphTerm& …
marvin7122 Jan 22, 2026
2240d37
Remove unnecessary includes.
marvin7122 Jan 22, 2026
d5d81b8
add dummy //____ comments
marvin7122 Jan 22, 2026
e1ce5a9
add dummy //____ comments
marvin7122 Jan 22, 2026
5ef04e6
use piped syntax for view transforms
marvin7122 Jan 22, 2026
1ced4d7
add informative description to generateForTable (and also rename gene…
marvin7122 Jan 22, 2026
3178f80
move evaluateTriple method from ConstructTripleGenerator to Construct…
marvin7122 Jan 22, 2026
b1f3846
Make use of using directive at top of the file to shorten ad_utility:…
marvin7122 Jan 22, 2026
20054cf
move logic from ExportQueryExecutionTrees::constructQueryResultToTrip…
marvin7122 Jan 23, 2026
2c0e590
Add comments to function declarations in ConstructQueryEvaluator.h
marvin7122 Jan 23, 2026
0a76619
Move implementation of generateStringTriples into ConstructTripleGene…
marvin7122 Jan 23, 2026
cb2fe57
remove using directives from ConstructTripleGenerator.cpp since that …
marvin7122 Jan 23, 2026
abad7d4
change syntax from multi-line comment to multiple single-line comments.
marvin7122 Jan 23, 2026
53f4661
add dummy comment
marvin7122 Jan 23, 2026
8492f60
add dummy comment
marvin7122 Jan 23, 2026
45ed929
Simplify filter of range
marvin7122 Jan 23, 2026
29198ac
fix comment
marvin7122 Jan 23, 2026
5228546
fix comment, give innerTransformer more descriptive name
marvin7122 Jan 23, 2026
f93dd0f
add static-assert with templated bool to ensure all Variants of Graph…
marvin7122 Jan 23, 2026
7ef8bc4
fix bug introduced in commit 45ed9291. Non-empty StringTriples were f…
marvin7122 Jan 23, 2026
f0afcb5
remove the qualifier from calls to the evaluate functions in the Cons…
marvin7122 Jan 23, 2026
f1c8c64
Rename ConstructQueryEvaluator::evaluate(GrapthTerm& term, ...) to Co…
marvin7122 Jan 23, 2026
195b135
Rewrite comment for enhanced readability.
marvin7122 Jan 23, 2026
50e7c45
Rewrite comments to make code easier to understand.
marvin7122 Jan 23, 2026
f5fc5a2
rename constructTriples_ to templateTriples_, since that name is more…
marvin7122 Jan 23, 2026
9ab871c
move generateStringTriplesForResultTable from ConstructTripleGenerato…
marvin7122 Jan 23, 2026
5bc9e99
try to fix the following error, which appears when trying to build th…
marvin7122 Jan 23, 2026
19219c1
try to fix the following error, which appears when trying to build th…
marvin7122 Jan 24, 2026
6e80aeb
move std::shared_ptr instead of copying it for perfomance improvement
marvin7122 Jan 25, 2026
ba4f594
add caching mechanism for construct query exporter
marvin7122 Jan 3, 2026
1d1325b
unify parser/data/Literal.h and rdfTypes/Literal.h aswell as parser/d…
marvin7122 Jan 13, 2026
98c8d06
add cache for evaluation of Variables and BlankNodes that is located …
marvin7122 Jan 21, 2026
b81d2a1
add per-row cache for BlankNode evaluations.
marvin7122 Jan 21, 2026
b33caac
Fix the compiler errors on gcc 11
joka921 Jan 26, 2026
dd7011d
Add a technically missing include.
joka921 Jan 26, 2026
a08364c
Revert "try to fix the following error, which appears when trying to …
joka921 Jan 26, 2026
b91f293
Revert "try to fix the following error, which appears when trying to …
joka921 Jan 26, 2026
3579ce7
Use the alias template InputRangeTypeErased only where we can...
joka921 Jan 26, 2026
a0bf7ed
fix SonarQube issue: Remove redundant access specifier; it does not c…
marvin7122 Jan 26, 2026
6e622d0
fix SonarQube issue: Rename _resultTableRowIdx to resultTableRowIdx_
marvin7122 Jan 26, 2026
b3dd3fe
fix SonarQube issue: pass TableWithRange not by value but by const re…
marvin7122 Jan 26, 2026
4b17592
pass `LimitOffsetClause` by const reference, because `getRowIndices` …
marvin7122 Jan 26, 2026
d17feb4
Fix the compiler errors on gcc 11
joka921 Jan 26, 2026
dd9a137
Add a technically missing include.
joka921 Jan 26, 2026
ca52480
Revert "try to fix the following error, which appears when trying to …
joka921 Jan 26, 2026
d765645
Revert "try to fix the following error, which appears when trying to …
joka921 Jan 26, 2026
71ae922
Use the alias template InputRangeTypeErased only where we can...
joka921 Jan 26, 2026
8beb6f9
fix SonarQube issue: Remove redundant access specifier; it does not c…
marvin7122 Jan 26, 2026
d63aa51
fix SonarQube issue: Rename _resultTableRowIdx to resultTableRowIdx_
marvin7122 Jan 26, 2026
5a880d3
fix SonarQube issue: pass TableWithRange not by value but by const re…
marvin7122 Jan 26, 2026
18785c8
pass `LimitOffsetClause` by const reference, because `getRowIndices` …
marvin7122 Jan 26, 2026
542e431
commit to save work
marvin7122 Jan 27, 2026
6de76fc
Merge remote-tracking branch 'origin/master' into decoupleEvalFunctio…
Jan 30, 2026
5b26cfb
do not use linebreaks in comments, use lowercase when referencing con…
marvin7122 Jan 30, 2026
d06af71
do not use linebreaks in comments, use lowercase when referencing con…
marvin7122 Jan 30, 2026
43207e3
merge decoupleEvalFunctionality
marvin7122 Jan 30, 2026
552c6fb
set Literal.h to HEAD of decoupleEvalFunctionality
marvin7122 Jan 30, 2026
e3d7a0e
set tests to their version of decoupleEvalFunctionality
marvin7122 Jan 30, 2026
f2591d9
fix merge
marvin7122 Jan 30, 2026
60a21e9
Remove BlankNodeCache, move the logic to precompute the values or Lit…
marvin7122 Jan 30, 2026
0bfef44
performance improvements.
marvin7122 Jan 31, 2026
24bf81f
Avoid double storage of variable strings in CONSTRUCT batch cache
marvin7122 Jan 31, 2026
77250df
Precompute BlankNode format strings in CONSTRUCT queries
marvin7122 Jan 31, 2026
736c613
Cache variable lookups per result row in CONSTRUCT queries
marvin7122 Jan 31, 2026
ddc571d
Fix QLEVER_CONSTRUCT_BATCH_SIZE environment variable name
marvin7122 Jan 31, 2026
8051b72
Add logging for CONSTRUCT batch size configuration
marvin7122 Jan 31, 2026
fc9f343
Document CONSTRUCT batch size configuration and effects
marvin7122 Jan 31, 2026
6420720
Change default CONSTRUCT batch size from 1000 to 64
marvin7122 Jan 31, 2026
b18b902
Refactored the CONSTRUCT query Turtle exporter to avoid StringTriple …
marvin7122 Jan 31, 2026
fa90df4
Unify CONSTRUCT export formats to avoid StringTriple allocations
marvin7122 Jan 31, 2026
9818ce2
Refactor CONSTRUCT exporter for readability
marvin7122 Jan 31, 2026
5aa9eec
Document ConstructTripleGenerator design decisions and optimizations
marvin7122 Jan 31, 2026
b2b861e
Clean up ConstructTripleGenerator: remove dead code, avoid string cop…
marvin7122 Jan 31, 2026
2e6c785
Replace stream_generator with InputRangeTypeErased in CONSTRUCT expor…
marvin7122 Feb 1, 2026
2c35291
Optimize IdCache: store string pointers directly, eliminate double lo…
marvin7122 Feb 1, 2026
b8a5a1e
rewrite comments for enhanced clarity
marvin7122 Feb 1, 2026
1454260
Bound IdCache memory with StableLRUCache to prevent unbounded growth
marvin7122 Feb 1, 2026
ade0f04
Move StableLRUCache into separate header file.
marvin7122 Feb 1, 2026
c568ddd
Refactor CONSTRUCT IdCache stats to use RAII logger at INFO level
marvin7122 Feb 1, 2026
c25ab08
Refactor CONSTRUCT exporter: extract FormattedTripleIterator, add RAI…
marvin7122 Feb 1, 2026
cbb9fa9
Refactor CONSTRUCT exporter for improved readability
marvin7122 Feb 1, 2026
6a6d11a
Avoid unnecessary instantiation of `val` variable.
marvin7122 Feb 2, 2026
3132254
Rewrite comments. Add TODO comments.
marvin7122 Feb 2, 2026
3272147
Improve batch processing documentation and make return types explicit
marvin7122 Feb 2, 2026
7e153d9
Clarify batch processing documentation and improve comments
marvin7122 Feb 2, 2026
932da72
simplify comments
marvin7122 Feb 2, 2026
65fdee6
merge upstream/master
marvin7122 Feb 2, 2026
241848c
remove dead code, rewrite comments
marvin7122 Feb 2, 2026
cfddafe
remove deprecated ConstructQueryCache
marvin7122 Feb 2, 2026
b4f528d
code quality improvements
marvin7122 Feb 2, 2026
eee51de
fix typo
marvin7122 Feb 2, 2026
36a623d
Extract init-captures into local variables before the lambda, for enh…
marvin7122 Feb 2, 2026
936e373
Improve comments
marvin7122 Feb 3, 2026
572f19b
Fix SonarQube code smell: extract 29-line lambda into processBatchFor…
marvin7122 Feb 3, 2026
d9fb287
add comment back in, which was deleted by accident
marvin7122 Feb 3, 2026
860c631
add another comment back in
marvin7122 Feb 3, 2026
9ff9736
Remove `ConstructOutputFormat`since it is just a wrapper around `ad_u…
marvin7122 Feb 3, 2026
4de2b7e
Replace raw pointers with owned strings in BatchEvaluationCache
marvin7122 Feb 3, 2026
d118a17
Do not populate BatchEvaluationCache with string objects but with sha…
marvin7122 Feb 4, 2026
80fa8b7
decouple batch processing of result-table rows from ConstructTripleGe…
marvin7122 Feb 4, 2026
1b05b70
small refactor: rename some methods and some types to ease understand…
marvin7122 Feb 4, 2026
1b20b67
Remove variableStrings indirection in batch processing
marvin7122 Feb 4, 2026
9a89ee0
Move all batch evaluation logic to ConstructBatchProcessor
marvin7122 Feb 4, 2026
a4d83f8
fix comment
marvin7122 Feb 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/engine/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ add_library(engine
QueryExecutionContext.cpp ExistsJoin.cpp SparqlProtocol.cpp ParsedRequestBuilder.cpp
NeutralOptional.cpp Load.cpp StripColumns.cpp NamedResultCache.cpp
ExplicitIdTableOperation.cpp StringMapping.cpp MaterializedViews.cpp
PermutationSelector.cpp ConstructQueryEvaluator.cpp ConstructTripleGenerator.cpp)
PermutationSelector.cpp ConstructQueryEvaluator.cpp ConstructTripleGenerator.cpp ConstructIdCache.cpp ConstructBatchProcessor.cpp)

qlever_target_link_libraries(engine util index parser global sparqlExpressions SortPerformanceEstimator Boost::iostreams s2 spatialjoin-dev pb_util pb_util_geo)

Expand Down
376 changes: 376 additions & 0 deletions src/engine/ConstructBatchProcessor.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,376 @@
// Copyright 2025 The QLever Authors, in particular:
//
// 2025 Marvin Stoetzel <[email protected]>, UFR
//
// UFR = University of Freiburg, Chair of Algorithms and Data Structures

#include "engine/ConstructBatchProcessor.h"

#include <absl/strings/str_cat.h>

#include "backports/StartsWithAndEndsWith.h"
#include "engine/ConstructQueryEvaluator.h"
#include "parser/data/ConstructQueryExportContext.h"
#include "rdfTypes/RdfEscaping.h"

// _____________________________________________________________________________
ConstructBatchProcessor::ConstructBatchProcessor(
std::shared_ptr<const InstantiationBlueprint> blueprint,
const TableWithRange& table, ad_utility::MediaType format,
size_t currentRowOffset)
: blueprint_(std::move(blueprint)),
format_(format),
tableWithVocab_(table.tableWithVocab_),
rowIndicesVec_(ql::ranges::begin(table.view_),
ql::ranges::end(table.view_)),
currentRowOffset_(currentRowOffset),
batchSize_(ConstructBatchProcessor::getBatchSize()) {
auto [cache, logger] = createIdCacheWithStats(rowIndicesVec_.size());
idCache_ = std::move(cache);
statsLogger_ = std::move(logger);
}

// _____________________________________________________________________________
std::optional<std::string> ConstructBatchProcessor::get() {
while (batchStart_ < rowIndicesVec_.size()) {
blueprint_->cancellationHandle_->throwIfCancelled();

loadBatchIfNeeded();

if (auto result = processCurrentBatch()) {
return result;
}

advanceToNextBatch();
}

return std::nullopt;
}

// _____________________________________________________________________________
void ConstructBatchProcessor::loadBatchIfNeeded() {
if (batchCache_.has_value()) {
return;
}
const size_t batchEnd =
std::min(batchStart_ + batchSize_, rowIndicesVec_.size());

auto batchRowIndices = ql::span<const uint64_t>(
rowIndicesVec_.data() + batchStart_, batchEnd - batchStart_);

batchCache_ = evaluateBatchColumnOriented(tableWithVocab_.idTable(),
tableWithVocab_.localVocab(),
batchRowIndices, currentRowOffset_);

// After we are done processing the batch, reset the indices for iterating
// over the rows/triples of the batch.
rowInBatchIdx_ = 0;
tripleIdx_ = 0;
}

// _____________________________________________________________________________
std::optional<std::string> ConstructBatchProcessor::processCurrentBatch() {
while (rowInBatchIdx_ < batchCache_->numRows_) {
if (auto result = processCurrentRow()) {
return result;
}
advanceToNextRow();
}
return std::nullopt;
}

// _____________________________________________________________________________
std::optional<std::string> ConstructBatchProcessor::processCurrentRow() {
while (tripleIdx_ < blueprint_->numTemplateTriples()) {
auto subject =
getTermStringPtr(tripleIdx_, 0, *batchCache_, rowInBatchIdx_);
auto predicate =
getTermStringPtr(tripleIdx_, 1, *batchCache_, rowInBatchIdx_);
auto object = getTermStringPtr(tripleIdx_, 2, *batchCache_, rowInBatchIdx_);

++tripleIdx_;

std::string formatted = formatTriple(subject, predicate, object);
if (!formatted.empty()) {
return formatted;
}
// Triple was UNDEF (incomplete), continue to next triple pattern.
}

return std::nullopt;
}

// _____________________________________________________________________________
std::optional<ConstructBatchProcessor::StringTriple>
ConstructBatchProcessor::getStringTriple() {
while (batchStart_ < rowIndicesVec_.size()) {
blueprint_->cancellationHandle_->throwIfCancelled();

loadBatchIfNeeded();

if (auto result = processCurrentBatchAsStringTriple()) {
return result;
}

advanceToNextBatch();
}

return std::nullopt;
}

// _____________________________________________________________________________
std::optional<ConstructBatchProcessor::StringTriple>
ConstructBatchProcessor::processCurrentBatchAsStringTriple() {
while (rowInBatchIdx_ < batchCache_->numRows_) {
if (auto result = processCurrentRowAsStringTriple()) {
return result;
}
advanceToNextRow();
}
return std::nullopt;
}

// _____________________________________________________________________________
std::optional<ConstructBatchProcessor::StringTriple>
ConstructBatchProcessor::processCurrentRowAsStringTriple() {
while (tripleIdx_ < blueprint_->numTemplateTriples()) {
auto subject =
getTermStringPtr(tripleIdx_, 0, *batchCache_, rowInBatchIdx_);
auto predicate =
getTermStringPtr(tripleIdx_, 1, *batchCache_, rowInBatchIdx_);
auto object = getTermStringPtr(tripleIdx_, 2, *batchCache_, rowInBatchIdx_);

++tripleIdx_;

StringTriple triple = instantiateTriple(subject, predicate, object);
if (!triple.isEmpty()) {
return triple;
}
// Triple was UNDEF (incomplete), continue to next triple pattern.
}

return std::nullopt;
}

// _____________________________________________________________________________
ConstructBatchProcessor::StringTriple
ConstructBatchProcessor::instantiateTriple(
const std::shared_ptr<const std::string>& subject,
const std::shared_ptr<const std::string>& predicate,
const std::shared_ptr<const std::string>& object) const {
if (!subject || !predicate || !object) {
return StringTriple{};
}
return StringTriple{*subject, *predicate, *object};
}

// _____________________________________________________________________________
void ConstructBatchProcessor::advanceToNextRow() {
++rowInBatchIdx_;
tripleIdx_ = 0;
}

// _____________________________________________________________________________
void ConstructBatchProcessor::advanceToNextBatch() {
batchStart_ += batchSize_;
batchCache_.reset();
}

// _____________________________________________________________________________
// Batch Evaluation (Column-Oriented Processing of multiple result-table rows)
//
// Evaluates Variables and BlankNodes for a batch of rows.
// Column-oriented access pattern for variables:
// for each variable V occurring in the template triples:
// for each row R in batch:
// read idTable[column(V)][R] <-- Sequential reads within a column
//
// This is more cache-friendly than row-oriented access, because the memory
// layout of `IdTable` is column-major.
BatchEvaluationCache ConstructBatchProcessor::evaluateBatchColumnOriented(
const IdTable& idTable, const LocalVocab& localVocab,
ql::span<const uint64_t> rowIndices, size_t currentRowOffset) {
BatchEvaluationCache batchCache;
batchCache.numRows_ = rowIndices.size();

evaluateVariablesForBatch(batchCache, idTable, localVocab, rowIndices,
currentRowOffset);
evaluateBlankNodesForBatch(batchCache, rowIndices, currentRowOffset);

return batchCache;
}

// _____________________________________________________________________________
void ConstructBatchProcessor::evaluateVariablesForBatch(
BatchEvaluationCache& batchCache, const IdTable& idTable,
const LocalVocab& localVocab, ql::span<const uint64_t> rowIndices,
size_t currentRowOffset) {
const size_t numRows = rowIndices.size();
auto& cacheStats = statsLogger_->stats();
const auto& variablesToEvaluate = blueprint_->variablesToEvaluate_;

// Initialize variable strings: [varIdx][rowInBatch]
// shared_ptr defaults to nullptr, representing UNDEF values.
batchCache.variableStrings_.resize(variablesToEvaluate.size());
for (auto& column : batchCache.variableStrings_) {
column.resize(numRows);
}

// Evaluate variables column-by-column for better cache locality.
// The IdTable is accessed sequentially for each column.
for (size_t varIdx = 0; varIdx < variablesToEvaluate.size(); ++varIdx) {
const auto& varInfo = variablesToEvaluate[varIdx];
auto& columnStrings = batchCache.variableStrings_[varIdx];

if (!varInfo.columnIndex_.has_value()) {
// Variable not in result - all values are nullptr (already default).
continue;
}

const size_t colIdx = varInfo.columnIndex_.value();

// Read all IDs from this column for all rows in the batch,
// look up their string values in the cache, and share them with the batch.
for (size_t rowInBatch = 0; rowInBatch < numRows; ++rowInBatch) {
const uint64_t rowIdx = rowIndices[rowInBatch];
Id id = idTable(rowIdx, colIdx);

// Use LRU cache's getOrCompute: returns cached value or computes and
// caches it.
size_t missesBefore = cacheStats.misses_;
const VariableToColumnMap& varCols = blueprint_->variableColumns_.get();
const Index& idx = blueprint_->index_.get();
const std::shared_ptr<const std::string>& cachedValue =
idCache_->getOrCompute(id, [&cacheStats, &varCols, &idx, &colIdx,
rowIdx, &idTable, &localVocab,
currentRowOffset](const Id&) {
++cacheStats.misses_;
ConstructQueryExportContext context{
rowIdx, idTable, localVocab, varCols, idx, currentRowOffset};
auto value = ConstructQueryEvaluator::evaluateWithColumnIndex(
colIdx, context);
if (value.has_value()) {
return std::make_shared<const std::string>(std::move(*value));
}
return std::shared_ptr<const std::string>{};
});

if (cacheStats.misses_ == missesBefore) {
++cacheStats.hits_;
}

columnStrings[rowInBatch] = cachedValue;
}
}
}

// _____________________________________________________________________________
void ConstructBatchProcessor::evaluateBlankNodesForBatch(
BatchEvaluationCache& batchCache, ql::span<const uint64_t> rowIndices,
size_t currentRowOffset) const {
const size_t numRows = rowIndices.size();
const auto& blankNodesToEvaluate = blueprint_->blankNodesToEvaluate_;

// Initialize blank node values: [blankNodeIdx][rowInBatch]
batchCache.blankNodeValues_.resize(blankNodesToEvaluate.size());
for (auto& column : batchCache.blankNodeValues_) {
column.resize(numRows);
}

// Evaluate blank nodes using precomputed prefix and suffix.
// Only the row number needs to be concatenated per row.
// Format: prefix + (currentRowOffset + rowIdx) + suffix
for (size_t blankIdx = 0; blankIdx < blankNodesToEvaluate.size();
++blankIdx) {
const BlankNodeFormatInfo& formatInfo = blankNodesToEvaluate[blankIdx];
auto& columnValues = batchCache.blankNodeValues_[blankIdx];

for (size_t rowInBatch = 0; rowInBatch < numRows; ++rowInBatch) {
const uint64_t rowIdx = rowIndices[rowInBatch];
// Use precomputed prefix and suffix, only concatenate row number.
columnValues[rowInBatch] = absl::StrCat(
formatInfo.prefix_, currentRowOffset + rowIdx, formatInfo.suffix_);
}
}
}

// _____________________________________________________________________________
std::shared_ptr<const std::string> ConstructBatchProcessor::getTermStringPtr(
size_t tripleIdx, size_t pos, const BatchEvaluationCache& batchCache,
size_t rowIdxInBatch) const {
const TriplePatternInfo& info = blueprint_->triplePatternInfos_[tripleIdx];
const TriplePatternInfo::TermLookupInfo& lookup = info.lookups_[pos];

switch (lookup.type) {
case TriplePatternInfo::TermType::CONSTANT: {
return std::make_shared<std::string>(
blueprint_->precomputedConstants_[tripleIdx][pos]);
}
case TriplePatternInfo::TermType::VARIABLE: {
// Variable shared_ptr are stored in the batch cache, eliminating
// hash lookups during instantiation.
return batchCache.getVariableString(lookup.index, rowIdxInBatch);
}
case TriplePatternInfo::TermType::BLANK_NODE: {
// Blank node values are always valid (computed for each row).
return std::make_shared<const std::string>(
batchCache.getBlankNodeValue(lookup.index, rowIdxInBatch));
}
}
// TODO<ms2144>: I do not think it is good to ever return a nullptr.
// We should probably throw an exception here or sth.
return nullptr; // Unreachable
}

// _____________________________________________________________________________
std::pair<std::shared_ptr<ConstructBatchProcessor::IdCache>,
std::shared_ptr<ConstructBatchProcessor::IdCacheStatsLogger>>
ConstructBatchProcessor::createIdCacheWithStats(size_t numRows) const {
// Cache capacity is sized to maximize cross-batch cache hits on repeated
// values (e.g., predicates that appear in many rows).
const size_t numVars = blueprint_->variablesToEvaluate_.size();
const size_t minCapacityForBatch = ConstructBatchProcessor::getBatchSize() *
std::max(numVars, size_t{1}) * 2;
const size_t capacity =
std::max(CONSTRUCT_ID_CACHE_MIN_CAPACITY, minCapacityForBatch);
auto idCache = std::make_shared<IdCache>(capacity);
auto statsLogger = std::make_shared<IdCacheStatsLogger>(numRows, capacity);
return {std::move(idCache), std::move(statsLogger)};
}

// _____________________________________________________________________________
// Formats triples for different output formats without intermediate
// `StringTriple` allocations.
std::string ConstructBatchProcessor::formatTriple(
const std::shared_ptr<const std::string>& subject,
const std::shared_ptr<const std::string>& predicate,
const std::shared_ptr<const std::string>& object) const {
if (!subject || !predicate || !object) {
return {};
}

switch (format_) {
case ad_utility::MediaType::turtle: {
// Only escape literals (strings starting with "). IRIs and blank nodes
// are used as-is, avoiding an unnecessary string copy.
if (ql::starts_with(*object, "\"")) {
return absl::StrCat(*subject, " ", *predicate, " ",
RdfEscaping::validRDFLiteralFromNormalized(*object),
" .\n");
}
return absl::StrCat(*subject, " ", *predicate, " ", *object, " .\n");
}
case ad_utility::MediaType::csv: {
return absl::StrCat(RdfEscaping::escapeForCsv(*subject), ",",
RdfEscaping::escapeForCsv(*predicate), ",",
RdfEscaping::escapeForCsv(*object), "\n");
}
case ad_utility::MediaType::tsv: {
return absl::StrCat(RdfEscaping::escapeForTsv(*subject), "\t",
RdfEscaping::escapeForTsv(*predicate), "\t",
RdfEscaping::escapeForTsv(*object), "\n");
}
default:
return {}; // TODO<ms2144>: add proper error throwing here?
}
}
Loading
Loading