Skip to content

Conversation

@hannahbast
Copy link
Member

@hannahbast hannahbast commented Oct 17, 2025

So far, all twin-relation sorters used 4 GB of memory for merging. This is fine for the SPO/SOP and OSP/OPS permutation pairs, where "relations" (the number of triples with the same S or O) are usually small. However, for PSO/POS, relations can be very large (think rdf:type), and 4 GB can be too small for large datasets.

For the twin-relation sorter of the PSO/POS permutation pair, we now provide half of the memory available for the whole index building (which is configurable via --stxxl-memory). We can do that because each permutation sorter gets half of that memory and for the last permutation pair, no other pair is sorted at the same time, so half of that memory was available anyway.

So far, all twin-relation sorters were 4_GB of memory for merging. This
is fine for the SPO/SOP and OSP/OPS permutation pairs, where "relations"
(the number of triples with the same S or O) are usually small. However,
for PSO/POS, relations can be very large (think `rdf:type`), and 4_GB
can be too small for large datasets.

For that permutation pair, we now provide half of the memory available
for the whole index building (which is configurabel via --stxxl-memory).
That is OK because each permutation sorter gets half of that memory and
for the last permutation pair, no other pair is sorted at the same time.
@hannahbast hannahbast requested a review from Copilot October 17, 2025 04:03
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adjusts the memory allocation for the twin-relation sorter to better handle large relations in the PSO/POS permutation by making the memory configurable and allocating more memory when building PSO/POS.

  • Introduces a twinRelationSorterMemory parameter through the permutation creation stack.
  • Adds a constant default memory (5 GB) for SPO/SOP and OSP/OPS; uses half of the index-building memory limit for PSO/POS.
  • Updates tests to pass the new parameter explicitly.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/CompressedRelationsTest.cpp Updates test call to CompressedRelationWriter::createPermutationPair with explicit memory argument.
src/index/IndexImpl.h Adds twinRelationSorterMemory parameter to relevant methods.
src/index/IndexImpl.cpp Propagates and applies the new memory parameter across permutation creation calls; assigns different defaults depending on permutation stage.
src/index/ConstantsIndexBuilding.h Defines DEFAULT_MEMORY_FOR_TWIN_RELATION_SORTER; adds using directive for memory literals.
src/index/CompressedRelation.h Adds twinRelationSorterMemory parameter to createPermutationPair signature.
src/index/CompressedRelation.cpp Uses the passed twinRelationSorterMemory when constructing the external sorter.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

std::move(blockGenerator), *thirdSorter);
createSecondPermutationPair(
NumColumnsIndexBuilding + 2, std::move(blockGenerator),
DEFAULT_MEMORY_LIMIT_INDEX_BUILDING, *thirdSorter);
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEFAULT_MEMORY_LIMIT_INDEX_BUILDING is not defined; this will not compile. Replace it with the intended constant DEFAULT_MEMORY_FOR_TWIN_RELATION_SORTER (as used elsewhere in this PR) to set the memory for the SPO/SOP or OSP/OPS twin-relation sorter.

Suggested change
DEFAULT_MEMORY_LIMIT_INDEX_BUILDING, *thirdSorter);
DEFAULT_MEMORY_FOR_TWIN_RELATION_SORTER, *thirdSorter);

Copilot uses AI. Check for mistakes.
Comment on lines 344 to +348
ad_utility::InputRangeTypeErased<IdTableStatic<0>> sortedTriples,
qlever::KeyOrder permutation,
const std::vector<std::function<void(const IdTableStatic<0>&)>>&
perBlockCallbacks);
perBlockCallbacks,
ad_utility::MemorySize twinRelationSorterMemory);
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new twinRelationSorterMemory parameter should be documented in the function comment to explain its purpose (memory cap for merging relations), typical values, and how it affects SPO/SOP vs. OSP/OPS vs. PSO/POS.

Copilot uses AI. Check for mistakes.
@sparql-conformance
Copy link

Overview

Number of Tests Passed ✅ Failed ❌ Intended ⚠️ Not tested
522 439 16 67 0

Conformance check passed ✅

No test result changes.

Details: https://qlever.dev/sparql-conformance-ui?cur=cf1a367e996da2cf01cd606b46986dc1e6059030&prev=8ff2011e4acc8262b55b75cef46b96b9d5da8f23

@codecov
Copy link

codecov bot commented Oct 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.45%. Comparing base (8ff2011) to head (cf1a367).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2443   +/-   ##
=======================================
  Coverage   91.44%   91.45%           
=======================================
  Files         462      462           
  Lines       46893    46902    +9     
  Branches     5240     5240           
=======================================
+ Hits        42881    42893   +12     
+ Misses       2513     2511    -2     
+ Partials     1499     1498    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sonarqubecloud
Copy link

@hannahbast hannahbast requested a review from joka921 October 17, 2025 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants