-
Notifications
You must be signed in to change notification settings - Fork 97
Give more memory to twin-relation sorter for PSO/POS #2443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
So far, all twin-relation sorters were 4_GB of memory for merging. This is fine for the SPO/SOP and OSP/OPS permutation pairs, where "relations" (the number of triples with the same S or O) are usually small. However, for PSO/POS, relations can be very large (think `rdf:type`), and 4_GB can be too small for large datasets. For that permutation pair, we now provide half of the memory available for the whole index building (which is configurabel via --stxxl-memory). That is OK because each permutation sorter gets half of that memory and for the last permutation pair, no other pair is sorted at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adjusts the memory allocation for the twin-relation sorter to better handle large relations in the PSO/POS permutation by making the memory configurable and allocating more memory when building PSO/POS.
- Introduces a twinRelationSorterMemory parameter through the permutation creation stack.
- Adds a constant default memory (5 GB) for SPO/SOP and OSP/OPS; uses half of the index-building memory limit for PSO/POS.
- Updates tests to pass the new parameter explicitly.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/CompressedRelationsTest.cpp | Updates test call to CompressedRelationWriter::createPermutationPair with explicit memory argument. |
| src/index/IndexImpl.h | Adds twinRelationSorterMemory parameter to relevant methods. |
| src/index/IndexImpl.cpp | Propagates and applies the new memory parameter across permutation creation calls; assigns different defaults depending on permutation stage. |
| src/index/ConstantsIndexBuilding.h | Defines DEFAULT_MEMORY_FOR_TWIN_RELATION_SORTER; adds using directive for memory literals. |
| src/index/CompressedRelation.h | Adds twinRelationSorterMemory parameter to createPermutationPair signature. |
| src/index/CompressedRelation.cpp | Uses the passed twinRelationSorterMemory when constructing the external sorter. |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
src/index/IndexImpl.cpp
Outdated
| std::move(blockGenerator), *thirdSorter); | ||
| createSecondPermutationPair( | ||
| NumColumnsIndexBuilding + 2, std::move(blockGenerator), | ||
| DEFAULT_MEMORY_LIMIT_INDEX_BUILDING, *thirdSorter); |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DEFAULT_MEMORY_LIMIT_INDEX_BUILDING is not defined; this will not compile. Replace it with the intended constant DEFAULT_MEMORY_FOR_TWIN_RELATION_SORTER (as used elsewhere in this PR) to set the memory for the SPO/SOP or OSP/OPS twin-relation sorter.
| DEFAULT_MEMORY_LIMIT_INDEX_BUILDING, *thirdSorter); | |
| DEFAULT_MEMORY_FOR_TWIN_RELATION_SORTER, *thirdSorter); |
| ad_utility::InputRangeTypeErased<IdTableStatic<0>> sortedTriples, | ||
| qlever::KeyOrder permutation, | ||
| const std::vector<std::function<void(const IdTableStatic<0>&)>>& | ||
| perBlockCallbacks); | ||
| perBlockCallbacks, | ||
| ad_utility::MemorySize twinRelationSorterMemory); |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new twinRelationSorterMemory parameter should be documented in the function comment to explain its purpose (memory cap for merging relations), typical values, and how it affects SPO/SOP vs. OSP/OPS vs. PSO/POS.
Overview
Conformance check passed ✅No test result changes. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2443 +/- ##
=======================================
Coverage 91.44% 91.45%
=======================================
Files 462 462
Lines 46893 46902 +9
Branches 5240 5240
=======================================
+ Hits 42881 42893 +12
+ Misses 2513 2511 -2
+ Partials 1499 1498 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|



So far, all twin-relation sorters used
4 GBof memory for merging. This is fine for the SPO/SOP and OSP/OPS permutation pairs, where "relations" (the number of triples with the same S or O) are usually small. However, for PSO/POS, relations can be very large (thinkrdf:type), and4 GBcan be too small for large datasets.For the twin-relation sorter of the PSO/POS permutation pair, we now provide half of the memory available for the whole index building (which is configurable via
--stxxl-memory). We can do that because each permutation sorter gets half of that memory and for the last permutation pair, no other pair is sorted at the same time, so half of that memory was available anyway.