Lazy and prefiltered `OPTIONAL` #2695

RobinTF · 2026-02-05T09:44:44Z

The 5 should be a named constant

RobinTF · 2026-02-05T11:04:06Z

This if statement can be asserted, and the meta blocks range is never empty at this point.

RobinTF · 2026-02-05T09:45:15Z

This should be extracted to a helper function

RobinTF · 2026-02-05T09:59:35Z

What if you don't yield this value up until this point? I think this would simplify the logic in fetch() noticeably

RobinTF · 2026-02-05T11:07:28Z

Consider moving this param up a bit in the struct, then you wouldn't have to re-define all of these default arguments again.

-Original file line number
+Diff line change
@@ Expand Up / @@ -16,6 +16,7 @@ @@
     #include "backports/concepts.h"
     #include "engine/LocalVocab.h"
+    #include "engine/Result.h"
     #include "engine/idTable/IdTable.h"
     #include "engine/idTable/IdTableConcepts.h"
     #include "global/Id.h"
@@ Expand Down Expand Up / @@ -259,6 +260,14 @@ class AddCombinedRowToIdTable { @@
       LocalVocab& localVocab() { return mergedVocab_; }
+      // Move both the result table and local vocab out as an IdTableVocabPair.
+      // This is a convenience method for the common pattern of moving both out.
+      auto toIdTableVocabPair() && {
+        flush();
+        return Result::IdTableVocabPair{std::move(resultTable_),
+                                        std::move(mergedVocab_)};
+      }
       // Disable copying and moving, it is currently not needed and makes it harder
       // to reason about
       AddCombinedRowToIdTable(const AddCombinedRowToIdTable&) = delete;
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -586,6 +586,9 @@ struct IndexScan::SharedGeneratorState { @@
       bool hasUndef_ = false;
       // Indicates if the generator has been fully consumed.
       bool doneFetching_ = false;
+      // If true, filter the left side (skip non-matching inputs). If false, pass
+      // through all inputs even if they don't match any blocks.
+      bool filterLeftSide_ = true;
       // Advance the `iterator` to the next non-empty table. Set `hasUndef_` to true
       // if the first table is undefined. Also set `doneFetching_` if the generator
@@ Expand Down Expand Up / @@ -643,13 +646,59 @@ struct IndexScan::SharedGeneratorState { @@
               // We have seen entries in the join column that are larger than the
               // largest block in the index scan, which means that there will be no
               // more matches.
+              if (!filterLeftSide_) {
+                // Case B: Push current table before marking as done.
+                prefetchedValues_.push_back(std::move(*iterator_.value()));
+              }
               doneFetching_ = true;
               return;
             }
-            // The current `joinColumn` has no matching block in the index, we can
-            // safely skip appending it to `prefetchedValues_`, but future values
-            // might require later blocks from the index.
-            continue;
+            // Case A: The current `joinColumn` has no matching block in the index.
+            if (filterLeftSide_) {
+              // We can safely skip appending it to `prefetchedValues_`, but future
+              // values might require later blocks from the index.
+              continue;
+            } else {
+              // When not filtering, push the table to prefetchedValues.
+              prefetchedValues_.push_back(std::move(*iterator_.value()));
+              // If buffer grows too large, find a dummy block to add.
+              if (prefetchedValues_.size() > 5) {
+                // Find the last value in the join column of the last prefetched
+                // table.
+                const auto& lastPrefetched = prefetchedValues_.back();
+                auto lastJoinColumn =
+                    lastPrefetched.idTable_.getColumn(joinColumn_);
+                AD_CORRECTNESS_CHECK(!lastJoinColumn.empty());
+                Id lastValue = lastJoinColumn.back();
+                // Find the smallest block whose first entry is larger than
+                // lastValue.
+                // TODO<joka921> This should always be the first block that is still
+                // available. also remove code duplication with the above code.
+                bool foundBlock = false;
+                size_t numBlocksHandled = 0;
+                for (const auto& block : metaBlocks_.getBlockMetadataView()) {
+                  ++numBlocksHandled;
+                  if (CompressedRelationReader::getRelevantIdFromTriple(
+                          block.firstTriple_, metaBlocks_) > lastValue) {
+                    // Found a suitable block, add it to pendingBlocks.
+                    pendingBlocks_.push_back(block);
+                    lastEntryInBlocks_ =
+                        CompressedRelationReader::getRelevantIdFromTriple(
+                            block.lastTriple_, metaBlocks_);
+                    AD_CORRECTNESS_CHECK(numBlocksHandled == 1);
+                    metaBlocks_.removePrefix(numBlocksHandled);
+                    foundBlock = true;
+                    break;
+                  }
+                }
+                if (!foundBlock) {
+                  // No more blocks available, mark as done.
+                  doneFetching_ = true;
+                  return;
+                }
+              }
+              continue;
+            }
           }
           prefetchedValues_.push_back(std::move(*iterator_.value()));
           ql::ranges::move(newBlocks, std::back_inserter(pendingBlocks_));
@@ Expand Down Expand Up / @@ -690,7 +739,19 @@ Result::LazyResult IndexScan::createPrefilteredJoinSide( @@
             if (prefetched.empty()) {
               AD_CORRECTNESS_CHECK(state->doneFetching_);
-              return LoopControl::makeBreak();
+              // If not filtering left side, yield all remaining elements.
+              AD_CORRECTNESS_CHECK(state->iterator_.has_value());
+              auto it = state->iterator_.value();
+              if (!state->filterLeftSide_ && it != state->generator_.end()) {
+                // Advance the iterator past the last value we already yielded.
+                ++it;
+                return LoopControl::breakWithYieldAll(
+                    ql::ranges::subrange(it, state->generator_.end()) |
+                    ql::views::filter(
+                        [](const auto& block) { return !block.idTable_.empty(); }));
+              } else {
+                return LoopControl::makeBreak();
+              }
             }
             // Make a defensive copy of the values to avoid modification during
@@ Expand Down Expand Up @@
     // _____________________________________________________________________________
     std::pair<Result::LazyResult, Result::LazyResult> IndexScan::prefilterTables(
-        Result::LazyResult input, ColumnIndex joinColumn) {
+        Result::LazyResult input, ColumnIndex joinColumn, bool filterLeftSide) {
       AD_CORRECTNESS_CHECK(numVariables_ <= 3 && numVariables_ > 0);
       auto metaBlocks = getMetadataForScan();
       if (!metaBlocks.has_value()) {
         // Return empty results
-        return {Result::LazyResult{}, Result::LazyResult{}};
+        return {filterLeftSide ? Result::LazyResult{} : std::move(input),
+                Result::LazyResult{}};
       }
-      auto state = std::make_shared<SharedGeneratorState>(SharedGeneratorState{
-          std::move(input), joinColumn, std::move(metaBlocks.value())});
+      auto state = std::make_shared<SharedGeneratorState>(
+          SharedGeneratorState{std::move(input),
+                               joinColumn,
+                               std::move(metaBlocks.value()),
+                               std::nullopt,
+                               {},
+                               {},
+                               std::nullopt,
+                               false,
+                               false,
+                               filterLeftSide});
       return {createPrefilteredJoinSide(state),
               createPrefilteredIndexScanSide(state)};
     }
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -132,7 +132,8 @@ class IndexScan final : public Operation { @@
       // there are undef values, the second generator represents the full index
       // scan.
       std::pair<Result::LazyResult, Result::LazyResult> prefilterTables(
-          Result::LazyResult input, ColumnIndex joinColumn);
+          Result::LazyResult input, ColumnIndex joinColumn,
+          bool filterLeftSide = true);
      private:
       // Implementation detail that allows to consume a lazy range from two other
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -11,6 +11,7 @@ @@
     #include <sstream>
     #include <vector>
+    #include "JoinWithIndexScanHelpers.h"
     #include "backports/functional.h"
     #include "backports/type_traits.h"
     #include "engine/AddCombinedRowToTable.h"
@@ Expand All / @@ -29,7 +30,7 @@ @@
     #include "util/JoinAlgorithms/JoinAlgorithms.h"
     using namespace qlever::joinHelpers;
+    using namespace qlever::joinWithIndexScanHelpers;
     using std::endl;
     using std::string;
@@ Expand Down Expand Up @@
       }
     }
-    // _____________________________________________________________________________
-    namespace {
-    // Type alias for the general InputRangeTypeErased with specific types.
-    using IteratorWithSingleCol = InputRangeTypeErased<IdTableAndFirstCol<IdTable>>;
-    // Convert a `CompressedRelationReader::IdTableGeneratorInputRange` to a
-    // `InputRangeTypeErased<IdTableAndFirstCol<IdTable>>` for more efficient access
-    // in the join columns below. This also makes sure the runtime information of
-    // the passed `IndexScan` is updated properly as the range is consumed.
-    IteratorWithSingleCol convertGenerator(
-        CompressedRelationReader::IdTableGeneratorInputRange gen, IndexScan& scan) {
-      // Store the generator in a wrapper so we can access its details after moving
-      auto generatorStorage =
-          std::make_shared<CompressedRelationReader::IdTableGeneratorInputRange>(
-              std::move(gen));
-      using SendPriority = RuntimeInformation::SendPriority;
-      auto range = CachingTransformInputRange(
-          *generatorStorage,
-          [generatorStorage, &scan,
-           sendPriority = SendPriority::Always](auto& table) mutable {
-            scan.updateRuntimeInfoForLazyScan(generatorStorage->details(),
-                                              sendPriority);
-            sendPriority = SendPriority::IfDue;
-            // IndexScans don't have a local vocabulary, so we can just use an empty
-            // one.
-            return IdTableAndFirstCol{std::move(table), LocalVocab{}};
-          });
-      return IteratorWithSingleCol{std::move(range)};
-    }
-    }  // namespace
     // ______________________________________________________________________________________________________
     Result Join::computeResultForTwoIndexScans(bool requestLaziness) const {
       return createResult(
@@ Expand All @@
             // of the child. If we serialize it whenever the join operation yields a
             // table that's frequent enough and reduces the overhead.
             auto leftBlocks =
-                convertGenerator(std::move(leftBlocksInternal), *leftScan);
-            auto rightBlocks =
-                convertGenerator(std::move(rightBlocksInternal), *rightScan);
+                convertGeneratorFromScan(std::move(leftBlocksInternal), *leftScan);
+            auto rightBlocks = convertGeneratorFromScan(
+                std::move(rightBlocksInternal), *rightScan);
             ad_utility::zipperJoinForBlocksWithoutUndef(leftBlocks, rightBlocks,
                                                         std::less{}, rowAdder);
-            leftScan->runtimeInfo().status_ =
-                RuntimeInformation::Status::lazilyMaterializedCompleted;
-            rightScan->runtimeInfo().status_ =
-                RuntimeInformation::Status::lazilyMaterializedCompleted;
+            setScanStatusToLazilyCompleted(*leftScan, *rightScan);
             auto localVocab = std::move(rowAdder.localVocab());
             return Result::IdTableVocabPair{std::move(rowAdder).resultTable(),
                                             std::move(localVocab)};
@@ Expand All / @@ -661,11 +624,12 @@ Result Join::computeResultForIndexScanAndIdTable( @@
             const IdTable& idTable = resultWithIdTable->idTable();
             auto rowAdder = makeRowAdder(std::move(yieldTable));
-            auto permutationIdTable = ad_utility::IdTableAndFirstCol{
-                idTable.asColumnSubsetView(idTableIsRightInput
-                                               ? joinColMap.permutationRight()
-                                               : joinColMap.permutationLeft()),
-                resultWithIdTable->getCopyOfLocalVocab()};
+            auto permutationIdTable =
+                ad_utility::IdTableAndFirstCols<1, IdTableView<0>>{
+                    idTable.asColumnSubsetView(idTableIsRightInput
+                                                   ? joinColMap.permutationRight()
+                                                   : joinColMap.permutationLeft()),
+                    resultWithIdTable->getCopyOfLocalVocab()};
             ad_utility::Timer timer{
                 ad_utility::timer::Timer::InitialStatus::Started};
@@ Expand All / @@ -676,7 +640,7 @@ Result Join::computeResultForIndexScanAndIdTable( @@
             std::optional<std::shared_ptr<const Result>> indexScanResult =
                 std::nullopt;
             auto rightBlocks = [&scan, idTableHasUndef, &permutationIdTable,
-                                &indexScanResult]() -> LazyInputView {
+                                &indexScanResult]() -> LazyInputView<1> {
               if (idTableHasUndef) {
                 indexScanResult =
                     scan->getResult(false, ComputationMode::LAZY_IF_SUPPORTED);
@@ Expand All / @@ -686,7 +650,8 @@ Result Join::computeResultForIndexScanAndIdTable( @@
               } else {
                 auto rightBlocksInternal =
                     scan->lazyScanForJoinOfColumnWithScan(permutationIdTable.col());
-                return convertGenerator(std::move(rightBlocksInternal), *scan);
+                return convertGeneratorFromScan(std::move(rightBlocksInternal),
+                                                *scan);
               }
             }();
@@ Expand All / @@ -704,8 +669,7 @@ Result Join::computeResultForIndexScanAndIdTable( @@
             } else {
               doJoin(blockForIdTable, rightBlocks);
             }
-            scan->runtimeInfo().status_ =
-                RuntimeInformation::Status::lazilyMaterializedCompleted;
+            setScanStatusToLazilyCompleted(*scan);
             auto localVocab = std::move(rowAdder.localVocab());
             return Result::IdTableVocabPair{std::move(rowAdder).resultTable(),
@@ Expand Down Expand Up / @@ -741,8 +705,7 @@ Result Join::computeResultForIndexScanAndLazyOperation( @@
                 convertGenerator(std::move(indexScanSide),
                                  joinColMap.permutationRight()),
                 std::less{}, rowAdder);
-            scan->runtimeInfo().status_ =
-                RuntimeInformation::Status::lazilyMaterializedCompleted;
+            setScanStatusToLazilyCompleted(*scan);
             auto localVocab = std::move(rowAdder.localVocab());
             return Result::IdTableVocabPair{std::move(rowAdder).resultTable(),
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy and prefiltered `OPTIONAL` #2695

Uh oh!

Diff view

Diff view

There are no files selected for viewing

RobinTF Feb 5, 2026

Uh oh!

RobinTF Feb 5, 2026

Uh oh!

RobinTF Feb 5, 2026

Uh oh!

RobinTF Feb 5, 2026

Uh oh!

RobinTF Feb 5, 2026

Uh oh!

Uh oh!

Lazy and prefiltered OPTIONAL #2695

Are you sure you want to change the base?

Uh oh!

Lazy and prefiltered OPTIONAL #2695

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

RobinTF Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

RobinTF Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

RobinTF Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

RobinTF Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

RobinTF Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Lazy and prefiltered `OPTIONAL` #2695

Lazy and prefiltered `OPTIONAL` #2695