Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply PrefilterExpressionIndex in IndexScan #1619

Open
wants to merge 91 commits into
base: master
Choose a base branch
from

Conversation

realHannes
Copy link
Collaborator

No description provided.

realHannes and others added 30 commits April 25, 2024 19:07
realHannes and others added 27 commits October 12, 2024 14:01
This allows the `CartesianProductJoin` operation to produce lazy results as well as to consume a single lazy result itself.
Currently only the last child of the Cartesian product can be consumed lazily, but this can in principle be changed in the future. In particular, the most memory-saving way would be to lazily consume the largest child that supports lazy evaluation.
The `SparqlExpression` base class has been extended with the method `getPrefilterExpressionForMetadata`. This method constructs for suitable (logical) expressions which are used inside a `FILTER` a corresponding `PrefilterExpression` (see PR ad-freiburg#1503). These `PrefilterExpression`s can be used to prefilter the blocks of an `IndexScan` by only looking at their metadata.
At the moment, the following expressions provide an overriden implementation of `getPrefilterExpressionForMetadata`: `strstarts` (preliminary), `logical-or` and `logical-and` (binary),  `logical-not` (unary) and the standard `RelationalExpressions (<, ==, >, <=, >=)`.
Add a new aggregate function `STDEV(X)` which computes the (sample) standard deviation, such that a user will not have to repetitively type `math:sqrt(sum(math:pow((X - avg(X)), 2)) / (count(*) - 1))`. This is not part of the SPARQL standard, but also doesn't cause any conflicts.
PR ad-freiburg#1582 and ad-freiburg#1603 gave all index-lookup methods access to a snapshot of the (located) delta triples. With this change, these triples are now merged with the original triples during query processing whenever necessary. When an index block does not contain any located triples, the performance for accessing that block is the same as before.

The methods for obtaining the result size of an index scan now have two versions: one for obtaining an approximate size (this is cheap because it can be computed from the metadata of the blocks and the located triples) and one for obtaining the exact size (if there are located triples this is expensive because it requires reading and decompressing a block and merging the located triples).
@sparql-conformance
Copy link

Copy link

codecov bot commented Nov 15, 2024

Codecov Report

Attention: Patch coverage is 86.90476% with 11 lines in your changes missing coverage. Please review.

Project coverage is 89.25%. Comparing base (77ac964) to head (96ded86).

Files with missing lines Patch % Lines
src/engine/IndexScan.cpp 81.81% 10 Missing ⚠️
src/engine/Filter.cpp 91.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1619   +/-   ##
=======================================
  Coverage   89.24%   89.25%           
=======================================
  Files         374      374           
  Lines       35315    35370   +55     
  Branches     3988     3994    +6     
=======================================
+ Hits        31518    31569   +51     
- Misses       2503     2510    +7     
+ Partials     1294     1291    -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

sonarcloud bot commented Nov 15, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants