Completely refactor the fulltext operations #1093

NickG-1 · 2023-09-14T12:19:47Z

As of this commit, the fulltext index (triggered by ql:contains-word and ql:contains-entity) uses two basic operations:

TextIndexScanForWord: For a given word or prefix, return all text records that contain the word, (possibly together with the matched word in the case of a prefix, and the score of the match).
TextIndexScanForEntity: For a given word or prefix, return a superset of all pairs of (text, entity) where the entity is contained in the text according to ql:contains-entity and the text contains the word. For technical reasons this is a superset: We always have to scan the complete block from the half-inverted index which might belong to a shorter prefix.

The general processing is then as follows:

For each word or prefix that appears as part of the object of a ql:contains-word triple, a TextIndexScanForWord is created.
For each entity or variable that appears as the object of a ql:contains-entity triple, a TextIndexScanForEntity is created.
The rest of the query processing is handled by the "ordinary" query planner using the normal operations like JOIN that are also used to process standard SPARQL queries.

This is much cleaner than the old TextOperationWith[out]Filter operations which combined the functionality of the above scan operations with JOIN operations, because the old approach lead to a lot of code duplication (the code for a join of two tables was duplicated for the fulltext module) and because the new approach makes queries easier to optimize and to reason about because the runtime information trees become much clearer if the scans and joins are represented separately.

…esultTable

This reverts commit a91a811.

codecov · 2024-01-03T00:26:08Z

Codecov Report

Attention: 69 lines in your changes are missing coverage. Please review.

Comparison is base (f7c2c32) 84.34% compared to head (c7e5855) 85.25%.

Files	Patch %	Lines
src/engine/QueryPlanner.cpp	78.31%	31 Missing and 5 partials ⚠️
src/index/IndexImpl.Text.cpp	81.00%	15 Missing and 4 partials ⚠️
src/engine/QueryPlanner.h	64.70%	4 Missing and 2 partials ⚠️
src/parser/sparqlParser/SparqlQleverVisitor.cpp	89.65%	0 Missing and 3 partials ⚠️
src/engine/TextIndexScanForEntity.h	95.45%	0 Missing and 2 partials ⚠️
src/index/Vocabulary.cpp	71.42%	0 Missing and 2 partials ⚠️
src/index/FTSAlgorithms.cpp	96.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1093      +/-   ##
==========================================
+ Coverage   84.34%   85.25%   +0.90%     
==========================================
  Files         304      308       +4     
  Lines       29100    29385     +285     
  Branches     3446     3464      +18     
==========================================
+ Hits        24544    25051     +507     
+ Misses       3153     2938     -215     
+ Partials     1403     1396       -7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

joka921

A second round of reviews, this is already much much much cleaner.

src/engine/QueryPlanner.cpp

src/engine/QueryPlanner.h

test/QueryPlannerTest.cpp

test/engine/TextIndexScanForWordTest.cpp

test/engine/TextIndexScanForEntityTest.cpp

test/engine/TextIndexScanTestHelpers.h

This reverts commit 8a0913a.

joka921

Mostly very minor stuff.

src/engine/QueryPlanner.h

src/engine/TextIndexScanForEntity.cpp

src/engine/TextIndexScanForEntity.h

src/engine/TextIndexScanForWord.h

src/parser/data/Variable.h

src/parser/data/VariableToColumnMapPrinters.cpp

test/QueryPlannerTestHelpers.h

…o wordIndexScan

sonarqubecloud · 2024-01-18T12:12:30Z

Quality Gate passed

The SonarCloud Quality Gate passed, but some issues were introduced.

13 New issues
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

joka921

Thank you very much,
This makes the text index code much much cleaner.

) Since #1093 we use a much simpler approach for answering full-text queries that contain `ql:contains-word` and `ql:contains-entity`. That PR made a lot of old code for the text index obsolete. This code is now deleted.

NickG-1 and others added 30 commits February 24, 2023 19:09

quick fix so indexToOptionalString works

a91a811

Increased readability in IndexImpl.Text.cpp plus WordId shows up in r…

36bc2fd

…esultTable

small fix

a8483fc

functional added ?completedWord to output

aef409d

Revert "quick fix so indexToOptionalString works"

b7c468e

This reverts commit a91a811.

changed gitignore

8d8d54b

Merge branch 'uptodate'

ddc76ef

reintegrated vocab quick fix

88f2783

merge fixes

0e2db52

fixed output of completedWord

ae113f4

formatting

9afc2e5

Merge branch 'ad-freiburg:master' into master

fe4ae9a

Merge branch 'uptodate'

4f2e57c

Merge branch 'master' of https://github.com/NickG-1/qlever

9ccb914

Merge branch 'ad-freiburg:master' into master

371eb30

PR review changes

f523dc1

sonar and formatter

012786d

renaming and bug fix in aggScoresAndTakeTop...

6dd7bcf

sonar stayle changes

3d7b9c5

small fix

486fef5

sonar

f00eeef

fixed tests

bf40643

formatting

dd8e9e0

Merge branch 'uptodate'

89602c2

clean up

b8f1823

added test-cases

e2c9ec5

formatting

ea86a37

Merge branch 'uptodate'

4a1e3af

adapt to merge

1c2c8b4

Merge branch 'uptodate'

32bc101

NickG-1 added 6 commits December 25, 2023 13:46

added score columns to output and made their names unambiguous

675bad9

review changes

9ac14b4

Merge branch 'uptodate' into wordIndexScan

bb1b101

review changes

06fe14d

review changes

4fbbbe6

bug fix and added tests

ffd79f3

sonar and codecov changes

5ff7aff

joka921 reviewed Jan 10, 2024

View reviewed changes

NickG-1 added 8 commits January 11, 2024 17:55

review changes

a3998b7

bug fix

6a91b48

format

db86119

codecov and sonar

e098b75

bug fix

835956d

formatting

8a0913a

Revert "formatting"

f870f3b

This reverts commit 8a0913a.

formatting

4c835a7

joka921 requested changes Jan 16, 2024

View reviewed changes

NickG-1 and others added 5 commits January 17, 2024 23:30

review changes

a21da1f

Fix newline character

1e03a7e

Merge branch 'ad-freiburg:master' into wordIndexScan

8456b3b

Merge branch 'wordIndexScan' of https://github.com/NickG-1/qlever int…

00ae623

…o wordIndexScan

sonar

c7e5855

joka921 approved these changes Jan 18, 2024

View reviewed changes

joka921 changed the title ~~Added WordIndexScan~~ Completely refactor the fulltext operations Jan 18, 2024

joka921 merged commit 8f9b13a into ad-freiburg:master Jan 18, 2024

NickG-1 deleted the wordIndexScan branch January 18, 2024 15:09

joka921 mentioned this pull request Jan 19, 2024

Delete old and by now unused code for answering full-text queries #1231

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Completely refactor the fulltext operations #1093

Completely refactor the fulltext operations #1093

Uh oh!

NickG-1 commented Sep 14, 2023 •

edited by joka921

Loading

Uh oh!

codecov bot commented Jan 3, 2024 •

edited

Loading

Uh oh!

joka921 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joka921 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud bot commented Jan 18, 2024

Uh oh!

joka921 left a comment

Uh oh!

Uh oh!

Completely refactor the fulltext operations #1093

Completely refactor the fulltext operations #1093

Uh oh!

Conversation

NickG-1 commented Sep 14, 2023 • edited by joka921 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

joka921 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joka921 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud bot commented Jan 18, 2024

Quality Gate passed

Uh oh!

joka921 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NickG-1 commented Sep 14, 2023 •

edited by joka921

Loading

codecov bot commented Jan 3, 2024 •

edited

Loading