Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ql:contains-word now can show the respective word score #1397

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

Flixtastic
Copy link

modified: src/engine/TextIndexScanForWord.cpp
Updated computeVariableToColumnMap to get a variable for word score
Updated getResultWidth which now is one column wider

modified: src/index/FTSAlgorithms.cpp
Updated now filters a table with 3 instead of 2 columns
TODO generalize to a table with arbitrary amount of columns (also use views::zip once c++23 is in use)

modified: src/index/IndexImpl.Text.cpp
Updated readWordCl to read the frequency compressed list for scores and add an extra column for them before returning the table

modified: src/parser/sparqlParser/SparqlQleverVisitor.cpp
Updated setMatchingWordAndScoreVisibleIfPresent to add the score variables for words when implicitly asked for with *

@Flixtastic
Copy link
Author

modified: src/index/IndexImpl.Text.cpp
Formatted with clang

modified: test/QueryPlannerTestHelpers.h
Updated expected result width in TextIndexScanForWord

modified: test/engine/TextIndexScanForWordTest.cpp
Added include to print variables
Updated result widths expected in tests
Updated variables expected in ColumnMap

Problems:
When using clang-16 with the .clang-format file provided in the qlever folder on the .cpp files, the respective .h includes get moved down to all other includes instead of being on top.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This already looks very nice, I only have a few comments, but the most things we have already discussed today.

smallIdTable.resize(idTable.numRows());
std::ranges::copy(idTable.getColumn(0), smallIdTable.getColumn(0).begin());
std::ranges::copy(idTable.getColumn(2), smallIdTable.getColumn(1).begin());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to copy all the columns...
The IdTable class has a function setColumnSubset which can be used to do this more efficiently(stripping columns works well with column based layouts).

@@ -1279,6 +1279,7 @@ void Visitor::setMatchingWordAndScoreVisibleIfPresent(
}
for (std::string_view s : std::vector<std::string>(
absl::StrSplit(name.substr(1, name.size() - 2), ' '))) {
addVisibleVariable(var->getScoreVariable(std::string(s)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to drop the std::string(...) again once you have a dedicated function for the getWordScoreVariable.
And rename the getScoreVariable to getEntityScoreVariable or something.


result = s2.computeResultOnlyForTesting();
ASSERT_EQ(result.idTable().numColumns(), 1);
ASSERT_EQ(result.idTable().numColumns(), 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, let the getQec()function take the contents of a ContextFile. Then we can finally properly test the test index.

Copy link

sonarcloud bot commented Jul 12, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants