Skip to content

Add missing internal triples for UPDATEs#2674

Open
RobinTF wants to merge 9 commits intoad-freiburg:masterfrom
RobinTF:add-missing-internal-triples
Open

Add missing internal triples for UPDATEs#2674
RobinTF wants to merge 9 commits intoad-freiburg:masterfrom
RobinTF:add-missing-internal-triples

Conversation

@RobinTF
Copy link
Collaborator

@RobinTF RobinTF commented Jan 28, 2026

This PR adds code to add the missing "object"@language ql:langtag <@language> internal triples to the delta triples on insertions. This way all kinds of language filters now work with update. A caveat is that these triples are never removed again, so the memory requirement will simply increase more and more, but the behaviour will never be wrong, since these new triples are always joined with a regular index scan before being used.

@RobinTF RobinTF requested review from hannahbast and joka921 January 28, 2026 14:03
@codecov
Copy link

codecov bot commented Jan 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.59%. Comparing base (8c2d7c0) to head (6b1d3eb).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2674   +/-   ##
=======================================
  Coverage   91.58%   91.59%           
=======================================
  Files         480      480           
  Lines       41357    41368   +11     
  Branches     5494     5496    +2     
=======================================
+ Hits        37877    37889   +12     
+ Misses       1901     1900    -1     
  Partials     1579     1579           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A request for minor improvements.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small suggestions for the caching, let's see how easy they are to implement, but there is a chance to learn something here:)

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely improved, I have some minor suggestions, but the next round should be good to go.

ad_utility::util::LRUCache<std::string, Id> languageTagCache_{
languageTagCacheSize_};

// Cache commonly used predicates between calls.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be more precise. It caches the IDs of commonly used language tagged predicates like @en@rdfs:label

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what it does though. It just caches the predicates without the language tags, because those might be the ones that are most likely expensive to look up.

CPP_template(typename Key, typename Func)(
requires ad_utility::InvocableWithConvertibleReturnType<
Func, V, const K&>) const V& getOrCompute(const K& key,
Func, V, const K&>) const V& getOrCompute(Key&& key,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be Func, V, const Key& in the requires clause for a little more precision.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it shouldn't, because we end up passing a const ref of the actual thing to the function. (I ended up with slightly different code than we discussed)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, thanks for the explanation.

}
auto result = cache_.try_emplace(key, computeFunction(key), keys_.begin());
auto result = cache_.try_emplace(
AD_FWD(key), computeFunction(keys_.front()), keys_.begin());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that does the trick (the double string creation still is not nice, but that was already there before...)

languageTagCacheSize_};

// Cache commonly used predicates between calls.
static constexpr size_t predicateCacheSize_ = 50;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both of the cache sizes are a little small (it is global for the full index, maybe use a few hundreds or thousands?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the language tags, the cache size seems reasonable in my opinion, there aren't that many frequently used languages. For the predicates you might have a point. It's heavily dataset dependent though. For Wikidata to get to a predicate that's being used fewer than 1M times, you'd have to cache at least 587 predicates which are used more frequently.
I'll increase it to 100 for now and I'll leave the final decision to @hannahbast

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank youy very much, feel free to forward this to @hannahbast

@sparql-conformance
Copy link

Overview

Number of Tests Passed ✅ Intended ✅ Failed ❌ Not tested
547 450 73 24 0

Conformance check passed ✅

No test result changes.

Details: https://qlever.dev/sparql-conformance-ui?cur=6b1d3eb6b28b6b08b0c1e0905ad9e0eaf10c1cba&prev=8c2d7c0ae8710cd555004525bedd27ffac060b1b

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants