-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search MVP default search method with IDN data #153
Comments
Testing with just the inner SELECT query now, the main issue seems to be that this query searches across all triples. Also, this weighted regex is significantly faster (0.035s vs 29.189s in Fuseki) if we implement something similar to the "skosWeighted" search method - https://github.com/RDFLib/prez/blob/main/prez/reference_data/search_methods/search_skos_weighted.ttl . See below: SELECT ?search_result_uri ?predicate ?match (SUM(?w) AS ?weight) ?hashID
WHERE {
?search_result_uri ?predicate ?match .
?search_result_uri <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> .
?search_result_uri <http://www.w3.org/2004/02/skos/core#inScheme> <https://linked.data.gov.au/def/data-access-rights> .
BIND(URI(CONCAT("urn:hash:", SHA256(CONCAT(STR(?search_result_uri), STR(?predicate), STR(?match))))) AS ?hashID)
{
?search_result_uri ?predicate ?match .
BIND (50 AS ?w)
FILTER (REGEX(?match, "^open$", "i"))
} UNION {
?search_result_uri ?predicate ?match .
BIND (20 AS ?w)
FILTER (REGEX(?match, "^open", "i"))
} UNION {
?search_result_uri ?predicate ?match .
BIND (10 AS ?w)
FILTER (REGEX(?match, "open", "i"))
}
} GROUP BY ?search_result_uri ?predicate ?match ?hashID ORDER BY DESC(?weight) LIMIT 10 Since we'll probably only be searching across labels & descriptions, and returning objects that have endpoints in Prez, we could restrict the predicates that are matched and the base classes of the results to further optimise the query. |
Looks like it's the query structure. Lets see if we can add back in the CONSTRUCT to your performant REGEX above. For context as well, FTS query below.
|
How does this look?
|
Looks good, nice and fast at about 0.035s. Not aggregating just means you'll get duplicate results in the case where a result satisfies multiples matches. What do you think of restricting the matched predicate to labels & descriptions? Description matching could be worth less too. Also what do you think of restricting the base class to classes Prez supports? |
This would be a closed profile with no properties defined. You'll then get labels/descriptions when the annotations are added. Profiles changes coming soon ..
Sounds good - any issue adding LCASE back in too for "exact" match?
Ideally I think prez could display whatever information about whatever object is found, perhaps on a generic page if there isn't a suitable endpoint |
David to:
|
Resolved in #149 |
Testing the "default" regex search method takes over 30s against the IDN triplestore for the following query:
http://localhost:8000/search?term=open&method=default&limit=10&focus-to-filter[rdf:type]=http%3A%2F%2Fwww.w3.org%2F2004%2F02%2Fskos%2Fcore%23Concept&focus-to-filter[skos:inScheme]=https%3A%2F%2Flinked.data.gov.au%2Fdef%2Fdata-access-rights
The text was updated successfully, but these errors were encountered: