Skip to content

Feature/relevant suggestions and more #1602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 39 commits into from
Jun 24, 2025

Conversation

kwahlin
Copy link
Contributor

@kwahlin kwahlin commented Jun 18, 2025

Depends on libris/lxlviewer#1328.

Mainly solves https://kbse.atlassian.net/browse/LWS-272. The API should now continually respond with relevant suggestions depending on which part of the query is being edited, given that the current query string _q is accompanied with supplementary API parameters _suggest=true and cursor=i where i is the current cursor position.

Basically much of the "supersearch" machinery has been moved from frontend to backend in order to get rid of fragile temporary solutions and address known flaws.

Summary of changes/improvements:

  • Always do a prefix search when editing free text (within property included). This is done by adding an asterisk at the end of the currently edited word. In this way we'll get suggestions for incomplete queries while typing (e.g. lotta på bråkm will give us lotta på bråkmakagargatan).
  • In terms of relevancy, exact matches will be favored over "soft" matches, so when typing e.g. johan we expect fields containing the string johan to score higher than johanna, however when typing only joh we can expect fields containing johan, johanna, johannes etc. to score similarly. This is achieved by doing an extra query with the exact string, beside the prefix query.
  • The prefix query is done more or less the same way as a regular query, however the extra search that we do on the side to give a little extra boost for exact matching phrases must be done a little differently. For a regular query like lotta på bråkamakargatan we simply search the phrase "lotta på bråkmakargatan" too but for a prefix query like lotta på bråkm* we need to exclude the prefixed word due to prefix queries not working well (at least not performance-wise) with the scoring algorithm (BM25). Thus in this example we'd do an extra phrase search only for "lotta på", however with proper scoring.
  • Otherwise the Elastic query is composed more or less the same way as a regular query, see
    public Map<String, Object> toEs(EsMappings esMappings, EsBoost.Config boostConfig) {
    for details.
  • Free text searching within specific properties/fields now works too, quoted (e.g. contributor:"astrid lindgren") as well as unquoted (e.g. contribution:(astrid lindgren)). I had to change the structure quite a bit to achieve this, see mainly 88729ea. A FreeText object now holds a list of tokens instead of simply a string and each Token in turn holds valuable information that before was lost in the parsing step. Throughout the whole query machinery we now keep track of whether a token was quoted in the original query string and thus should be treated as a phrase. We also keep track of each token's position (index) in the original query string, which we need for knowing if it's currently being edited, by matching this value against the given cursor position.
    @jannistsiroyannis I feel that I may have "cheated" with this part so I'd like your input. See changes in search2.parse. I just took what I needed for now 😄
  • Just like before suggestions will differ depending on what part of a query i being edited, however the suggestions should now be more accurate thanks to stricter vocab controlling and less hard coded rules.
    • For simple free text we'll search for Agent, Concept, Language or Work, just like before. These types are still hard coded. Predicates that can be associated with a suggestion are also still hard coded. Showing all possibilities is probably not a good idea since there are obscure properties that we don't want to expose for the user. The hard coded mappings should at least be more comprehensive and accurate now. (We won't end up suggesting e.g. a Library to be added as subject.)
    • When searching within a property, we'll use the same base types (Agent, Concept, Language or Work) and match those with the property's range know what types of resources to suggest. We'll query for all types that the property may point to which also belong to any of the base classes. If the range does not match any of the base types, we'll just search the default type (:Work). So searching within :contributor will give suggestions of type :BibliographicAgent (pending Feature/adjustments for libris search definitions#532) while searching within :title will give suggestions of type :Work.
  • A suggest query result item that is not a direct hit (i.e. has another type than :Work) is complemented with a list of predicates that may be used for adding it as a query criteria (e.g. add a :Person as contributor) and each predicate in the list comes with a full find URL with that criteria added to _q along with the new expected cursor position after navigating to the URL (we probably need to tweak what the expected position should be). Using a ready-made URL solves the current problem of the raw resource URI being visible for a short moment instead of the "pretty" chip when adding a suggested search criteria.

Not really related to the task but some other things that I couldn't help fixing while at it:

In current state these changes require only a small frontend fix (libris/lxlviewer#1328) but more changes will be needed in order to make use of the new suggest functionality. I've started myself to adjust some of the frontend code (libris/lxlviewer#1329). Feel free to take over from there when this one is merged @jesperengstrom @johanbissemattsson.

Once we've switched over to putting input within brackets instead of quotes when searching within property (https://kbse.atlassian.net/browse/LWS-325) we should also revert 70d6482.

I intend to add some more unit tests and clarifying comments.

kwahlin added 30 commits May 22, 2025 13:19
* Set default phrase boost divisor to a high number for now to make sure that the score from phrase matching isn't too big.
* The number can and will be tuned with the temporary API parameter _phraseBoostDivisor.
* Keep together boosting config to facilitate passing on config supplied via REST API.
* Ensures that free text search within property works the same way as "simple" free text search
* Don't flatten AST, control the tree structure with QueryTreeBuilder
* Enables phrase searching within property
* Multiple free text tokens are now held together when within brackets
* WIP: Introduce new (structured) Value types: Numeric, Date
@kwahlin kwahlin merged commit 0695b11 into develop Jun 24, 2025
1 check passed
@kwahlin kwahlin deleted the feature/relevant-suggestions-and-more branch June 24, 2025 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant