Feature/relevant suggestions and more #1602

kwahlin · 2025-06-18T16:56:54Z

Mainly solves https://kbse.atlassian.net/browse/LWS-272. The API should now continually respond with relevant suggestions depending on which part of the query is being edited, given that the current query string _q is accompanied with supplementary API parameters _suggest=true and cursor=i where i is the current cursor position.

Basically much of the "supersearch" machinery has been moved from frontend to backend in order to get rid of fragile temporary solutions and address known flaws.

Summary of changes/improvements:

Always do a prefix search when editing free text (within property included). This is done by adding an asterisk at the end of the currently edited word. In this way we'll get suggestions for incomplete queries while typing (e.g. lotta på bråkm will give us lotta på bråkmakagargatan).
In terms of relevancy, exact matches will be favored over "soft" matches, so when typing e.g. johan we expect fields containing the string johan to score higher than johanna, however when typing only joh we can expect fields containing johan, johanna, johannes etc. to score similarly. This is achieved by doing an extra query with the exact string, beside the prefix query.
The prefix query is done more or less the same way as a regular query, however the extra search that we do on the side to give a little extra boost for exact matching phrases must be done a little differently. For a regular query like lotta på bråkamakargatan we simply search the phrase "lotta på bråkmakargatan" too but for a prefix query like lotta på bråkm* we need to exclude the prefixed word due to prefix queries not working well (at least not performance-wise) with the scoring algorithm (BM25). Thus in this example we'd do an extra phrase search only for "lotta på", however with proper scoring.
Otherwise the Elastic query is composed more or less the same way as a regular query, see

librisxl/whelk-core/src/main/groovy/whelk/search2/querytree/FreeText.java

Line 48 in 3613418

public Map<String, Object> toEs(EsMappings esMappings, EsBoost.Config boostConfig) {

for details.
Free text searching within specific properties/fields now works too, quoted (e.g. contributor:"astrid lindgren") as well as unquoted (e.g. contribution:(astrid lindgren)). I had to change the structure quite a bit to achieve this, see mainly 88729ea. A FreeText object now holds a list of tokens instead of simply a string and each Token in turn holds valuable information that before was lost in the parsing step. Throughout the whole query machinery we now keep track of whether a token was quoted in the original query string and thus should be treated as a phrase. We also keep track of each token's position (index) in the original query string, which we need for knowing if it's currently being edited, by matching this value against the given cursor position.
@jannistsiroyannis I feel that I may have "cheated" with this part so I'd like your input. See changes in search2.parse. I just took what I needed for now 😄
Just like before suggestions will differ depending on what part of a query i being edited, however the suggestions should now be more accurate thanks to stricter vocab controlling and less hard coded rules.
- For simple free text we'll search for Agent, Concept, Language or Work, just like before. These types are still hard coded. Predicates that can be associated with a suggestion are also still hard coded. Showing all possibilities is probably not a good idea since there are obscure properties that we don't want to expose for the user. The hard coded mappings should at least be more comprehensive and accurate now. (We won't end up suggesting e.g. a Library to be added as subject.)
- When searching within a property, we'll use the same base types (Agent, Concept, Language or Work) and match those with the property's range know what types of resources to suggest. We'll query for all types that the property may point to which also belong to any of the base classes. If the range does not match any of the base types, we'll just search the default type (:Work). So searching within :contributor will give suggestions of type :BibliographicAgent (pending Feature/adjustments for libris search definitions#532) while searching within :title will give suggestions of type :Work.
A suggest query result item that is not a direct hit (i.e. has another type than :Work) is complemented with a list of predicates that may be used for adding it as a query criteria (e.g. add a :Person as contributor) and each predicate in the list comes with a full find URL with that criteria added to _q along with the new expected cursor position after navigating to the URL (we probably need to tweak what the expected position should be). Using a ready-made URL solves the current problem of the raw resource URI being visible for a short moment instead of the "pretty" chip when adding a suggested search criteria.

Not really related to the task but some other things that I couldn't help fixing while at it:

Dates and numeric values are now parsed as such, which should result in more efficient and accurate ES queries (at least range queries).
A bug that made the "include e-plikt" filter (https://github.com/libris/definitions/blob/1d0228161a97d3da50307aeb18700b3a7b8ae889/source/apps.jsonld#L96) ineffective (b5588b2).

In current state these changes require only a small frontend fix (libris/lxlviewer#1328) but more changes will be needed in order to make use of the new suggest functionality. I've started myself to adjust some of the frontend code (libris/lxlviewer#1329). Feel free to take over from there when this one is merged @jesperengstrom @johanbissemattsson.

Once we've switched over to putting input within brackets instead of quotes when searching within property (https://kbse.atlassian.net/browse/LWS-325) we should also revert 70d6482.

I intend to add some more unit tests and clarifying comments.

* Set default phrase boost divisor to a high number for now to make sure that the score from phrase matching isn't too big. * The number can and will be tuned with the temporary API parameter _phraseBoostDivisor. * Keep together boosting config to facilitate passing on config supplied via REST API.

This reverts commit 5296b54.

* Ensures that free text search within property works the same way as "simple" free text search * Don't flatten AST, control the tree structure with QueryTreeBuilder * Enables phrase searching within property * Multiple free text tokens are now held together when within brackets * WIP: Introduce new (structured) Value types: Numeric, Date

…lt list

kwahlin added 30 commits May 22, 2025 13:19

Make sure scores from all fields are always summed

a1b4a91

Support prefixed words in phrase

5296b54

Omit masked/truncated strings from phrase query

343f94f

WIP: Prefer exact matches if in suggest mode

f94e1bd

Refactor and add clarifying comments

6448bb6

Simplify boost config

2ebdba2

Fix method call

bada7f3

Cleanup: Remove never(?) used method

4c89979

Reuse logic for free text search within property

e099fba

WIP: Keep track of quotes and offsets for tokens

fa04334

Get rid of no longer needed method for quoting

7641eaa

Revert "Support prefixed words in phrase"

294d3e9

This reverts commit 5296b54.

Match cursor position with token offsets to get prefix query right

0d2dc2b

Recognize abstract classes such as BibliographicAgent

16eca42

Adjust suggestions depending on the query

4c9432f

Fix a few things

20fcb41

Handle queries that make little or no sense

d39473a

Recognize ES long fields

271c3ee

Add reverseLinks filter

7326987

Don't search alternative paths for platform terms

aae56a8

Parse reverseLinks filter from string

3c0a48e

Remove misconception

fda2cb7

Handle date fields

0b58bec

Limit suggestions to BibliographicAgent rather than Agent

2f826b7

Update unit tests and fix things accordingly

cbd7008

Merge branch 'develop' into feature/relevant-suggestions-and-more

71cee5a

Widen suggestion range from BibliographicAgent to Agent again

4a29a86

Bugfix: Get negation right for nested field

b5588b2

kwahlin added 8 commits June 11, 2025 09:18

Get date string right

1366a4c

Make Suggest a separate search mode

affec4e

Include supplementary data along with each item in suggest query resu…

bc7421f

…lt list

Fix things

cf3aaee

Support removing and replacing (in place) nodes anywhere in a query tree

f118a2a

Don't set _stats=false for suggestions

d8624c0

Simplify add/remove/replace operations on query tree to avoid confusion

3613418

Reintroduce quoted search bug temporarily

70d6482

kwahlin mentioned this pull request Jun 18, 2025

Expect quoted input strings to be quoted in search.mapping too libris/lxlviewer#1328

Merged

kwahlin requested review from olovy and jannistsiroyannis June 18, 2025 18:04

kwahlin mentioned this pull request Jun 18, 2025

feat(lxlweb): implement _suggest (LWS-272) libris/lxlviewer#1329

Merged

Fix failing unit tests

0823aba

kwahlin merged commit 0695b11 into develop Jun 24, 2025
1 check passed

kwahlin deleted the feature/relevant-suggestions-and-more branch June 24, 2025 07:47

jesperengstrom mentioned this pull request Jun 27, 2025

feat(lxlweb, supersearch): Use up links for pills again (LWS-387 libris/lxlviewer#1338

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/relevant suggestions and more #1602

Feature/relevant suggestions and more #1602

Uh oh!

kwahlin commented Jun 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Feature/relevant suggestions and more #1602

Feature/relevant suggestions and more #1602

Uh oh!

Conversation

kwahlin commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kwahlin commented Jun 18, 2025 •

edited

Loading