Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced search API #3161

Merged
merged 78 commits into from
Feb 12, 2025
Merged

Advanced search API #3161

merged 78 commits into from
Feb 12, 2025

Conversation

jiru
Copy link
Member

@jiru jiru commented Feb 5, 2025

This brings advanced search filtering over a consistent, documented and unit-tested API. Because this PR revamps the search filtering code, it may introduce bugs/changes in the existing web search functionality, although I did my best to prevent this.

Note: once this got merged, the new API will be on par with the existing /api_v0 in terms of features, so I think we could consider closing #2983, #3133 and #2524.

I put the documentation to api.dev.tatoeba.org for anyone to check out (documentation only; the branch is not deployed there, so you may not perform any API request).

jiru added 30 commits August 21, 2024 10:12
This adds support for searching sentences owned by one of the specified users.
for sentence language filter.

The current Search model implementation is very loose regarding invalid parameters,
it just ignores them. The reason is that it was implemented primarily for web sentence
search, which goal is to somehow always show search results no matter how wrong
the parameters are (and to eventually, sometimes display a warning message).

Now, in order to reuse the Search model for a search API, we need proper error reporting,
which will be implemented through exceptions.
Display all API errors in json instead of potientially HTML,
so that API consumers get machine-readable errors.

Use Cake HTTP exceptions instead of return $response->withStatus().
This way, when debug is enabled, it helps by showing the location
the exception was thrown from; when debug is disabled,
it shows a more complete message for the consumer.

What’s more, we can now customize the json format of errors.

Note that there is a limitation to this approach of putting an ErrorController.php
file in the prefix routing path: if an exception occurs before routing too early
(such as before routing is done), the error will be shown in HTML anyway. But hey,
this is a limitation of using CakePHP routing mechanism to split execution paths
between website and API.
More reusable code, more adjustments for API filters.
The goal is to split the Search model into different filter classes
that can all be manipulated the same way.
So that we can use a simpler version with just compile() and anyOf()
for the word count filter.
Also trying to make testSearchParams() more useful by testing
the post-execution state rather than what methods are internally
called. This is much needed because soon we’ll have nothing but
calls to Search::setFilter() and it will be a mess to know which
filter it is called with and when.
Rewrite translation filters into a group of filters
having currently one filter inside: translation lang.

This implementation also allows stacking multiple translation filter groups,
paving the way for #2031.
I was so excited to make this filter work with multiple users
that I named it using plural. But it’s better to keep consistent
with the other filter names.
Take advantage of parent class BoolFilter logic.
Use a name that indicates it’s a boolean filter.
Use the new class-based filters to filter results on a search.
jiru added 28 commits November 28, 2024 09:13
There are currently two ways to set the value of boolean filters,
either passing it as constructor param or setting it later with ->not().
This commit fixes the ->not() case and adds a test for it.
This will be useful for precise error reporting.
Throw exceptions in the filters where it does not make sense
(or it is error-prone) to combine with logical AND.
We want to forbid usage of ->and() on that filter (because it’s a boolean filter,
it does not make sense to do native=yes AND native=no, that would certainly be
some human mistake), however internally it uses ->and() to build
a list of user ids, so I made that method internal.
Throw exceptions in the filters where it does not make sense
(or it is not possible) to perform a logical NOT.
That method was used to carry the language currently set
in the LanguageFilter. Instead, this commit introduces a
more generic setSearch() method that allows any filter to
access the associated Search model it is used in, paving
the way for more filters depending on context data.
This parameter was removed in 02b5391.
Throw a proper InvalidValueException when an invalid list id is provided,
otherwise an InvalidArgumentException('Cannot convert value of type `string` to integer')
is thrown because we pass that value directly to the find() query.
Fail on arbitrary strings input. Previously, it used not to fail
because in PHP this statement is evaluated as true: "invalid" == 0
It does not really make sense because the only possible values for
this filter are "0" and "not 0". We prevent AND to avoid human errors.
* Supports all the search filters available
* Properly documented using openapi manifest
* Allows to combine filters with boolean AND, OR and NOT
* Multi-stacked translation filters (#2031)
* Good unit test coverage
Add sentence id as secondary sort key, so that the resulst order is
predictable even when the first key is equal.

For example, if two sentences with the same "last modified" time are
returned, when ordered by "last modified", these sentences should
always show up in the same order, and the same search made in reversed
order should always be the opposite of the normal order.

So far it was only the case for the "random" order (which can be
reproducted by providing the same random seed), but now all sort
orders are deterministic.
This makes the internal sort order (the "ORDER BY" keys) accessible
to search filters. This will be used by the cursor filter.
This will be used to perform keyset pagination.
This will be used for keyset pagination. The computed cursor
value is to be used as input value for the CursorFilter filter.
This will be used for keyset pagination. The computed cursor
value is to be used as input value for the CursorFilter filter.
Return cursor value along with search results to allow keyset pagination.

Keyset pagination allows to paginate through an arbirary number of results,
not limited by the Manticore "max_matches" limit of 1000 results.

This commit also makes the number of results available in the API response,
but only for the first page. For other pages, it is hidden because keyset
pagination actually turns that value into the "number of results in that page
and subsequent pages" (the number decreases as we go through the pages),
which is rather confusing for a "total".
Not sure how I ended up writing that line, but it’s not use.
We are still using the Paginator helper in that view,
so it should be declared here.
(I am not sure why it kept working without it.)
@jiru jiru added this to the next milestone Feb 12, 2025
@jiru jiru merged commit 8191f19 into dev Feb 12, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant