Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typesense throwing errors whith latest v10.11.3 update #865

Closed
Braunson opened this issue Sep 18, 2024 · 6 comments
Closed

Typesense throwing errors whith latest v10.11.3 update #865

Braunson opened this issue Sep 18, 2024 · 6 comments

Comments

@Braunson
Copy link

Scout Version

10.11.3

Scout Driver

Typesense

Laravel Version

10.48.2

PHP Version

8.2.23

Database Driver & Version

No response

SDK Version

No response

Meilisearch CLI Version

No response

Description

The most recent MR #858 causes an issue with Typesense. For example, this code works:

$results = Names::search($this->keyword)
    ->options([
        'filter_by' => $this->buildFilterString(),
        'sort_by' => implode(',', $this->sortBy),
    ])
    ->paginate(48);

But if I drop in this line between options and paginate:

->query(fn (Builder $query) => $query->with('someRelation'))

We then get this error back from Typesense:

Only up to 250 hits can be fetched per page.

The only solution to solve that is to again set the per_page option to match the ->paginate() value.

It looks like adding back #824 resolves this.

Steps To Reproduce

  1. Install Laravel Scout, Typesense
  2. Setup model with thousands of records and import with Scout
  3. Setup model relation
  4. Use query above using code to re-test
Copy link

Thank you for reporting this issue!

As Laravel is an open source project, we rely on the community to help us diagnose and fix issues as it is not possible to research and fix every issue reported to us via GitHub.

If possible, please make a pull request fixing the issue you have described, along with corresponding tests. All pull requests are promptly reviewed by the Laravel team.

Thank you!

@tharropoulos
Copy link
Contributor

This issue and #866 are tightly linked. The paginate function calls the getTotalCount function to get the total count of the results.

'total' => $this->getTotalCount($rawResults),

When the query callback is defined, whether empty or not, it will pass the null check:

scout/src/Builder.php

Lines 495 to 497 in 8790681

if (is_null($this->queryCallback)) {
return $totalCount;
}

As the count of the mapped ids will always be lesser than total count (the number of the ids will at most match the per page variable), it will go on to:

scout/src/Builder.php

Lines 501 to 507 in 8790681

if (count($ids) < $totalCount) {
$ids = $engine->keys(tap(clone $this, function ($builder) use ($totalCount) {
$builder->take(
is_null($this->limit) ? $totalCount : min($this->limit, $totalCount)
);
}))->all();
}

The issue lies at:

is_null($this->limit) ? $totalCount : min($this->limit, $totalCount)

The builder's limit will be null by default, so the ternary check fails and defaults to the totalCount, which would, in this case, be the total matching count, bar the filter/query. This in turn exceeds Typesense's maximum amount of 250 results per page and will execute a search to Typesense with per_page that will result in an error.

You can check this manually yourself. If you first use the take function (which sets the limit of the builder to the input), you won't get an error:

$results = Names::search($this->keyword)
    ->options([
        'filter_by' => $this->buildFilterString(),
        'sort_by' => implode(',', $this->sortBy),
    ])
    ->query(function ($query) {})
+   ->take(200)
    ->paginate(48);

This is the reverse of what #858 was doing. The problem here is that the keys function maps the ids using a search query using the absolute maximum limit (that being totalAmount). In turn, Typesense will return an error. The main thing to be discussed is the main purpose of this check:

scout/src/Builder.php

Lines 501 to 507 in 8790681

if (count($ids) < $totalCount) {
$ids = $engine->keys(tap(clone $this, function ($builder) use ($totalCount) {
$builder->take(
is_null($this->limit) ? $totalCount : min($this->limit, $totalCount)
);
}))->all();
}

We could introduce a function that takes care of this limitation by continuously querying by different page sizes until it reaches the end:

public function search(Builder $builder)
{
    // If the limit is within Typesense's capabilities, perform a single search
    if (is_null($builder->limit) || $builder->limit <= $this->maxResultsPerPage) {
        return $this->performSearch(
            $builder,
            $this->buildSearchParameters($builder, 1, $builder->limit ?? $this->maxResultsPerPage)
        );
    }

    // For larger result sets, we need to paginate
    return $this->performPaginatedSearch($builder);
}

protected function performPaginatedSearch(Builder $builder)
{
   // Sends all the paginated queries to the server and concats the results
}

However, this will introduce horrible performance for every pagination not limited by the user. What's the functionality behind

scout/src/Builder.php

Lines 501 to 507 in 8790681

if (count($ids) < $totalCount) {
$ids = $engine->keys(tap(clone $this, function ($builder) use ($totalCount) {
$builder->take(
is_null($this->limit) ? $totalCount : min($this->limit, $totalCount)
);
}))->all();
}

and should we preserve it? If yes, I'll have to look into we can optimize the paginatedSearch function.

@tharropoulos
Copy link
Contributor

Meilisearch, for example, limits maximum results from an index at 1000 by default and that value must be manually changed to get more maximum results. If changed to fit the index's size, then it too would introduce performance issues, but that's an opt in. If a function were to be introduced to limit the maximum amount of results from a Typesense collection, then we should be able to mitigate the performance deficit we get by calling the Typesense server num_docs % 250. But that's all if the check on the builder class should be preserved.

@jasonbosco
Copy link
Contributor

Algolia also limits max results to 1K for any searches, even with pagination.

If a function were to be introduced to limit the maximum amount of results from a Typesense collection

That sounds like a reasonable trade-off to me, especially to maintain parity with the other engines supported in Scout. With Typesense we can then let users configure this limit in Scout dynamically.

@tharropoulos
Copy link
Contributor

Just posted a PR for this on #867, check it out when you get the chance

@crynobone
Copy link
Member

PR has been merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants