Skip to content

Pagination: put count(*) in subquery to improve performance (x10) #3876

@nkosi23

Description

@nkosi23

I have a table with about 600,000 rows taking quite a noticeable amount of time to be paginated and return 10 rows, despite having an index on the column being used by the sort by. By default, the pagination utility built-into marten produces the following query using Skip() and Take() with Stats() produces the following query:

My index is:

(data ->> 'CityName'::text)
select d.id, d.data, count(*) OVER() as total_rows 
from abccorp_ops_operations.mt_doc_world_cities as d 
where TRUE 
order by d.data ->> 'CityName' 
OFFSET 0 LIMIT 10;

With the following query plan:

"QUERY PLAN"
"Limit  (cost=0.42..2.87 rows=10 width=321) (actual time=554.433..554.439 rows=10 loops=1)"
"  ->  WindowAgg  (cost=0.42..150618.48 rows=614940 width=321) (actual time=554.432..554.437 rows=10 loops=1)"
"        ->  Index Scan using worldcities_cityname on mt_doc_world_cities d  (cost=0.42..141394.38 rows=614940 width=281) (actual time=0.020..267.800 rows=614940 loops=1)"
"Planning Time: 0.387 ms"
"Execution Time: 574.315 ms"

The idea would be to use the following instead:

SELECT d.id, d.data, 
       (SELECT COUNT(*) FROM abccorp_ops_operations.mt_doc_world_cities) AS total_rows
FROM abccorp_ops_operations.mt_doc_world_cities AS d
WHERE true
ORDER BY d.data ->> 'CityName'
LIMIT 10 OFFSET 0;

Which has a much more efficient query plan:


"Limit  (cost=9262.39..9264.72 rows=10 width=321) (actual time=46.136..47.692 rows=10 loops=1)"
"  InitPlan 1 (returns $1)"
"    ->  Finalize Aggregate  (cost=9261.96..9261.97 rows=1 width=8) (actual time=46.119..47.664 rows=1 loops=1)"
"          ->  Gather  (cost=9261.74..9261.95 rows=2 width=8) (actual time=46.114..47.660 rows=3 loops=1)"
"                Workers Planned: 2"
"                Workers Launched: 2"
"                ->  Partial Aggregate  (cost=8261.74..8261.75 rows=1 width=8) (actual time=16.132..16.133 rows=1 loops=3)"
"                      ->  Parallel Index Only Scan using worldcities_continent on mt_doc_world_cities  (cost=0.42..7621.20 rows=256217 width=0) (actual time=0.018..10.039 rows=204980 loops=3)"
"                            Heap Fetches: 0"
"  ->  Index Scan using worldcities_cityname on mt_doc_world_cities d  (cost=0.42..142988.62 rows=614921 width=321) (actual time=46.135..46.144 rows=10 loops=1)"
"Planning Time: 0.108 ms"
"Execution Time: 47.717 ms"

By the way, there is a where true, but this is because I have integrated Marten's pagination with a client-side datatable library, so I generate predicates dynamically etc. and for now a shortcut was to put that when there is no filter. It has no impact on performance from what i can see.

Normally, the table is supposed to contain 4 million rows, which was taking a ridiculous amount of time to load (10s of seconds), while the row count number is not that high.

I am not sure what would be the implications of making this change, and suspect that there may be use cases i have not considered.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions