-
-
Notifications
You must be signed in to change notification settings - Fork 506
Description
I have a table with about 600,000 rows taking quite a noticeable amount of time to be paginated and return 10 rows, despite having an index on the column being used by the sort by. By default, the pagination utility built-into marten produces the following query using Skip()
and Take()
with Stats()
produces the following query:
My index is:
(data ->> 'CityName'::text)
select d.id, d.data, count(*) OVER() as total_rows
from abccorp_ops_operations.mt_doc_world_cities as d
where TRUE
order by d.data ->> 'CityName'
OFFSET 0 LIMIT 10;
With the following query plan:
"QUERY PLAN"
"Limit (cost=0.42..2.87 rows=10 width=321) (actual time=554.433..554.439 rows=10 loops=1)"
" -> WindowAgg (cost=0.42..150618.48 rows=614940 width=321) (actual time=554.432..554.437 rows=10 loops=1)"
" -> Index Scan using worldcities_cityname on mt_doc_world_cities d (cost=0.42..141394.38 rows=614940 width=281) (actual time=0.020..267.800 rows=614940 loops=1)"
"Planning Time: 0.387 ms"
"Execution Time: 574.315 ms"
The idea would be to use the following instead:
SELECT d.id, d.data,
(SELECT COUNT(*) FROM abccorp_ops_operations.mt_doc_world_cities) AS total_rows
FROM abccorp_ops_operations.mt_doc_world_cities AS d
WHERE true
ORDER BY d.data ->> 'CityName'
LIMIT 10 OFFSET 0;
Which has a much more efficient query plan:
"Limit (cost=9262.39..9264.72 rows=10 width=321) (actual time=46.136..47.692 rows=10 loops=1)"
" InitPlan 1 (returns $1)"
" -> Finalize Aggregate (cost=9261.96..9261.97 rows=1 width=8) (actual time=46.119..47.664 rows=1 loops=1)"
" -> Gather (cost=9261.74..9261.95 rows=2 width=8) (actual time=46.114..47.660 rows=3 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Partial Aggregate (cost=8261.74..8261.75 rows=1 width=8) (actual time=16.132..16.133 rows=1 loops=3)"
" -> Parallel Index Only Scan using worldcities_continent on mt_doc_world_cities (cost=0.42..7621.20 rows=256217 width=0) (actual time=0.018..10.039 rows=204980 loops=3)"
" Heap Fetches: 0"
" -> Index Scan using worldcities_cityname on mt_doc_world_cities d (cost=0.42..142988.62 rows=614921 width=321) (actual time=46.135..46.144 rows=10 loops=1)"
"Planning Time: 0.108 ms"
"Execution Time: 47.717 ms"
By the way, there is a where true, but this is because I have integrated Marten's pagination with a client-side datatable library, so I generate predicates dynamically etc. and for now a shortcut was to put that when there is no filter. It has no impact on performance from what i can see.
Normally, the table is supposed to contain 4 million rows, which was taking a ridiculous amount of time to load (10s of seconds), while the row count number is not that high.
I am not sure what would be the implications of making this change, and suspect that there may be use cases i have not considered.