Pagination: put count(*) in subquery to improve performance (x10)

I have a table with about 600,000 rows taking quite a noticeable amount of time to be paginated and return 10 rows, despite having an index on the column being used by the sort by. ~By default, the pagination utility built-into marten produces the following query~ using `Skip()` and `Take()` with `Stats()` produces the following query:

My index is:

```sql
(data ->> 'CityName'::text)
```

```sql
select d.id, d.data, count(*) OVER() as total_rows 
from abccorp_ops_operations.mt_doc_world_cities as d 
where TRUE 
order by d.data ->> 'CityName' 
OFFSET 0 LIMIT 10;
```

With the following query plan:

```
"QUERY PLAN"
"Limit  (cost=0.42..2.87 rows=10 width=321) (actual time=554.433..554.439 rows=10 loops=1)"
"  ->  WindowAgg  (cost=0.42..150618.48 rows=614940 width=321) (actual time=554.432..554.437 rows=10 loops=1)"
"        ->  Index Scan using worldcities_cityname on mt_doc_world_cities d  (cost=0.42..141394.38 rows=614940 width=281) (actual time=0.020..267.800 rows=614940 loops=1)"
"Planning Time: 0.387 ms"
"Execution Time: 574.315 ms"
```

The idea would be to use the following instead:

```sql
SELECT d.id, d.data, 
       (SELECT COUNT(*) FROM abccorp_ops_operations.mt_doc_world_cities) AS total_rows
FROM abccorp_ops_operations.mt_doc_world_cities AS d
WHERE true
ORDER BY d.data ->> 'CityName'
LIMIT 10 OFFSET 0;
```

Which has a much more efficient query plan:

```

"Limit  (cost=9262.39..9264.72 rows=10 width=321) (actual time=46.136..47.692 rows=10 loops=1)"
"  InitPlan 1 (returns $1)"
"    ->  Finalize Aggregate  (cost=9261.96..9261.97 rows=1 width=8) (actual time=46.119..47.664 rows=1 loops=1)"
"          ->  Gather  (cost=9261.74..9261.95 rows=2 width=8) (actual time=46.114..47.660 rows=3 loops=1)"
"                Workers Planned: 2"
"                Workers Launched: 2"
"                ->  Partial Aggregate  (cost=8261.74..8261.75 rows=1 width=8) (actual time=16.132..16.133 rows=1 loops=3)"
"                      ->  Parallel Index Only Scan using worldcities_continent on mt_doc_world_cities  (cost=0.42..7621.20 rows=256217 width=0) (actual time=0.018..10.039 rows=204980 loops=3)"
"                            Heap Fetches: 0"
"  ->  Index Scan using worldcities_cityname on mt_doc_world_cities d  (cost=0.42..142988.62 rows=614921 width=321) (actual time=46.135..46.144 rows=10 loops=1)"
"Planning Time: 0.108 ms"
"Execution Time: 47.717 ms"
```

By the way, there is a where true, but this is because I have integrated Marten's pagination with a client-side datatable library, so I generate predicates dynamically etc. and for now a shortcut was to put that when there is no filter. It has no impact on performance from what i can see.

Normally, the table is supposed to contain 4 million rows, which was taking a ridiculous amount of time to load (10s of seconds), while the row count number is not that high.

I am not sure what would be the implications of making this change, and suspect that there may be use cases i have not considered.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Pagination: put count(*) in subquery to improve performance (x10) #3876

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Pagination: put count(*) in subquery to improve performance (x10) #3876

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions