diff --git a/use-timescale/hypercore/secondary-indexes.md b/use-timescale/hypercore/secondary-indexes.md index f6875263bc..d30c4cdee2 100644 --- a/use-timescale/hypercore/secondary-indexes.md +++ b/use-timescale/hypercore/secondary-indexes.md @@ -60,44 +60,51 @@ at the right time. When you segment your data to access specific columns, your q For example, to access information about a single device with a specific `device_id`, you segment on the `device_id` column. This enables you to run analytical queries on compressed data in the $COLUMNSTORE much faster. -For example for the following $HYPERTABLE: +To illustrate, let's create a $HYPERTABLE and then run the same query on it with and without optimizations: -```sql -CREATE TABLE metrics ( - time TIMESTAMPTZ, - user_id INT, - device_id INT, - data JSONB -) WITH ( - tsdb.hypertable -); -``` + - +1. **Create a $HYPERTABLE** - + Create a `metrics` $HYPERTABLE with the following command: + + ```sql + CREATE TABLE metrics ( + time TIMESTAMPTZ, + user_id INT, + device_id INT, + data JSONB + ) WITH ( + tsdb.hypertable + ); + ``` + + + +1. **Execute a query on the $HYPERTABLE without optimizations** -1. **Execute a query on a regular $HYPERTABLE** 1. Query your data ```sql - SELECT device_id, AVG(cpu) AS avg_cpu, AVG(disk_io) AS avg_disk_io + SELECT device_id, AVG(cpu) AS avg_cpu, AVG(disk_io) AS avg_disk_io FROM metrics - WHERE device_id = 5 + WHERE time >= '2024-03-01 00:00:00+01' + AND time < '2024-03-02 00:00:00+01' + AND device_id = 5 GROUP BY device_id; ``` Gives the following result: ```sql - device_id | avg_cpu | avg_disk_io + device_id | avg_cpu | avg_disk_io -----------+--------------------+--------------------- - 5 | 0.4972598866221261 | 0.49820356730280524 + 5 | 0.4954351575883885 | 0.49725603413909114 (1 row) - Time: 177,399 ms + Time: 29.216 ms ``` 1. **Execute a query on the same data segmented and ordered in the $COLUMNSTORE** - 1. Control the way your data is ordered in the $COLUMNSTORE: - + 1. Control the way your data is ordered and segmented in the $COLUMNSTORE: + ```sql ALTER TABLE metrics SET ( timescaledb.enable_columnstore = true, @@ -108,24 +115,30 @@ CREATE TABLE metrics ( 1. Query your data ```sql - select avg(cpu) from metrics where time >= '2024-03-01 00:00:00+01' and time < '2024-03-02 00:00:00+01'; - ``` + SELECT device_id, AVG(cpu) AS avg_cpu, AVG(disk_io) AS avg_disk_io + FROM metrics + WHERE time >= '2024-03-01 00:00:00+01' + AND time < '2024-03-02 00:00:00+01' + AND device_id = 5 + GROUP BY device_id; + ``` Gives the following result: + ```sql - device_id | avg_cpu | avg_disk_io - -----------+-------------------+--------------------- - 5 | 0.497259886622126 | 0.49820356730280535 + device_id | avg_cpu | avg_disk_io + -----------+--------------------+--------------------- + 5 | 0.4954351575883885 | 0.49725603413909114 (1 row) - Time: 42,139 ms + Time: 1.828 ms ``` - As you see, using `orderby` and `segmentby` not only reduces the amount of space taken by your data, but also + As you see, using `orderby` and `segmentby` not only reduces the amount of space taken by your data, but also vastly improves query speed. The number of rows that are compressed together in a single batch (like the ones we see above) is 1000. -If your chunk does not contain enough data to create big enough batches, your compression ratio will be reduced. +If your $CHUNK does not contain enough data to create big enough batches, your compression ratio will be reduced. This needs to be taken into account when you define your $COLUMNSTORE settings. [hypercore]: /use-timescale/:currentVersion:/hypercore/