Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 42 additions & 29 deletions use-timescale/hypercore/secondary-indexes.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,44 +60,51 @@ at the right time. When you segment your data to access specific columns, your q
For example, to access information about a single device with a specific `device_id`, you segment on the `device_id` column.
This enables you to run analytical queries on compressed data in the $COLUMNSTORE much faster.

For example for the following $HYPERTABLE:
To illustrate, let's create a $HYPERTABLE and then run the same query on it with and without optimizations:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To illustrate, let's create a $HYPERTABLE and then run the same query on it with and without optimizations:
To illustrate, run the same query on a $HYPERTABLE, first without, then with optimizations:


```sql
CREATE TABLE metrics (
time TIMESTAMPTZ,
user_id INT,
device_id INT,
data JSONB
) WITH (
tsdb.hypertable
);
```
<Procedure>

<CreateHypertablePolicyNote />
1. **Create a $HYPERTABLE**

<Procedure>
Create a `metrics` $HYPERTABLE with the following command:

```sql
CREATE TABLE metrics (
time TIMESTAMPTZ,
user_id INT,
device_id INT,
data JSONB
) WITH (
tsdb.hypertable
);
```

<CreateHypertablePolicyNote />

1. **Execute a query on the $HYPERTABLE without optimizations**

1. **Execute a query on a regular $HYPERTABLE**
1. Query your data
```sql
SELECT device_id, AVG(cpu) AS avg_cpu, AVG(disk_io) AS avg_disk_io
SELECT device_id, AVG(cpu) AS avg_cpu, AVG(disk_io) AS avg_disk_io
FROM metrics
WHERE device_id = 5
WHERE time >= '2024-03-01 00:00:00+01'
AND time < '2024-03-02 00:00:00+01'
AND device_id = 5
GROUP BY device_id;
```
Gives the following result:
```sql
device_id | avg_cpu | avg_disk_io
device_id | avg_cpu | avg_disk_io
-----------+--------------------+---------------------
5 | 0.4972598866221261 | 0.49820356730280524
5 | 0.4954351575883885 | 0.49725603413909114
(1 row)
Time: 177,399 ms
Time: 29.216 ms
```

1. **Execute a query on the same data segmented and ordered in the $COLUMNSTORE**

1. Control the way your data is ordered in the $COLUMNSTORE:
1. Control the way your data is ordered and segmented in the $COLUMNSTORE:

```sql
ALTER TABLE metrics SET (
timescaledb.enable_columnstore = true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Columnstore is already enabled, you don't need to enable it.

Expand All @@ -108,24 +115,30 @@ CREATE TABLE metrics (

1. Query your data
```sql
select avg(cpu) from metrics where time >= '2024-03-01 00:00:00+01' and time < '2024-03-02 00:00:00+01';
```
SELECT device_id, AVG(cpu) AS avg_cpu, AVG(disk_io) AS avg_disk_io
FROM metrics
WHERE time >= '2024-03-01 00:00:00+01'
AND time < '2024-03-02 00:00:00+01'
AND device_id = 5
GROUP BY device_id;
```
Gives the following result:

```sql
device_id | avg_cpu | avg_disk_io
-----------+-------------------+---------------------
5 | 0.497259886622126 | 0.49820356730280535
device_id | avg_cpu | avg_disk_io
-----------+--------------------+---------------------
5 | 0.4954351575883885 | 0.49725603413909114
(1 row)
Time: 42,139 ms
Time: 1.828 ms
```

As you see, using `orderby` and `segmentby` not only reduces the amount of space taken by your data, but also
As you see, using `orderby` and `segmentby` not only reduces the amount of space taken by your data, but also
vastly improves query speed.

</Procedure>

The number of rows that are compressed together in a single batch (like the ones we see above) is 1000.
If your chunk does not contain enough data to create big enough batches, your compression ratio will be reduced.
If your $CHUNK does not contain enough data to create big enough batches, your compression ratio will be reduced.
This needs to be taken into account when you define your $COLUMNSTORE settings.

[hypercore]: /use-timescale/:currentVersion:/hypercore/
Expand Down