You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/en/managing-data/core-concepts/parts.md
+4-3Lines changed: 4 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,10 +5,10 @@ description: What are data parts in ClickHouse
5
5
keywords: [part]
6
6
---
7
7
8
-
9
-
10
8
## What are table parts in ClickHouse?
11
9
10
+
<br/>
11
+
12
12
The data from each table in the ClickHouse [MergeTree engine family](/docs/en/engines/table-engines/mergetree-family) is organized on disk as a collection of immutable `data parts`.
13
13
14
14
To illustrate this, we use this table (adapted from the [UK property prices dataset](/docs/en/getting-started/example-datasets/uk-price-paid)) tracking the date, town, street, and price for sold properties in the United Kingdom:
@@ -30,6 +30,7 @@ ORDER BY (town, street);
30
30
A data part is created whenever a set of rows is inserted into the table. The following diagram sketches this:
When a ClickHouse server processes the example insert with 4 rows (e.g., via an [INSERT INTO statement](/docs/en/sql-reference/statements/insert-into)) sketched in the diagram above, it performs several steps:
35
36
@@ -48,6 +49,6 @@ Data parts are self-contained, including all metadata needed to interpret their
48
49
To manage the number of parts per table, a background merge job periodically combines smaller parts into larger ones until they reach a [configurable](/docs/en/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool) compressed size (typically ~150 GB). Merged parts are marked as inactive and deleted after a [configurable](/docs/en/operations/settings/merge-tree-settings#old-parts-lifetime) time interval. Over time, this process creates a hierarchical structure of merged parts, which is why it’s called a MergeTree table:
To minimize the number of initial parts and the overhead of merges, database clients are [encouraged](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse#data-needs-to-be-batched-for-optimal-performance) to either insert tuples in bulk, e.g. 20,000 rows at once, or to use the [asynchronous insert mode](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse), in which ClickHouse buffers rows from multiple incoming INSERTs into the same table and creates a new part only after the buffer size exceeds a configurable threshold, or a timeout expires.
Delete mutations refers to `ALTER` queries that manipulate table data through delete. Most notably they are queries like `ALTER TABLE DELETE`, etc. Performing such queries will produce new mutated versions of the data parts. This means that such statements would trigger a rewrite of whole data parts for all data that was inserted before the mutation, translating to a large amount of write requests.
9
+
10
+
:::info
11
+
For deletes, you can avoid these large amounts of write requests by using specialised table engines like [ReplacingMergeTree](/docs/en/guides/replacing-merge-tree) or [CollapsingMergeTree](/docs/en/engines/table-engines/mergetree-family/collapsingmergetree) instead of the default MergeTree table engine.
12
+
:::
13
+
14
+
import DeleteMutations from '@site/docs/en/sql-reference/statements/alter/delete.md';
Partitioning is specified on a table when it is initially defined via the `PARTITION BY` clause. This clause can contain a SQL expression on any columns, the results of which will define which partition a row is sent to.
11
+
12
+
The data parts are logically associated with each partition on disk and can be queried in isolation. For the example below, we partition the `posts` table by year using the expression `toYear(CreationDate)`. As rows are inserted into ClickHouse, this expression will be evaluated against each row and routed to the resulting partition if it exists (if the row is the first for a year, the partition will be created).
ORDER BY (PostTypeId, toDate(CreationDate), CreationDate)
26
+
PARTITION BY toYear(CreationDate)
27
+
```
28
+
29
+
Read about setting the partition expression in a section [How to set the partition expression](/docs/en/sql-reference/statements/alter/partition/#how-to-set-partition-expression).
30
+
31
+
In ClickHouse, users should principally consider partitioning to be a data management feature, not a query optimization technique. By separating data logically based on a key, each partition can be operated on independently e.g. deleted. This allows users to move partitions, and thus subnets, between [storage tiers](/en/integrations/s3#storage-tiers) efficiently on time or [expire data/efficiently delete from the cluster](/en/sql-reference/statements/alter/partition).
32
+
33
+
## Drop Partitions
34
+
35
+
`ALTER TABLE ... DROP PARTITION` provides a cost-efficient way to drop a whole partition.
36
+
37
+
```sql
38
+
ALTERTABLE table_name [ON CLUSTER cluster] DROP PARTITION|PART partition_expr
39
+
```
40
+
41
+
This query tags the partition as inactive and deletes data completely, approximately in 10 minutes. The query is replicated – it deletes data on all replicas.
42
+
43
+
In example, below we remove posts from 2008 for the earlier table by dropping the associated partition.
Truncate allows the data in a table or database to be removed, while preserving their existence. This is a lightweight operation which cannot be reversed.
9
+
10
+
import Truncate from '@site/docs/en/sql-reference/statements/truncate.md';
Update mutations refers to `ALTER` queries that manipulate table data through updates. Most notably they are queries like `ALTER TABLE UPDATE`, etc. Performing such queries will produce new mutated versions of the data parts. This means that such statements would trigger a rewrite of whole data parts for all data that was inserted before the mutation, translating to a large amount of write requests.
9
+
10
+
:::info
11
+
For updates, you can avoid these large amounts of write requests by using specialised table engines like [ReplacingMergeTree](/docs/en/guides/replacing-merge-tree) or [CollapsingMergeTree](/docs/en/engines/table-engines/mergetree-family/collapsingmergetree) instead of the default MergeTree table engine.
12
+
:::
13
+
14
+
import UpdateMutations from '@site/docs/en/sql-reference/statements/alter/update.md';
This section contains tips and best practices for improving performance with ClickHouse. We recommend users read [Core Concepts](/docs/en/parts) as a precursor to this section, which covers the main concepts required to improve performance, especially [Primary Indices](/docs/en/optimize/sparse-primary-indexes).
0 commit comments