Skip to content

Commit 0cbbd2c

Browse files
committed
ensure no page switching + add core concepts
1 parent 8c1a449 commit 0cbbd2c

File tree

11 files changed

+166
-32
lines changed

11 files changed

+166
-32
lines changed

docs/en/guides/best-practices/sparse-primary-indexes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
slug: /en/optimize/sparse-primary-indexes
3-
sidebar_label: Sparse Primary Indexes
3+
sidebar_label: Primary Indexes
44
sidebar_position: 1
55
description: In this guide we are going to do a deep dive into ClickHouse indexing.
66
---

docs/en/integrations/data-ingestion/dbms/mysql/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
sidebar_label: MySQL
33
sidebar_position: 10
4-
slug: /en/integrations/mysql
4+
slug: /en/integrations/connecting-to-mysql
55
description: The MySQL table engine allows you to connect ClickHouse to MySQL.
66
keywords: [clickhouse, mysql, connect, integrate, table, engine]
77
---
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
slug: /en/integrations/mysql
3+
sidebar_label: MySQL
4+
title: MySQL
5+
hide_title: true
6+
---
7+
8+
import MySQL from '@site/docs/en/integrations/data-ingestion/dbms/mysql/index.md';
9+
10+
<MySQL/>

docs/en/managing-data/core-concepts/parts.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,10 @@ description: What are data parts in ClickHouse
55
keywords: [part]
66
---
77

8-
9-
108
## What are table parts in ClickHouse?
119

10+
<br/>
11+
1212
The data from each table in the ClickHouse [MergeTree engine family](/docs/en/engines/table-engines/mergetree-family) is organized on disk as a collection of immutable `data parts`.
1313

1414
To illustrate this, we use this table (adapted from the [UK property prices dataset](/docs/en/getting-started/example-datasets/uk-price-paid)) tracking the date, town, street, and price for sold properties in the United Kingdom:
@@ -30,6 +30,7 @@ ORDER BY (town, street);
3030
A data part is created whenever a set of rows is inserted into the table. The following diagram sketches this:
3131

3232
<img src={require('./images/part.png').default} alt='INSERT PROCESSING' class='image' style={{width: '100%'}} />
33+
<br/>
3334

3435
When a ClickHouse server processes the example insert with 4 rows (e.g., via an [INSERT INTO statement](/docs/en/sql-reference/statements/insert-into)) sketched in the diagram above, it performs several steps:
3536

@@ -48,6 +49,6 @@ Data parts are self-contained, including all metadata needed to interpret their
4849
To manage the number of parts per table, a background merge job periodically combines smaller parts into larger ones until they reach a [configurable](/docs/en/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool) compressed size (typically ~150 GB). Merged parts are marked as inactive and deleted after a [configurable](/docs/en/operations/settings/merge-tree-settings#old-parts-lifetime) time interval. Over time, this process creates a hierarchical structure of merged parts, which is why it’s called a MergeTree table:
4950

5051
<img src={require('./images/merges.png').default} alt='PART MERGES' class='image' style={{width: '100%'}} />
51-
52+
<br/>
5253

5354
To minimize the number of initial parts and the overhead of merges, database clients are [encouraged](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse#data-needs-to-be-batched-for-optimal-performance) to either insert tuples in bulk, e.g. 20,000 rows at once, or to use the [asynchronous insert mode](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse), in which ClickHouse buffers rows from multiple incoming INSERTs into the same table and creates a new part only after the buffer size exceeds a configurable threshold, or a timeout expires.
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
---
2+
slug: /en/managing-data/delete_mutations
3+
sidebar_label: Delete Mutations
4+
title: Delete Mutations
5+
hide_title: false
6+
---
7+
8+
Delete mutations refers to `ALTER` queries that manipulate table data through delete. Most notably they are queries like `ALTER TABLE DELETE`, etc. Performing such queries will produce new mutated versions of the data parts. This means that such statements would trigger a rewrite of whole data parts for all data that was inserted before the mutation, translating to a large amount of write requests.
9+
10+
:::info
11+
For deletes, you can avoid these large amounts of write requests by using specialised table engines like [ReplacingMergeTree](/docs/en/guides/replacing-merge-tree) or [CollapsingMergeTree](/docs/en/engines/table-engines/mergetree-family/collapsingmergetree) instead of the default MergeTree table engine.
12+
:::
13+
14+
import DeleteMutations from '@site/docs/en/sql-reference/statements/alter/delete.md';
15+
16+
<DeleteMutations/>
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
---
2+
slug: /en/managing-data/drop_partition
3+
sidebar_label: Drop Partition
4+
title: Dropping Partitions
5+
hide_title: false
6+
---
7+
8+
## Background
9+
10+
Partitioning is specified on a table when it is initially defined via the `PARTITION BY` clause. This clause can contain a SQL expression on any columns, the results of which will define which partition a row is sent to.
11+
12+
The data parts are logically associated with each partition on disk and can be queried in isolation. For the example below, we partition the `posts` table by year using the expression `toYear(CreationDate)`. As rows are inserted into ClickHouse, this expression will be evaluated against each row and routed to the resulting partition if it exists (if the row is the first for a year, the partition will be created).
13+
14+
```sql
15+
CREATE TABLE posts
16+
(
17+
`Id` Int32 CODEC(Delta(4), ZSTD(1)),
18+
`PostTypeId` Enum8('Question' = 1, 'Answer' = 2, 'Wiki' = 3, 'TagWikiExcerpt' = 4, 'TagWiki' = 5, 'ModeratorNomination' = 6, 'WikiPlaceholder' = 7, 'PrivilegeWiki' = 8),
19+
`AcceptedAnswerId` UInt32,
20+
`CreationDate` DateTime64(3, 'UTC'),
21+
...
22+
`ClosedDate` DateTime64(3, 'UTC')
23+
)
24+
ENGINE = MergeTree
25+
ORDER BY (PostTypeId, toDate(CreationDate), CreationDate)
26+
PARTITION BY toYear(CreationDate)
27+
```
28+
29+
Read about setting the partition expression in a section [How to set the partition expression](/docs/en/sql-reference/statements/alter/partition/#how-to-set-partition-expression).
30+
31+
In ClickHouse, users should principally consider partitioning to be a data management feature, not a query optimization technique. By separating data logically based on a key, each partition can be operated on independently e.g. deleted. This allows users to move partitions, and thus subnets, between [storage tiers](/en/integrations/s3#storage-tiers) efficiently on time or [expire data/efficiently delete from the cluster](/en/sql-reference/statements/alter/partition).
32+
33+
## Drop Partitions
34+
35+
`ALTER TABLE ... DROP PARTITION` provides a cost-efficient way to drop a whole partition.
36+
37+
``` sql
38+
ALTER TABLE table_name [ON CLUSTER cluster] DROP PARTITION|PART partition_expr
39+
```
40+
41+
This query tags the partition as inactive and deletes data completely, approximately in 10 minutes. The query is replicated – it deletes data on all replicas.
42+
43+
In example, below we remove posts from 2008 for the earlier table by dropping the associated partition.
44+
45+
```sql
46+
SELECT DISTINCT partition
47+
FROM system.parts
48+
WHERE `table` = 'posts'
49+
50+
┌─partition─┐
51+
2008
52+
2009
53+
2010
54+
2011
55+
2012
56+
2013
57+
2014
58+
2015
59+
2016
60+
2017
61+
2018
62+
2019
63+
2020
64+
2021
65+
2022
66+
2023
67+
2024
68+
└───────────┘
69+
70+
17 rows in set. Elapsed: 0.002 sec.
71+
72+
ALTER TABLE posts
73+
(DROP PARTITION '2008')
74+
75+
0 rows in set. Elapsed: 0.103 sec.
76+
```

docs/en/managing-data/truncate.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
---
2+
slug: /en/managing-data/truncate
3+
sidebar_label: Truncate Table
4+
title: Truncate Table
5+
hide_title: false
6+
---
7+
8+
Truncate allows the data in a table or database to be removed, while preserving their existence. This is a lightweight operation which cannot be reversed.
9+
10+
import Truncate from '@site/docs/en/sql-reference/statements/truncate.md';
11+
12+
<Truncate/>
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
---
2+
slug: /en/managing-data/update_mutations
3+
sidebar_label: Update Mutations
4+
title: Update Mutations
5+
hide_title: false
6+
---
7+
8+
Update mutations refers to `ALTER` queries that manipulate table data through updates. Most notably they are queries like `ALTER TABLE UPDATE`, etc. Performing such queries will produce new mutated versions of the data parts. This means that such statements would trigger a rewrite of whole data parts for all data that was inserted before the mutation, translating to a large amount of write requests.
9+
10+
:::info
11+
For updates, you can avoid these large amounts of write requests by using specialised table engines like [ReplacingMergeTree](/docs/en/guides/replacing-merge-tree) or [CollapsingMergeTree](/docs/en/engines/table-engines/mergetree-family/collapsingmergetree) instead of the default MergeTree table engine.
12+
:::
13+
14+
import UpdateMutations from '@site/docs/en/sql-reference/statements/alter/update.md';
15+
16+
<UpdateMutations/>

docs/en/optimize/index.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
slug: /en/optimize
3+
sidebar_label: Overview
4+
title: Performance and Optimizations
5+
hide_title: false
6+
---
7+
8+
This section contains tips and best practices for improving performance with ClickHouse. We recommend users read [Core Concepts](/docs/en/parts) as a precursor to this section, which covers the main concepts required to improve performance, especially [Primary Indices](/docs/en/optimize/sparse-primary-indexes).

sidebars.js

Lines changed: 16 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -696,11 +696,7 @@ const sidebars = {
696696
"en/integrations/data-ingestion/apache-spark/spark-jdbc",
697697
],
698698
},
699-
{
700-
type: "doc",
701-
id: "en/integrations/data-ingestion/dbms/mysql/index",
702-
label: "MySQL",
703-
},
699+
"en/integrations/data-sources/mysql",
704700
"en/integrations/data-sources/cassandra",
705701
"en/integrations/data-sources/redis",
706702
"en/integrations/data-sources/rabbitmq",
@@ -876,18 +872,24 @@ const sidebars = {
876872
],
877873

878874
managingData: [
875+
{
876+
type: "category",
877+
label: "Core concepts",
878+
collapsed: false,
879+
collapsible: false,
880+
items: [
881+
"en/managing-data/core-concepts/parts",
882+
"en/guides/best-practices/sparse-primary-indexes",
883+
]
884+
},
879885
{
880886
type: "category",
881887
label: "Updating Data",
882888
collapsed: false,
883889
collapsible: false,
884890
items: [
885891
"en/managing-data/updates",
886-
{
887-
type: "link",
888-
label: "Update Mutations",
889-
href: "/en/sql-reference/statements/alter/update"
890-
},
892+
"en/managing-data/update_mutations",
891893
{
892894
type: "doc",
893895
label: "Lightweight Updates",
@@ -916,21 +918,9 @@ const sidebars = {
916918
label: "Lightweight Deletes",
917919
id: "en/guides/developer/lightweight-delete"
918920
},
919-
{
920-
type: "link",
921-
label: "Delete Mutations",
922-
href: "/en/sql-reference/statements/alter/delete"
923-
},
924-
{
925-
type: "link",
926-
label: "Truncate Table",
927-
href: "/en/sql-reference/statements/truncate"
928-
},
929-
{
930-
type: "link",
931-
label: "Drop Partition",
932-
href: "/en/sql-reference/statements/alter/partition#drop-partitionpart"
933-
}
921+
"en/managing-data/delete_mutations",
922+
"en/managing-data/truncate",
923+
"en/managing-data/drop_partition",
934924
]
935925
},
936926
{
@@ -1001,7 +991,7 @@ const sidebars = {
1001991
collapsed: false,
1002992
collapsible: false,
1003993
items: [
1004-
"en/guides/best-practices/sparse-primary-indexes",
994+
"en/optimize/index",
1005995
"en/operations/analyzer",
1006996
"en/guides/best-practices/asyncinserts",
1007997
"en/guides/best-practices/avoidmutations",

0 commit comments

Comments
 (0)