Skip to content

Commit

Permalink
ensure no page switching + add core concepts
Browse files Browse the repository at this point in the history
  • Loading branch information
gingerwizard committed Dec 20, 2024
1 parent 8c1a449 commit 0cbbd2c
Show file tree
Hide file tree
Showing 11 changed files with 166 additions and 32 deletions.
2 changes: 1 addition & 1 deletion docs/en/guides/best-practices/sparse-primary-indexes.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
slug: /en/optimize/sparse-primary-indexes
sidebar_label: Sparse Primary Indexes
sidebar_label: Primary Indexes
sidebar_position: 1
description: In this guide we are going to do a deep dive into ClickHouse indexing.
---
Expand Down
2 changes: 1 addition & 1 deletion docs/en/integrations/data-ingestion/dbms/mysql/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
sidebar_label: MySQL
sidebar_position: 10
slug: /en/integrations/mysql
slug: /en/integrations/connecting-to-mysql
description: The MySQL table engine allows you to connect ClickHouse to MySQL.
keywords: [clickhouse, mysql, connect, integrate, table, engine]
---
Expand Down
10 changes: 10 additions & 0 deletions docs/en/integrations/data-sources/mysql.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
slug: /en/integrations/mysql
sidebar_label: MySQL
title: MySQL
hide_title: true
---

import MySQL from '@site/docs/en/integrations/data-ingestion/dbms/mysql/index.md';

<MySQL/>
7 changes: 4 additions & 3 deletions docs/en/managing-data/core-concepts/parts.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ description: What are data parts in ClickHouse
keywords: [part]
---



## What are table parts in ClickHouse?

<br/>

The data from each table in the ClickHouse [MergeTree engine family](/docs/en/engines/table-engines/mergetree-family) is organized on disk as a collection of immutable `data parts`.

To illustrate this, we use this table (adapted from the [UK property prices dataset](/docs/en/getting-started/example-datasets/uk-price-paid)) tracking the date, town, street, and price for sold properties in the United Kingdom:
Expand All @@ -30,6 +30,7 @@ ORDER BY (town, street);
A data part is created whenever a set of rows is inserted into the table. The following diagram sketches this:

<img src={require('./images/part.png').default} alt='INSERT PROCESSING' class='image' style={{width: '100%'}} />
<br/>

When a ClickHouse server processes the example insert with 4 rows (e.g., via an [INSERT INTO statement](/docs/en/sql-reference/statements/insert-into)) sketched in the diagram above, it performs several steps:

Expand All @@ -48,6 +49,6 @@ Data parts are self-contained, including all metadata needed to interpret their
To manage the number of parts per table, a background merge job periodically combines smaller parts into larger ones until they reach a [configurable](/docs/en/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool) compressed size (typically ~150 GB). Merged parts are marked as inactive and deleted after a [configurable](/docs/en/operations/settings/merge-tree-settings#old-parts-lifetime) time interval. Over time, this process creates a hierarchical structure of merged parts, which is why it’s called a MergeTree table:

<img src={require('./images/merges.png').default} alt='PART MERGES' class='image' style={{width: '100%'}} />

<br/>

To minimize the number of initial parts and the overhead of merges, database clients are [encouraged](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse#data-needs-to-be-batched-for-optimal-performance) to either insert tuples in bulk, e.g. 20,000 rows at once, or to use the [asynchronous insert mode](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse), in which ClickHouse buffers rows from multiple incoming INSERTs into the same table and creates a new part only after the buffer size exceeds a configurable threshold, or a timeout expires.
16 changes: 16 additions & 0 deletions docs/en/managing-data/delete_mutations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
slug: /en/managing-data/delete_mutations
sidebar_label: Delete Mutations
title: Delete Mutations
hide_title: false
---

Delete mutations refers to `ALTER` queries that manipulate table data through delete. Most notably they are queries like `ALTER TABLE DELETE`, etc. Performing such queries will produce new mutated versions of the data parts. This means that such statements would trigger a rewrite of whole data parts for all data that was inserted before the mutation, translating to a large amount of write requests.

:::info
For deletes, you can avoid these large amounts of write requests by using specialised table engines like [ReplacingMergeTree](/docs/en/guides/replacing-merge-tree) or [CollapsingMergeTree](/docs/en/engines/table-engines/mergetree-family/collapsingmergetree) instead of the default MergeTree table engine.
:::

import DeleteMutations from '@site/docs/en/sql-reference/statements/alter/delete.md';

<DeleteMutations/>
76 changes: 76 additions & 0 deletions docs/en/managing-data/drop_partition.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
---
slug: /en/managing-data/drop_partition
sidebar_label: Drop Partition
title: Dropping Partitions
hide_title: false
---

## Background

Partitioning is specified on a table when it is initially defined via the `PARTITION BY` clause. This clause can contain a SQL expression on any columns, the results of which will define which partition a row is sent to.

The data parts are logically associated with each partition on disk and can be queried in isolation. For the example below, we partition the `posts` table by year using the expression `toYear(CreationDate)`. As rows are inserted into ClickHouse, this expression will be evaluated against each row and routed to the resulting partition if it exists (if the row is the first for a year, the partition will be created).

```sql
CREATE TABLE posts
(
`Id` Int32 CODEC(Delta(4), ZSTD(1)),
`PostTypeId` Enum8('Question' = 1, 'Answer' = 2, 'Wiki' = 3, 'TagWikiExcerpt' = 4, 'TagWiki' = 5, 'ModeratorNomination' = 6, 'WikiPlaceholder' = 7, 'PrivilegeWiki' = 8),
`AcceptedAnswerId` UInt32,
`CreationDate` DateTime64(3, 'UTC'),
...
`ClosedDate` DateTime64(3, 'UTC')
)
ENGINE = MergeTree
ORDER BY (PostTypeId, toDate(CreationDate), CreationDate)
PARTITION BY toYear(CreationDate)
```

Read about setting the partition expression in a section [How to set the partition expression](/docs/en/sql-reference/statements/alter/partition/#how-to-set-partition-expression).

In ClickHouse, users should principally consider partitioning to be a data management feature, not a query optimization technique. By separating data logically based on a key, each partition can be operated on independently e.g. deleted. This allows users to move partitions, and thus subnets, between [storage tiers](/en/integrations/s3#storage-tiers) efficiently on time or [expire data/efficiently delete from the cluster](/en/sql-reference/statements/alter/partition).

## Drop Partitions

`ALTER TABLE ... DROP PARTITION` provides a cost-efficient way to drop a whole partition.

``` sql
ALTER TABLE table_name [ON CLUSTER cluster] DROP PARTITION|PART partition_expr
```

This query tags the partition as inactive and deletes data completely, approximately in 10 minutes. The query is replicated – it deletes data on all replicas.

In example, below we remove posts from 2008 for the earlier table by dropping the associated partition.

```sql
SELECT DISTINCT partition
FROM system.parts
WHERE `table` = 'posts'

┌─partition─┐
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
└───────────┘

17 rows in set. Elapsed: 0.002 sec.

ALTER TABLE posts
(DROP PARTITION '2008')

0 rows in set. Elapsed: 0.103 sec.
```
12 changes: 12 additions & 0 deletions docs/en/managing-data/truncate.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
slug: /en/managing-data/truncate
sidebar_label: Truncate Table
title: Truncate Table
hide_title: false
---

Truncate allows the data in a table or database to be removed, while preserving their existence. This is a lightweight operation which cannot be reversed.

import Truncate from '@site/docs/en/sql-reference/statements/truncate.md';

<Truncate/>
16 changes: 16 additions & 0 deletions docs/en/managing-data/update_mutations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
slug: /en/managing-data/update_mutations
sidebar_label: Update Mutations
title: Update Mutations
hide_title: false
---

Update mutations refers to `ALTER` queries that manipulate table data through updates. Most notably they are queries like `ALTER TABLE UPDATE`, etc. Performing such queries will produce new mutated versions of the data parts. This means that such statements would trigger a rewrite of whole data parts for all data that was inserted before the mutation, translating to a large amount of write requests.

:::info
For updates, you can avoid these large amounts of write requests by using specialised table engines like [ReplacingMergeTree](/docs/en/guides/replacing-merge-tree) or [CollapsingMergeTree](/docs/en/engines/table-engines/mergetree-family/collapsingmergetree) instead of the default MergeTree table engine.
:::

import UpdateMutations from '@site/docs/en/sql-reference/statements/alter/update.md';

<UpdateMutations/>
8 changes: 8 additions & 0 deletions docs/en/optimize/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
slug: /en/optimize
sidebar_label: Overview
title: Performance and Optimizations
hide_title: false
---

This section contains tips and best practices for improving performance with ClickHouse. We recommend users read [Core Concepts](/docs/en/parts) as a precursor to this section, which covers the main concepts required to improve performance, especially [Primary Indices](/docs/en/optimize/sparse-primary-indexes).
42 changes: 16 additions & 26 deletions sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -696,11 +696,7 @@ const sidebars = {
"en/integrations/data-ingestion/apache-spark/spark-jdbc",
],
},
{
type: "doc",
id: "en/integrations/data-ingestion/dbms/mysql/index",
label: "MySQL",
},
"en/integrations/data-sources/mysql",
"en/integrations/data-sources/cassandra",
"en/integrations/data-sources/redis",
"en/integrations/data-sources/rabbitmq",
Expand Down Expand Up @@ -876,18 +872,24 @@ const sidebars = {
],

managingData: [
{
type: "category",
label: "Core concepts",
collapsed: false,
collapsible: false,
items: [
"en/managing-data/core-concepts/parts",
"en/guides/best-practices/sparse-primary-indexes",
]
},
{
type: "category",
label: "Updating Data",
collapsed: false,
collapsible: false,
items: [
"en/managing-data/updates",
{
type: "link",
label: "Update Mutations",
href: "/en/sql-reference/statements/alter/update"
},
"en/managing-data/update_mutations",
{
type: "doc",
label: "Lightweight Updates",
Expand Down Expand Up @@ -916,21 +918,9 @@ const sidebars = {
label: "Lightweight Deletes",
id: "en/guides/developer/lightweight-delete"
},
{
type: "link",
label: "Delete Mutations",
href: "/en/sql-reference/statements/alter/delete"
},
{
type: "link",
label: "Truncate Table",
href: "/en/sql-reference/statements/truncate"
},
{
type: "link",
label: "Drop Partition",
href: "/en/sql-reference/statements/alter/partition#drop-partitionpart"
}
"en/managing-data/delete_mutations",
"en/managing-data/truncate",
"en/managing-data/drop_partition",
]
},
{
Expand Down Expand Up @@ -1001,7 +991,7 @@ const sidebars = {
collapsed: false,
collapsible: false,
items: [
"en/guides/best-practices/sparse-primary-indexes",
"en/optimize/index",
"en/operations/analyzer",
"en/guides/best-practices/asyncinserts",
"en/guides/best-practices/avoidmutations",
Expand Down
7 changes: 6 additions & 1 deletion src/theme/Navbar/Content/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,11 @@ const dropdownCategories = [{
sidebar: 'managingData',
link: '/docs/en/updating-data',
menuItems: [
{
title: 'Core concepts',
description: 'Understand core concepts in ClickHouse',
link: '/docs/en/parts'
},
{
title: 'Updating Data',
description: 'Updating and replacing data in ClickHouse',
Expand All @@ -145,7 +150,7 @@ const dropdownCategories = [{
{
title: 'Performance and Optimizations',
description: 'Guides to help you optimize ClickHouse',
link: '/docs/en/optimize/sparse-primary-indexes'
link: '/docs/en/optimize'
}
]
},
Expand Down

0 comments on commit 0cbbd2c

Please sign in to comment.