diff --git a/docs/en/guides/best-practices/sparse-primary-indexes.md b/docs/en/guides/best-practices/sparse-primary-indexes.md
index 44f9bb72275..47df4660900 100644
--- a/docs/en/guides/best-practices/sparse-primary-indexes.md
+++ b/docs/en/guides/best-practices/sparse-primary-indexes.md
@@ -1,6 +1,6 @@
---
slug: /en/optimize/sparse-primary-indexes
-sidebar_label: Sparse Primary Indexes
+sidebar_label: Primary Indexes
sidebar_position: 1
description: In this guide we are going to do a deep dive into ClickHouse indexing.
---
diff --git a/docs/en/integrations/data-ingestion/dbms/mysql/index.md b/docs/en/integrations/data-ingestion/dbms/mysql/index.md
index eaa22339981..c3d292f783e 100644
--- a/docs/en/integrations/data-ingestion/dbms/mysql/index.md
+++ b/docs/en/integrations/data-ingestion/dbms/mysql/index.md
@@ -1,7 +1,7 @@
---
sidebar_label: MySQL
sidebar_position: 10
-slug: /en/integrations/mysql
+slug: /en/integrations/connecting-to-mysql
description: The MySQL table engine allows you to connect ClickHouse to MySQL.
keywords: [clickhouse, mysql, connect, integrate, table, engine]
---
diff --git a/docs/en/integrations/data-sources/mysql.md b/docs/en/integrations/data-sources/mysql.md
new file mode 100644
index 00000000000..7f6bb47ec98
--- /dev/null
+++ b/docs/en/integrations/data-sources/mysql.md
@@ -0,0 +1,10 @@
+---
+slug: /en/integrations/mysql
+sidebar_label: MySQL
+title: MySQL
+hide_title: true
+---
+
+import MySQL from '@site/docs/en/integrations/data-ingestion/dbms/mysql/index.md';
+
+
diff --git a/docs/en/managing-data/core-concepts/parts.md b/docs/en/managing-data/core-concepts/parts.md
index cb2bb74c445..cec38dcde18 100644
--- a/docs/en/managing-data/core-concepts/parts.md
+++ b/docs/en/managing-data/core-concepts/parts.md
@@ -5,10 +5,10 @@ description: What are data parts in ClickHouse
keywords: [part]
---
-
-
## What are table parts in ClickHouse?
+
+
The data from each table in the ClickHouse [MergeTree engine family](/docs/en/engines/table-engines/mergetree-family) is organized on disk as a collection of immutable `data parts`.
To illustrate this, we use this table (adapted from the [UK property prices dataset](/docs/en/getting-started/example-datasets/uk-price-paid)) tracking the date, town, street, and price for sold properties in the United Kingdom:
@@ -30,6 +30,7 @@ ORDER BY (town, street);
A data part is created whenever a set of rows is inserted into the table. The following diagram sketches this:
+
When a ClickHouse server processes the example insert with 4 rows (e.g., via an [INSERT INTO statement](/docs/en/sql-reference/statements/insert-into)) sketched in the diagram above, it performs several steps:
@@ -48,6 +49,6 @@ Data parts are self-contained, including all metadata needed to interpret their
To manage the number of parts per table, a background merge job periodically combines smaller parts into larger ones until they reach a [configurable](/docs/en/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool) compressed size (typically ~150 GB). Merged parts are marked as inactive and deleted after a [configurable](/docs/en/operations/settings/merge-tree-settings#old-parts-lifetime) time interval. Over time, this process creates a hierarchical structure of merged parts, which is why it’s called a MergeTree table:
-
+
To minimize the number of initial parts and the overhead of merges, database clients are [encouraged](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse#data-needs-to-be-batched-for-optimal-performance) to either insert tuples in bulk, e.g. 20,000 rows at once, or to use the [asynchronous insert mode](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse), in which ClickHouse buffers rows from multiple incoming INSERTs into the same table and creates a new part only after the buffer size exceeds a configurable threshold, or a timeout expires.
diff --git a/docs/en/managing-data/delete_mutations.md b/docs/en/managing-data/delete_mutations.md
new file mode 100644
index 00000000000..a118428edf4
--- /dev/null
+++ b/docs/en/managing-data/delete_mutations.md
@@ -0,0 +1,16 @@
+---
+slug: /en/managing-data/delete_mutations
+sidebar_label: Delete Mutations
+title: Delete Mutations
+hide_title: false
+---
+
+Delete mutations refers to `ALTER` queries that manipulate table data through delete. Most notably they are queries like `ALTER TABLE DELETE`, etc. Performing such queries will produce new mutated versions of the data parts. This means that such statements would trigger a rewrite of whole data parts for all data that was inserted before the mutation, translating to a large amount of write requests.
+
+:::info
+For deletes, you can avoid these large amounts of write requests by using specialised table engines like [ReplacingMergeTree](/docs/en/guides/replacing-merge-tree) or [CollapsingMergeTree](/docs/en/engines/table-engines/mergetree-family/collapsingmergetree) instead of the default MergeTree table engine.
+:::
+
+import DeleteMutations from '@site/docs/en/sql-reference/statements/alter/delete.md';
+
+
\ No newline at end of file
diff --git a/docs/en/managing-data/drop_partition.md b/docs/en/managing-data/drop_partition.md
new file mode 100644
index 00000000000..4d8cd29e996
--- /dev/null
+++ b/docs/en/managing-data/drop_partition.md
@@ -0,0 +1,76 @@
+---
+slug: /en/managing-data/drop_partition
+sidebar_label: Drop Partition
+title: Dropping Partitions
+hide_title: false
+---
+
+## Background
+
+Partitioning is specified on a table when it is initially defined via the `PARTITION BY` clause. This clause can contain a SQL expression on any columns, the results of which will define which partition a row is sent to.
+
+The data parts are logically associated with each partition on disk and can be queried in isolation. For the example below, we partition the `posts` table by year using the expression `toYear(CreationDate)`. As rows are inserted into ClickHouse, this expression will be evaluated against each row and routed to the resulting partition if it exists (if the row is the first for a year, the partition will be created).
+
+```sql
+ CREATE TABLE posts
+(
+ `Id` Int32 CODEC(Delta(4), ZSTD(1)),
+ `PostTypeId` Enum8('Question' = 1, 'Answer' = 2, 'Wiki' = 3, 'TagWikiExcerpt' = 4, 'TagWiki' = 5, 'ModeratorNomination' = 6, 'WikiPlaceholder' = 7, 'PrivilegeWiki' = 8),
+ `AcceptedAnswerId` UInt32,
+ `CreationDate` DateTime64(3, 'UTC'),
+...
+ `ClosedDate` DateTime64(3, 'UTC')
+)
+ENGINE = MergeTree
+ORDER BY (PostTypeId, toDate(CreationDate), CreationDate)
+PARTITION BY toYear(CreationDate)
+```
+
+Read about setting the partition expression in a section [How to set the partition expression](/docs/en/sql-reference/statements/alter/partition/#how-to-set-partition-expression).
+
+In ClickHouse, users should principally consider partitioning to be a data management feature, not a query optimization technique. By separating data logically based on a key, each partition can be operated on independently e.g. deleted. This allows users to move partitions, and thus subnets, between [storage tiers](/en/integrations/s3#storage-tiers) efficiently on time or [expire data/efficiently delete from the cluster](/en/sql-reference/statements/alter/partition).
+
+## Drop Partitions
+
+`ALTER TABLE ... DROP PARTITION` provides a cost-efficient way to drop a whole partition.
+
+``` sql
+ALTER TABLE table_name [ON CLUSTER cluster] DROP PARTITION|PART partition_expr
+```
+
+This query tags the partition as inactive and deletes data completely, approximately in 10 minutes. The query is replicated – it deletes data on all replicas.
+
+In example, below we remove posts from 2008 for the earlier table by dropping the associated partition.
+
+```sql
+SELECT DISTINCT partition
+FROM system.parts
+WHERE `table` = 'posts'
+
+┌─partition─┐
+│ 2008 │
+│ 2009 │
+│ 2010 │
+│ 2011 │
+│ 2012 │
+│ 2013 │
+│ 2014 │
+│ 2015 │
+│ 2016 │
+│ 2017 │
+│ 2018 │
+│ 2019 │
+│ 2020 │
+│ 2021 │
+│ 2022 │
+│ 2023 │
+│ 2024 │
+└───────────┘
+
+17 rows in set. Elapsed: 0.002 sec.
+
+ ALTER TABLE posts
+ (DROP PARTITION '2008')
+
+0 rows in set. Elapsed: 0.103 sec.
+```
diff --git a/docs/en/managing-data/truncate.md b/docs/en/managing-data/truncate.md
new file mode 100644
index 00000000000..7c58076ecd6
--- /dev/null
+++ b/docs/en/managing-data/truncate.md
@@ -0,0 +1,12 @@
+---
+slug: /en/managing-data/truncate
+sidebar_label: Truncate Table
+title: Truncate Table
+hide_title: false
+---
+
+Truncate allows the data in a table or database to be removed, while preserving their existence. This is a lightweight operation which cannot be reversed.
+
+import Truncate from '@site/docs/en/sql-reference/statements/truncate.md';
+
+
diff --git a/docs/en/managing-data/update_mutations.md b/docs/en/managing-data/update_mutations.md
new file mode 100644
index 00000000000..b24c7a0b8b4
--- /dev/null
+++ b/docs/en/managing-data/update_mutations.md
@@ -0,0 +1,16 @@
+---
+slug: /en/managing-data/update_mutations
+sidebar_label: Update Mutations
+title: Update Mutations
+hide_title: false
+---
+
+Update mutations refers to `ALTER` queries that manipulate table data through updates. Most notably they are queries like `ALTER TABLE UPDATE`, etc. Performing such queries will produce new mutated versions of the data parts. This means that such statements would trigger a rewrite of whole data parts for all data that was inserted before the mutation, translating to a large amount of write requests.
+
+:::info
+For updates, you can avoid these large amounts of write requests by using specialised table engines like [ReplacingMergeTree](/docs/en/guides/replacing-merge-tree) or [CollapsingMergeTree](/docs/en/engines/table-engines/mergetree-family/collapsingmergetree) instead of the default MergeTree table engine.
+:::
+
+import UpdateMutations from '@site/docs/en/sql-reference/statements/alter/update.md';
+
+
\ No newline at end of file
diff --git a/docs/en/optimize/index.md b/docs/en/optimize/index.md
new file mode 100644
index 00000000000..54899f025a5
--- /dev/null
+++ b/docs/en/optimize/index.md
@@ -0,0 +1,8 @@
+---
+slug: /en/optimize
+sidebar_label: Overview
+title: Performance and Optimizations
+hide_title: false
+---
+
+This section contains tips and best practices for improving performance with ClickHouse. We recommend users read [Core Concepts](/docs/en/parts) as a precursor to this section, which covers the main concepts required to improve performance, especially [Primary Indices](/docs/en/optimize/sparse-primary-indexes).
diff --git a/sidebars.js b/sidebars.js
index dfb9d9234d3..b56b3cb5492 100644
--- a/sidebars.js
+++ b/sidebars.js
@@ -696,11 +696,7 @@ const sidebars = {
"en/integrations/data-ingestion/apache-spark/spark-jdbc",
],
},
- {
- type: "doc",
- id: "en/integrations/data-ingestion/dbms/mysql/index",
- label: "MySQL",
- },
+ "en/integrations/data-sources/mysql",
"en/integrations/data-sources/cassandra",
"en/integrations/data-sources/redis",
"en/integrations/data-sources/rabbitmq",
@@ -876,6 +872,16 @@ const sidebars = {
],
managingData: [
+ {
+ type: "category",
+ label: "Core concepts",
+ collapsed: false,
+ collapsible: false,
+ items: [
+ "en/managing-data/core-concepts/parts",
+ "en/guides/best-practices/sparse-primary-indexes",
+ ]
+ },
{
type: "category",
label: "Updating Data",
@@ -883,11 +889,7 @@ const sidebars = {
collapsible: false,
items: [
"en/managing-data/updates",
- {
- type: "link",
- label: "Update Mutations",
- href: "/en/sql-reference/statements/alter/update"
- },
+ "en/managing-data/update_mutations",
{
type: "doc",
label: "Lightweight Updates",
@@ -916,21 +918,9 @@ const sidebars = {
label: "Lightweight Deletes",
id: "en/guides/developer/lightweight-delete"
},
- {
- type: "link",
- label: "Delete Mutations",
- href: "/en/sql-reference/statements/alter/delete"
- },
- {
- type: "link",
- label: "Truncate Table",
- href: "/en/sql-reference/statements/truncate"
- },
- {
- type: "link",
- label: "Drop Partition",
- href: "/en/sql-reference/statements/alter/partition#drop-partitionpart"
- }
+ "en/managing-data/delete_mutations",
+ "en/managing-data/truncate",
+ "en/managing-data/drop_partition",
]
},
{
@@ -1001,7 +991,7 @@ const sidebars = {
collapsed: false,
collapsible: false,
items: [
- "en/guides/best-practices/sparse-primary-indexes",
+ "en/optimize/index",
"en/operations/analyzer",
"en/guides/best-practices/asyncinserts",
"en/guides/best-practices/avoidmutations",
diff --git a/src/theme/Navbar/Content/index.js b/src/theme/Navbar/Content/index.js
index a84999c6ac2..8741d9c2c1f 100644
--- a/src/theme/Navbar/Content/index.js
+++ b/src/theme/Navbar/Content/index.js
@@ -127,6 +127,11 @@ const dropdownCategories = [{
sidebar: 'managingData',
link: '/docs/en/updating-data',
menuItems: [
+ {
+ title: 'Core concepts',
+ description: 'Understand core concepts in ClickHouse',
+ link: '/docs/en/parts'
+ },
{
title: 'Updating Data',
description: 'Updating and replacing data in ClickHouse',
@@ -145,7 +150,7 @@ const dropdownCategories = [{
{
title: 'Performance and Optimizations',
description: 'Guides to help you optimize ClickHouse',
- link: '/docs/en/optimize/sparse-primary-indexes'
+ link: '/docs/en/optimize'
}
]
},