ensure no page switching + add core concepts

ClickHouse · Dec 20, 2024 · 0cbbd2c · 0cbbd2c
1 parent 8c1a449
commit 0cbbd2c
Show file tree

Hide file tree

Showing 11 changed files with 166 additions and 32 deletions.
diff --git a/docs/en/guides/best-practices/sparse-primary-indexes.md b/docs/en/guides/best-practices/sparse-primary-indexes.md
@@ -1,6 +1,6 @@
 ---
 slug: /en/optimize/sparse-primary-indexes
-sidebar_label: Sparse Primary Indexes
+sidebar_label: Primary Indexes
 sidebar_position: 1
 description: In this guide we are going to do a deep dive into ClickHouse indexing.
 ---

diff --git a/docs/en/integrations/data-ingestion/dbms/mysql/index.md b/docs/en/integrations/data-ingestion/dbms/mysql/index.md
@@ -1,7 +1,7 @@
 ---
 sidebar_label: MySQL
 sidebar_position: 10
-slug: /en/integrations/mysql
+slug: /en/integrations/connecting-to-mysql
 description: The MySQL table engine allows you to connect ClickHouse to MySQL.
 keywords: [clickhouse, mysql, connect, integrate, table, engine]
 ---

diff --git a/docs/en/integrations/data-sources/mysql.md b/docs/en/integrations/data-sources/mysql.md
@@ -0,0 +1,10 @@
+---
+slug: /en/integrations/mysql
+sidebar_label: MySQL
+title: MySQL
+hide_title: true
+---
+
+import MySQL from '@site/docs/en/integrations/data-ingestion/dbms/mysql/index.md';
+
+<MySQL/>
diff --git a/docs/en/managing-data/core-concepts/parts.md b/docs/en/managing-data/core-concepts/parts.md
@@ -5,10 +5,10 @@ description: What are data parts in ClickHouse
 keywords: [part]
 ---
 
-
-
 ## What are table parts in ClickHouse?
 
+<br/>
+
 The data from each table in the ClickHouse [MergeTree engine family](/docs/en/engines/table-engines/mergetree-family) is organized on disk as a collection of immutable `data parts`. 
 
 To illustrate this, we use this table (adapted from the [UK property prices dataset](/docs/en/getting-started/example-datasets/uk-price-paid)) tracking the date, town, street, and price for sold properties in the United Kingdom:
@@ -30,6 +30,7 @@ ORDER BY (town, street);
 A data part is created whenever a set of rows is inserted into the table. The following diagram sketches this:
 
 <img src={require('./images/part.png').default} alt='INSERT PROCESSING' class='image' style={{width: '100%'}} />
+<br/>
 
 When a ClickHouse server processes the example insert with 4 rows (e.g., via an [INSERT INTO statement](/docs/en/sql-reference/statements/insert-into)) sketched in the diagram above, it performs several steps:
 
@@ -48,6 +49,6 @@ Data parts are self-contained, including all metadata needed to interpret their
 To manage the number of parts per table, a background merge job periodically combines smaller parts into larger ones until they reach a [configurable](/docs/en/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool) compressed size (typically ~150 GB). Merged parts are marked as inactive and deleted after a [configurable](/docs/en/operations/settings/merge-tree-settings#old-parts-lifetime) time interval. Over time, this process creates a hierarchical structure of merged parts, which is why it’s called a MergeTree table:
 
 <img src={require('./images/merges.png').default} alt='PART MERGES' class='image' style={{width: '100%'}} />
-
+<br/>
 
 To minimize the number of initial parts and the overhead of merges, database clients are [encouraged](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse#data-needs-to-be-batched-for-optimal-performance) to either insert tuples in bulk, e.g. 20,000 rows at once, or to use the [asynchronous insert mode](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse), in which ClickHouse buffers rows from multiple incoming INSERTs into the same table and creates a new part only after the buffer size exceeds a configurable threshold, or a timeout expires.
diff --git a/docs/en/managing-data/delete_mutations.md b/docs/en/managing-data/delete_mutations.md
@@ -0,0 +1,16 @@
+---
+slug: /en/managing-data/delete_mutations
+sidebar_label: Delete Mutations
+title: Delete Mutations
+hide_title: false
+---
+
+Delete mutations refers to `ALTER` queries that manipulate table data through delete. Most notably they are queries like `ALTER TABLE DELETE`, etc. Performing such queries will produce new mutated versions of the data parts. This means that such statements would trigger a rewrite of whole data parts for all data that was inserted before the mutation, translating to a large amount of write requests.
+
+:::info
+For deletes, you can avoid these large amounts of write requests by using specialised table engines like [ReplacingMergeTree](/docs/en/guides/replacing-merge-tree) or [CollapsingMergeTree](/docs/en/engines/table-engines/mergetree-family/collapsingmergetree) instead of the default MergeTree table engine.
+:::
+
+import DeleteMutations from '@site/docs/en/sql-reference/statements/alter/delete.md';
+
+<DeleteMutations/>
diff --git a/docs/en/managing-data/drop_partition.md b/docs/en/managing-data/drop_partition.md
@@ -0,0 +1,76 @@
+---
+slug: /en/managing-data/drop_partition
+sidebar_label: Drop Partition
+title: Dropping Partitions
+hide_title: false
+---
+
+## Background
+
+Partitioning is specified on a table when it is initially defined via the `PARTITION BY` clause. This clause can contain a SQL expression on any columns, the results of which will define which partition a row is sent to.
+
+The data parts are logically associated with each partition on disk and can be queried in isolation. For the example below, we partition the `posts` table by year using the expression `toYear(CreationDate)`. As rows are inserted into ClickHouse, this expression will be evaluated against each row and routed to the resulting partition if it exists (if the row is the first for a year, the partition will be created).
+
+```sql
+ CREATE TABLE posts
+(
+	`Id` Int32 CODEC(Delta(4), ZSTD(1)),
+	`PostTypeId` Enum8('Question' = 1, 'Answer' = 2, 'Wiki' = 3, 'TagWikiExcerpt' = 4, 'TagWiki' = 5, 'ModeratorNomination' = 6, 'WikiPlaceholder' = 7, 'PrivilegeWiki' = 8),
+	`AcceptedAnswerId` UInt32,
+	`CreationDate` DateTime64(3, 'UTC'),
+...
+	`ClosedDate` DateTime64(3, 'UTC')
+)
+ENGINE = MergeTree
+ORDER BY (PostTypeId, toDate(CreationDate), CreationDate)
+PARTITION BY toYear(CreationDate)
+```
+
+Read about setting the partition expression in a section [How to set the partition expression](/docs/en/sql-reference/statements/alter/partition/#how-to-set-partition-expression).
+
+In ClickHouse, users should principally consider partitioning to be a data management feature, not a query optimization technique. By separating data logically based on a key, each partition can be operated on independently e.g. deleted. This allows users to move partitions, and thus subnets, between [storage tiers](/en/integrations/s3#storage-tiers) efficiently on time or [expire data/efficiently delete from the cluster](/en/sql-reference/statements/alter/partition). 
+
+## Drop Partitions
+
+`ALTER TABLE ... DROP PARTITION` provides a cost-efficient way to drop a whole partition.
+
+``` sql
+ALTER TABLE table_name [ON CLUSTER cluster] DROP PARTITION|PART partition_expr
+```
+
+This query tags the partition as inactive and deletes data completely, approximately in 10 minutes. The query is replicated – it deletes data on all replicas.
+
+In example, below we remove posts from 2008 for the earlier table by dropping the associated partition.
+
+```sql
+SELECT DISTINCT partition
+FROM system.parts
+WHERE `table` = 'posts'
+
+┌─partition─┐
+│ 2008  	│
+│ 2009  	│
+│ 2010  	│
+│ 2011  	│
+│ 2012  	│
+│ 2013  	│
+│ 2014  	│
+│ 2015  	│
+│ 2016  	│
+│ 2017  	│
+│ 2018  	│
+│ 2019  	│
+│ 2020  	│
+│ 2021  	│
+│ 2022  	│
+│ 2023  	│
+│ 2024  	│
+└───────────┘
+
+17 rows in set. Elapsed: 0.002 sec.
+
+	ALTER TABLE posts
+	(DROP PARTITION '2008')
+
+0 rows in set. Elapsed: 0.103 sec.
+```
diff --git a/docs/en/managing-data/truncate.md b/docs/en/managing-data/truncate.md
@@ -0,0 +1,12 @@
+---
+slug: /en/managing-data/truncate
+sidebar_label: Truncate Table
+title: Truncate Table
+hide_title: false
+---
+
+Truncate allows the data in a table or database to be removed, while preserving their existence. This is a lightweight operation which cannot be reversed.
+
+import Truncate from '@site/docs/en/sql-reference/statements/truncate.md';
+
+<Truncate/>
diff --git a/docs/en/managing-data/update_mutations.md b/docs/en/managing-data/update_mutations.md
@@ -0,0 +1,16 @@
+---
+slug: /en/managing-data/update_mutations
+sidebar_label: Update Mutations
+title: Update Mutations
+hide_title: false
+---
+
+Update mutations refers to `ALTER` queries that manipulate table data through updates. Most notably they are queries like `ALTER TABLE UPDATE`, etc. Performing such queries will produce new mutated versions of the data parts. This means that such statements would trigger a rewrite of whole data parts for all data that was inserted before the mutation, translating to a large amount of write requests.
+
+:::info
+For updates, you can avoid these large amounts of write requests by using specialised table engines like [ReplacingMergeTree](/docs/en/guides/replacing-merge-tree) or [CollapsingMergeTree](/docs/en/engines/table-engines/mergetree-family/collapsingmergetree) instead of the default MergeTree table engine.
+:::
+
+import UpdateMutations from '@site/docs/en/sql-reference/statements/alter/update.md';
+
+<UpdateMutations/>
diff --git a/docs/en/optimize/index.md b/docs/en/optimize/index.md
@@ -0,0 +1,8 @@
+---
+slug: /en/optimize
+sidebar_label: Overview
+title: Performance and Optimizations
+hide_title: false
+---
+
+This section contains tips and best practices for improving performance with ClickHouse. We recommend users read [Core Concepts](/docs/en/parts) as a precursor to this section, which covers the main concepts required to improve performance, especially [Primary Indices](/docs/en/optimize/sparse-primary-indexes).
diff --git a/sidebars.js b/sidebars.js
@@ -696,11 +696,7 @@ const sidebars = {
             "en/integrations/data-ingestion/apache-spark/spark-jdbc",
           ],
         },
-        {
-          type: "doc",
-          id: "en/integrations/data-ingestion/dbms/mysql/index",
-          label: "MySQL",
-        },
+        "en/integrations/data-sources/mysql",
         "en/integrations/data-sources/cassandra",
         "en/integrations/data-sources/redis",
         "en/integrations/data-sources/rabbitmq",
@@ -876,18 +872,24 @@ const sidebars = {
   ],
 
   managingData: [
+    {
+      type: "category",
+      label: "Core concepts",
+      collapsed: false,
+      collapsible: false,
+      items: [
+        "en/managing-data/core-concepts/parts",
+        "en/guides/best-practices/sparse-primary-indexes",
+      ]
+    },
     {
       type: "category",
       label: "Updating Data",
       collapsed: false,
       collapsible: false,
       items: [
         "en/managing-data/updates",
-        {
-          type: "link",
-          label: "Update Mutations",
-          href: "/en/sql-reference/statements/alter/update"
-        },
+        "en/managing-data/update_mutations",
         {
           type: "doc",
           label: "Lightweight Updates",
@@ -916,21 +918,9 @@ const sidebars = {
             label: "Lightweight Deletes",
             id: "en/guides/developer/lightweight-delete"
           },
-          {
-            type: "link",
-            label: "Delete Mutations",
-            href: "/en/sql-reference/statements/alter/delete"
-          },
-          {
-            type: "link",
-            label: "Truncate Table",
-            href: "/en/sql-reference/statements/truncate"
-          },
-          {
-            type: "link",
-            label: "Drop Partition",
-            href: "/en/sql-reference/statements/alter/partition#drop-partitionpart"
-          }
+          "en/managing-data/delete_mutations",
+          "en/managing-data/truncate",
+          "en/managing-data/drop_partition",
         ]
       },
       {
@@ -1001,7 +991,7 @@ const sidebars = {
       collapsed: false,
       collapsible: false,
       items: [
-        "en/guides/best-practices/sparse-primary-indexes",
+        "en/optimize/index",
         "en/operations/analyzer",
         "en/guides/best-practices/asyncinserts",
         "en/guides/best-practices/avoidmutations",

diff --git a/src/theme/Navbar/Content/index.js b/src/theme/Navbar/Content/index.js
@@ -127,6 +127,11 @@ const dropdownCategories = [{
     sidebar: 'managingData',
     link: '/docs/en/updating-data',
     menuItems: [
+      {
+        title: 'Core concepts',
+        description: 'Understand core concepts in ClickHouse',
+        link: '/docs/en/parts'
+      },
       {
         title: 'Updating Data',
         description: 'Updating and replacing data in ClickHouse',
@@ -145,7 +150,7 @@ const dropdownCategories = [{
       {
         title: 'Performance and Optimizations',
         description: 'Guides to help you optimize ClickHouse',
-        link: '/docs/en/optimize/sparse-primary-indexes'
+        link: '/docs/en/optimize'
       }
     ]
   },