Skip to content

[doc] Move CDC to Develop #25289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jan 3, 2025
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ Change data capture (CDC) in YugabyteDB provides technology to ensure that any c

CDC using PostgreSQL protocol in YugabyteDB is based on the PostgreSQL Logical Replication model. The fundamental concept is that of the Replication Slot. A Replication Slot represents a stream of changes that can be replayed to the client in the order they were made on the origin server in a manner that preserves transactional consistency. This is the basis for the support for Transactional CDC in YugabyteDB. Where the strict requirements of Transactional CDC are not present, multiple replication slots can be used to stream changes from unrelated tables in parallel.

{{<lead link="../../../explore/change-data-capture/">}}
{{<lead link="../../../develop/change-data-capture/">}}

See [Change data capture](../../../explore/change-data-capture/) in Explore for more details and limitations.
See [Change data capture](../../../develop/change-data-capture/) for more details and limitations.

{{</lead>}}

Expand Down Expand Up @@ -79,13 +79,13 @@ The walsender sends changes to the output plugin, which filters them according t
<!--TODO (Siddharth): Fix the Links to the protocol section.

{{< note title="Note" >}}
Refer to [Replication Protocol](../../../explore/change-data-capture/using-logical-replication/#streaming-protocol) for more details.
Refer to [Replication Protocol](../../../develop/change-data-capture/using-logical-replication/#streaming-protocol) for more details.

{{< /note >}}

{{< tip title="Explore" >}}

See [Getting Started with Logical Replication](../../../explore/change-data-capture/using-logical-replication/getting-started/) to set up Logical Replication in YugabyteDB.
See [Getting Started with Logical Replication](../../../develop/change-data-capture/using-logical-replication/getting-started/) to set up Logical Replication in YugabyteDB.

{{< /tip >}}
-->
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,9 @@ Every YB-TServer has a `CDC service` that is stateless. The main APIs provided b

![Stateless CDC Service](/images/architecture/stateless_cdc_service.png)

{{<lead link="../../../explore/change-data-capture/">}}
{{<lead link="../../../develop/change-data-capture/">}}

See [Change data capture](../../../explore/change-data-capture/) in Explore for more details and limitations.
See [Change data capture](../../../develop/change-data-capture/) for more details and limitations.

{{</lead>}}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,12 @@ tags:
menu:
preview:
identifier: explore-change-data-capture
parent: explore
weight: 280
parent: develop
weight: 575
type: indexpage
---
In databases, change data capture (CDC) is a set of software design patterns used to determine and track the data that has changed so that action can be taken using the changed data. CDC is beneficial in a number of scenarios:

Change data capture (CDC) is used to determine and track the data that has changed so that action can be taken using the changed data. CDC is used in a number of scenarios:

- **Microservice-oriented architectures**: Some microservices require a stream of changes to the data, and using CDC in YugabyteDB can provide consumable data changes to CDC subscribers.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ description: CDC using YugabyteDB PostgreSQL replication protocol.
headcontent: Capture changes made to data in the database
tags:
feature: early-access
aliases:
- /preview/explore/change-data-capture/using-logical-replication/
menu:
preview:
identifier: explore-change-data-capture-logical-replication
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ headerTitle: Advanced configuration
linkTitle: Advanced configuration
description: Advanced Configurations for Logical Replication.
headcontent: Tune your CDC configuration
aliases:
- /preview/explore/change-data-capture/using-logical-replication/advanced-configuration/
menu:
preview:
parent: explore-change-data-capture-logical-replication
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ title: Advanced topics
headerTitle: Advanced topics
linkTitle: Advanced topics
description: Advanced topics for Change Data Capture in YugabyteDB.
aliases:
- /preview/explore/change-data-capture/using-logical-replication/advanced-topic/
menu:
preview:
parent: explore-change-data-capture-logical-replication
Expand All @@ -17,7 +19,7 @@ This section explores a range of topics designed to provide deeper insights and

A change in the schema of the tables (ALTER TABLE) being streamed is transparently handled by the database without manual intervention.

This is illustrated in the following example. The client used for the example is [pg_recvlogical](../get-started/#get-started-with-pg-recvlogical).
This is illustrated in the following example. The client used for the example is [pg_recvlogical](../../../../explore/change-data-capture/#try-it-out).

1. Create a table and create the replication slot. pg_recvlogical uses the test_decoding output plugin by default.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ title: Best Practices for logical replication
headerTitle: Best practices
linkTitle: Best practices
description: Best Practices for for logical replication with Change Data Capture in YugabyteDB.
aliases:
- /preview/explore/change-data-capture/using-logical-replication/best-practices/
menu:
preview:
parent: explore-change-data-capture-logical-replication
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ headerTitle: Get started
linkTitle: Get started
description: Get started with Change Data Capture in YugabyteDB.
headcontent: Start using CDC with logical replication
aliases:
- /preview/explore/change-data-capture/using-logical-replication/get-started/
menu:
preview:
parent: explore-change-data-capture-logical-replication
Expand All @@ -12,145 +14,16 @@ menu:
type: docs
---

To get started streaming data change events from a YugabyteDB database using a replication slot, you can use either of the following client options:
Use the following steps to get started streaming data change events from a YugabyteDB database using a replication slot and the YugabyteDB connector.

- [pg_recvlogical](#get-started-with-pg-recvlogical)
- [YugabyteDB connector](#get-started-with-yugabytedb-connector)
For an example of logical replication using the pg_recvlogical utility, see [Change data capture](../../../../explore/change-data-capture/).

{{< note title="Note" >}}

CDC via logical replication is supported in YugabyteDB starting from version 2024.1.1.

{{< /note >}}

## Get started with pg_recvlogical

pg_recvlogical is a command-line tool provided by PostgreSQL for interacting with the logical replication feature. It is specifically used to receive changes from the database using logical replication slots.

YugabyteDB provides the pg_recvlogical binary in the `<yugabyte-db-dir>/postgres/bin/` directory, which is inherited and based on PostgreSQL 11.2. Although PostgreSQL also offers a pg_recvlogical binary, you are strongly advised to use the YugabyteDB version to avoid compatibility issues.

### Set up pg_recvlogical

To set up pg_recvlogical, create and start the local cluster by running the following command from your YugabyteDB home directory:

```sh
./bin/yugabyted start \
--advertise_address=127.0.0.1 \
--base_dir="${HOME}/var/node1" \
--tserver_flags="allowed_preview_flags_csv={cdcsdk_enable_dynamic_table_support},cdcsdk_enable_dynamic_table_support=true,cdcsdk_publication_list_refresh_interval_secs=2"
```

#### Create tables

1. Use ysqlsh to connect to the default `yugabyte` database with the default superuser `yugabyte`, as follows:

```sh
bin/ysqlsh -h 127.0.0.1 -U yugabyte -d yugabyte
```

1. In the `yugabyte` database, create a table `employees`.

```sql
CREATE TABLE employees (
employee_id SERIAL PRIMARY KEY,
name VARCHAR(255),
email VARCHAR(255),
department_id INTEGER
);
```

#### Create a Replication slot

Create a logical replication slot named `test_logical_replication_slot` using the `test_decoding` output plugin via the following function:

```sql
SELECT *
FROM pg_create_logical_replication_slot('test_logical_replication_slot', 'test_decoding');
```

Expected output after running the command that indicates successful creation of the slot:

```output
slot_name | lsn
-------------------------------+-----
test_logical_replication_slot | 0/2
```

#### Configure and start pg_recvlogical

The pg_recvlogical binary can be found under `<yugabyte-db-dir>/postgres/bin/`.

Open a new shell and start pg_recvlogical to connect to the `yugabyte` database with the superuser `yugabyte` and replicate changes using the replication slot you created as follows:

```sh
./pg_recvlogical -d yugabyte \
-U yugabyte \
-h 127.0.0.1 \
--slot test_logical_replication_slot \
--start \
-f -
```

Any changes that get replicated are printed to stdout.

For more pg_recvlogical configurations, refer to the PostgreSQL [pg_recvlogical](https://www.postgresql.org/docs/11/app-pgrecvlogical.html) documentation.

#### Verify Replication

Return to the shell where ysqlsh is running. Perform DMLs on the `employees` table.

```sql
BEGIN;

INSERT INTO employees (name, email, department_id)
VALUES ('Alice Johnson', '[email protected]', 1);

INSERT INTO employees (name, email, department_id)
VALUES ('Bob Smith', '[email protected]', 2);

COMMIT;
```

Expected output observed on stdout where pg_recvlogical is running:

```output
BEGIN 2
table public.employees: INSERT: employee_id[integer]:1 name[character varying]:'Alice Johnson' email[character varying]:'[email protected]' department_id[integer]:1
table public.employees: INSERT: employee_id[integer]:2 name[character varying]:'Bob Smith' email[character varying]:'[email protected]' department_id[integer]:2
COMMIT 2
```

#### Add tables (Dynamic table addition)

You can add a new table to the `yugabyte` database and any DMLs performed on the new table would also be replicated to pg_recvlogical.

1. In the `yugabyte` database, create a new table `projects`:

```sql
CREATE TABLE projects (
project_id SERIAL PRIMARY KEY,
name VARCHAR(255),
description TEXT
);
```

2. Perform DMLs on the `projects` table:

```sql
INSERT INTO projects (name, description)
VALUES ('Project A', 'Description of Project A');
```

Expected output observed on stdout where pg_recvlogical is running:

```output
BEGIN 3
table public.projects: INSERT: project_id[integer]:1 name[character varying]:'Project A' description[text]:'Description of Project A'
COMMIT 3
```

{{% explore-cleanup-local %}}

## Get started with YugabyteDB connector

This tutorial demonstrates how to use Debezium to monitor a YugabyteDB database. As the data in the database changes, you will see the resulting event streams.
Expand Down Expand Up @@ -180,7 +53,7 @@ To start the services needed for this tutorial, you must:

Zookeeper is the first service you must start.

1. Open a terminal and use it to start Zookeeper in a container. This command runs a new container using version `2.5.2.Final` of the `debezium/zookeeper` image:
Open a terminal and use it to start Zookeeper in a container. This command runs a new container using version `2.5.2.Final` of the `debezium/zookeeper` image:

```sh
docker run -d --rm --name zookeeper -p 2181:2181 -p 2888:2888 -p 3888:3888 debezium/zookeeper:2.5.2.Final
Expand All @@ -190,7 +63,7 @@ docker run -d --rm --name zookeeper -p 2181:2181 -p 2888:2888 -p 3888:3888 debez

After starting Zookeeper, you can start Kafka in a new container.

1. Open a new terminal and use it to start Kafka in a container. This command runs a new container using version `2.5.2.Final` of the `debezium/kafka` image:
Open a new terminal and use it to start Kafka in a container. This command runs a new container using version `2.5.2.Final` of the `debezium/kafka` image:

```sh
docker run -d --rm --name kafka -p 9092:9092 --link zookeeper:zookeeper debezium/kafka:2.5.2.Final
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ headerTitle: Key concepts
linkTitle: Key concepts
description: Change Data Capture in YugabyteDB.
headcontent: PostgreSQL logical replication concepts
aliases:
- /preview/explore/change-data-capture/using-logical-replication/key-concepts/
menu:
preview:
parent: explore-change-data-capture-logical-replication
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ title: CDC monitoring in YugabyteDB
headerTitle: Monitor
linkTitle: Monitor
description: Monitor Change Data Capture in YugabyteDB.
aliases:
- /preview/explore/change-data-capture/using-logical-replication/monitor/
menu:
preview:
parent: explore-change-data-capture-logical-replication
Expand Down
Loading