cdc-postgres is a project designed to implement Change Data Capture (CDC) for PostgreSQL databases. It leverages tools like Kafka and Debezium to monitor and capture data changes in real-time, ensuring that downstream applications have access to the latest data.
This project is built with purpose snapshot a OLTP database and stream the changes to a OLAP database
- Real-time Data Capture: Monitors PostgreSQL databases for changes and captures INSERT, UPDATE, and DELETE operations as they occur.
- Integration with Kafka: Streams captured data changes into Kafka topics for further processing or integration with other systems.
- Debezium Integration: Utilizes Debezium connectors to handle the CDC process efficiently.
Before setting up the project, ensure you have the following installed:
- Docker
- Docker Compose
-
Clone the Repository:
git clone https://github.com/shaking54/cdc-postgres.git cd cdc-postgres -
Start the Services:
The project includes Docker Compose configurations for setting up the necessary services.
-
To start Kafka and Zookeeper:
make kafka
-
To start Debezium:
make connect
-
-
Set Up Debezium Connector:
Configure Debezium to monitor your PostgreSQL database by sending a POST request to the Kafka Connect REST API. Replace placeholder values as needed.
{ "name": "postgres-connector", "config": { "connector.class": "io.debezium.connector.postgresql.PostgresConnector", "database.hostname": "[replace with your db host]", "database.port": "5432", "database.user": "[replace with your db user]", "database.password": "[replace with your db password]", "database.dbname": "[replace with your db name]", "database.server.name": "[replace with your server name]", "table.include.list": "[replace with you table name following format. seperate with commas: schema_name.table_name1, schema_name.table_name2]", "plugin.name": "pgoutput" } }You can use a tool like
curlto send this configuration:curl -X POST -H "Content-Type: application/json" -d @connector-config.json http://localhost:8083/connectorsEnsure that the
connector-config.jsonfile contains the JSON configuration shown above.
Once set up, the system will capture changes from the specified PostgreSQL tables and stream them into the corresponding Kafka topics. You can then consume these topics using your preferred Kafka consumer to process the data changes.
- Debezium for providing CDC connectors.
- Apache Kafka for the robust streaming platform.
- Docker for containerization.