Name		Name	Last commit message	Last commit date
parent directory ..
assets		assets
cdk_stacks		cdk_stacks
src/utils		src/utils
.envrc		.envrc
.example.cdk.context.json		.example.cdk.context.json
.gitignore		.gitignore
README.md		README.md
app.py		app.py
cdk.json		cdk.json
rds-mysql-zero-etl-integration-with-redsfhit-serverless.svg		rds-mysql-zero-etl-integration-with-redsfhit-serverless.svg
requirements.txt		requirements.txt
source.bat		source.bat

README.md

Amazon RDS MySQL Zero-ETL integrations with Amazon Redshift Serverless

This repository provides you cdk scripts and sample code to create an Amazon RDS zero-ETL integration with Amazon Redshift Serverless.

An Amazon RDS zero-ETL integration with Amazon Redshift enables near real-time analytics and machine learning (ML) using Amazon Redshift on petabytes of transactional data from RDS.

The cdk.json file tells the CDK Toolkit how to execute your app.

This project is set up like a standard Python project. The initialization process also creates a virtualenv within this project, stored under the .venv directory. To create the virtualenv it assumes that there is a python3 (or python for Windows) executable in your path with access to the venv package. If for any reason the automatic creation of the virtualenv fails, you can create the virtualenv manually.

To manually create a virtualenv on MacOS and Linux:

$ git clone --depth=1 https://github.com/aws-samples/aws-kr-startup-samples.git
$ cd aws-kr-startup-samples
$ git sparse-checkout init --cone
$ git sparse-checkout set analytics/zero-etl-integrations/rds-mysql-to-redshift
$ cd analytics/zero-etl-integrations/rds-mysql-to-redshift

$ python3 -m venv .venv

After the init process completes and the virtualenv is created, you can use the following step to activate your virtualenv.

(.venv) $ source .venv/bin/activate

If you are a Windows platform, you would activate the virtualenv like this:

(.venv) % .venv\Scripts\activate.bat

Once the virtualenv is activated, you can install the required dependencies.

(.venv) $ pip install -r requirements.txt

To add additional dependencies, for example other CDK libraries, just add them to your setup.py file and rerun the pip install -r requirements.txt command.

Prerequisites

Set up cdk.context.json

Before synthesizing the CloudFormation, you need to prepare the cdk context configuration file, cdk.context.json:

For example,

{
  "rds_instance_name": "zero-etl-source-rds",
  "redshift": {
    "db_name": "zero-etl-target-rs",
    "namespace": "zero-etl-target-rs-ns",
    "workgroup": "zero-etl-target-rs-wg"
  },
  "zero_etl_integration": {
    "data_filter": "include: demodb.retail_trans",
    "integration_name": "zero-etl-rss"
  }
}

Bootstrap AWS environment for AWS CDK app

Also, before any AWS CDK app can be deployed, you have to bootstrap your AWS environment to create certain AWS resources that the AWS CDK CLI (Command Line Interface) uses to deploy your AWS CDK app.

Run the cdk bootstrap command to bootstrap the AWS environment.

(.venv) $ cdk bootstrap

Now you can deploy the CloudFormation template for this code.

List all CDK Stacks

(.venv) $ export CDK_DEFAULT_ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
(.venv) $ export CDK_DEFAULT_REGION=$(aws configure get region)
(.venv) $ cdk list
MySQLVpcStack
MySQLStack
MySQLClientHostStack
RedshiftServerlessStack
ZeroETLfromRDStoRSS

Create MySQL cluster

  (.venv) $ cdk deploy MySQLVpcStack MySQLStack MySQLClientHostStack

Create a sample database and table

In order to set up the MySQL, you need to connect the MySQL cluster on an EC2 Bastion host.

ℹ️ The MySQL username and password are stored in the AWS Secrets Manager as a name such as DatabaseSecret-xxxxxxxxxxxx.

To retrieve a secret (AWS console)

(Step 1) Open the Secrets Manager console at https://console.aws.amazon.com/secretsmanager/.
(Step 2) In the list of secrets, choose the secret you want to retrieve.
(Step 3) In the Secret value section, choose Retrieve secret value.
Secrets Manager displays the current version (AWSCURRENT) of the secret. To see other versions of the secret, such as AWSPREVIOUS or custom labeled versions, use the AWS CLI.

To confirm that binary logging is enabled

Connect to the cluster writer node.

 $ BASTION_HOST_ID=$(aws cloudformation describe-stacks --stack-name MySQLClientHoistStack --region ${CDK_DEFAULT_REGION} | \
 jq -r '.Stacks[0].Outputs | .[] | select(.OutputKey | endswith("EC2InstanceId")) | .OutputValue')

 $ aws ec2-instance-connect ssh --instance-id ${BASTION_HOST_ID} --os-user ec2-user

 [ec2-user@ip-172-31-7-186 ~]$ mysql -hrds-instance-name.xxxxxxxxxxxx.region-name.rds.amazonaws.com -uadmin -p
 Enter password:
 Welcome to the MariaDB monitor.  Commands end with ; or \g.
 Your MySQL connection id is 20
 Server version: 8.0.23 Source distribution

 Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

 Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

 MySQL [(none)]>

ℹ️ MySQLClientHoistStack is a CDK Stack to create the bastion host.

ℹ️ You can connect to an EC2 instance using the EC2 Instance Connect CLI: aws ec2-instance-connect ssh. For more information, see Connect using the EC2 Instance Connect CLI.

At SQL prompt run the below command to confirm that binary logging is enabled:

 MySQL [(none)]> SHOW GLOBAL VARIABLES LIKE "log_bin";
 +---------------+-------+
 | Variable_name | Value |
 +---------------+-------+
 | log_bin       | ON    |
 +---------------+-------+
 1 row in set (0.00 sec)

Also run this to AWS DMS has bin log access that is required for replication

 MySQL [(none)]> CALL mysql.rds_set_configuration('binlog retention hours', 24);
 Query OK, 0 rows affected (0.01 sec)

 MySQL [(none)]> CALL mysql.rds_show_configuration;
 +------------------------+-------+-----------------------------------------------------------------------------------------------------------+
 | name                   | value | description                                                                                               |
 +------------------------+-------+-----------------------------------------------------------------------------------------------------------+
 | binlog retention hours | 24    | binlog retention hours specifies the duration in hours before binary logs are automatically deleted.      |
 | source delay           | 0     | source delay specifies replication delay in seconds between current instance and its master.              |
 | target delay           | 0     | target delay specifies replication delay in seconds between current instance and its future read-replica. |
 +------------------------+-------+-----------------------------------------------------------------------------------------------------------+
 3 rows in set (0.00 sec)

 Query OK, 0 rows affected (0.00 sec)

 MySQL [(none)]>

Run the below command to create the sample database named demodb.

 MySQL [(none)]> SHOW DATABASES;
 +--------------------+
 | Database           |
 +--------------------+
 | information_schema |
 | mysql              |
 | performance_schema |
 | sys                |
 +--------------------+
 4 rows in set (0.00 sec)

 MySQL [(none)]> CREATE DATABASE IF NOT EXISTS demodb;
 Query OK, 1 row affected (0.01 sec)

 MySQL [(none)]> USE demodb;
 Database changed
 MySQL [demodb]> SHOW TABLES;
 Empty set (0.00 sec)

Also run this to create the sample table named retail_trans

 MySQL [demodb]> CREATE TABLE IF NOT EXISTS demodb.retail_trans (
          trans_id BIGINT(20) AUTO_INCREMENT,
          customer_id VARCHAR(12) NOT NULL,
          event VARCHAR(10) DEFAULT NULL,
          sku VARCHAR(10) NOT NULL,
          amount INT DEFAULT 0,
          device VARCHAR(10) DEFAULT NULL,
          trans_datetime DATETIME DEFAULT CURRENT_TIMESTAMP,
          PRIMARY KEY(trans_id),
          KEY(trans_datetime)
        ) ENGINE=InnoDB AUTO_INCREMENT=0;
 Query OK, 0 rows affected, 1 warning (0.04 sec)

 MySQL [demodb]> SHOW TABLES;
 +------------------+
 | Tables_in_demodb |
 +------------------+
 | retail_trans     |
 +------------------+
 1 row in set (0.00 sec)

 MySQL [demodb]> DESC retail_trans;
 +----------------+-------------+------+-----+-------------------+-------------------+
 | Field          | Type        | Null | Key | Default           | Extra             |
 +----------------+-------------+------+-----+-------------------+-------------------+
 | trans_id       | bigint      | NO   | PRI | NULL              | auto_increment    |
 | customer_id    | varchar(12) | NO   |     | NULL              |                   |
 | event          | varchar(10) | YES  |     | NULL              |                   |
 | sku            | varchar(10) | NO   |     | NULL              |                   |
 | amount         | int         | YES  |     | 0                 |                   |
 | device         | varchar(10) | YES  |     | NULL              |                   |
 | trans_datetime | datetime    | YES  | MUL | CURRENT_TIMESTAMP | DEFAULT_GENERATED |
 +----------------+-------------+------+-----+-------------------+-------------------+
 7 rows in set (0.00 sec)

 MySQL [demodb]>

After setting up the MySQL, you should come back to the terminal where you are deploying stacks.

Create Redshift Serverless cluster

  (.venv) $ cdk deploy RedshiftServerlessStack

Configure authorization for your Amazon Redshift data warehouse

Before you create a zero-ETL integration, you must create a source database and a target Amazon Redshift data warehouse. You also must allow replication into the data warehouse by adding the database as an authorized integration source.

You can configure authorized integration sources from the Resource Policy tab on the Amazon Redshift console or using the Amazon Redshift PutResourcePolicy API operation.

To control the source that can create an inbound integration into the namespace, create a resource policy and attach it to the namespace. With the resource policy, you can specify the source that has access to the integration.

The following is a sample resource policy (e.g., rs-rp.json).

  {
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Principal": {
          "Service": "redshift.amazonaws.com"
        },
        "Action": "redshift:AuthorizeInboundIntegration",
        "Resource": "arn:aws:redshift-serverless:{region}:{account-id}:namespace/namespace-uuid",
        "Condition": {
          "StringEquals": {
            "aws:SourceArn": "arn:aws:rds:{region}:{account-id}:db:{rds-instance-name}"
          }
        }
      }
    ]
  }

ℹ️ You can find out the Amazon RDS Instance ARN by running the following.

  aws cloudformation describe-stacks --stack-name MySQLStack --region {region} | \
  jq -r '.Stacks[0].Outputs | .[] | select(.OutputKey | endswith("RDSZeroETLSourceArn")) | .OutputValue'

ℹ️ You can find out the Amazon Redshift namespace ARN by running the following.

  aws cloudformation describe-stacks --stack-name RedshiftServerlessStack --region {region} | \
  jq -r '.Stacks[0].Outputs | .[] | select(.OutputKey | endswith("NamespaceNameArn")) | .OutputValue'

To put a resource policy on your Amazon Redshift namespace ARN for a MySQL source, run a AWS CLI command similar to the following.

  (.venv) $ export RSS_RESOURCE_ARN=$(aws cloudformation describe-stacks --stack-name RedshiftServerlessStack --region ${CDK_DEFAULT_REGION} | \
  jq -r '.Stacks[0].Outputs | .[] | select(.OutputKey | endswith("NamespaceNameArn")) | .OutputValue')
  (.venv) $ aws redshift put-resource-policy \
                --region ${CDK_DEFAULT_REGION} \
                --policy file://rs-rp.json \
                --resource-arn ${RSS_RESOURCE_ARN}

Create Zero ETL Integration with filters

In this example we only want to replicate data from the MySQL table demodb.retail_trans to Redshift. So we add the data filtering option to cdk.context.json like this:

   {
     ...
     "zero_etl_integration": {
       "data_filter": "include: demodb.retail_trans",
       "integration_name": "zero-etl-rss"
     }
   }

ℹ️ Data filtering for Amazon RDS zero-ETL integrations with Amazon Redshift

Now let's create the Zero-ETL integration. It takes a few minutes to change the status of the Zero-ETL integration from Creating to Active. The time varies depending on size of the dataset already available in the source.

  (.venv) $ cdk deploy ZeroETLfromRDStoRSS

Test Zero-ETL Integration

(1) Load Data Into Amazon MySQL Cluster

  $ BASTION_HOST_ID=$(aws cloudformation describe-stacks --stack-name MySQLClientHoistStack --region ${CDK_DEFAULT_REGION} | jq -r '.Stacks[0].Outputs | .[] | select(.OutputKey | endswith("EC2InstanceId")) | .OutputValue')

  $ aws ec2-instance-connect ssh --instance-id ${BASTION_HOST_ID} --os-user ec2-user

  [ec2-user@ip-172-31-7-186 ~]$ python3 gen_fake_mysql_data.py \
                                  --database {your-database-name} \
                                  --table {your-table-name} \
                                  --user {user-name} \
                                  --password {password} \
                                  --host {rds-instance-name}.xxxxxxxxxxxx.{region-name}.rds.amazonaws.com \
                                  --max-count 10

After filling data into the MySQL table, connect to the primary instance and run some queries.

For example, retrieve some records.

  $ mysql -hrds-instance-name.xxxxxxxxxxxx.region-name.rds.amazonaws.com -uadmin -p
  Enter password:
  Welcome to the MariaDB monitor.  Commands end with ; or \g.
  Your MySQL connection id is 20
  Server version: 8.0.23 Source distribution

  Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

  Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

  MySQL [(none)]> USE demodb;
  Database changed

  MySQL [demodb]> SELECT count(*) FROM retail_trans;
  +----------+
  | count(*) |
  +----------+
  |      10  |
  +----------+
  1 row in set (0.01 sec)

  MySQL [demodb]> SELECT * FROM retail_trans LIMIT 10;
  +----------+--------------+----------+------------+--------+--------+---------------------+
  | trans_id | customer_id  | event    | sku        | amount | device | trans_datetime      |
  +----------+--------------+----------+------------+--------+--------+---------------------+
  |        1 | 460104780596 | cart     | IQ6879MMTB |      8 | mobile | 2023-01-16 06:08:06 |
  |        2 | 758933025159 | like     | RL1573WWLT |      1 | tablet | 2023-01-16 06:17:21 |
  |        3 | 754384589074 | like     | PX4135DYNT |      1 | mobile | 2023-01-16 06:08:52 |
  |        4 | 602811489876 | purchase | PI7913TREO |     66 | pc     | 2023-01-16 06:01:07 |
  |        5 | 222732129586 | like     | AS6987HGLN |      1 | mobile | 2023-01-16 06:09:06 |
  |        6 | 387378799012 | list     | AI6161BEFX |      1 | pc     | 2023-01-16 06:10:27 |
  |        7 | 843982894991 | cart     | DA7930CJBR |     81 | pc     | 2023-01-16 06:11:41 |
  |        8 | 818177069814 | like     | JS6166YPTE |      1 | pc     | 2023-01-16 06:17:08 |
  |        9 | 248083404876 | visit    | AS8552DVOO |      1 | pc     | 2023-01-16 06:24:39 |
  |       10 | 731184658511 | visit    | XZ9997LSJN |      1 | tablet | 2023-01-16 06:12:18 |
  +----------+--------------+----------+------------+--------+--------+---------------------+
  10 rows in set (0.00 sec)

(2) Create a database from the integration in Amazon Redshift

To create your database, complete the following steps:

On the Redshift Serverless dashboard, navigate to the zero-etl-target-rs-ns namespace.
Choose Query data to open Query Editor v2.
Connect to the Redshift Serverless data warehouse by choosing Create connection.

Obtain the integration_id from the svv_integration system table:

---- copy this result, use in the next sql
SELECT integration_id FROM svv_integration;

Use the integration_id from the previous step to create a new database from the integration:
```
CREATE DATABASE aurora_zeroetl FROM INTEGRATION '<result from above>';
```
The integration is now complete, and an entire snapshot of the source will reflect as is in the destination. Ongoing changes will be synced in near-real time.
On the Redshift Serverless dashboard, open Query Editor v2 using the database you created as part of the integration setup. Use the following query to get information about checkpoint, snaphost, and subsequent CDC data replication:
```
SELECT * FROM SYS_INTEGRATION_ACTIVITY
WHERE TRUNC(INTEGRATION_START_TIME)= CURRENT_DATE
ORDER BY INTEGRATION_START_TIME;
```

Clean Up

To delete a zero-ETL integration, run the below command.
```
(.venv) $ cdk destroy --force -e ZeroETLfromRDStoRSS
```
When you delete a zero-ETL integration, your transactional data isn’t deleted from Amazon RDS or Amazon Redshift, but Amazon RDS doesn’t send new data to Amazon Redshift.
If you want to delete all CloudFormation stacks, run the below command.
```
(.venv) $ cdk destroy --force --all
```

Useful commands

CDK CLI Installation

npm install -g aws-cdk install the AWS CDK Toolkit (the cdk command).
npm install -g aws-cdk@latest install the latest AWS CDK Toolkit (the cdkcommand).

CDK CLI commands

cdk init app --language python create a new, empty CDK Python project.
cdk bootstrap --profile <AWS Profile> Deploys the CDK Toolkit staging stack; see Bootstrapping
cdk ls list all stacks in the app
cdk synth emits the synthesized CloudFormation template
cdk deploy deploy this stack to your default AWS account/region
cdk diff compare deployed stack with current state
cdk docs open CDK documentation

Enjoy!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rds-mysql-to-redshift

rds-mysql-to-redshift

README.md

Amazon RDS MySQL Zero-ETL integrations with Amazon Redshift Serverless

Prerequisites

List all CDK Stacks

Create MySQL cluster

Create a sample database and table

Create Redshift Serverless cluster

Configure authorization for your Amazon Redshift data warehouse

Create Zero ETL Integration with filters

Test Zero-ETL Integration

(1) Load Data Into Amazon MySQL Cluster

(2) Create a database from the integration in Amazon Redshift

Clean Up

Useful commands

CDK CLI Installation

CDK CLI commands

References

Files

rds-mysql-to-redshift

Directory actions

More options

Directory actions

More options

Latest commit

History

rds-mysql-to-redshift

Folders and files

parent directory

README.md

Amazon RDS MySQL Zero-ETL integrations with Amazon Redshift Serverless

Prerequisites

List all CDK Stacks

Create MySQL cluster

Create a sample database and table

Create Redshift Serverless cluster

Configure authorization for your Amazon Redshift data warehouse

Create Zero ETL Integration with filters

Test Zero-ETL Integration

(1) Load Data Into Amazon MySQL Cluster

(2) Create a database from the integration in Amazon Redshift

Clean Up

Useful commands

CDK CLI Installation

CDK CLI commands

References