Run dbt serverless in the Cloud (AWS)
- aws credentials configured in
~/.aws/credentials - aws cli
pip install awscli
- terraform
The infrastructure is based on terraform. I setup a terraform backend to keep terraform state. The backend is based an S3 bucket that was created manually. You can create an S3 bucket simply running:
aws s3api create-bucket --bucket nicor88-eu-west-1-terraform --region eu-west-1 --create-bucket-configuration LocationConstraint=eu-west-1
Remember to change the name of the S3 bucket inside infrastructure/provider.tf before running the following commands:
export AWS_PROFILE=your_profile make infra-plan make infra-apply
After the infra is created correctly, you can push an new image to the ECR repository running:
make push-to-ecr AWS_ACCOUNT_ID=your_account_id
Currently Aurora Postgres is only accessible inside the VPC. I create a Network load balancer, to connect to the DB from everywhere, but you need to get the Private IP of Aurora Endpoint. You can simply run:
nslookup your_aurora_enpoint # returned from the terraform outputs
Then you need to replace the 2 variables:
- autora_postgres_serverless_private_ip_1
- autora_postgres_serverless_private_ip_2
and apply again the changes with the command make infra-apply
{
"commands1": [
"dbt",
"run",
"--models"
"example"
],
{
"commands2": [
"dbt",
"run",
"--models"
"just_another_example"
]
}
}
It's possible to invoke ECS Fargate containers to run dbt also from Airflow. Here an example of how to call a DbtOperator from Airflow:
dbt_run_example = DbtOperator(
dag=dag,
task_id='dbt_example',
command='run',
target='dev',
dbt_models='my_example',
subnets=['subnet_id_1', 'subnet_id_2'],
security_groups=['sg_1']
)