This is a full-stack processing pipeline for transcribing voice data using Whipser model.
It is built on top of AWS services, including S3, OpenSearch, Lambda, Step Functions, and Batch.
We use AWS Chalice to develop lambdas.
The batch jobs are just plain-old Docker containers.
We use AWS CDK to manage and deploy the entire stack.
The step function definitions are in infrastructure/stacks/processing_stack.py
. Please make sure to edit based on Step Functions documentation.
make deploy stack="whisper-processing-aws-ecr"
make build-lambdas
make deploy stack="whisper-processing-aws-stack"