This project explores how well general-purpose AI models, such as Anthropic's Claude-3-5-Sonnet-20241022, can generate synthetic datasets. The goal is to evaluate the model’s ability to understand and replicate database schemas, generate realistic data, and maintain the integrity and relationships between the data.
For more information, check out the blog post: Vibe Coding With AI to Generate Synthetic Data: Part 1
Two PostgreSQL databases are required:
- The first database should be the production database, containing schema (data is optional).
- The second should be an empty database.
- Both databases must use the same version of PostgreSQL.
- The PostgreSQL version installed on the Action's runner should match the versions of the databases.
- An Anthropic API key is required to access the AI model.
- Rename
.env.example
to.env
to configure the environment. - Install the dependencies.
- Run one of the following commands from your terminal:
node .github/scripts/generate-data-ai-only.mjs
node .github/scripts/generate-data-hybrid.mjs
- Add the variables outlined in
env.example
to GitHub Secrets. - Manually trigger the Actions via the GitHub UI.
The production database follows the schema defined in schema.sql
. There is no need to add data to the production database in order to run these tests.