|
| 1 | +# Analytics Engineer Interview Project |
| 2 | + |
| 3 | +This repository contains a technical evaluation project for Analytics Engineering candidates at Jarvus. It provides a structured environment for working with data pipelines using dbt and DuckDB. |
| 4 | + |
| 5 | +## Project Structure |
| 6 | + |
| 7 | +``` |
| 8 | +. |
| 9 | +├── data/ # Data directory (gitignored except for .gitkeep) |
| 10 | +│ └── raw/ # Raw data storage |
| 11 | +├── models/ # dbt models |
| 12 | +│ ├── staging/ # Staging models |
| 13 | +│ ├── intermediate/ # Intermediate models |
| 14 | +│ └── marts/ # Mart models |
| 15 | +├── scripts/ # Data ingestion scripts |
| 16 | +├── tests/ # dbt tests |
| 17 | +└── macros/ # dbt macros |
| 18 | +``` |
| 19 | + |
| 20 | +## Prerequisites |
| 21 | + |
| 22 | +- Docker and Docker Compose |
| 23 | +- Git |
| 24 | + |
| 25 | +## Local Development Setup |
| 26 | + |
| 27 | +1. Clone the repository: |
| 28 | + |
| 29 | + ```bash |
| 30 | + git clone https://github.com/your-org/analytics-engineer-interview.git |
| 31 | + cd analytics-engineer-interview |
| 32 | + ``` |
| 33 | + |
| 34 | +2. Start the development environment: |
| 35 | + |
| 36 | + ```bash |
| 37 | + docker compose build |
| 38 | + docker compose run dev |
| 39 | + ``` |
| 40 | + |
| 41 | +3. Run the full pipeline: |
| 42 | + |
| 43 | + ```bash |
| 44 | + docker compose run analytics |
| 45 | + ``` |
| 46 | + |
| 47 | +## Development Workflow |
| 48 | + |
| 49 | +### Running Individual Components |
| 50 | + |
| 51 | +1. Data Ingestion: |
| 52 | + |
| 53 | + ```bash |
| 54 | + docker compose run ingest |
| 55 | + ``` |
| 56 | + |
| 57 | +2. dbt Commands: |
| 58 | + |
| 59 | + ```bash |
| 60 | + docker compose run dbt deps # Install dbt dependencies |
| 61 | + docker compose run dbt debug # Test connection |
| 62 | + docker compose run dbt build # Run models, tests, and snapshots |
| 63 | + ``` |
| 64 | + |
| 65 | +### Making Changes |
| 66 | + |
| 67 | +1. Create a new branch: |
| 68 | + |
| 69 | + ```bash |
| 70 | + git checkout -b feature/your-feature-name |
| 71 | + ``` |
| 72 | + |
| 73 | +2. Make your changes to the models, scripts, or configurations |
| 74 | + |
| 75 | +3. Test your changes: |
| 76 | + |
| 77 | + ```bash |
| 78 | + docker compose run analytics |
| 79 | + ``` |
| 80 | + |
| 81 | +4. Open a pull request on GitHub |
| 82 | + |
| 83 | +## Deployment Process |
| 84 | + |
| 85 | +The project includes a GitHub Actions workflow that: |
| 86 | + |
| 87 | +1. Runs on push to main branch and pull requests |
| 88 | +2. Executes the full pipeline |
| 89 | +3. Uploads the DuckDB database and dbt artifacts as workflow artifacts |
| 90 | + |
| 91 | +## Project Components |
| 92 | + |
| 93 | +### Data Pipeline |
| 94 | + |
| 95 | +- Downloads example data from public sources |
| 96 | +- Loads data into a local DuckDB database |
| 97 | +- Processes data through dbt models: |
| 98 | + - Staging models for initial data cleaning |
| 99 | + - Intermediate models for data transformation |
| 100 | + - Mart models for final analysis |
| 101 | + |
| 102 | +### dbt Models |
| 103 | + |
| 104 | +- `stg_bus_shelters`: Initial cleaning and column renaming |
| 105 | +- `int_shelter_locations`: Geographic clustering analysis |
| 106 | +- `mart_shelter_distribution`: High-level shelter distribution metrics |
| 107 | + |
| 108 | +### Configuration Files |
| 109 | + |
| 110 | +- `pyproject.toml`: Python dependencies managed by Poetry |
| 111 | +- `dbt_project.yml`: dbt project configuration |
| 112 | +- `profiles.yml`: dbt connection profiles |
| 113 | +- `docker-compose.yml`: Container configuration |
| 114 | +- `Dockerfile`: Development environment definition |
| 115 | + |
| 116 | +## Contributing |
| 117 | + |
| 118 | +See our [Contributing Guidelines](CONTRIBUTING.md) for development practices and standards. |
| 119 | + |
| 120 | +## Support |
| 121 | + |
| 122 | +If you encounter any issues or have questions, please: |
| 123 | + |
| 124 | +1. Check the existing GitHub issues |
| 125 | +2. Create a new issue if needed |
| 126 | +3. Comment on your assigned evaluation task |
| 127 | + |
| 128 | +## License |
| 129 | + |
| 130 | +This project is proprietary and confidential. All rights reserved. |
0 commit comments