|
| 1 | +--- |
| 2 | +name: Analytics Engineering Task |
| 3 | +about: Technical evaluation task for Analytics Engineering candidates |
| 4 | +title: 'Analytics Engineering Task: SEPTA GTFS Analysis' |
| 5 | +labels: 'evaluation' |
| 6 | +assignees: '' |
| 7 | +--- |
| 8 | + |
| 9 | +# Analytics Engineering Task |
| 10 | + |
| 11 | +## Background |
| 12 | + |
| 13 | +As part of our evaluation process, we'd like you to work on a real-world data analysis task. For this task, assume that we are working on a project for SEPTA (Southeastern Pennsylvania Transportation Authority), and they request that we pull together some data to inform a visualization of the number of routes for each mode within the system. This task requires that we have access to their GTFS (General Transit Feed Specification) data. |
| 14 | + |
| 15 | +## Task Description |
| 16 | + |
| 17 | +1. Download the [bus and rail GTFS feeds for SEPTA](https://github.com/septadev/GTFS/releases) and import the routes and agency tables for both into DuckDB |
| 18 | +2. Clean up the imported tables, adding a text version of [`route_type`](https://gtfs.org/documentation/schedule/reference/#routestxt) (i.e. mode name) |
| 19 | +3. Produce a view showing the total routes per mode: agency_name, mode_name, route_count |
| 20 | +4. Open a pull request for us to review just as you would working on a team |
| 21 | + |
| 22 | +## Expected Time |
| 23 | + |
| 24 | +You are expected to spend 1–2 hours on this task. Feel free to extend the idea of the exercise as you see fit as long as you are meeting the core requirements. |
| 25 | + |
| 26 | +## Getting Started |
| 27 | + |
| 28 | +1. Clone this repository |
| 29 | +2. Install dependencies using Poetry: `poetry install` |
| 30 | +3. Run the existing pipeline to verify setup: `docker compose run analytics` |
| 31 | + |
| 32 | +## Development Workflow |
| 33 | + |
| 34 | +1. Create a new branch for your work |
| 35 | +2. Implement your solution following these steps: |
| 36 | + - Add data ingestion script for GTFS feeds |
| 37 | + - Create staging models for routes and agency tables |
| 38 | + - Implement the mode name mapping |
| 39 | + - Create the final view with route counts |
| 40 | +3. Test your changes locally |
| 41 | +4. Open a pull request with your solution |
| 42 | + |
| 43 | +## Evaluation Criteria |
| 44 | + |
| 45 | +We will evaluate your submission based on: |
| 46 | + |
| 47 | +- Code quality and organization |
| 48 | +- SQL/dbt best practices |
| 49 | +- Documentation |
| 50 | +- Git workflow |
| 51 | +- Problem-solving approach |
| 52 | + |
| 53 | +## Resources |
| 54 | + |
| 55 | +- [GTFS Reference](https://gtfs.org/documentation/schedule/reference/) |
| 56 | +- [dbt Documentation](https://docs.getdbt.com/) |
| 57 | +- [DuckDB Documentation](https://duckdb.org/docs/) |
| 58 | + |
| 59 | +## Questions? |
| 60 | + |
| 61 | +If you have any questions about the task or requirements, please don't hesitate to ask by commenting on this issue. |
| 62 | + |
| 63 | +Good luck! |
0 commit comments