Skip to content

Commit 7f198cb

Browse files
ci: add GitHub Actions workflow and issue template
1 parent 232df19 commit 7f198cb

File tree

2 files changed

+96
-0
lines changed

2 files changed

+96
-0
lines changed
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
---
2+
name: Analytics Engineering Task
3+
about: Technical evaluation task for Analytics Engineering candidates
4+
title: 'Analytics Engineering Task: SEPTA GTFS Analysis'
5+
labels: 'evaluation'
6+
assignees: ''
7+
---
8+
9+
# Analytics Engineering Task
10+
11+
## Background
12+
13+
As part of our evaluation process, we'd like you to work on a real-world data analysis task. For this task, assume that we are working on a project for SEPTA (Southeastern Pennsylvania Transportation Authority), and they request that we pull together some data to inform a visualization of the number of routes for each mode within the system. This task requires that we have access to their GTFS (General Transit Feed Specification) data.
14+
15+
## Task Description
16+
17+
1. Download the [bus and rail GTFS feeds for SEPTA](https://github.com/septadev/GTFS/releases) and import the routes and agency tables for both into DuckDB
18+
2. Clean up the imported tables, adding a text version of [`route_type`](https://gtfs.org/documentation/schedule/reference/#routestxt) (i.e. mode name)
19+
3. Produce a view showing the total routes per mode: agency_name, mode_name, route_count
20+
4. Open a pull request for us to review just as you would working on a team
21+
22+
## Expected Time
23+
24+
You are expected to spend 1–2 hours on this task. Feel free to extend the idea of the exercise as you see fit as long as you are meeting the core requirements.
25+
26+
## Getting Started
27+
28+
1. Clone this repository
29+
2. Install dependencies using Poetry: `poetry install`
30+
3. Run the existing pipeline to verify setup: `docker compose run analytics`
31+
32+
## Development Workflow
33+
34+
1. Create a new branch for your work
35+
2. Implement your solution following these steps:
36+
- Add data ingestion script for GTFS feeds
37+
- Create staging models for routes and agency tables
38+
- Implement the mode name mapping
39+
- Create the final view with route counts
40+
3. Test your changes locally
41+
4. Open a pull request with your solution
42+
43+
## Evaluation Criteria
44+
45+
We will evaluate your submission based on:
46+
47+
- Code quality and organization
48+
- SQL/dbt best practices
49+
- Documentation
50+
- Git workflow
51+
- Problem-solving approach
52+
53+
## Resources
54+
55+
- [GTFS Reference](https://gtfs.org/documentation/schedule/reference/)
56+
- [dbt Documentation](https://docs.getdbt.com/)
57+
- [DuckDB Documentation](https://duckdb.org/docs/)
58+
59+
## Questions?
60+
61+
If you have any questions about the task or requirements, please don't hesitate to ask by commenting on this issue.
62+
63+
Good luck!

.github/workflows/pipeline.yml

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
name: Analytics Pipeline
2+
3+
on:
4+
push:
5+
branches: [ main ]
6+
pull_request:
7+
branches: [ main ]
8+
9+
jobs:
10+
pipeline:
11+
runs-on: ubuntu-latest
12+
13+
steps:
14+
- uses: actions/checkout@v3
15+
16+
- name: Build and run pipeline
17+
run: |
18+
docker compose build
19+
docker compose run analytics
20+
21+
- name: Upload DuckDB database
22+
uses: actions/upload-artifact@v3
23+
with:
24+
name: analytics-db
25+
path: data/analytics.db
26+
if-no-files-found: error
27+
28+
- name: Upload dbt artifacts
29+
uses: actions/upload-artifact@v3
30+
with:
31+
name: dbt-artifacts
32+
path: target/
33+
if-no-files-found: error

0 commit comments

Comments
 (0)