Skip to content

Commit afc42a3

Browse files
docs: add project documentation and contributing guidelines
1 parent d0ad320 commit afc42a3

File tree

2 files changed

+220
-0
lines changed

2 files changed

+220
-0
lines changed

CONTRIBUTING.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Contributing Guidelines
2+
3+
## Commit Messages
4+
5+
We follow the [Conventional Commits](https://www.conventionalcommits.org/) specification for commit messages. This leads to more readable messages that are easy to follow when looking through the project history.
6+
7+
### Commit Message Format
8+
9+
Each commit message consists of a **header**, a **body** and a **footer**. The header has a special format that includes a **type**, a **scope** and a **subject**:
10+
11+
```
12+
<type>(<scope>): <subject>
13+
<BLANK LINE>
14+
<body>
15+
<BLANK LINE>
16+
<footer>
17+
```
18+
19+
The **header** is mandatory and the **scope** of the header is optional.
20+
21+
### Type
22+
23+
Must be one of the following:
24+
25+
* **feat**: A new feature
26+
* **fix**: A bug fix
27+
* **docs**: Documentation only changes
28+
* **style**: Changes that do not affect the meaning of the code (white-space, formatting, etc)
29+
* **refactor**: A code change that neither fixes a bug nor adds a feature
30+
* **perf**: A code change that improves performance
31+
* **test**: Adding missing tests or correcting existing tests
32+
* **chore**: Changes to the build process or auxiliary tools and libraries such as documentation generation
33+
34+
### Scope
35+
36+
The scope should be the name of the module affected (as perceived by the person reading the changelog generated from commit messages).
37+
38+
### Subject
39+
40+
The subject contains a succinct description of the change:
41+
42+
* use the imperative, present tense: "change" not "changed" nor "changes"
43+
* don't capitalize the first letter
44+
* no dot (.) at the end
45+
46+
### Body
47+
48+
Just as in the **subject**, use the imperative, present tense. The body should include the motivation for the change and contrast this with previous behavior.
49+
50+
### Footer
51+
52+
The footer should contain any information about **Breaking Changes** and is also the place to reference GitHub issues that this commit **Closes**.
53+
54+
### Examples
55+
56+
```
57+
feat(models): add new staging model for GTFS routes
58+
59+
* Create stg_routes.sql
60+
* Add route_type mapping
61+
* Include agency reference
62+
63+
Closes #123
64+
```
65+
66+
```
67+
fix(pipeline): correct CSV download path
68+
69+
The CSV download path was incorrectly pointing to a temporary directory.
70+
Now uses the configured data/raw directory.
71+
72+
Closes #456
73+
```
74+
75+
```
76+
docs(readme): update development setup instructions
77+
78+
* Add Poetry installation steps
79+
* Clarify Docker requirements
80+
* Update troubleshooting section
81+
```
82+
83+
## Pull Request Process
84+
85+
1. Create a feature branch from `main`
86+
2. Follow the conventional commits specification for all commits
87+
3. Update documentation as needed
88+
4. Open a pull request with a clear description of the changes
89+
5. Ensure all checks pass
90+
6. Request review from maintainers

README.md

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# Analytics Engineer Interview Project
2+
3+
This repository contains a technical evaluation project for Analytics Engineering candidates at Jarvus. It provides a structured environment for working with data pipelines using dbt and DuckDB.
4+
5+
## Project Structure
6+
7+
```
8+
.
9+
├── data/ # Data directory (gitignored except for .gitkeep)
10+
│ └── raw/ # Raw data storage
11+
├── models/ # dbt models
12+
│ ├── staging/ # Staging models
13+
│ ├── intermediate/ # Intermediate models
14+
│ └── marts/ # Mart models
15+
├── scripts/ # Data ingestion scripts
16+
├── tests/ # dbt tests
17+
└── macros/ # dbt macros
18+
```
19+
20+
## Prerequisites
21+
22+
- Docker and Docker Compose
23+
- Git
24+
25+
## Local Development Setup
26+
27+
1. Clone the repository:
28+
29+
```bash
30+
git clone https://github.com/your-org/analytics-engineer-interview.git
31+
cd analytics-engineer-interview
32+
```
33+
34+
2. Start the development environment:
35+
36+
```bash
37+
docker compose build
38+
docker compose run dev
39+
```
40+
41+
3. Run the full pipeline:
42+
43+
```bash
44+
docker compose run analytics
45+
```
46+
47+
## Development Workflow
48+
49+
### Running Individual Components
50+
51+
1. Data Ingestion:
52+
53+
```bash
54+
docker compose run ingest
55+
```
56+
57+
2. dbt Commands:
58+
59+
```bash
60+
docker compose run dbt deps # Install dbt dependencies
61+
docker compose run dbt debug # Test connection
62+
docker compose run dbt build # Run models, tests, and snapshots
63+
```
64+
65+
### Making Changes
66+
67+
1. Create a new branch:
68+
69+
```bash
70+
git checkout -b feature/your-feature-name
71+
```
72+
73+
2. Make your changes to the models, scripts, or configurations
74+
75+
3. Test your changes:
76+
77+
```bash
78+
docker compose run analytics
79+
```
80+
81+
4. Open a pull request on GitHub
82+
83+
## Deployment Process
84+
85+
The project includes a GitHub Actions workflow that:
86+
87+
1. Runs on push to main branch and pull requests
88+
2. Executes the full pipeline
89+
3. Uploads the DuckDB database and dbt artifacts as workflow artifacts
90+
91+
## Project Components
92+
93+
### Data Pipeline
94+
95+
- Downloads example data from public sources
96+
- Loads data into a local DuckDB database
97+
- Processes data through dbt models:
98+
- Staging models for initial data cleaning
99+
- Intermediate models for data transformation
100+
- Mart models for final analysis
101+
102+
### dbt Models
103+
104+
- `stg_bus_shelters`: Initial cleaning and column renaming
105+
- `int_shelter_locations`: Geographic clustering analysis
106+
- `mart_shelter_distribution`: High-level shelter distribution metrics
107+
108+
### Configuration Files
109+
110+
- `pyproject.toml`: Python dependencies managed by Poetry
111+
- `dbt_project.yml`: dbt project configuration
112+
- `profiles.yml`: dbt connection profiles
113+
- `docker-compose.yml`: Container configuration
114+
- `Dockerfile`: Development environment definition
115+
116+
## Contributing
117+
118+
See our [Contributing Guidelines](CONTRIBUTING.md) for development practices and standards.
119+
120+
## Support
121+
122+
If you encounter any issues or have questions, please:
123+
124+
1. Check the existing GitHub issues
125+
2. Create a new issue if needed
126+
3. Comment on your assigned evaluation task
127+
128+
## License
129+
130+
This project is proprietary and confidential. All rights reserved.

0 commit comments

Comments
 (0)