Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mdformat to pre-commit and lint repo #2923

Merged
merged 1 commit into from
Aug 24, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ about: Create a report to help us improve
title: 'Bug: '
labels: ''
assignees: ''

---

**Describe the bug**
Expand Down
8 changes: 6 additions & 2 deletions .github/ISSUE_TEMPLATE/new-team-member.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
name: New Team Member
about: Kick off the onboarding process.
title: New Team Member - [Name]
labels: 'new team member'

labels: new team member
---

Name:
Role:
Reports to:
Expand All @@ -14,9 +14,11 @@ GitHub Username:
Slack Username:

**Set-up:**

- [ ] Technical Onboarding call scheduled

- [ ] Added to tools:

- [ ] Github
- [ ] Organization: Cal-ITP
- [ ] Team: warehouse-users and warehouse-contributors
Expand All @@ -26,12 +28,14 @@ Slack Username:
- [ ] Slack

- [ ] Added to meetings:

- [ ] Analyst Round Tables (Tuesday & Thursday)
- [ ] Lunch n' Learn
- [ ] All-hands
- [ ] Data & Digital Services email list

- [ ] Added to Slack channels:

- [ ] #data-analyses
- [ ] #data-office-hours
- [ ] #data
Expand Down
8 changes: 2 additions & 6 deletions .github/ISSUE_TEMPLATE/user-story.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,16 @@ about: Submit a user story or feature request
title: ''
labels: ''
assignees: ''

---

## User story / feature request

_Please describe your need, outlining the key users, the feature being requested, and the goal that that the feature will facilitate. For example: **As a [user or stakeholder type], I want [software feature] so that [some business value]**_


_Please describe your need, outlining the key users, the feature being requested, and the goal that that the feature will facilitate. For example: **As a \[user or stakeholder type\], I want \[software feature\] so that \[some business value\]**_

### Acceptance Criteria

_Please enter something that can be verified to show that this user story is satisfied. For example: **I can join table X with table Y.** or **Column A appears in table Z in Metabase.**_



### Notes

_Please enter any additional information that will facilitate the completion of this ticket. For example: Are there any constraints not mentioned above? Are there any alternatives you have considered?_
9 changes: 4 additions & 5 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
# Description
_Describe your changes and why you're making them. Please include the context, motivation, and relevant dependencies._


_Describe your changes and why you're making them. Please include the context, motivation, and relevant dependencies._

Resolves #[issue]
Resolves #\[issue\]

## Type of change

Expand All @@ -13,11 +12,11 @@ Resolves #[issue]
- [ ] Documentation

## How has this been tested?
_Include commands/logs/screenshots as relevant._


_Include commands/logs/screenshots as relevant._

## Post-merge follow-ups

_Document any actions that must be taken post-merge to deploy or otherwise implement the changes in this PR (for example, running a full refresh of some incremental model in dbt). If these actions will take more than a few hours after the merge or if they will be completed by someone other than the PR author, please create a dedicated follow-up issue and link it here to track resolution._

- [ ] No action required
Expand Down
21 changes: 11 additions & 10 deletions .github/workflows/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,32 +13,33 @@ While we're using GCP Composer, "deployment" of Airflow consists of two parts:

This workflow builds a static website from the Svelte app and deploys it to Netlify.

## build-*.yml workflows
## build-\*.yml workflows

Workflows prefixed with `build-` generally lint, test, and (usually) publish either a Python package or a Docker image.

## service-*.yml workflows
## service-\*.yml workflows

Workflows prefixed with `service-` deal with Kubernetes deployments.

* `service-release-candidate.yml` creates candidate branches, using [hologit](https://github.com/JarvusInnovations/hologit) to bring in external Helm charts and remove irrelevant (i.e. non-infra) code
* `service-release-diff.yml` renders kubectl diffs on PRs targeting release branches
* `service-release-channel.yml` deploys to a given channel (i.e. environment) on updates to a release branch
- `service-release-candidate.yml` creates candidate branches, using [hologit](https://github.com/JarvusInnovations/hologit) to bring in external Helm charts and remove irrelevant (i.e. non-infra) code
- `service-release-diff.yml` renders kubectl diffs on PRs targeting release branches
- `service-release-channel.yml` deploys to a given channel (i.e. environment) on updates to a release branch

Some of these workflows use hologit or invoke. See the READMEs in [.holo](../.holo) and [ci](../ci) for documentation regarding hologit and invoke, respectively.

## GitOps

The workflows described above also define their triggers. In general, developer workflows should follow these steps.

1. Check out a feature branch
2. Put up a PR for that feature branch, targeting `main`
* `service-release-candidate` will run and create a remote branch named `candidate/<feature-branch-name`
- `service-release-candidate` will run and create a remote branch named `candidate/<feature-branch-name`
3. Create and merge a PR from the candidate branch to `releases/test`
* `service-release-diff` will run on the PR and print the expected changes
* `service-release-channel` will run on merge (i.e. push on `releases/test`) and deploy
- `service-release-diff` will run on the PR and print the expected changes
- `service-release-channel` will run on merge (i.e. push on `releases/test`) and deploy
4. Merge the original PR
* `service-release-candidate` will then update the remote `candidates/main` branch
- `service-release-candidate` will then update the remote `candidates/main` branch
5. Create and merge a PR from `candidates/main` to `releases/prod`
* `service-release-channel` will run and deploy to `prod` this time
- `service-release-channel` will run and deploy to `prod` this time

Note: One alternative would be to use `candidates/main` to deploy into both `test` and `prod`. This is very possible but can be a bit annoying if GitHub is configured to delete branches on merge and the `cleanup-release-candidates` action then deletes `candidates/main` after it has been merged into `releases/test`.
4 changes: 2 additions & 2 deletions .holo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ GitHub branches in a manner that facilitates an "obvious" GitOps workflow for CI
hologit allows:

1. Building branches containing only a subset of repository contents (for example, a branch only including infra-related code)
* This action is called "projection"
- This action is called "projection"
2. Bringing in contents from another repository without relying on published artifacts such as Helm charts
3. Applying transformations to files as part of #1
* These transformations are called "lenses"
- These transformations are called "lenses"

In this repository, we declare one holobranch named [release-candidate](../branches/release-candidate).
By projecting this holobranch in GitHub Actions, individual "candidate" branches end up containing
Expand Down
10 changes: 10 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -60,3 +60,13 @@ repos:
rev: 0.6.1
hooks:
- id: nbstripout
- repo: https://github.com/executablebooks/mdformat
rev: 0.7.16
hooks:
- id: mdformat
exclude: ^warehouse/(?!README.md)
args: ["--number"]
additional_dependencies:
- mdformat-gfm
- mdformat-frontmatter
- mdformat-footnote
24 changes: 13 additions & 11 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,18 @@ These guidelines are meant to provide a foundation for collaboration in Cal-ITP'
primarily [#data-infra](https://github.com/cal-itp/data-infra).

## Issues
* When submitting an issue, please try to use an existing template if one is appropriate
* Provide enough information and context; try to do one or more of the following:
* Include links to specific lines of code, error logs, Slack context, etc.
* Include error messages or tracebacks if relevant and short
* Connect issues to Sentry issues

- When submitting an issue, please try to use an existing template if one is appropriate
- Provide enough information and context; try to do one or more of the following:
- Include links to specific lines of code, error logs, Slack context, etc.
- Include error messages or tracebacks if relevant and short
- Connect issues to Sentry issues

## Pull Requests
* We generally use merge commits as we think they provide clarity in a PR-based workflow
* PRs should be linked to any issues that they close. [Keywords](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword) are one good way to do this
* Google provides a [How to do a code review reference](https://google.github.io/eng-practices/review/reviewer/) that reviewers may find helpful
* Use draft PRs to keep track of work without notifying reviewers, and avoid giving pre-emptive feedback on draft PRs
* Reviewers should not generally merge PRs themselves and should instead let the author merge, since authors will have the most context about merge considerations (for example, whether additional reviews are still needed, or whether any communication is needed about the impacts of the PR when it merges)
* After a PR is merged, the author has the responsibility of monitoring any subsequent CI actions for successful completions

- We generally use merge commits as we think they provide clarity in a PR-based workflow
- PRs should be linked to any issues that they close. [Keywords](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword) are one good way to do this
- Google provides a [How to do a code review reference](https://google.github.io/eng-practices/review/reviewer/) that reviewers may find helpful
- Use draft PRs to keep track of work without notifying reviewers, and avoid giving pre-emptive feedback on draft PRs
- Reviewers should not generally merge PRs themselves and should instead let the author merge, since authors will have the most context about merge considerations (for example, whether additional reviews are still needed, or whether any communication is needed about the impacts of the PR when it merges)
- After a PR is merged, the author has the responsibility of monitoring any subsequent CI actions for successful completions
30 changes: 15 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,21 @@ Documentation for this codebase lives at [docs.calitp.org/data-infra](https://do

## Repository Structure

* [./airflow](./airflow) contains the local dev setup and source code for Airflow DAGs (i.e. ETL)
* [./ci](./ci) contains continuous integration and deployment scripts using GitHub actions.
* [./docs](./docs) builds the [docs site](https://docs.calitp.org/data-infra).
* [./kubernetes](./kubernetes) contains helm charts, scripts and more for deploying apps/services (e.g. Metabase, JupyterHub) on our kubernetes cluster.
* [./images](./images) contains images we build and deploy for use by services such as JupyterHub.
* [./services](./services) contains apps that we write and deploy to kubernetes.
* [./warehouse](./warehouse) contains our dbt project that builds and tests models in the BigQuery warehouse.
- [./airflow](./airflow) contains the local dev setup and source code for Airflow DAGs (i.e. ETL)
- [./ci](./ci) contains continuous integration and deployment scripts using GitHub actions.
- [./docs](./docs) builds the [docs site](https://docs.calitp.org/data-infra).
- [./kubernetes](./kubernetes) contains helm charts, scripts and more for deploying apps/services (e.g. Metabase, JupyterHub) on our kubernetes cluster.
- [./images](./images) contains images we build and deploy for use by services such as JupyterHub.
- [./services](./services) contains apps that we write and deploy to kubernetes.
- [./warehouse](./warehouse) contains our dbt project that builds and tests models in the BigQuery warehouse.

## Contributing

* Follow the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) standard for all commits
* Use Conventional Commit format for PR titles
* Use GitHub's *draft* status to indicate PRs that are not ready for review/merging
* Do not use GitHub's "update branch" button or merge the `main` branch back into a PR branch to update it. Instead, rebase PR branches to update them and resolve any merge conflicts.
* We use GitHub's "code owners" functionality to designate a person or group of people who are in the line of approval for changes to some parts of this repository - if one or more people are automatically tagged as reviewers by GitHub when you create a PR, an approving review from at least one of them is required to merge. This does not automatically place the PR review in somebody's list of priorities, so please reach out to a reviewer to get eyes on your PR if it's time-sensitive.
- Follow the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) standard for all commits
- Use Conventional Commit format for PR titles
- Use GitHub's *draft* status to indicate PRs that are not ready for review/merging
- Do not use GitHub's "update branch" button or merge the `main` branch back into a PR branch to update it. Instead, rebase PR branches to update them and resolve any merge conflicts.
- We use GitHub's "code owners" functionality to designate a person or group of people who are in the line of approval for changes to some parts of this repository - if one or more people are automatically tagged as reviewers by GitHub when you create a PR, an approving review from at least one of them is required to merge. This does not automatically place the PR review in somebody's list of priorities, so please reach out to a reviewer to get eyes on your PR if it's time-sensitive.

## Linting and type-checking

Expand All @@ -41,9 +41,9 @@ which runs in the GitHub Actions that build the individual images. If you are
unfamiliar with Python type hints or mypy, the following documentation links
will prove useful.

* [PEP 484](https://peps.python.org/pep-0484/), which added type hints
* [The typing module docs](https://docs.python.org/3/library/typing.html)
* [The mypy docs](https://mypy.readthedocs.io/en/stable/)
- [PEP 484](https://peps.python.org/pep-0484/), which added type hints
- [The typing module docs](https://docs.python.org/3/library/typing.html)
- [The mypy docs](https://mypy.readthedocs.io/en/stable/)

In general, it should be relatively easy to make most of our code pass mypy
since we make heavy use of Pydantic types. Some of our imported modules will
Expand Down
5 changes: 4 additions & 1 deletion airflow/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,13 @@ gcloud init
```

Next, run the initial database migration which also creates a default user named `airflow`.

```shell
docker-compose run airflow db init
```

Next, start all services including the Airflow web server.

```console
docker-compose up
```
Expand All @@ -60,6 +62,7 @@ docker-compose run airflow tasks test download_gtfs_schedule_v2 download_schedul
Additional reading about this setup can be found on the [Airflow Docs](https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html)

### PodOperators

Airflow PodOperator tasks execute a specific Docker image; as of 2023-08-24 these images are pushed to [GitHub Container Registry](https://ghcr.io/) and production uses `:latest` tags while local uses `:development`. If you want to test these tasks locally, you must build and push development versions of the images used by the tasks. The Dockerfiles and code that make up the images live in the [../jobs](../jobs) directory. For example:

```bash
Expand All @@ -77,7 +80,7 @@ docker-compose run airflow tasks test unzip_and_validate_gtfs_schedule_hourly va

### Common Issues

* `docker-compose up` exits with code 137 - Check that your docker has enough RAM (e.g. 8Gbs). See [this post](https://stackoverflow.com/questions/44533319/how-to-assign-more-memory-to-docker-container) on how to increase its resources.
- `docker-compose up` exits with code 137 - Check that your docker has enough RAM (e.g. 8Gbs). See [this post](https://stackoverflow.com/questions/44533319/how-to-assign-more-memory-to-docker-container) on how to increase its resources.

## Deploying to production

Expand Down
1 change: 1 addition & 0 deletions airflow/dags/download_gtfs_schedule_v2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@ Type: [Now / Scheduled](https://docs.calitp.org/data-infra/airflow/dags-maintena
This DAG orchestrates raw data capture for GTFS schedule data. It reads GTFS data configuration files that are generated by the [`airtable_loader_2` DAG](../airtable_loader_v2/README.md) to determine the list of GTFS schedule URLs to scrape (this DAG will just find the latest such configuration file, so there is no formal dependency between the two DAGs on a daily run basis.)

## Secrets

You may need to change authentication information in [Secret Manager](https://console.cloud.google.com/security/secret-manager); auth keys are loaded from Secret Manager at the start of DAG executions. You may create new versions of existing secrets, or add entirely new secrets. Secrets must be tagged with `gtfs_schedule: true` to be loaded and are referenced by `url_secret_key_name` or `header_secret_key_name` in Airtable's GTFS dataset records.
9 changes: 5 additions & 4 deletions airflow/dags/transform_warehouse/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,11 @@ Type: [Now / Scheduled](https://docs.calitp.org/data-infra/airflow/dags-maintena
This DAG orchestrates the running of the Cal-ITP dbt project and deployment of associated artifacts like the [dbt docs site](https://dbt-docs.calitp.org/#!/overview).

This DAG has some special considerations:
* If a task fails, look carefully before assuming that clearing the task will help. If the failure was caused by a `DbtModelError`, there is an issue with the SQL or data in an individual model and clearing the task will not help until that issue is fixed.

* While this DAG does not have any formal dependencies on other DAGs, the data transformations within the dbt project do depend on successful upstream data capture and parsing.
- If a task fails, look carefully before assuming that clearing the task will help. If the failure was caused by a `DbtModelError`, there is an issue with the SQL or data in an individual model and clearing the task will not help until that issue is fixed.

* Because the tasks in this DAG involve running a large volume of SQL transformations, they risk triggering data quotas if the DAG is run multiple times in a single day.
- While this DAG does not have any formal dependencies on other DAGs, the data transformations within the dbt project do depend on successful upstream data capture and parsing.

* This task can be run with a `dbt_select` statement provided (use the `Trigger DAG w/ config` button (option under the "play" icon in the upper right corner when looking at an individual DAG) in the Airflow UI and provide a JSON configuration like `{"dbt_select": "<+ if you want to run parents><your_model_here><+ if you want to run children>"}` using [dbt selection syntax](https://docs.getdbt.com/reference/node-selection/syntax#specifying-resources)) to re-run a specific individual model's lineage.
- Because the tasks in this DAG involve running a large volume of SQL transformations, they risk triggering data quotas if the DAG is run multiple times in a single day.

- This task can be run with a `dbt_select` statement provided (use the `Trigger DAG w/ config` button (option under the "play" icon in the upper right corner when looking at an individual DAG) in the Airflow UI and provide a JSON configuration like `{"dbt_select": "<+ if you want to run parents><your_model_here><+ if you want to run children>"}` using [dbt selection syntax](https://docs.getdbt.com/reference/node-selection/syntax#specifying-resources)) to re-run a specific individual model's lineage.
1 change: 1 addition & 0 deletions apps/maps/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ Netlify sites deployed via `netlify deploy ...` with `--alias=some-alias` and/or
The site is deployed to production on merges to main, as defined in [../../.github/workflows/deploy-apps-maps.yml](../../.github/workflows/deploy-apps-maps.yml).

You may also deploy manually with the following:

```bash
(from the apps/maps folder)
npm run build
Expand Down
1 change: 1 addition & 0 deletions ci/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ a deployment named `archiver` is configured in [the prod channel](./channels/pro
by `invoke` (see below) calling `kubectl` commands.

## invoke (aka pyinvoke)

[invoke](https://docs.pyinvoke.org/en/stable/) is a Python framework for executing subprocesses and building a CLI application.
The tasks are defined in `tasks.py` and configuration in `invoke.yaml`; config values under the top-level `calitp`
are specific to our defined tasks.
Expand Down
Loading
Loading