diff --git a/.github/README.md b/.github/workflows/README.md similarity index 97% rename from .github/README.md rename to .github/workflows/README.md index ba831f628b..6af1d57193 100644 --- a/.github/README.md +++ b/.github/workflows/README.md @@ -1,6 +1,6 @@ # GitHub Actions -All CI/CD automation in this project is executed via GitHub Actions, whose workflow files live in the [./workflows/](./workflows) directory. +All CI/CD automation in this project is executed via GitHub Actions, whose workflow files live in this directory. ## deploy-airflow.yml diff --git a/.github/workflows/build-dbt.yml b/.github/workflows/build-warehouse-image.yml similarity index 97% rename from .github/workflows/build-dbt.yml rename to .github/workflows/build-warehouse-image.yml index 3c2e678e15..0e4a3df02e 100644 --- a/.github/workflows/build-dbt.yml +++ b/.github/workflows/build-warehouse-image.yml @@ -5,11 +5,11 @@ on: branches: - 'main' paths: - - '.github/workflows/build-dbt.yml' + - '.github/workflows/build-warehouse-image.yml' - 'warehouse/**' pull_request: paths: - - '.github/workflows/build-dbt.yml' + - '.github/workflows/build-warehouse-image.yml' - 'warehouse/**' concurrency: diff --git a/airflow/README.md b/airflow/README.md index 26cdf83e26..06d0399936 100644 --- a/airflow/README.md +++ b/airflow/README.md @@ -59,6 +59,22 @@ docker-compose run airflow tasks test download_gtfs_schedule_v2 download_schedul Additional reading about this setup can be found on the [Airflow Docs](https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html) +### PodOperators +Airflow PodOperator tasks execute a specific Docker image; as of 2023-08-24 these images are pushed to [GitHub Container Registry](https://ghcr.io/) and production uses `:latest` tags while local uses `:development`. If you want to test these tasks locally, you must build and push development versions of the images used by the tasks. The Dockerfiles and code that make up the images live in the [../jobs](../jobs) directory. For example: + +```bash +# running from jobs/gtfs-schedule-validator/ +docker build -t ghcr.io/cal-itp/data-infra/gtfs-schedule-validator:development . +docker push ghcr.io/cal-itp/data-infra/gtfs-schedule-validator:development +``` + +Then, you could execute a task using this updated image. + +```bash +# running from airflow/ +docker-compose run airflow tasks test unzip_and_validate_gtfs_schedule_hourly validate_gtfs_schedule 2023-06-07T16:00:00 +``` + ### Common Issues * `docker-compose up` exits with code 137 - Check that your docker has enough RAM (e.g. 8Gbs). See [this post](https://stackoverflow.com/questions/44533319/how-to-assign-more-memory-to-docker-container) on how to increase its resources. diff --git a/airflow/dags/download_gtfs_schedule_v2/README.md b/airflow/dags/download_gtfs_schedule_v2/README.md index 93b6825d08..af857c5df5 100644 --- a/airflow/dags/download_gtfs_schedule_v2/README.md +++ b/airflow/dags/download_gtfs_schedule_v2/README.md @@ -3,3 +3,6 @@ Type: [Now / Scheduled](https://docs.calitp.org/data-infra/airflow/dags-maintenance.html) This DAG orchestrates raw data capture for GTFS schedule data. It reads GTFS data configuration files that are generated by the [`airtable_loader_2` DAG](../airtable_loader_v2/README.md) to determine the list of GTFS schedule URLs to scrape (this DAG will just find the latest such configuration file, so there is no formal dependency between the two DAGs on a daily run basis.) + +## Secrets +You may need to change authentication information in [Secret Manager](https://console.cloud.google.com/security/secret-manager); auth keys are loaded from Secret Manager at the start of DAG executions. You may create new versions of existing secrets, or add entirely new secrets. Secrets must be tagged with `gtfs_schedule: true` to be loaded and are referenced by `url_secret_key_name` or `header_secret_key_name` in Airtable's GTFS dataset records. diff --git a/ci/README.md b/ci/README.md index cf56ef1d00..f63c471ad5 100644 --- a/ci/README.md +++ b/ci/README.md @@ -15,3 +15,29 @@ Individual release channels/environments are config files that are passed to inv ```bash poetry run invoke release -f channels/test.yaml ``` + +## GitOps + +In this diagram, arrows represent human actions such as opening and merging PRs and nodes (except for the very first) represent automated actions such as `invoke` deploying to the cluster. Green nodes indicate a deployment while white nodes indicate an automated git action such as branch creation or commenting on a pull request. + +```mermaid +flowchart TD +classDef default fill:white, color:black, stroke:black +classDef initial fill:lightblue, color:black +classDef deploy fill:lightgreen, color:black + +pr[Push commits to a branch.\nDoes a test environment exist?] +candidates_branch[GitHub Action renders candidates/branch-name] +branch_diff[invoke diff renders on test PR] +branch_invoke[invoke releases to test] + +candidates_main[GitHub Action builds images and renders candidates/main\nNote: if you stop here, no Kubernetes changes will actually be deployed.] +prod_diff[invoke diff renders on prod PR] +prod_invoke[invoke releases to prod] + +pr -- Yes --> candidates_branch -- "Open PR from candidates/branch-name to releases/test" --> branch_diff -- "Merge candidate PR to releases/test" --> branch_invoke -- Merge original PR to main after review and testing --> candidates_main -- "Open PR from candidates/main to releases/prod" --> prod_diff -- "Merge candidate PR to releases/prod" --> prod_invoke +pr -- "No; merge to main after review" --> candidates_main + +class pr initial +class branch_invoke,prod_invoke deploy +``` diff --git a/docs/_toc.yml b/docs/_toc.yml index c5e4a6dc0c..6237295c5d 100644 --- a/docs/_toc.yml +++ b/docs/_toc.yml @@ -47,12 +47,6 @@ parts: - file: architecture/data - file: airflow/dags-maintenance - file: transit_database/transitdatabase - - file: kubernetes/README - sections: - - file: kubernetes/JupyterHub - - file: kubernetes/architecture - - file: kubernetes/deployment - - file: backups/metabase - caption: Contribute to the Docs! chapters: - file: contribute/overview diff --git a/docs/backups/metabase.md b/docs/backups/metabase.md deleted file mode 100644 index 68e2b68d34..0000000000 --- a/docs/backups/metabase.md +++ /dev/null @@ -1,61 +0,0 @@ -# Backups - -## Metabase - -For most of our backups we utilize [Restic](https://restic.readthedocs.io/en/latest/010_introduction.html) - -To verify that metabase configuration backups have been created, there are three pieces of information you require: - -1. Name of the Restic repository -2. Restic password -3. Google Access token - -There are several ways to obtain the Restic information. - -## Google Cloud Engine - -Within the kubernetes engine on GCE, go to the sidebar of `Secrets and Config Maps`. Select `cluster = data-infra-apps(us-west1)` and `namespace = metabase`, then select `database-backup`. This will have the Restic password that you will need but it will be encrypted. - -## Lens - -The preferred method is to use the Lens Kubernetes IDE https://k8slens.dev/. Once Lens desktop is set up, sync the following cluster `gke_cal-itp-data-infra_us-west1_data-infra-apps`. Within the configuration sidebar, navigate to `Secrets`. Select the `database-backup` secret where you will see the `RESTIC_PASSWORD`. Click the eye icon to unencrypt the password. - -Navigate to the Workloads parent folder and select `CronJobs`. Select the cronjob `postgresql-backup`. If you click the edit button you can look at it in YAML form. There you will obtain the Restic repository info. - -```shell -name: RESTIC_REPOSITORY -value: gs:calitp-backups-metabase:/ -- name: PGHOST -value: database.metabase.svc.cluster.local -``` - -Once you have the name of the Restic repository, the password and your google access token you can connect to Restic. - -## Restic - -Within Restic you can see the snapshots by running the following terminal commands: - -`restic list snapshot` or `restic snapshots latest` - -For spot testing, create a folder within the tmp directory -`mkdir /tmp/pgdump` then run the Restic restore command to extract the data from a snapshot. - -`restic restore -t /tmp/pgdump latest` - -This will be a zipped file, unzip it by using - -`gunzip /tmp/pgdump/pg_dumpall.sql` - -## Verify SQL in Postgres - -To verify the SQL schema and underlying data has not been corrupted , open the SQL file within a Docker container. For initial Docker container setup please visit [Docker Documentation](https://docs.docker.com/get-started/) - -`docker run --rm -v /tmp/sql:/workspace -e POSTGRES_HOST_AUTH_METHOD=trust postgres:13.5` - -It is important to note that the version of postgres used to take the metabase snapshots (13.5) needs to be the same version of postgres that is restoring the dump. - -To load the sql into postgres, run the following command: - -`psql -U postgres < pg_dumpall.sql` - -Then you can verify the schema and underlying data within postgres. diff --git a/docs/kubernetes/JupyterHub.md b/docs/kubernetes/JupyterHub.md deleted file mode 100644 index 3ad6295c47..0000000000 --- a/docs/kubernetes/JupyterHub.md +++ /dev/null @@ -1,197 +0,0 @@ -# JupyterHub - -This page outlines how to deploy JupyterHub to Cal-ITP's Kubernetes cluster. - -As we are not yet able to commit encrypted secrets to the cluster, we'll have to do some work ahead of a `helm install`. - -## Installation - -### 1. Create the Namespace - -``` -kubectl create ns jupyterhub -``` - -### 2. Add Secrets to Namespace - -Two base64-encoded secrets must be added to the `jupyterhub` namespace before installing the Helm chart. - -We'll cover the purpose of each secret in the subsections below. - -We'll put both of these secrets in a local file, `jupyterhub-secrets.yaml`, which will contain something that looks like this: - -```yaml -apiVersion: v1 -data: - service-key.json: -kind: Secret -metadata: - name: jupyterhub-gcloud-service-key - namespace: jupyterhub ---- - -apiVersion: v1 -data: - values.yaml: -kind: Secret -metadata: - name: jupyterhub-github-config - namespace: jupyterhub -``` - -With the above configured, we can go ahead and apply: - -``` -kubectl apply -f jupyterhub-secrets.yaml -``` - -**This file should never be committed!!!** - -#### jupyterhub-gcloud-service-key - -The GCloud service key, a .json file, is used to authenticate users to GCloud. - -Currently, the secret `jupyterhub-gcloud-service-key` is mounted to every JupyterHub user's running container at `/usr/local/secrets/service-key.json`. As we refine our authentication approach, this secret may become obsolete and may be removed from this process, but for now the process of volume mounting this secret is required for authentication. - -To create the base64 encoded string from your terminal: - -``` -cat the-service-key.json | base64 -w 0 -``` - -Then add the terminal's output to the `jupyterhub-secrets.yaml` outlined above. - -#### jupyterhub-github-config - -Because we use [Github OAuth](https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/authentication.html?highlight=oauth#github) for user authentication in JupyterHub, we have to provide a client-id and client-secret to the JupyterHub Helm chart. We also have to provide a bunch of non-sensitive information to the chart, as well, for GitHub OAuth to function. - -For context, here is what the full configuration for the GitHub OAuth in our JupyterHub Helm chart's `values.yaml` might look like: - -```yaml -hub: - config: - GitHubOAuthenticator: - client_id: - client_secret: - oauth_callback_url: https://your-jupyterhub-domain/hub/oauth_callback - allowed_organizations: - - cal-itp:warehouse-users - scope: - - read:org - JupyterHub: - authenticator_class: github - Authenticator: - admin_users: - - machow - - themightchris - - lottspot -``` - -Fortunately, we don't have to store all of this information in the secret! The JupyterHub chart affords us the ability to use the `hub.existingSecret` parameter to pass in the sensitive information, so we can mix-and-match parts of the configuration. - -This means that we can leave parameters like `hub.config.GitHubOAuthenticator.oauth_callback_url` and `hub.config.GitHubOAuthenticator.allowed_organizations` in plain text in our `values.yaml`, and place sensitive information like `hub.config.GitHubOAuthenticator.client_id` and `hub.config.GitHubOAuthenticator.client_secret` in our `jupyterhub-secrets.yaml`. - -So, what format must our base64 encoded string take in order for the JuptyerHub chart to accept it? - -##### Create a Temporary File - -``` -touch github-secrets.yaml -``` - -##### Fill the File with Chart-Formatted Secrets - -Your `github-secrets.yaml` should look like this: - -```yaml -hub: - config: - GitHubOAuthenticator: - client_id: - client_secret: -``` - -##### Encode the File Contents - -From your terminal: - -``` -cat github-secrets.yaml | base64 -w 0 -``` - -##### Add the Encoding to Your Secret - -Add the terminal output from above to your `jupyterhub-secrets.yaml` file. - -##### Clean up - -From your terminal: - -``` -rm github-secrets.yaml -``` - -#### Reminder - Apply the secrets file! - -This was a long section, don't forget to apply the following before proceeding. - -``` -kubectl apply -f jupyterhub-secrets.yaml -``` - -After you apply it - you should delete it or keep it somewhere very safe! - -### 3. Install the Helm Chart - -You are now ready to install the chart to your cluster using Helm. - -``` -helm dependency update kubernetes/apps/charts/jupyterhub -helm install jupyterhub kubernetes/apps/charts/jupyterhub -n jupyterhub -``` - -## Updating - -In general, any non-secret changes to the chart can be added to / adjusted in the chart's `values.yaml`. - -Upgrade with: - -``` -# On changes to dependencies in Chart.yaml, remember to re-run: -# helm dependency update kubernetes/apps/charts/jupyterhub -helm upgrade jupyterhub kubernetes/apps/charts/jupyterhub -n jupyterhub -``` - -Note that if you haven't yet connected to the kubernetes cluster, you may need to run the following. - -``` -source kubernetes/gke/config-cluster.sh -export KUBECONFIG=$HOME/.kube/data-infra-apps.yaml -gcloud container clusters get-credentials "$GKE_NAME" --region "$GKE_REGION" -``` - -## Domain Name Changes - -At the time of this writing, a JupyterHub deployment is available at `https://hubtest.k8s.calitp.jarv.us`. - -If, in the future, the domain name were to change to something more permanent, some configuration would have to change. Fortunately, though, none of the secrets we covered above are affected by these configuration changes! - -It is advised the below changes are planned and executed in a coordinated effort. - -### Changes in GitHub OAuth - -Within the GitHub OAuth application, in Github, the homepage and callback URLs would need to be changed. Cal-ITP owns the Github OAUth application in GitHub, and [this Cal-ITP Github issue](https://github.com/cal-itp/data-infra/issues/367) can be referenced for individual contributors who may be able to helm adjusting the Github OAUth application's homepage and callback URLs. - -### Changes in the Helm Chart - -After the changes have been made to the GitHub OAuth application, the following portions of the JupyterHub chart's `values.yaml` must be changed: - - - `hub.config.GitHubOAuthenticator.oauth_callback_url` - - `ingress.hosts` - - `ingress.tls.hosts` - -Apply these chart changes with: - -``` -helm upgrade jupyterhub kubernetes/apps/charts/jupyterhub -n jupyterhub -``` diff --git a/docs/kubernetes/README.md b/docs/kubernetes/README.md deleted file mode 100644 index 3a0a9a1a92..0000000000 --- a/docs/kubernetes/README.md +++ /dev/null @@ -1,127 +0,0 @@ ---- -jupytext: - cell_metadata_filter: -all - formats: md:myst - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.10.3 -kernelspec: - display_name: Python 3 (ipykernel) - language: python - name: python3 ---- - -# Kubernetes -## Cluster Administration ## -### preflight ### - -Check logged in user - -```bash -gcloud auth list -# ensure correct active user -# gcloud auth login -``` - -Check active project - -```bash -gcloud config get-value project -# project should be cal-itp-data-infra -# gcloud config set project cal-itp-data-infra -``` - -Check compute region - -```bash -gcloud config get-value compute/region -# region should be us-west1 -# gcloud config set compute/region us-west1 -``` - -### quick start ### - -```bash -./kubernetes/gke/cluster-create.sh -# ... -export KUBECONFIG=$PWD/kubernetes/gke/kube/admin.yaml -kubectl cluster-info -``` - -### cluster lifecycle ### - -Create the cluster by running `kubernetes/gke/cluster-create.sh`. - -The cluster level configuration parameters are stored in -[`kubernetes/gke/config-cluster.sh`](https://github.com/cal-itp/data-infra/blob/main/kubernetes/gke/config-cluster.sh). -Creating the cluster also requires configuring parameters for a node pool -named "default-pool" (unconfigurable name defined by GKE) in -[`kubernetes/gke/config-nodepool.sh`](https://github.com/cal-itp/data-infra/blob/main/kubernetes/gke/config-nodepool.sh). -Any additional node pools configured in this file are also stood up at cluster -creation time. - -Once the cluster is created, it can be managed by pointing the `KUBECONFIG` -environment variable to `kubernetes/gke/kube/admin.yaml`. - -The cluster can be deleted by running `kubernetes/gke/cluster-delete.sh`. - -### nodepool lifecycle ### - -Certain features of node pools are immutable (e.g., machine type); to change -such parameters requires creating a new node pool with the desired new values, -migrating workloads off of the old node pool, and then deleting the old node pool. -The node pool lifecycle scripts help simplify this process. - -#### create a new node pool #### - -Configure a new node pool by adding its name to the `GKE_NODEPOOL_NAMES` array -in [`kubernetes/gke/config-nodepool.sh`](https://github.com/cal-itp/data-infra/blob/main/kubernetes/gke/config-nodepool.sh). -For each nodepool property (`GKE_NODEPOOL_NODE_COUNT`, `GKE_NODEPOOL_NODE_LOCATIONS`, etc) -it is required to add an entry to the array which is mapped to the nodepool name. - -Once the new nodepool is configured, it can be stood up by running `kubernetes/gke/nodepool-up.sh [nodepool-name]`, -or by simply running `kubernetes/gke/nodepool-up.sh`, which will stand up all configured node pools which do not yet -exist. - -#### drain and delete an old node pool #### - -Once a new nodepool has been created to replace an active node pool, the old node pool must be -removed from the `GKE_NODEPOOL_NAMES` array. - -Once the old node pool is removed from the array, it can be drained and deleted by running `kubernetes/gke/nodepool-down.sh `. - -## Deploy Cluster Workloads ## - -Cluster workloads are divided into two classes: - -1. system -2. apps - -Apps are the workloads that users actually care about. - -### system workloads ### - -```bash -kubectl apply -k kubernetes/system -``` - -System workloads are used to support running applications. This includes items -such as an ingress controller, monitoring, logging, etc. The system deploy command -is run at cluster create time, but when new system workloads are added it may need -to be run again. - -### app: metabase ### - -First deploy: - -```bash -helm install metabase kubernetes/apps/charts/metabase -f kubernetes/apps/values/metabase.yaml -n metabase --create-namespace -``` - -Apply changes: - -```bash -helm upgrade metabase kubernetes/apps/charts/metabase -f kubernetes/apps/values/metabase.yaml -n metabase -``` diff --git a/docs/kubernetes/architecture.md b/docs/kubernetes/architecture.md deleted file mode 100644 index 67a1e21196..0000000000 --- a/docs/kubernetes/architecture.md +++ /dev/null @@ -1,7 +0,0 @@ -# architecture - -This page displays the architecture of our kubernetes environment. - -## - -![Collection Matrix](assets/kubernetes_architecture.png) diff --git a/docs/kubernetes/assets/deployment_process.png b/docs/kubernetes/assets/deployment_process.png deleted file mode 100644 index a053d5e32f..0000000000 Binary files a/docs/kubernetes/assets/deployment_process.png and /dev/null differ diff --git a/docs/kubernetes/assets/kubernetes_architecture.png b/docs/kubernetes/assets/kubernetes_architecture.png deleted file mode 100644 index 06b9845ea6..0000000000 Binary files a/docs/kubernetes/assets/kubernetes_architecture.png and /dev/null differ diff --git a/docs/kubernetes/deployment.md b/docs/kubernetes/deployment.md deleted file mode 100644 index 2b1651d2aa..0000000000 --- a/docs/kubernetes/deployment.md +++ /dev/null @@ -1,7 +0,0 @@ -# deployment - -This page outlines the procress for deploying to Cal-ITP's Kubernetes cluster. - -## - -![Collection Matrix](assets/deployment_process.png) diff --git a/kubernetes/README.md b/kubernetes/README.md new file mode 100644 index 0000000000..a4536575bc --- /dev/null +++ b/kubernetes/README.md @@ -0,0 +1,181 @@ +# Kubernetes + +> :bulb: See the [ci README](../ci/README.md) for the specifics of deploying Kubernetes changes via GitOps. Only workloads (i.e. applications) are deployed via CI/CD and pull requests; changing the Kubernetes cluster itself (e.g. adding a node pool) is a manual operation. + +> :notebook: Both the Google Kubernetes Engine UI and the Lens Kubernetes IDE are useful GUI tools for interacting with a Kubernetes cluster, though you can get by with `kubectl` on the command line. + +We deploy our applications and services to a Google Kubernetes Engine cluster. If you are unfamiliar with Kubernetes, we recommend reading through [the official tutorial](https://kubernetes.io/docs/tutorials/kubernetes-basics/) to understand the main components (you do not have to actually perform all the steps). + +A [glossary](#Glossary) exists at the end of this document. + +## Cluster Administration + +We do not currently use Terraform to manage our cluster, nodepools, etc. and major changes to the cluster are unlikely to be necessary, but we do have some bash scripts that can help with tasks such as creating new node pools or creating a test cluster. + +First, verify you are logged in and gcloud is pointed at `cal-itp-data-infra` and the `us-west1` region. +```bash +gcloud auth list +gcloud config get-value project +gcloud config get-value compute/region +``` + +### Deploying the cluster + +> :red_circle: You should only run this script if you intend to actually deploy a new cluster, though it will stop if the cluster already exists. This is likely to be a rare operation but may be necessary for migrating regions, creating a totally isolated test cluster, etc. + +The cluster level configuration parameters are stored in [config-cluster.sh](./gke/config-cluster.sh). Creating the cluster also requires configuring parameters for a node pool named "default-pool" (unconfigurable name defined by GKE) in [kubernetes/gke/config-nodepool.sh](./gke/config-nodepool.sh). Any additional node pools configured in this file are also stood up at cluster creation time. + +Once the cluster is created, it can be managed by pointing the `KUBECONFIG` +environment variable to `kubernetes/gke/kube/admin.yaml`. + +```bash +./kubernetes/gke/cluster-create.sh +export KUBECONFIG=$PWD/kubernetes/gke/kube/admin.yaml +kubectl cluster-info +``` + +The cluster can be deleted by running `kubernetes/gke/cluster-delete.sh`. + +### Nodepool lifecycle + +It's much more likely that a user may want to add or change node pools than make changes to the cluster itself. Certain features of node pools are immutable (e.g. machine type); to change such parameters requires creating a new node pool with the desired new values, migrating workloads off of the old node pool, and then deleting the old node pool. The node pool lifecycle scripts help simplify this process. + +#### Create a new node pool + +Configure a new node pool by adding its name to the `GKE_NODEPOOL_NAMES` array in [kubernetes/gke/config-nodepool.sh](./gke/config-nodepool.sh). For each nodepool property (`GKE_NODEPOOL_NODE_COUNT`, `GKE_NODEPOOL_NODE_LOCATIONS`, etc) it is required to add an entry to the array which is mapped to the nodepool name. This config file is also where you will set Kubernetes [taints](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) and [labels](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/) on the nodes. + +Once the new nodepool is configured, it can be stood up by running `kubernetes/gke/nodepool-up.sh `, or by simply running `kubernetes/gke/nodepool-up.sh`, which will stand up all configured node pools which do not yet exist. + +#### Drain and delete an old node pool #### + +Once a new nodepool has been created to replace an active node pool, the old node pool must be removed from the `GKE_NODEPOOL_NAMES` array. + +Once the old node pool is removed from the array, it can be drained and deleted by running `kubernetes/gke/nodepool-down.sh `. + +## Deploying workloads + +Cluster workloads are divided into two classes: + +1. Apps are the workloads that users actually care about; this includes deployed "applications" such as the GTFS-RT archiver but also includes "services" like Grafana and Sentry. These workloads are deployed using `invoke` as defined in the [ci](../ci/) folder. +2. System workloads are used to support running applications. This includes items such as an ingress controller, HTTPS certificate manager, etc. The system deploy command is run at cluster create time, but when new system workloads are added it may need to be run again. + + ```bash + kubectl apply -k kubernetes/system + ``` + +## JupyterHub + +JupyterHub is a good example of an application using a Helm chart that is ultimately exposed to the outside internet for user access. In general, any non-secret changes to the chart can be accomplished by modifying the chart's `values.yaml` and running the `invoke release` specific to JupyterHub. +``` +poetry run invoke release -f channels/prod.yaml --app=jupyterhub +``` + +### Secrets +Because we use [Github OAuth](https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/authentication.html?highlight=oauth#github) for user authentication in JupyterHub, we have to provide a client-id and client-secret to the JupyterHub Helm chart. Here is what the full configuration for the GitHub OAuth in our JupyterHub Helm chart's `values.yaml` might look like: + +```yaml +hub: + config: + GitHubOAuthenticator: + client_id: + client_secret: + oauth_callback_url: https://your-jupyterhub-domain/hub/oauth_callback + allowed_organizations: + - cal-itp:warehouse-users + scope: + - read:org + JupyterHub: + authenticator_class: github + Authenticator: + admin_users: + - machow + - themightchris + - lottspot +``` + +We want to avoid committing these secrets to GitHub, but we also want to version control as much of the `values.yaml` as possible. Fortunately, the JupyterHub chart affords us the ability to use the `hub.existingSecret` parameter to referencing an existing secret containing additional `values.yaml` entries. For GitHub OAuth specifically, the `jupyterhub-github-config` secret must contain a `values.yaml` key containing a base64-encoded representation of the following yaml: + +```yaml +hub: + config: + GitHubOAuthenticator: + client_id: + client_secret: +``` + +This encoding could be accomplished by calling `cat | base64` or using similar CLI tools; do not use an online base64 converter for secrets! + +### Domain Name Changes + +At the time of this writing, a JupyterHub deployment is available at [https://notebooks.calitp.org](https://notebooks.calitp.org). If this domain name needs to change, the following configurations must also change so OAuth and ingress continue to function. + +1. Within the GitHub OAuth application, in Github, the homepage and callback URLs would need to be changed. Cal-ITP owns the Github OAUth application in GitHub, and [this Cal-ITP Github issue](https://github.com/cal-itp/data-infra/issues/367) can be referenced for individual contributors who may be able to helm adjusting the Github OAUth application's homepage and callback URLs. + +2. After the changes have been made to the GitHub OAuth application, the following portions of the JupyterHub chart's `values.yaml` must be changed: + + - `hub.config.GitHubOAuthenticator.oauth_callback_url` + - `ingress.hosts` + - `ingress.tls.hosts` + +# Backups + +For most of our backups we utilize [Restic](https://restic.readthedocs.io/en/latest/010_introduction.html); this section uses the Metabase database backup as an example. + +To verify that Metabase configuration backups have been created, there are three pieces of information you require: + +1. Name of the Restic repository +2. Restic password +3. Google Access token (if you have previously authenticated to `gcloud`, this should already be complete) + +There are several ways to obtain the Restic information, listed in order of effort. + +1. In Google Cloud Console, find the `database-backup` K8s Secret in the appropriate namespace (e.g. `metabase`) in the data-infra-apps cluster +2. Perform #1 but using the [Lens Kubernetes IDE](https://k8slens.dev) +3. Print out the K8s Secret and decode from base64 using `kubectl` and `jq` +4. Determine the name of the secret from the deployment YAML (e.g. `metabase_database-backup`) and track it down in Google Cloud Secret Manager; the secrets generally follow the pattern of `_ Mostly cribbed from the [official Kubernetes documentation](https://kubernetes.io/docs/concepts/workloads) + * Kubernetes - a platform for orchestrating (i.e. deploying) containerized software applications onto a collection of virtual machines + * Cluster - a collection of virtual machines (i.e. nodes) on which Kubernetes is installed, and onto which Kubernetes in turn deploys pods + * Pod - one (or more) containers deployed to run within a Kubernetes cluster + * For deployed services/applications, Pods exist because of a Deployment + * For ephemeral workloads (think Airflow tasks or database backups), Pods may be managed directly or via a Job + * Deployment - a Kubernetes object that manages a set of Pods, such as multiple replicas of the same web application + * StatefulSet - similar to Deployments but provides guarantees (e.g. deterministic network identifiers) necessary for stateful applications such as databases + * Service - an abstraction around Pods that provides a network interface _within the cluster_ + * For example, a Redis instance needs a Service to be usable by other Pods + * Ingress - exposes Services to the outside world + * For example, a Metabase Service needs an Ingress to be accessible from the internet + * Volume - an abstraction of storage that is typically mounted into the file system of Pods + * Secrets/ConfigMaps - an abstraction of configuration information, typically mounted as environment variables of or files within Pods