The Giga DataOps Platform project follows the concept of Trunk-Based Development,
wherein User Stories are worked on PRs. PRs then get merged to main
once approved by
another developer.
The main
branch serves as the most up-to-date version of the code base.
Refer to Conventional Commits.
[<Feature/Fix/Release/Hotfix>](<issue-id>) <Short desc>
- Branch off from
main
to ensure you get the latest code. - Name your branch according to the Naming Conventions.
- Keep your commits self-contained and your PRs small and tailored to a specific feature as much as possible.
- Push your commits, open a PR and fill in the PR template.
- Request a review from 1 other developer.
- Once approved, rebase/squash your commits into
main
. Rule of thumb:- If the PR contains 1 or 2 commits, perform a Rebase.
- If the PR contains several commits that build toward a larger feature, perform a Squash.
- If the PR contains several commits that are relatively unrelated (e.g., an assortment of bug fixes), perform a Rebase.
azure/
- Contains all configuration for Azure DevOps pipelines.dagster/
- Contains all custom Dagster code.docs/
- This folder contains all Markdown files for creating Backstage TechDocs.spark/
- Contains Docker build items for custom Hive Metastore image.infra/
- Contains all Kubernetes & Helm configuration.spark/
- Contains Docker build items for custom Spark image.oauth2-proxy/
- Contains all Docker build items for custom OAuth2 Proxy image.
- Kubernetes
- If you are using Docker Desktop on Windows, you can use the bundled Kubernetes distribution.
- Helm
Skip this step if you are on Linux or Mac.
- Check your
USERPROFILE
directory for a file named.wslconfig
. You can navigate to this directory by opening the file explorer and entering%USERPROFILE%
in the address bar. If the file does not exist, create it. - Ensure the following contents are in the file:
This is working with the assumption of a workstation that has 4 cores, 32GB RAM, and 1TB of storage. Adjust the values accordingly if you have different hardware specifications. Ideally, do not give WSL more than half of your available RAM.
[wsl2] memory=16GB swap=20GB
- Install WSL. You may be prompted to restart your device.
- In a separate Powershell/Command Prompt (CMD) terminal, run:
wsl --set-default-version 2
- Open the Microsoft Store, search for and install Ubuntu.
- In the Powershell/CMD terminal, run:
wsl --set-default Ubuntu
- In the start menu, Ubuntu should show up in the recently added programs. Open it.
- You will be prompted for a new username and password. Enter any credentials and make sure to remember them. You may be prompted to restart again.
- If you are not prompted to restart, close Ubuntu and open it again. You should now have a working WSL installation.
Important
From this point on, all commands should be run inside the Ubuntu terminal, unless otherwise specified.
- Install Docker Desktop. You may be prompted to restart your device.
- Open the Docker Desktop app and go to settings.
- Ensure you have the following settings:
[!NOTE] WSL integration settings are only applicable if you are on Windows.
- Wait for the Kubernetes installation to complete.
- To test if everything is setup correctly, run this inside an Ubuntu terminal:
If you get no errors, you're good to go!
docker image ls -a kubectl get all
Kubernetes is installed as part of the Docker Desktop installation. You can optionally
install the kubectx
and kubens
plugins to make it easier to switch between
contexts/namespaces.
Install Krew:
- Run the following:
( set -x; cd "$(mktemp -d)" && OS="$(uname | tr '[:upper:]' '[:lower:]')" && ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/')" && KREW="krew-${OS}_${ARCH}" && curl -fsSLO "https://github.com/kubernetes-sigs/krew/releases/latest/download/${KREW}.tar.gz" && tar zxvf "${KREW}.tar.gz" && ./"${KREW}" install krew )
- Add the Krew path to your system
PATH
by appending to your.bashrc
/.zshrc
(i.e. run the following):echo 'export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"' >> ~/.bashrc
- Load your new shell config:
# bash source ~/.bashrc # zsh source ~/.zshrc
- Download the Krew plugin list
kubectl krew update
- Install
kubectx
andkubens
kubectl krew install ctx kubectl krew install ns
- Test if installation is ok:
kubectl ctx kubectl ns
- Install asdf.
- Test installation:
asdf
- Install Python build dependencies:
- MacOS
brew install openssl readline sqlite3 xz zlib tcl-tk
- Linux/WSL
sudo apt-get update sudo apt-get install -y build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev curl libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev
- MacOS
- Install Python
asdf plugin add python asdf install python 3.11.7
- Install Poetry
asdf add plugin poetry asdf install poetry 1.7.1
- Add Poetry path to your shell config:
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
- Reload shell config:
source ~/.bashrc
- Test installation:
poetry --version
- Set recommended settings:
poetry config virtualenvs.in-project true
- Install Task:
sh -c "$(curl --location https://taskfile.dev/install.sh)" -- -d -b ~/.local/bin
- Test installation:
task --version
git clone
the repository to your workstation.- Run initial setup:
task setup
Dagster, Spark, and Hive have their own respective .env
files. The
contents of these files can be provided upon request. There are also .env.example
files which you can use as reference. Copy the contents of this file into a new file
named .env
in the same directory, then supply your own values.
Ensure that the Pre-requisites have already been set up and all the necessary
command-line executables are in your PATH
.
# spin up Docker containers
task
# Follow Docker logs
task logs
# List all tasks (inspect Taskfile.yml to see the actual commands being run)
task -l
At the end of your development tasks, stop the containers to free resources:
task stop
Example: Adding dagster-azure
# cd to relevant folder
cd dagster
# Add the dependency using poetry
poetry add dagster-azure
## Re-run task
task