Skip to content

Commit

Permalink
address pr feedback
Browse files Browse the repository at this point in the history
Signed-off-by: Jesse Nelson <[email protected]>
  • Loading branch information
jnels124 committed Oct 29, 2024
1 parent 24670dc commit 444af78
Show file tree
Hide file tree
Showing 10 changed files with 158 additions and 87 deletions.
34 changes: 3 additions & 31 deletions docs/runbook/change-citus-node-pool-machine-type.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,42 +7,14 @@ Need to Change Machine Type for Citus Node Pool(s)
## Prerequisites

- Have `jq` installed
- kubectl is pointing to the cluster you want to create snapshots from
- kubectl is pointing to the cluster you want to change the machine type for
- All bash commands assume your working directory is `docs/runbook/scripts`

## Solution

1. Follow the steps to [create a disk snapshot for Citus cluster](./create-disk-snapshot-for-citus-cluster.md)
to backup the current cluster data
2. Configure and export env vars
2. Run
```bash
export GCP_PROJECT="my-gcp-project"
export GCP_K8S_CLUSTER_NAME="my-cluster-name"
export GCP_K8S_CLUSTER_REGION="my-cluster-region"
export GCP_WORKER_POOL_NAME="citus-worker"
export GCP_COORDINATOR_POOL_NAME="citus-coordinator"
export MACHINE_TYPE="new-machine-type"
export AUTO_UNROUTE="true" # Automatically suspend/resume helm release and scale monitor
export POOLS_TO_UPDATE=("${GCP_WORKER_POOL_NAME}" "${GCP_COORDINATOR_POOL_NAME}")
```
3. Run
```bash
source ./utils.sh
NAMESPACES=($(kubectl get sgshardedclusters.stackgres.io -A -o jsonpath='{.items[*].metadata.namespace}'))
for namespace in "${NAMESPACES[@]}"
do
unrouteTraffic "${namespace}"
pauseCitus "${namespace}"
done
resizeCitusNodePools 0
for pool in "${POOLS_TO_UPDATE[@]}"
do
gcloud container node-pools update ${pool} --project=${GCP_PROJECT} --cluster=${GCP_K8S_CLUSTER_NAME} --location=${GCP_K8S_CLUSTER_REGION} --machine-type=${MACHINE_TYPE}
done
resizeCitusNodePools 1
for namespace in "${NAMESPACES[@]}"
do
unpauseCitus "${namespace}"
routeTraffic "${namespace}"
done
./change-machine-type.sh
```
12 changes: 6 additions & 6 deletions docs/runbook/create-disk-snapshot-for-citus-cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,14 @@ Need to create disk snapshots for Citus cluster(s)

## Prerequisites

- Have access to a running Citus cluster deployed by the `hedera-mirror` chart
- Have access to a running Citus cluster deployed by the `hedera-mirror` chart
- Have `jq` installed
- All bash commands assume your working directory is `docs/runbook/scripts`
- kubectl is pointing to the cluster you want to create snapshots from
- The kubectl context is set to the cluster you want to create snapshots from

## Solution

1. Run script and follow along with all prompts
```bash
./volume-snapshot.sh
```
Run script and follow along with all prompts
```bash
./volume-snapshot.sh
```
14 changes: 7 additions & 7 deletions docs/runbook/increase-zfs-disksize.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@ The pvc for a shard is running out of space and needs to be increased beyond cur
## Prerequisites

- Have `jq` installed
- The kubectl context is set to the cluster containing the disks you want to resize

## Solution

1. Configure kubectl to point to the cluster
2. Identify the worker (and/or coordinator) pvc(s) that needs to be resized
1. Identify the worker (and/or coordinator) pvc(s) that needs to be resized
```bash
kubectl get pv -o \
custom-columns='PVC_NAME:.spec.claimRef.name,PV_NAME:.metadata.name,CAPACITY:..spec.capacity.storage,NODE_ID:.spec.nodeAffinity.required.nodeSelectorTerms[0].matchExpressions[0].values[0]' \
Expand All @@ -32,7 +32,7 @@ The pvc for a shard is running out of space and needs to be increased beyond cur
mirror-citus-shard0-data-mirror-citus-shard0-0 pvc-5dd58b07-db59-4c3a-882f-dcd7467dfd49 10000Gi worker-us-central1-c-0
mirror-citus-shard1-data-mirror-citus-shard1-0 pvc-f9b980a9-0771-4222-9034-bd44279ddde8 12000Gi worker-us-central1-f-0
```
3. Using the `nodeId` from the previous step, increase the disk size for all disks needed
2. Using the `nodeId` from the previous step, increase the disk size for all disks needed
```text
diskPrefix - value of zfs.init.diskPrefix in values.yaml
diskName - {diskPrefix}-{nodeId}-zfs
Expand All @@ -42,17 +42,17 @@ The pvc for a shard is running out of space and needs to be increased beyond cur
```bash
gcloud compute disks resize "{diskName}" --size="{diskSize}" --zone="{zone}"
```
4. Restart the zfs init pods
3. Restart the zfs init pods
```bash
kubectl rollout restart daemonset -n common mirror-zfs-init
```
5. Verify the pool size has been increased
4. Verify the pool size has been increased
```bash
kubectl get pods -n common -l component=openebs-zfs-node -o json |
jq -r '.items[].metadata.name' |
xargs -I % kubectl exec -c openebs-zfs-plugin -n common % -- zfs list
```
6. Update the `hedera-mirror` chart's `values.yaml` to reflect the new disk size
5. Update the `hedera-mirror` chart's `values.yaml` to reflect the new disk size
```yaml
stackgres:
coordinator:
Expand All @@ -73,5 +73,5 @@ The pvc for a shard is running out of space and needs to be increased beyond cur
persistentVolume:
size: 3200Gi
```
7. Deploy the changes. Be sure to leave wiggle room for zfs rounding
6. Deploy the changes. Be sure to leave wiggle room for zfs rounding
see [here](https://github.com/openebs/zfs-localpv/blob/develop/docs/faq.md#7-why-the-zfs-volume-size-is-different-than-the-reqeusted-size-in-pvc)
4 changes: 2 additions & 2 deletions docs/runbook/restore-citus-from-disk-snapshot.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,14 @@ Need to restore Citus cluster from disk snapshots
- Snapshots of disks were created by following the [create snapshot](create-disk-snapshot-for-citus-cluster.md) runbook
- Have `jq` and `ksd`(kubernetes secret decrypter) installed
- The snapshots are from a compatible version of `postgres`
- The `target cluster` has a running `hedera-mirror` chart with Stackgres enabled
- The `target cluster` has a running Citus cluster deployed with `hedera-mirror` chart
- The `target cluster` you are restoring to doesn't have any pvcs with a size larger than the size of the pvc in the
snapshot. You can't decrease the size of a pvc. If needed, you can delete the existing cluster in the `target cluster`
and redeploy the `hedera-mirror` chart with the default disk sizes.
- If you have multiple Citus clusters in the `target cluster`, you will need to restore all of them
- All bash commands assume your working directory is `docs/runbook/scripts`
- Only a single citus cluster is installed per namespace
- kubectl is pointing to the cluster you want to restore snapshots to
- The kubectl context is set to the cluster you want to restore snapshots to

## Steps

Expand Down
69 changes: 69 additions & 0 deletions docs/runbook/scripts/change-machine-type.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
#!/usr/bin/env bash

set -euo pipefail

source ./utils.sh

GCP_PROJECT="$(readUserInput "Enter GCP Project for target: ")"
if [[ -z "${GCP_PROJECT}" ]]; then
log "GCP_PROJECT is not set and is required. Exiting"
exit 1
else
gcloud projects describe "${GCP_PROJECT}" > /dev/null
fi

GCP_K8S_CLUSTER_REGION="$(readUserInput "Enter target cluster region: ")"
if [[ -z "${GCP_K8S_CLUSTER_REGION}" ]]; then
log "GCP_K8S_CLUSTER_REGION is not set and is required. Exiting"
exit 1
else
gcloud compute regions describe "${GCP_K8S_CLUSTER_REGION}" --project "${GCP_PROJECT}" > /dev/null
fi

GCP_K8S_CLUSTER_NAME="$(readUserInput "Enter target cluster name: ")"
if [[ -z "${GCP_K8S_CLUSTER_NAME}" ]]; then
log "GCP_K8S_CLUSTER_NAME is not set and is required. Exiting"
exit 1
else
gcloud container clusters describe --project "${GCP_PROJECT}" \
--region="${GCP_K8S_CLUSTER_REGION}" \
"${GCP_K8S_CLUSTER_NAME}" > /dev/null
fi

MACHINE_TYPE="$(readUserInput "Enter new machine type: ")"
if [[ -z "${MACHINE_TYPE}" ]]; then
log "MACHINE_TYPE is not set and is required. Exiting"
exit 1
fi

POOLS_TO_UPDATE_INPUT="$(readUserInput "Enter the node pools to update (space-separated): ")"
if [[ -z "${POOLS_TO_UPDATE_INPUT}" ]]; then
log "POOLS_TO_UPDATE_INPUT is not set and is required. Exiting"
exit 1
else
IFS=', ' read -r -a POOLS_TO_UPDATE <<< "${POOLS_TO_UPDATE_INPUT}"
for pool in "${POOLS_TO_UPDATE[@]}"; do
POOL_LOCATIONS=($(gcloud container node-pools describe "${pool}" --project="${GCP_PROJECT}" --cluster="${GCP_K8S_CLUSTER_NAME}" --region="${GCP_K8S_CLUSTER_REGION}" --format="json" | jq -r '.locations[]'))
for location in "${POOL_LOCATIONS[@]}"; do
gcloud compute machine-types describe "${MACHINE_TYPE}" --project="${GCP_PROJECT}" --zone="${location}" > /dev/null
done
done
fi

NAMESPACES=($(kubectl get sgshardedclusters.stackgres.io -A -o jsonpath='{.items[*].metadata.namespace}'))
for namespace in "${NAMESPACES[@]}"
do
unrouteTraffic "${namespace}"
pauseCitus "${namespace}"
done
resizeCitusNodePools 0
for pool in "${POOLS_TO_UPDATE[@]}"
do
gcloud container node-pools update "${pool}" --project="${GCP_PROJECT}" --cluster="${GCP_K8S_CLUSTER_NAME}" --location="${GCP_K8S_CLUSTER_REGION}" --machine-type="${MACHINE_TYPE}"
done
resizeCitusNodePools 1
for namespace in "${NAMESPACES[@]}"
do
unpauseCitus "${namespace}"
routeTraffic "${namespace}"
done
5 changes: 0 additions & 5 deletions docs/runbook/scripts/restore-volume-snapshot.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,7 @@ set -euo pipefail
source ./utils.sh

REPLACE_DISKS="${REPLACE_DISKS:-true}"
COMMON_NAMESPACE="${COMMON_NAMESPACE:-common}"
ZFS_POOL_NAME="${ZFS_POOL_NAME:-zfspv-pool}"
GCP_COORDINATOR_POOL_NAME="${GCP_COORDINATOR_POOL_NAME:-citus-coordinator}"
GCP_WORKER_POOL_NAME="${GCP_WORKER_POOL_NAME:-citus-worker}"
AUTO_UNROUTE="${AUTO_UNROUTE:-true}"


function configureAndValidate() {
CURRENT_CONTEXT=$(kubectl config current-context)
Expand Down
61 changes: 61 additions & 0 deletions docs/runbook/scripts/upgrade-k8s-version-citus.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
#!/usr/bin/env bash

set -euo pipefail

source ./utils.sh

NAMESPACES=($(kubectl get sgshardedclusters.stackgres.io -A -o jsonpath='{.items[*].metadata.namespace}'))
POOLS_TO_UPDATE=("${GCP_WORKER_POOL_NAME}" "${GCP_COORDINATOR_POOL_NAME}" "default-pool")

GCP_PROJECT="$(readUserInput "Enter GCP Project for target: ")"
if [[ -z "${GCP_PROJECT}" ]]; then
log "GCP_PROJECT is not set and is required. Exiting"
exit 1
else
gcloud projects describe "${GCP_PROJECT}" > /dev/null
fi

GCP_K8S_CLUSTER_REGION="$(readUserInput "Enter target cluster region: ")"
if [[ -z "${GCP_K8S_CLUSTER_REGION}" ]]; then
log "GCP_K8S_CLUSTER_REGION is not set and is required. Exiting"
exit 1
else
gcloud compute regions describe "${GCP_K8S_CLUSTER_REGION}" --project "${GCP_PROJECT}" > /dev/null
fi

GCP_K8S_CLUSTER_NAME="$(readUserInput "Enter target cluster name: ")"
if [[ -z "${GCP_K8S_CLUSTER_NAME}" ]]; then
log "GCP_K8S_CLUSTER_NAME is not set and is required. Exiting"
exit 1
else
gcloud container clusters describe --project "${GCP_PROJECT}" \
--region="${GCP_K8S_CLUSTER_REGION}" \
"${GCP_K8S_CLUSTER_NAME}" > /dev/null
fi

VERSION="$(readUserInput "Enter the new Kubernetes version: ")"
if [[ -z "${VERSION}" ]]; then
log "VERSION is not set and is required. Exiting"
exit 1
else
HAS_VERSION="$(gcloud container get-server-config --location="${GCP_K8S_CLUSTER_REGION}" --project="${GCP_PROJECT}" --format="json(validNodeVersions)" | jq -r --arg VERSION "${VERSION}" 'any(.validNodeVersions[]; . == $VERSION)')"
if [[ "${HAS_VERSION}" != "true" ]]; then
log "Version ${VERSION} is not valid. Exiting"
exit 1
fi
fi

for namespace in "${NAMESPACES[@]}"
do
unrouteTraffic "${namespace}"
pauseCitus "${namespace}"
done
for pool in "${POOLS_TO_UPDATE[@]}"
do
gcloud container clusters upgrade "${GCP_K8S_CLUSTER_NAME}" --node-pool="${pool}" --cluster-version="${VERSION}" --location="${GCP_K8S_CLUSTER_REGION}" --project="${GCP_PROJECT}"
done
for namespace in "${NAMESPACES[@]}"
do
unpauseCitus "${namespace}"
routeTraffic "${namespace}"
done
11 changes: 7 additions & 4 deletions docs/runbook/scripts/utils.sh
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -59,13 +59,13 @@ function routeTraffic() {
local namespace="${1}"

log "Running test queries"
kubectl exec -it -n "${namespace}" "${HELM_RELEASE_NAME}-citus-coord-0" -- psql -U mirror_rest -d mirror_node -c "select * from transaction limit 10"
kubectl exec -it -n "${namespace}" "${HELM_RELEASE_NAME}-citus-coord-0" -- psql -U mirror_node -d mirror_node -c "select * from transaction limit 10"
kubectl exec -it -n "${namespace}" "${HELM_RELEASE_NAME}-citus-coord-0" -c postgres-util -- psql -U mirror_rest -d mirror_node -c "select * from transaction limit 10"
kubectl exec -it -n "${namespace}" "${HELM_RELEASE_NAME}-citus-coord-0" -c postgres-util -- psql -U mirror_node -d mirror_node -c "select * from transaction limit 10"
doContinue
scaleDeployment "${namespace}" 1 "app.kubernetes.io/component=importer"
while true; do
local statusQuery="select $(date +%s) - (max(consensus_end) / 1000000000) from record_file"
local status=$(kubectl exec -n "${namespace}" "${HELM_RELEASE_NAME}-citus-coord-0" -- psql -q --csv -t -U mirror_rest -d mirror_node -c "select $(date +%s) - (max(consensus_end) / 1000000000) from record_file" | tail -n 1)
local status=$(kubectl exec -n "${namespace}" "${HELM_RELEASE_NAME}-citus-coord-0" -c postgres-util -- psql -q --csv -t -U mirror_rest -d mirror_node -c "select $(date +%s) - (max(consensus_end) / 1000000000) from record_file" | tail -n 1)
if [[ "${status}" -lt 10 ]]; then
log "Importer is caught up with the source"
break
Expand Down Expand Up @@ -207,4 +207,7 @@ function resizeCitusNodePools() {
COMMON_NAMESPACE="${COMMON_NAMESPACE:-common}"
HELM_RELEASE_NAME="${HELM_RELEASE_NAME:-mirror}"
CURRENT_CONTEXT="$(kubectl config current-context)"
CITUS_CLUSTERS="$(getCitusClusters)"
CITUS_CLUSTERS="$(getCitusClusters)"
AUTO_UNROUTE="${AUTO_UNROUTE:-true}"
GCP_COORDINATOR_POOL_NAME="${GCP_COORDINATOR_POOL_NAME:-citus-coordinator}"
GCP_WORKER_POOL_NAME="${GCP_WORKER_POOL_NAME:-citus-worker}"
2 changes: 0 additions & 2 deletions docs/runbook/scripts/volume-snapshot.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@ set -euo pipefail

source ./utils.sh

AUTO_UNROUTE="${AUTO_UNROUTE:-true}"

GCP_PROJECT="$(readUserInput "Enter GCP Project for target: ")"
if [[ -z "${GCP_PROJECT}" ]]; then
log "GCP_PROJECT is not set and is required. Exiting"
Expand Down
33 changes: 3 additions & 30 deletions docs/runbook/upgrade-k8s-version-citus-nodepool.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,41 +7,14 @@ Need to update k8s version for Citus Node Pool(s)
## Prerequisites

- Have `jq` installed
- kubectl is pointing to the cluster you want to upgrade
- The kubectl context is set to the cluster you want to upgrade
- All bash commands assume your working directory is `docs/runbook/scripts`

## Solution

1. Follow the steps to [create a disk snapshot for Citus cluster](./create-disk-snapshot-for-citus-cluster.md)
to backup the current cluster data
2. Configure and export env vars
2. Run
```bash
export GCP_PROJECT="my-gcp-project"
export GCP_K8S_CLUSTER_NAME="my-cluster-name"
export GCP_K8S_CLUSTER_REGION="my-cluster-region"
export GCP_WORKER_POOL_NAME="citus-worker"
export GCP_COORDINATOR_POOL_NAME="citus-coordinator"
export MACHINE_TYPE="new-machine-type"
export AUTO_UNROUTE="true" # Automatically suspend/resume helm release and scale monitor
export VERSION="new-k8s-version" # Specify the new k8s version
export POOLS_TO_UPDATE=("${GCP_WORKER_POOL_NAME}" "${GCP_COORDINATOR_POOL_NAME}" "default-pool")
```
3. Run
```bash
source ./utils.sh
NAMESPACES=($(kubectl get sgshardedclusters.stackgres.io -A -o jsonpath='{.items[*].metadata.namespace}'))
for namespace in "${NAMESPACES[@]}"
do
unrouteTraffic "${namespace}"
pauseCitus "${namespace}"
done
for pool in "${POOLS_TO_UPDATE[@]}"
do
gcloud container clusters upgrade ${GCP_K8S_CLUSTER_NAME} --node-pool=${pool} --cluster-version=${VERSION} --location=${GCP_K8S_CLUSTER_REGION} --project=${GCP_PROJECT}
done
for namespace in "${NAMESPACES[@]}"
do
unpauseCitus "${namespace}"
routeTraffic "${namespace}"
done
./upgrade-k8s-version-citus.sh
```

0 comments on commit 444af78

Please sign in to comment.