diff --git a/docs/runbook/change-citus-node-pool-machine-type.md b/docs/runbook/change-citus-node-pool-machine-type.md index dc5c0d7774..8e3cb62e63 100644 --- a/docs/runbook/change-citus-node-pool-machine-type.md +++ b/docs/runbook/change-citus-node-pool-machine-type.md @@ -7,42 +7,14 @@ Need to Change Machine Type for Citus Node Pool(s) ## Prerequisites - Have `jq` installed -- kubectl is pointing to the cluster you want to create snapshots from +- kubectl is pointing to the cluster you want to change the machine type for - All bash commands assume your working directory is `docs/runbook/scripts` ## Solution 1. Follow the steps to [create a disk snapshot for Citus cluster](./create-disk-snapshot-for-citus-cluster.md) to backup the current cluster data -2. Configure and export env vars +2. Run ```bash - export GCP_PROJECT="my-gcp-project" - export GCP_K8S_CLUSTER_NAME="my-cluster-name" - export GCP_K8S_CLUSTER_REGION="my-cluster-region" - export GCP_WORKER_POOL_NAME="citus-worker" - export GCP_COORDINATOR_POOL_NAME="citus-coordinator" - export MACHINE_TYPE="new-machine-type" - export AUTO_UNROUTE="true" # Automatically suspend/resume helm release and scale monitor - export POOLS_TO_UPDATE=("${GCP_WORKER_POOL_NAME}" "${GCP_COORDINATOR_POOL_NAME}") - ``` -3. Run - ```bash - source ./utils.sh - NAMESPACES=($(kubectl get sgshardedclusters.stackgres.io -A -o jsonpath='{.items[*].metadata.namespace}')) - for namespace in "${NAMESPACES[@]}" - do - unrouteTraffic "${namespace}" - pauseCitus "${namespace}" - done - resizeCitusNodePools 0 - for pool in "${POOLS_TO_UPDATE[@]}" - do - gcloud container node-pools update ${pool} --project=${GCP_PROJECT} --cluster=${GCP_K8S_CLUSTER_NAME} --location=${GCP_K8S_CLUSTER_REGION} --machine-type=${MACHINE_TYPE} - done - resizeCitusNodePools 1 - for namespace in "${NAMESPACES[@]}" - do - unpauseCitus "${namespace}" - routeTraffic "${namespace}" - done + ./change-machine-type.sh ``` diff --git a/docs/runbook/create-disk-snapshot-for-citus-cluster.md b/docs/runbook/create-disk-snapshot-for-citus-cluster.md index 6f38cede10..4c759413ee 100644 --- a/docs/runbook/create-disk-snapshot-for-citus-cluster.md +++ b/docs/runbook/create-disk-snapshot-for-citus-cluster.md @@ -6,14 +6,14 @@ Need to create disk snapshots for Citus cluster(s) ## Prerequisites -- Have access to a running Citus cluster deployed by the `hedera-mirror` chart +- Have access to a running Citus cluster deployed by the `hedera-mirror` chart - Have `jq` installed - All bash commands assume your working directory is `docs/runbook/scripts` -- kubectl is pointing to the cluster you want to create snapshots from +- The kubectl context is set to the cluster you want to create snapshots from ## Solution -1. Run script and follow along with all prompts - ```bash - ./volume-snapshot.sh - ``` +Run script and follow along with all prompts +```bash +./volume-snapshot.sh +``` diff --git a/docs/runbook/increase-zfs-disksize.md b/docs/runbook/increase-zfs-disksize.md index bc15533778..3aa65d5463 100644 --- a/docs/runbook/increase-zfs-disksize.md +++ b/docs/runbook/increase-zfs-disksize.md @@ -5,11 +5,11 @@ The pvc for a shard is running out of space and needs to be increased beyond cur ## Prerequisites - Have `jq` installed +- The kubectl context is set to the cluster containing the disks you want to resize ## Solution -1. Configure kubectl to point to the cluster -2. Identify the worker (and/or coordinator) pvc(s) that needs to be resized +1. Identify the worker (and/or coordinator) pvc(s) that needs to be resized ```bash kubectl get pv -o \ custom-columns='PVC_NAME:.spec.claimRef.name,PV_NAME:.metadata.name,CAPACITY:..spec.capacity.storage,NODE_ID:.spec.nodeAffinity.required.nodeSelectorTerms[0].matchExpressions[0].values[0]' \ @@ -32,7 +32,7 @@ The pvc for a shard is running out of space and needs to be increased beyond cur mirror-citus-shard0-data-mirror-citus-shard0-0 pvc-5dd58b07-db59-4c3a-882f-dcd7467dfd49 10000Gi worker-us-central1-c-0 mirror-citus-shard1-data-mirror-citus-shard1-0 pvc-f9b980a9-0771-4222-9034-bd44279ddde8 12000Gi worker-us-central1-f-0 ``` -3. Using the `nodeId` from the previous step, increase the disk size for all disks needed +2. Using the `nodeId` from the previous step, increase the disk size for all disks needed ```text diskPrefix - value of zfs.init.diskPrefix in values.yaml diskName - {diskPrefix}-{nodeId}-zfs @@ -42,17 +42,17 @@ The pvc for a shard is running out of space and needs to be increased beyond cur ```bash gcloud compute disks resize "{diskName}" --size="{diskSize}" --zone="{zone}" ``` -4. Restart the zfs init pods +3. Restart the zfs init pods ```bash kubectl rollout restart daemonset -n common mirror-zfs-init ``` -5. Verify the pool size has been increased +4. Verify the pool size has been increased ```bash kubectl get pods -n common -l component=openebs-zfs-node -o json | jq -r '.items[].metadata.name' | xargs -I % kubectl exec -c openebs-zfs-plugin -n common % -- zfs list ``` -6. Update the `hedera-mirror` chart's `values.yaml` to reflect the new disk size +5. Update the `hedera-mirror` chart's `values.yaml` to reflect the new disk size ```yaml stackgres: coordinator: @@ -73,5 +73,5 @@ The pvc for a shard is running out of space and needs to be increased beyond cur persistentVolume: size: 3200Gi ``` -7. Deploy the changes. Be sure to leave wiggle room for zfs rounding +6. Deploy the changes. Be sure to leave wiggle room for zfs rounding see [here](https://github.com/openebs/zfs-localpv/blob/develop/docs/faq.md#7-why-the-zfs-volume-size-is-different-than-the-reqeusted-size-in-pvc) diff --git a/docs/runbook/restore-citus-from-disk-snapshot.md b/docs/runbook/restore-citus-from-disk-snapshot.md index 266c922bba..ec3770a5b9 100644 --- a/docs/runbook/restore-citus-from-disk-snapshot.md +++ b/docs/runbook/restore-citus-from-disk-snapshot.md @@ -9,14 +9,14 @@ Need to restore Citus cluster from disk snapshots - Snapshots of disks were created by following the [create snapshot](create-disk-snapshot-for-citus-cluster.md) runbook - Have `jq` and `ksd`(kubernetes secret decrypter) installed - The snapshots are from a compatible version of `postgres` -- The `target cluster` has a running `hedera-mirror` chart with Stackgres enabled +- The `target cluster` has a running Citus cluster deployed with `hedera-mirror` chart - The `target cluster` you are restoring to doesn't have any pvcs with a size larger than the size of the pvc in the snapshot. You can't decrease the size of a pvc. If needed, you can delete the existing cluster in the `target cluster` and redeploy the `hedera-mirror` chart with the default disk sizes. - If you have multiple Citus clusters in the `target cluster`, you will need to restore all of them - All bash commands assume your working directory is `docs/runbook/scripts` - Only a single citus cluster is installed per namespace -- kubectl is pointing to the cluster you want to restore snapshots to +- The kubectl context is set to the cluster you want to restore snapshots to ## Steps diff --git a/docs/runbook/scripts/change-machine-type.sh b/docs/runbook/scripts/change-machine-type.sh new file mode 100755 index 0000000000..9448ce9ed3 --- /dev/null +++ b/docs/runbook/scripts/change-machine-type.sh @@ -0,0 +1,69 @@ +#!/usr/bin/env bash + +set -euo pipefail + +source ./utils.sh + +GCP_PROJECT="$(readUserInput "Enter GCP Project for target: ")" +if [[ -z "${GCP_PROJECT}" ]]; then + log "GCP_PROJECT is not set and is required. Exiting" + exit 1 +else + gcloud projects describe "${GCP_PROJECT}" > /dev/null +fi + +GCP_K8S_CLUSTER_REGION="$(readUserInput "Enter target cluster region: ")" +if [[ -z "${GCP_K8S_CLUSTER_REGION}" ]]; then + log "GCP_K8S_CLUSTER_REGION is not set and is required. Exiting" + exit 1 +else + gcloud compute regions describe "${GCP_K8S_CLUSTER_REGION}" --project "${GCP_PROJECT}" > /dev/null +fi + +GCP_K8S_CLUSTER_NAME="$(readUserInput "Enter target cluster name: ")" +if [[ -z "${GCP_K8S_CLUSTER_NAME}" ]]; then + log "GCP_K8S_CLUSTER_NAME is not set and is required. Exiting" + exit 1 +else + gcloud container clusters describe --project "${GCP_PROJECT}" \ + --region="${GCP_K8S_CLUSTER_REGION}" \ + "${GCP_K8S_CLUSTER_NAME}" > /dev/null +fi + +MACHINE_TYPE="$(readUserInput "Enter new machine type: ")" +if [[ -z "${MACHINE_TYPE}" ]]; then + log "MACHINE_TYPE is not set and is required. Exiting" + exit 1 +fi + +POOLS_TO_UPDATE_INPUT="$(readUserInput "Enter the node pools to update (space-separated): ")" +if [[ -z "${POOLS_TO_UPDATE_INPUT}" ]]; then + log "POOLS_TO_UPDATE_INPUT is not set and is required. Exiting" + exit 1 +else + IFS=', ' read -r -a POOLS_TO_UPDATE <<< "${POOLS_TO_UPDATE_INPUT}" + for pool in "${POOLS_TO_UPDATE[@]}"; do + POOL_LOCATIONS=($(gcloud container node-pools describe "${pool}" --project="${GCP_PROJECT}" --cluster="${GCP_K8S_CLUSTER_NAME}" --region="${GCP_K8S_CLUSTER_REGION}" --format="json" | jq -r '.locations[]')) + for location in "${POOL_LOCATIONS[@]}"; do + gcloud compute machine-types describe "${MACHINE_TYPE}" --project="${GCP_PROJECT}" --zone="${location}" > /dev/null + done + done +fi + +NAMESPACES=($(kubectl get sgshardedclusters.stackgres.io -A -o jsonpath='{.items[*].metadata.namespace}')) +for namespace in "${NAMESPACES[@]}" +do + unrouteTraffic "${namespace}" + pauseCitus "${namespace}" +done +resizeCitusNodePools 0 +for pool in "${POOLS_TO_UPDATE[@]}" +do +gcloud container node-pools update "${pool}" --project="${GCP_PROJECT}" --cluster="${GCP_K8S_CLUSTER_NAME}" --location="${GCP_K8S_CLUSTER_REGION}" --machine-type="${MACHINE_TYPE}" +done +resizeCitusNodePools 1 +for namespace in "${NAMESPACES[@]}" +do + unpauseCitus "${namespace}" + routeTraffic "${namespace}" +done \ No newline at end of file diff --git a/docs/runbook/scripts/restore-volume-snapshot.sh b/docs/runbook/scripts/restore-volume-snapshot.sh index 40c9719a8b..73b9484c31 100755 --- a/docs/runbook/scripts/restore-volume-snapshot.sh +++ b/docs/runbook/scripts/restore-volume-snapshot.sh @@ -5,12 +5,7 @@ set -euo pipefail source ./utils.sh REPLACE_DISKS="${REPLACE_DISKS:-true}" -COMMON_NAMESPACE="${COMMON_NAMESPACE:-common}" ZFS_POOL_NAME="${ZFS_POOL_NAME:-zfspv-pool}" -GCP_COORDINATOR_POOL_NAME="${GCP_COORDINATOR_POOL_NAME:-citus-coordinator}" -GCP_WORKER_POOL_NAME="${GCP_WORKER_POOL_NAME:-citus-worker}" -AUTO_UNROUTE="${AUTO_UNROUTE:-true}" - function configureAndValidate() { CURRENT_CONTEXT=$(kubectl config current-context) diff --git a/docs/runbook/scripts/upgrade-k8s-version-citus.sh b/docs/runbook/scripts/upgrade-k8s-version-citus.sh new file mode 100755 index 0000000000..36a4c3c115 --- /dev/null +++ b/docs/runbook/scripts/upgrade-k8s-version-citus.sh @@ -0,0 +1,61 @@ +#!/usr/bin/env bash + +set -euo pipefail + +source ./utils.sh + +NAMESPACES=($(kubectl get sgshardedclusters.stackgres.io -A -o jsonpath='{.items[*].metadata.namespace}')) +POOLS_TO_UPDATE=("${GCP_WORKER_POOL_NAME}" "${GCP_COORDINATOR_POOL_NAME}" "default-pool") + +GCP_PROJECT="$(readUserInput "Enter GCP Project for target: ")" +if [[ -z "${GCP_PROJECT}" ]]; then + log "GCP_PROJECT is not set and is required. Exiting" + exit 1 +else + gcloud projects describe "${GCP_PROJECT}" > /dev/null +fi + +GCP_K8S_CLUSTER_REGION="$(readUserInput "Enter target cluster region: ")" +if [[ -z "${GCP_K8S_CLUSTER_REGION}" ]]; then + log "GCP_K8S_CLUSTER_REGION is not set and is required. Exiting" + exit 1 +else + gcloud compute regions describe "${GCP_K8S_CLUSTER_REGION}" --project "${GCP_PROJECT}" > /dev/null +fi + +GCP_K8S_CLUSTER_NAME="$(readUserInput "Enter target cluster name: ")" +if [[ -z "${GCP_K8S_CLUSTER_NAME}" ]]; then + log "GCP_K8S_CLUSTER_NAME is not set and is required. Exiting" + exit 1 +else + gcloud container clusters describe --project "${GCP_PROJECT}" \ + --region="${GCP_K8S_CLUSTER_REGION}" \ + "${GCP_K8S_CLUSTER_NAME}" > /dev/null +fi + +VERSION="$(readUserInput "Enter the new Kubernetes version: ")" +if [[ -z "${VERSION}" ]]; then + log "VERSION is not set and is required. Exiting" + exit 1 +else + HAS_VERSION="$(gcloud container get-server-config --location="${GCP_K8S_CLUSTER_REGION}" --project="${GCP_PROJECT}" --format="json(validNodeVersions)" | jq -r --arg VERSION "${VERSION}" 'any(.validNodeVersions[]; . == $VERSION)')" + if [[ "${HAS_VERSION}" != "true" ]]; then + log "Version ${VERSION} is not valid. Exiting" + exit 1 + fi +fi + +for namespace in "${NAMESPACES[@]}" +do + unrouteTraffic "${namespace}" + pauseCitus "${namespace}" +done +for pool in "${POOLS_TO_UPDATE[@]}" +do +gcloud container clusters upgrade "${GCP_K8S_CLUSTER_NAME}" --node-pool="${pool}" --cluster-version="${VERSION}" --location="${GCP_K8S_CLUSTER_REGION}" --project="${GCP_PROJECT}" +done +for namespace in "${NAMESPACES[@]}" +do + unpauseCitus "${namespace}" + routeTraffic "${namespace}" +done \ No newline at end of file diff --git a/docs/runbook/scripts/utils.sh b/docs/runbook/scripts/utils.sh old mode 100644 new mode 100755 index 405ad6c12b..fa542894c7 --- a/docs/runbook/scripts/utils.sh +++ b/docs/runbook/scripts/utils.sh @@ -59,13 +59,13 @@ function routeTraffic() { local namespace="${1}" log "Running test queries" - kubectl exec -it -n "${namespace}" "${HELM_RELEASE_NAME}-citus-coord-0" -- psql -U mirror_rest -d mirror_node -c "select * from transaction limit 10" - kubectl exec -it -n "${namespace}" "${HELM_RELEASE_NAME}-citus-coord-0" -- psql -U mirror_node -d mirror_node -c "select * from transaction limit 10" + kubectl exec -it -n "${namespace}" "${HELM_RELEASE_NAME}-citus-coord-0" -c postgres-util -- psql -U mirror_rest -d mirror_node -c "select * from transaction limit 10" + kubectl exec -it -n "${namespace}" "${HELM_RELEASE_NAME}-citus-coord-0" -c postgres-util -- psql -U mirror_node -d mirror_node -c "select * from transaction limit 10" doContinue scaleDeployment "${namespace}" 1 "app.kubernetes.io/component=importer" while true; do local statusQuery="select $(date +%s) - (max(consensus_end) / 1000000000) from record_file" - local status=$(kubectl exec -n "${namespace}" "${HELM_RELEASE_NAME}-citus-coord-0" -- psql -q --csv -t -U mirror_rest -d mirror_node -c "select $(date +%s) - (max(consensus_end) / 1000000000) from record_file" | tail -n 1) + local status=$(kubectl exec -n "${namespace}" "${HELM_RELEASE_NAME}-citus-coord-0" -c postgres-util -- psql -q --csv -t -U mirror_rest -d mirror_node -c "select $(date +%s) - (max(consensus_end) / 1000000000) from record_file" | tail -n 1) if [[ "${status}" -lt 10 ]]; then log "Importer is caught up with the source" break @@ -207,4 +207,7 @@ function resizeCitusNodePools() { COMMON_NAMESPACE="${COMMON_NAMESPACE:-common}" HELM_RELEASE_NAME="${HELM_RELEASE_NAME:-mirror}" CURRENT_CONTEXT="$(kubectl config current-context)" -CITUS_CLUSTERS="$(getCitusClusters)" \ No newline at end of file +CITUS_CLUSTERS="$(getCitusClusters)" +AUTO_UNROUTE="${AUTO_UNROUTE:-true}" +GCP_COORDINATOR_POOL_NAME="${GCP_COORDINATOR_POOL_NAME:-citus-coordinator}" +GCP_WORKER_POOL_NAME="${GCP_WORKER_POOL_NAME:-citus-worker}" \ No newline at end of file diff --git a/docs/runbook/scripts/volume-snapshot.sh b/docs/runbook/scripts/volume-snapshot.sh index fe00508b72..f2e69cce31 100755 --- a/docs/runbook/scripts/volume-snapshot.sh +++ b/docs/runbook/scripts/volume-snapshot.sh @@ -4,8 +4,6 @@ set -euo pipefail source ./utils.sh -AUTO_UNROUTE="${AUTO_UNROUTE:-true}" - GCP_PROJECT="$(readUserInput "Enter GCP Project for target: ")" if [[ -z "${GCP_PROJECT}" ]]; then log "GCP_PROJECT is not set and is required. Exiting" diff --git a/docs/runbook/upgrade-k8s-version-citus-nodepool.md b/docs/runbook/upgrade-k8s-version-citus-nodepool.md index 3418486f99..897c939524 100644 --- a/docs/runbook/upgrade-k8s-version-citus-nodepool.md +++ b/docs/runbook/upgrade-k8s-version-citus-nodepool.md @@ -7,41 +7,14 @@ Need to update k8s version for Citus Node Pool(s) ## Prerequisites - Have `jq` installed -- kubectl is pointing to the cluster you want to upgrade +- The kubectl context is set to the cluster you want to upgrade - All bash commands assume your working directory is `docs/runbook/scripts` ## Solution 1. Follow the steps to [create a disk snapshot for Citus cluster](./create-disk-snapshot-for-citus-cluster.md) to backup the current cluster data -2. Configure and export env vars +2. Run ```bash - export GCP_PROJECT="my-gcp-project" - export GCP_K8S_CLUSTER_NAME="my-cluster-name" - export GCP_K8S_CLUSTER_REGION="my-cluster-region" - export GCP_WORKER_POOL_NAME="citus-worker" - export GCP_COORDINATOR_POOL_NAME="citus-coordinator" - export MACHINE_TYPE="new-machine-type" - export AUTO_UNROUTE="true" # Automatically suspend/resume helm release and scale monitor - export VERSION="new-k8s-version" # Specify the new k8s version - export POOLS_TO_UPDATE=("${GCP_WORKER_POOL_NAME}" "${GCP_COORDINATOR_POOL_NAME}" "default-pool") - ``` -3. Run - ```bash - source ./utils.sh - NAMESPACES=($(kubectl get sgshardedclusters.stackgres.io -A -o jsonpath='{.items[*].metadata.namespace}')) - for namespace in "${NAMESPACES[@]}" - do - unrouteTraffic "${namespace}" - pauseCitus "${namespace}" - done - for pool in "${POOLS_TO_UPDATE[@]}" - do - gcloud container clusters upgrade ${GCP_K8S_CLUSTER_NAME} --node-pool=${pool} --cluster-version=${VERSION} --location=${GCP_K8S_CLUSTER_REGION} --project=${GCP_PROJECT} - done - for namespace in "${NAMESPACES[@]}" - do - unpauseCitus "${namespace}" - routeTraffic "${namespace}" - done + ./upgrade-k8s-version-citus.sh ```