Must be developed on dsmlp-login as it's coupled to that system
start-cluster.sh
and stop-cluster.sh
are meant to run as kubernetes lifecycle hooks to start and stop the cluster as the pod starts
The cluster and master/worker nodes may be configured via the following environment variables.
SPARK_CHART_NAME : Helm chart used to instantiate Spark cluster : Default =
SPARK_CLUSTER_IMAGE_REGISTRY
: Default = ghcr.io
SPARK_CLUSTER_IMAGE_REPO ucsd-ets/spark-node
: Default = ucsd-ets/spark-node
SPARK_CLUSTER_IMAGE_TAG
: Default = fa22-3
SPARK_CLUSTER_MASTER_CPU
: Number of CPU cores assigned to Master node (sets kubernetes request&limit)
: Default = 2
SPARK_CLUSTER_MASTER_MEM
: Memory assigned to Master node (sets kubernetes request&limit)
: Default = 8G
SPARK_CLUSTER_WORKER_CPU
: Number of CPU cores assigned to Worker nodes (sets kubernetes request&limit)
: Default = 2
SPARK_CLUSTER_WORKER_MEM
: Memory assigned to Worker node (sets kubernetes request&limit)
: Default = 20G
SPARK_CLUSTER_WORKER_APP_MEM
: Spark application memory limit (should be ~2GB less than WORKER_MEM)
: Default = 18G
SPARK_CLUSTER_REPLICAS
: Number of worker nodes to start up
: Default = 3
SPARK_CLUSTER_RUNASGROUP
: Primary Unix group ID assigned to cluster nodes
: Default value: 0
SPARK_CLUSTER_FSGROUP
: Supplemental Unix group ID assigned to cluster nodes
: Default value: 0