add advanced configuration

atlassian · Mar 12, 2018 · 68cf7e9 · 68cf7e9
1 parent 1c46b2c
commit 68cf7e9
Show file tree

Hide file tree

Showing 3 changed files with 53 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -8,11 +8,11 @@ Kubernetes is a container orchestration framework that schedules Docker containe
 
 **The key preliminary features are:**
 
-- Cluster-level utilisation node scaling.
+- Utilisation based node scaling
 - Calculate requests and capacity to determine whether to scale up, down or to stay at the current scale
 - Wait until non-daemonset pods on nodes have completed before terminating the node
 - Designed to work on selected auto-scaling groups to allow the default
-  [Kubernetes Autoscaler](https://github.com/kubernetes/autoscaler) to continue to scale our service based workloads
+  [Kubernetes Autoscaler](https://github.com/kubernetes/autoscaler) to continue to scale service based workloads
 - Automatically terminate oldest nodes first
 - Support for different cloud providers - only AWS at the moment
 
@@ -36,7 +36,7 @@ make build
 ### Locally (out of cluster)
 
 ```bash
-go run cmd/main.go --kubeconfig=~/.kube/config --nodegroups=nodegroups.yaml
+go run cmd/main.go --kubeconfig=~/.kube/config --nodegroups=nodegroups_config.yaml
 ```
 
 ### Inside cluster
@@ -92,7 +92,8 @@ make test
 
 #### Test a specific package
 
-To test the controller package:
+For example, to test the controller package:
+
 ```bash
 go test ./pkg/controller
 ```
diff --git a/docs/configuration/README.md b/docs/configuration/README.md
@@ -2,3 +2,4 @@
 
 - [Command line configuration](./command-line.md) - configuring command line flags/args
 - [Node group configuration](./nodegroup.md) - configuring nodegroup.yml file
+- [Advanced Configuration](./advanced-configuration.md) - Configuring thresholds and slack capacity
diff --git a/docs/configuration/advanced-configuration.md b/docs/configuration/advanced-configuration.md
@@ -0,0 +1,47 @@
+# Advanced Configuration
+
+## Threshold Configuration
+
+Scaling thresholds are configured using the following main nodegroup configuration options:
+
+```yaml
+taint_upper_capacity_threshhold_percent: 40
+taint_lower_capacity_threshhold_percent: 10
+scale_up_threshhold_percent: 70
+```
+Full configuration options can be found in [Node Group Configuration](./nodegroup.md).
+
+Through the configuration of these three options we can tell Escalator when we would like it to scale up or down the
+cluster or when we want it to do nothing.
+
+- `scale_up_threshhold_percent` defines the threshold for scaling up.
+- `taint_upper_capacity_threshhold_percent` and `taint_lower_capacity_threshhold_percent` defines the thresholds for
+tainting which will lead to scaling down.
+- If the cluster utilisation falls between `taint_upper_capacity_threshhold_percent` and `scale_up_threshhold_percent`,
+Escalator will do nothing.
+
+Given the above threshold values, Escalator will do the following:
+
+- Utilisation: 110% = **scale up** (exceeded `scale_up_threshhold_percent`)
+- Utilisation: 75% = **scale up** (exceeded `scale_up_threshhold_percent`)
+- Utilisation: 70% = **scale up** (exceeded `scale_up_threshhold_percent`)
+- Utilisation: 50% = **do nothing**
+- Utilisation: 40% = **do nothing** (utilisation has to be lower than the threshold for it to trigger)
+- Utilisation: 38% = **scale down slowly** (below `taint_upper_capacity_threshhold_percent` 
+but above `taint_lower_capacity_threshhold_percent` so only scale down slowly)
+- Utilisation: 10% = **scale down slowly** (below `taint_upper_capacity_threshhold_percent` 
+but above `taint_lower_capacity_threshhold_percent` so only scale down slowly)
+- Utilisation: 9% = **scale down quickly** (below `taint_lower_capacity_threshhold_percent`)
+- Utilisation: 0% = **scale down quickly** (below `taint_lower_capacity_threshhold_percent`)
+
+## Slack Capacity Configuration
+
+Slack capacity is configured through the `scale_up_threshhold_percent` option. If this option is set to **70** for 
+example, whenever the node group utilisation reaches or exceeds **70%**, a scale up will occur. Escalator will try and
+keep the node group utilisation below **70%** and thus there will be a "slack capacity" of **30%**.
+
+To completely eliminate slack capacity, `scale_up_threshhold_percent` can be set to **100**, which will mean that
+Escalator will only scale up the node group when the utilisation reaches or exceeds **100%**.
+
+It is recommended to have some slack capacity in the event that there is a sudden spike of new pods to allow for
+Escalator time to increase the node group size before pods cannot be scheduled.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -2,3 +2,4 @@

		- [Command line configuration](./command-line.md) - configuring command line flags/args
		- [Node group configuration](./nodegroup.md) - configuring nodegroup.yml file
		- [Advanced Configuration](./advanced-configuration.md) - Configuring thresholds and slack capacity