You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/en/administration/going-production.md
+47-1
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ Best practices and recommended settings when going production.
13
13
* The `--writeback` option is strongly advised against, as it can easily cause data loss especially when used inside containers, if not properly managed. See ["Write Cache in Client (Community Edition)"](/docs/community/guide/cache#client-write-cache) and ["Write Cache in Client (Cloud Service)"](/docs/cloud/guide/cache#client-write-cache);
14
14
* When cluster resources are limited, refer to techniques in [Resource Optimization](../guide/resource-optimization.md#mount-pod-resources) for optimization;
15
15
* It's recommended to set non-preempting PriorityClass for Mount Pod, see [documentation](../guide/resource-optimization.md#set-non-preempting-priorityclass-for-mount-pod) for details.
16
-
*It's recommended to set PodDisruptionBudget for Mount Pod, see [documentation](../guide/resource-optimization.md#set-poddisruptionbudget-for-mount-pod) for details.
16
+
*Best practices for reducing node capacity. see [documentation](#scale-down-node)。
17
17
18
18
## Sidecar recommendations {#sidecar}
19
19
@@ -455,3 +455,49 @@ spec:
455
455
fsGroup: 2000
456
456
fsGroupChangePolicy: "OnRootMismatch"
457
457
```
458
+
459
+
## Scale Down {#scale-down-node}
460
+
461
+
The cluster manager may need to drain a node for maintenance or upgrading. It may also be necessary to rely on [Cluster Auto-Scaling Tools](https://kubernetes.io/docs/concepts/cluster-administration/node-autoscaling) for automatic scaling of the cluster.
462
+
463
+
When a node is being drained, Kubernetes will evict all Pods on the node, including Mount Pods. However, if a Mount Pod is evicted prematurely, that will cause error when the remaining application Pods try to access the JuiceFS PV. Moveover, Mount Pod will be re-created by CSI Node, since it's still being referenced by application Pods, leading to a restart loop, while all JuiceFS file system requests ends with an error.
464
+
465
+
To avoid this from happening, read below sections.
466
+
467
+
### Use PodDisruptionBudget {#pdb}
468
+
469
+
Set [PodDisruptionBudget](https://kubernetes.io/docs/tasks/run-application/configure-pdb) for the Mount Pod. PDB will ensure that the Mount Pod is protected when the node is drained, until all application Pods that reference this Mount Pod is evicted, thus application Pods can continue normal access towards the JuiceFS PV during the node drain. As an example:
470
+
471
+
```yaml
472
+
apiVersion: policy/v1
473
+
kind: PodDisruptionBudget
474
+
metadata:
475
+
name: jfs-pdb
476
+
namespace: kube-system # The namespace where JuiceFS CSI is installed
477
+
spec:
478
+
minAvailable: "100%" # Protect Mount Pod during a node drain
479
+
selector:
480
+
matchLabels:
481
+
app.kubernetes.io/name: juicefs-mount
482
+
```
483
+
484
+
:::note Compatibility
485
+
Different service providers make their own modifications on Kubernetes, some of which breaks PDB, if this is the case, refer to the next section to use Validating Webhook to protect Mount Pod.
486
+
:::
487
+
488
+
### Use validating webhook {#validating-webhook}
489
+
490
+
In certain Kubernetes environments, PDB does not work as expected (e.g. [Karpenter](https://github.com/aws/karpenter-provider-aws/issues/7853)), in which if PDB is created, scaling down no longer works properly.
491
+
492
+
To prevent this situation, you can use our Validating Webhook instead. When CSI Driver detects that an evicted Mount Pod is still being used, it will simply reject any eviction. The autoscaling tools will enter a retry loop until the Mount Pod is successfully deleted by CSI Node. To enable this feature, refer to this Helm configuration:
493
+
494
+
:::note
495
+
This feature requires at least JuiceFS CSI Driver v0.27.1.
496
+
:::
497
+
498
+
```yaml
499
+
validatingWebhook:
500
+
enabled: true
501
+
```
502
+
503
+
When using the [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler), if a node cannot be scaled down due to the existence of Mount Pod, it might be because that the Cluster Autoscaler cannot evict [Not Replicated Pods](https://github.com/kubernetes/autoscaler/issues/351), preventing normal scale-down operations. In this case, try the `cluster-autoscaler.kubernetes.io/safe-to-evict: "true"` annotation on the Mount Pods while utilizing the aforementioned webhook to achieve proper node scale-down.
Copy file name to clipboardexpand all lines: docs/en/guide/resource-optimization.md
-19
Original file line number
Diff line number
Diff line change
@@ -243,25 +243,6 @@ However, when the Mount Pod is created, if the node resources are insufficient,
243
243
kubectl -n kube-system set env -c juicefs-plugin statefulset/juicefs-csi-controller JUICEFS_MOUNT_PRIORITY_NAME=juicefs-mount-priority-nonpreempting JUICEFS_MOUNT_PREEMPTION_POLICY=Never
244
244
```
245
245
246
-
## Set PodDisruptionBudget for Mount Pod {#set-poddisruptionbudget-for-mount-pod}
247
-
248
-
The cluster manager may need to drain a node for maintenance or upgrading. When a node is being drained, Kubernetes will evict all Pods on the node, including Mount Pods. However, Mount Pod's eviction will cause that all application Pods can not use JuiceFS PV. In addition, Mount Pod will be re-created when CSI Node detects it is stilled used by application Pod, which will lead to a deleted-recreated loop of Mount Pod.
249
-
250
-
To avoid that situation happening, you can set [PodDisruptionBudget](https://kubernetes.io/docs/tasks/run-application/configure-pdb) for the Mount Pod. The PodDisruptionBudget will ensure that the Mount Pod is not evicted when the node is drained, until the related application Pod is evicted, and then CSI Node will delete it. So that it can ensure application Pod's usage of JuiceFS PV during the drain, avoid the deleted-recreated loop of Mount Pod, and do not affect the drain operation. Here is an example:
251
-
252
-
```yaml
253
-
apiVersion: policy/v1
254
-
kind: PodDisruptionBudget
255
-
metadata:
256
-
name: jfs-pdb
257
-
namespace: kube-system # The namespace where JuiceFS CSI is located
258
-
spec:
259
-
minAvailable: "100%" # avoid all Mount Pods are evicted during node's drain
260
-
selector:
261
-
matchLabels:
262
-
app.kubernetes.io/name: juicefs-mount
263
-
```
264
-
265
246
## Share Mount Pod for the same StorageClass {#share-mount-pod-for-the-same-storageclass}
266
247
267
248
By default, Mount Pod is only shared when multiple application Pods are using a same PV. However, you can take a step further and share Mount Pod (in the same node, of course) for all PVs that are created using the same StorageClass, under this policy, different application Pods will bind the host mount point on different paths, so that one Mount Pod is serving multiple application Pods.
* 考虑为 Mount Pod 设置非抢占式 PriorityClass,避免资源不足时,Mount Pod 将业务容器驱逐。详见[文档](../guide/resource-optimization.md#set-non-preempting-priorityclass-for-mount-pod);
16
-
*考虑为 Mount Pod 设置设置干扰预算 PodDisruptionBudget,避免排空节点时 Mount Pod 被驱逐。详见[文档](../guide/resource-optimization.md#set-poddisruptionbudget-for-mount-pod)。
在排空节点时,Kubernetes 会驱逐节点上所有的 Pod,包括 Mount Pod。如果 Mound Pod 先于应用 Pod 被驱逐,会导致应用 Pod 无法访问 JuiceFS PV,并且 CSI Node 检查到 Mount Pod 意外退出,但却还有应用 Pod 使用时,会再次拉起,这样会导致 Mount Pod 处于删除 - 拉取的循环中,造成节点缩容无法正常进行,同时业务 Pod 访问 JuiceFS PV 报错的异常。
466
+
467
+
为了避免缩容期间的异常,阅读以下小节了解如何处理。
468
+
469
+
### 设置干扰预算(PodDisruptionBudget){#pdb}
470
+
471
+
可以为 Mount Pod 设置干扰预算([PodDisruptionBudget](https://kubernetes.io/docs/tasks/run-application/configure-pdb))。干扰预算可以保证在排空节点时,Mount Pod 不会被驱逐,直到其对应的应用 Pod 被驱逐,CSI Node 会将其删除。这样既可以保证节点排空期间应用 Pod 对 JuiceFS PV 的访问,避免 Mount Pod 的删除 - 拉取循环,也不影响整个节点排空的流程。示例如下:
472
+
473
+
```yaml
474
+
apiVersion: policy/v1
475
+
kind: PodDisruptionBudget
476
+
metadata:
477
+
name: jfs-pdb
478
+
namespace: kube-system # 对应 JuiceFS CSI 所在的命名空间
479
+
spec:
480
+
minAvailable: "100%" # 避免所有 Mount Pod 在节点排空时被驱逐
481
+
selector:
482
+
matchLabels:
483
+
app.kubernetes.io/name: juicefs-mount
484
+
```
485
+
486
+
:::note 兼容性
487
+
不同的服务提供商都对 Kubernetes 进行了适配和改造,使得 PDB 未必能如预期般工作,如果出现这种情况,请参考下一小节,用 Webhook 来保证排空节点时,Mount Pod 不被过早驱逐。
面对这种情况,则不应使用 PDB,而是为 CSI 驱动启用 Validating Webhook。这样 CSI 驱动在检查到被驱逐的 Mount Pod 还有应用 Pod 使用时,会拒绝驱逐请求。自动扩缩容工具工具会持续重试,直到 Mount Pod 引用计数归零、被正常释放。通过 Helm 安装的示例如下:
495
+
496
+
:::note
497
+
此特性需使用 0.27.1 及以上版本的 JuiceFS CSI 驱动
498
+
:::
499
+
500
+
```yaml
501
+
validatingWebhook:
502
+
enabled: true
503
+
```
504
+
505
+
如果你在使用 [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) 工具时,如果在遇到含有 Mount Pod 的节点无法缩容的情况,可能是因为 Cluster Autoscaler 无法驱逐 [Not Replicated Pod](https://github.com/kubernetes/autoscaler/issues/351),导致无法正常缩容。此时可以尝试为 Mount Pod 设置 `cluster-autoscaler.kubernetes.io/safe-to-evict: "true"` 注解,同时配合上述 webhook,来达到正常缩容的目的。
Copy file name to clipboardexpand all lines: docs/zh_cn/guide/resource-optimization.md
-19
Original file line number
Diff line number
Diff line change
@@ -243,25 +243,6 @@ CSI Node 在创建 Mount Pod 时,会默认给其设置 PriorityClass 为 `syst
243
243
kubectl -n kube-system set env -c juicefs-plugin statefulset/juicefs-csi-controller JUICEFS_MOUNT_PRIORITY_NAME=juicefs-mount-priority-nonpreempting JUICEFS_MOUNT_PREEMPTION_POLICY=Never
244
244
```
245
245
246
-
## 为 Mount Pod 设置干扰预算(PodDisruptionBudget){#set-poddisruptionbudget-for-mount-pod}
247
-
248
-
集群管理员有时会对节点进行排空(drain),以便维护节点、升级节点等。在排空节点时,Kubernetes 会驱逐节点上所有的 Pod,包括 Mount Pod。但是 Mound Pod 的驱逐可能会导致应用 Pod 无法访问 JuiceFS PV,并且 CSI Node 在检查到被驱逐的 Mount Pod 还有应用 Pod 使用时,会再次拉起,这样会导致 Mount Pod 处于删除 - 拉取的循环中。
249
-
250
-
为了避免这种情况的发生,可以为 Mount Pod 设置干扰预算([PodDisruptionBudget](https://kubernetes.io/docs/tasks/run-application/configure-pdb))。干扰预算可以保证在排空节点时,Mount Pod 不会被驱逐,直到其对应的应用 Pod 被驱逐,CSI Node 会将其删除。这样既可以保证节点排空期间应用 Pod 对 JuiceFS PV 的访问,避免 Mount Pod 的删除 - 拉取循环,也不影响整个节点排空的流程。示例如下:
251
-
252
-
```yaml
253
-
apiVersion: policy/v1
254
-
kind: PodDisruptionBudget
255
-
metadata:
256
-
name: jfs-pdb
257
-
namespace: kube-system # 对应 JuiceFS CSI 所在的命名空间
258
-
spec:
259
-
minAvailable: "100%" # 避免所有 Mount Pod 在节点排空时被驱逐
260
-
selector:
261
-
matchLabels:
262
-
app.kubernetes.io/name: juicefs-mount
263
-
```
264
-
265
246
## 为相同的 StorageClass 复用 Mount Pod {#share-mount-pod-for-the-same-storageclass}
266
247
267
248
默认情况下,仅在多个应用 Pod 使用相同 PV 时,Mount Pod 才会被复用。如果你希望进一步降低开销,可以更加激进地复用 Mount Pod,让使用相同 StorageClass 创建出来的所有 PV,都复用同一个 Mount Pod(当然了,复用只能发生在同一个节点)。不同的应用 Pod,将会绑定挂载点下不同的路径,实现一个挂载点为多个应用容器提供服务。
0 commit comments