Skip to content

Commit 99a50bd

Browse files
authored
Merge pull request #688 from red-hat-storage/sync_ds--master
Syncing latest changes from master for rook
2 parents 453dc30 + 8388c7c commit 99a50bd

File tree

11 files changed

+182
-45
lines changed

11 files changed

+182
-45
lines changed

Documentation/CRDs/Cluster/external-cluster/external-cluster.md

Lines changed: 54 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -121,25 +121,6 @@ The storageclass is used to create a volume in the pool matching the topology wh
121121

122122
For more details, see the [Topology-Based Provisioning](topology-for-external-mode.md)
123123

124-
### Upgrade Example
125-
126-
1. If consumer cluster doesn't have restricted caps, this will upgrade all the default csi-users (non-restricted):
127-
128-
```console
129-
python3 create-external-cluster-resources.py --upgrade
130-
```
131-
132-
2. If the consumer cluster has restricted caps:
133-
Restricted users created using `--restricted-auth-permission` flag need to pass mandatory flags: '`--rbd-data-pool-name`(if it is a rbd user), `--k8s-cluster-name` and `--run-as-user`' flags while upgrading, in case of cephfs users if you have passed `--cephfs-filesystem-name` flag while creating csi-users then while upgrading it will be mandatory too. In this example the user would be `client.csi-rbd-node-rookstorage-replicapool` (following the pattern `csi-user-clusterName-poolName`)
134-
135-
```console
136-
python3 create-external-cluster-resources.py --upgrade --rbd-data-pool-name replicapool --k8s-cluster-name rookstorage --run-as-user client.csi-rbd-node-rookstorage-replicapool
137-
```
138-
139-
!!! note
140-
An existing non-restricted user cannot be converted to a restricted user by upgrading.
141-
The upgrade flag should only be used to append new permissions to users. It shouldn't be used for changing a csi user already applied permissions. For example, you shouldn't change the pool(s) a user has access to.
142-
143124
### Admin privileges
144125

145126
If in case the cluster needs the admin keyring to configure, update the admin key `rook-ceph-mon` secret with client.admin keyring
@@ -305,3 +286,57 @@ you can export the settings from this cluster with the following steps.
305286

306287
!!! important
307288
For other clusters to connect to storage in this cluster, Rook must be configured with a networking configuration that is accessible from other clusters. Most commonly this is done by enabling host networking in the CephCluster CR so the Ceph daemons will be addressable by their host IPs.
289+
290+
## Upgrades
291+
292+
Upgrading the cluster would be different for restricted caps and non-restricted caps,
293+
294+
1. If consumer cluster doesn't have restricted caps, this will upgrade all the default CSI users (non-restricted)
295+
296+
```console
297+
python3 create-external-cluster-resources.py --upgrade
298+
```
299+
300+
2. If the consumer cluster has restricted caps
301+
302+
Restricted users created using `--restricted-auth-permission` flag need to pass mandatory flags: '`--rbd-data-pool-name`(if it is a rbd user), `--k8s-cluster-name` and `--run-as-user`' flags while upgrading, in case of cephfs users if you have passed `--cephfs-filesystem-name` flag while creating CSI users then while upgrading it will be mandatory too. In this example the user would be `client.csi-rbd-node-rookstorage-replicapool` (following the pattern `csi-user-clusterName-poolName`)
303+
304+
```console
305+
python3 create-external-cluster-resources.py --upgrade --rbd-data-pool-name replicapool --k8s-cluster-name rookstorage --run-as-user client.csi-rbd-node-rookstorage-replicapool
306+
```
307+
308+
!!! note
309+
1) An existing non-restricted user cannot be converted to a restricted user by upgrading.
310+
2) The upgrade flag should only be used to append new permissions to users. It shouldn't be used for changing a CSI user already applied permissions. For example, be careful not to change pools(s) that a user has access to.
311+
312+
### Upgrade cluster to utilize new feature
313+
314+
Some Rook upgrades may require re-running the import steps, or may introduce new external cluster features that can be most easily enabled by re-running the import steps.
315+
316+
To re-run the import steps with new options, the python script should be re-run using the same configuration options that were used for past invocations, plus the configurations that are being added or modified.
317+
318+
Starting with Rook v1.15, the script stores the configuration in the external-cluster-user-command configmap for easy future reference.
319+
320+
* arg: Exact arguments that were used for for processing the script. Argument that are decided using the Priority: command-line-args > config.ini file values > default values.
321+
322+
#### Example `external-cluster-user-command` ConfigMap:
323+
324+
1. Get the last-applied config, if its available
325+
326+
```console
327+
$ kubectl get configmap -namespace rook-ceph external-cluster-user-command --output jsonpath='{.data.args}'
328+
```
329+
330+
2. Copy the output to config.ini
331+
332+
3. Make any desired modifications and additions to `config.ini``
333+
334+
4. Run the python script again using the [config file](#config-file)
335+
336+
5. [Copy the bash output](#2-copy-the-bash-output)
337+
338+
6. Run the steps under [import-the-source-data](#import-the-source-data)
339+
340+
!!! warning
341+
If the last-applied config is unavailable, run the current version of the script again using previously-applied config and CLI flags.
342+
Failure to reuse the same configuration options when re-invoking the python script can result in unexpected changes when re-running the import script.

Documentation/Troubleshooting/ceph-common-issues.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,9 @@ title: Ceph Common Issues
6767
- [Symptoms](#symptoms-11)
6868
- [Investigation](#investigation-7)
6969
- [Solution](#solution-12)
70+
- [The cluster is in an unhealthy state or fails to configure when LimitNOFILE=infinity in containerd](#the-cluster-is-in-an-unhealthy-state-or-fails-to-configure-when-limitnofileinfinity-in-containerd)
71+
- [Symptoms](#symptoms-12)
72+
- [Solution](#solution-13)
7073

7174

7275
Many of these problem cases are hard to summarize down to a short phrase that adequately describes the problem. Each problem will start with a bulleted list of symptoms. Keep in mind that all symptoms may not apply depending on the configuration of Rook. If the majority of the symptoms are seen there is a fair chance you are experiencing that problem.
@@ -774,3 +777,36 @@ data: {}
774777
```
775778
776779
If the ConfigMap exists, remove any keys that you wish to configure through the environment.
780+
781+
## The cluster is in an unhealthy state or fails to configure when LimitNOFILE=infinity in containerd
782+
783+
### Symptoms
784+
785+
When trying to create a new deployment, Ceph mons keep crashing and the cluster fails to configure or remains in an unhealthy state. The nodes' CPUs are stuck at 100%.
786+
787+
```console
788+
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL FSID
789+
rook-ceph /var/lib/rook 3 4m6s Ready Failed to configure ceph cluster HEALTH_ERR
790+
```
791+
792+
### Solution
793+
794+
Before systemd v240, systemd would leave `fs.nr_open` as-is because it had no mechanism to set a safe upper limit for it. The kernel hard-coded value for the default number of max open files is **1048576**. Starting from systemd v240, when `LimitNOFILE=infinity` is specified in the containerd.service configuration, this value will typically be set to **~1073741816** (INT_MAX for x86_64 divided by two).
795+
796+
To fix this, set LimitNOFILE in the systemd service configuration to **1048576**.
797+
798+
Create an override.conf file with the new LimitNOFILE value:
799+
800+
```console
801+
$ vim /etc/systemd/system/containerd.service.d/override.conf
802+
[Service]
803+
LimitNOFILE=1048576
804+
```
805+
806+
Reload systemd manager configuration, restart containerd and restart all monitors deployments:
807+
808+
```console
809+
$ systemctl daemon-reload
810+
$ systemctl restart containerd
811+
$ kubectl rollout restart deployment rook-ceph-mon-a rook-ceph-mon-b rook-ceph-mon-c -n rook-ceph
812+
```

build/csv/ceph/ceph.rook.io_cephclusters.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -449,6 +449,7 @@ spec:
449449
- ""
450450
- crush-compat
451451
- upmap
452+
- read
452453
- upmap-read
453454
type: string
454455
type: object

deploy/charts/rook-ceph/templates/resources.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1601,6 +1601,7 @@ spec:
16011601
- ""
16021602
- crush-compat
16031603
- upmap
1604+
- read
16041605
- upmap-read
16051606
type: string
16061607
type: object

deploy/examples/crds.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1599,6 +1599,7 @@ spec:
15991599
- ""
16001600
- crush-compat
16011601
- upmap
1602+
- read
16021603
- upmap-read
16031604
type: string
16041605
type: object

go.mod

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ replace (
1616

1717
require (
1818
github.com/IBM/keyprotect-go-client v0.14.3
19-
github.com/aws/aws-sdk-go v1.54.20
19+
github.com/aws/aws-sdk-go v1.55.3
2020
github.com/banzaicloud/k8s-objectmatcher v1.8.0
2121
github.com/ceph/go-ceph v0.28.0
2222
github.com/coreos/pkg v0.0.0-20230601102743-20bbbf26f4d8
@@ -30,8 +30,8 @@ require (
3030
github.com/kube-object-storage/lib-bucket-provisioner v0.0.0-20221122204822-d1a8c34382f1
3131
github.com/libopenstorage/secrets v0.0.0-20240416031220-a17cf7f72c6c
3232
github.com/pkg/errors v0.9.1
33-
github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring v0.75.1
34-
github.com/prometheus-operator/prometheus-operator/pkg/client v0.75.1
33+
github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring v0.75.2
34+
github.com/prometheus-operator/prometheus-operator/pkg/client v0.75.2
3535
github.com/rook/rook/pkg/apis v0.0.0-20231204200402-5287527732f7
3636
github.com/spf13/cobra v1.8.1
3737
github.com/spf13/pflag v1.0.5

go.sum

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -144,8 +144,8 @@ github.com/armon/go-socks5 v0.0.0-20160902184237-e75332964ef5/go.mod h1:wHh0iHkY
144144
github.com/asaskevich/govalidator v0.0.0-20180720115003-f9ffefc3facf/go.mod h1:lB+ZfQJz7igIIfQNfa7Ml4HSf2uFQQRzpGGRXenZAgY=
145145
github.com/asaskevich/govalidator v0.0.0-20190424111038-f61b66f89f4a/go.mod h1:lB+ZfQJz7igIIfQNfa7Ml4HSf2uFQQRzpGGRXenZAgY=
146146
github.com/aws/aws-sdk-go v1.44.164/go.mod h1:aVsgQcEevwlmQ7qHE9I3h+dtQgpqhFB+i8Phjh7fkwI=
147-
github.com/aws/aws-sdk-go v1.54.20 h1:FZ2UcXya7bUkvkpf7TaPmiL7EubK0go1nlXGLRwEsoo=
148-
github.com/aws/aws-sdk-go v1.54.20/go.mod h1:eRwEWoyTWFMVYVQzKMNHWP5/RV4xIUGMQfXQHfHkpNU=
147+
github.com/aws/aws-sdk-go v1.55.3 h1:0B5hOX+mIx7I5XPOrjrHlKSDQV/+ypFZpIHOx5LOk3E=
148+
github.com/aws/aws-sdk-go v1.55.3/go.mod h1:eRwEWoyTWFMVYVQzKMNHWP5/RV4xIUGMQfXQHfHkpNU=
149149
github.com/banzaicloud/k8s-objectmatcher v1.8.0 h1:Nugn25elKtPMTA2br+JgHNeSQ04sc05MDPmpJnd1N2A=
150150
github.com/banzaicloud/k8s-objectmatcher v1.8.0/go.mod h1:p2LSNAjlECf07fbhDyebTkPUIYnU05G+WfGgkTmgeMg=
151151
github.com/benbjohnson/clock v1.1.0/go.mod h1:J11/hYXuz8f4ySSvYwY0FKfm+ezbsZBKZxNJlLklBHA=
@@ -770,11 +770,11 @@ github.com/prashantv/gostub v1.1.0 h1:BTyx3RfQjRHnUWaGF9oQos79AlQ5k8WNktv7VGvVH4
770770
github.com/prashantv/gostub v1.1.0/go.mod h1:A5zLQHz7ieHGG7is6LLXLz7I8+3LZzsrV0P1IAHhP5U=
771771
github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring v0.44.1/go.mod h1:3WYi4xqXxGGXWDdQIITnLNmuDzO5n6wYva9spVhR4fg=
772772
github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring v0.46.0/go.mod h1:3WYi4xqXxGGXWDdQIITnLNmuDzO5n6wYva9spVhR4fg=
773-
github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring v0.75.1 h1:+iiljhJV6niK7MuifJs/n3NeLxikd85nrQfn53sLJkU=
774-
github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring v0.75.1/go.mod h1:XYrdZw5dW12Cjkt4ndbeNZZTBp4UCHtW0ccR9+sTtPU=
773+
github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring v0.75.2 h1:6UsAv+jAevuGO2yZFU/BukV4o9NKnFMOuoouSA4G0ns=
774+
github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring v0.75.2/go.mod h1:XYrdZw5dW12Cjkt4ndbeNZZTBp4UCHtW0ccR9+sTtPU=
775775
github.com/prometheus-operator/prometheus-operator/pkg/client v0.46.0/go.mod h1:k4BrWlVQQsvBiTcDnKEMgyh/euRxyxgrHdur/ZX/sdA=
776-
github.com/prometheus-operator/prometheus-operator/pkg/client v0.75.1 h1:s7GlsRYGLWP+L1eQKy6RmLatX+k3v9NQwutUix4l5uM=
777-
github.com/prometheus-operator/prometheus-operator/pkg/client v0.75.1/go.mod h1:qca3qWGdknRpHvPyThepe5a6QYAh38IQ2ml93E6V3NY=
776+
github.com/prometheus-operator/prometheus-operator/pkg/client v0.75.2 h1:71GOmhZFA2/17maXqCcuJEzpJDyqPty8SpEOGZWyVec=
777+
github.com/prometheus-operator/prometheus-operator/pkg/client v0.75.2/go.mod h1:Sv6XsfGGkR9gKnhP92F5dNXEpsSePn0W+7JwYP0NVkc=
778778
github.com/prometheus/client_golang v0.9.0/go.mod h1:7SWBe2y4D6OKWSNQJUaRYU/AaXPKyh/dDVn+NZz0KFw=
779779
github.com/prometheus/client_golang v0.9.1/go.mod h1:7SWBe2y4D6OKWSNQJUaRYU/AaXPKyh/dDVn+NZz0KFw=
780780
github.com/prometheus/client_golang v0.9.3/go.mod h1:/TN21ttK/J9q6uSwhBd54HahCDft0ttaMvbicHlPoso=

pkg/apis/ceph.rook.io/v1/types.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -679,7 +679,7 @@ type Module struct {
679679

680680
type ModuleSettings struct {
681681
// BalancerMode sets the `balancer` module with different modes like `upmap`, `crush-compact` etc
682-
// +kubebuilder:validation:Enum="";crush-compat;upmap;upmap-read
682+
// +kubebuilder:validation:Enum="";crush-compat;upmap;read;upmap-read
683683
BalancerMode string `json:"balancerMode,omitempty"`
684684
}
685685

pkg/daemon/ceph/client/mgr.go

Lines changed: 30 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,18 @@ import (
2222

2323
"github.com/pkg/errors"
2424
"github.com/rook/rook/pkg/clusterd"
25+
cephver "github.com/rook/rook/pkg/operator/ceph/version"
2526
)
2627

2728
var (
2829
moduleEnableWaitTime = 5 * time.Second
2930
)
3031

32+
const (
33+
readBalancerMode = "read"
34+
upmapReadBalancerMode = "upmap-read"
35+
)
36+
3137
func CephMgrMap(context *clusterd.Context, clusterInfo *ClusterInfo) (*MgrMap, error) {
3238
args := []string{"mgr", "dump"}
3339
buf, err := NewCephCommand(context, clusterInfo, args).Run()
@@ -132,12 +138,12 @@ func setBalancerMode(context *clusterd.Context, clusterInfo *ClusterInfo, mode s
132138
return nil
133139
}
134140

135-
// setMinCompatClientLuminous set the minimum compatibility for clients to Luminous
136-
func setMinCompatClientLuminous(context *clusterd.Context, clusterInfo *ClusterInfo) error {
137-
args := []string{"osd", "set-require-min-compat-client", "luminous", "--yes-i-really-mean-it"}
141+
// setMinCompatClient set the minimum compatibility for clients
142+
func setMinCompatClient(context *clusterd.Context, clusterInfo *ClusterInfo, version string) error {
143+
args := []string{"osd", "set-require-min-compat-client", version, "--yes-i-really-mean-it"}
138144
_, err := NewCephCommand(context, clusterInfo, args).Run()
139145
if err != nil {
140-
return errors.Wrap(err, "failed to set set-require-min-compat-client to luminous")
146+
return errors.Wrapf(err, "failed to set set-require-min-compat-client to %q", version)
141147
}
142148

143149
return nil
@@ -165,8 +171,12 @@ func mgrSetBalancerMode(context *clusterd.Context, clusterInfo *ClusterInfo, bal
165171

166172
// ConfigureBalancerModule configures the balancer module
167173
func ConfigureBalancerModule(context *clusterd.Context, clusterInfo *ClusterInfo, balancerModuleMode string) error {
168-
// Set min compat client to luminous before enabling the balancer mode "upmap"
169-
err := setMinCompatClientLuminous(context, clusterInfo)
174+
minCompatClientVersion, err := desiredMinCompatClientVersion(clusterInfo, balancerModuleMode)
175+
if err != nil {
176+
return errors.Wrap(err, "failed to get minimum compatibility client version")
177+
}
178+
179+
err = setMinCompatClient(context, clusterInfo, minCompatClientVersion)
170180
if err != nil {
171181
return errors.Wrap(err, "failed to set minimum compatibility client")
172182
}
@@ -179,3 +189,17 @@ func ConfigureBalancerModule(context *clusterd.Context, clusterInfo *ClusterInfo
179189

180190
return nil
181191
}
192+
193+
func desiredMinCompatClientVersion(clusterInfo *ClusterInfo, balancerModuleMode string) (string, error) {
194+
// Set min compat client to luminous before enabling the balancer mode "upmap"
195+
minCompatClientVersion := "luminous"
196+
if balancerModuleMode == readBalancerMode || balancerModuleMode == upmapReadBalancerMode {
197+
if !clusterInfo.CephVersion.IsAtLeast(cephver.CephVersion{Major: 19}) {
198+
return "", errors.New("minimum ceph v19 (Squid) is required for upmap-read or read balancer modes")
199+
}
200+
// Set min compat client to reef before enabling the balancer mode "upmap-read" or "read"
201+
minCompatClientVersion = "reef"
202+
}
203+
204+
return minCompatClientVersion, nil
205+
}

pkg/daemon/ceph/client/mgr_test.go

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ import (
2121

2222
"github.com/pkg/errors"
2323
"github.com/rook/rook/pkg/clusterd"
24+
cephver "github.com/rook/rook/pkg/operator/ceph/version"
2425
exectest "github.com/rook/rook/pkg/util/exec/test"
2526
"github.com/stretchr/testify/assert"
2627
)
@@ -135,3 +136,37 @@ func TestSetBalancerMode(t *testing.T) {
135136
err := setBalancerMode(&clusterd.Context{Executor: executor}, AdminTestClusterInfo("mycluster"), "upmap")
136137
assert.NoError(t, err)
137138
}
139+
140+
func TestGetMinCompatClientVersion(t *testing.T) {
141+
clusterInfo := AdminTestClusterInfo("mycluster")
142+
t.Run("upmap-read balancer mode with ceph v19", func(t *testing.T) {
143+
clusterInfo.CephVersion = cephver.CephVersion{Major: 19}
144+
result, err := desiredMinCompatClientVersion(clusterInfo, upmapReadBalancerMode)
145+
assert.NoError(t, err)
146+
assert.Equal(t, "reef", result)
147+
})
148+
149+
t.Run("read balancer mode with ceph v19", func(t *testing.T) {
150+
clusterInfo.CephVersion = cephver.CephVersion{Major: 19}
151+
result, err := desiredMinCompatClientVersion(clusterInfo, readBalancerMode)
152+
assert.NoError(t, err)
153+
assert.Equal(t, "reef", result)
154+
})
155+
t.Run("upmap-read balancer mode with ceph below v19 should fail", func(t *testing.T) {
156+
clusterInfo.CephVersion = cephver.CephVersion{Major: 18}
157+
_, err := desiredMinCompatClientVersion(clusterInfo, upmapReadBalancerMode)
158+
assert.Error(t, err)
159+
})
160+
t.Run("read balancer mode with ceph below v19 should fail", func(t *testing.T) {
161+
clusterInfo.CephVersion = cephver.CephVersion{Major: 18}
162+
_, err := desiredMinCompatClientVersion(clusterInfo, readBalancerMode)
163+
assert.Error(t, err)
164+
})
165+
166+
t.Run("upmap balancer set min compat client to luminous", func(t *testing.T) {
167+
clusterInfo.CephVersion = cephver.CephVersion{Major: 19}
168+
result, err := desiredMinCompatClientVersion(clusterInfo, "upmap")
169+
assert.NoError(t, err)
170+
assert.Equal(t, "luminous", result)
171+
})
172+
}

0 commit comments

Comments
 (0)