Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to provision volumes on all nodes with StorageClass with Custom Node Labels #356

Open
khalMeg opened this issue Dec 16, 2024 · 11 comments
Labels
to-be-scoped Need scoping

Comments

@khalMeg
Copy link

khalMeg commented Dec 16, 2024

I am using openEBS localpv LVM with a StorageClass with Custom Node Labels (for all my worker nodes). However, volumes are always provisioned on node worker-01, and if I cordon worker-01 volumes will never be scheduled on different nodes.

workaround : if I create a new storage class for a specific node (other than worker-01) the volume is provisioned without any problem.

here is my storage class:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: main-sc
annotations:
allowVolumeExpansion: true
parameters:
  thinProvision: "yes"
  storage: "lvm"
  volgroup: "data"
  fsType: "xfs"
provisioner: local.csi.openebs.io
allowedTopologies:
- matchLabelExpressions:
  - key: kubernetes.io/hostname
    values:
      - worker-01
      - worker-02
      - worker-03
      - worker-04
     
kubectl get pods -n openebs
NAME                                             READY   STATUS    RESTARTS       AGE
openebs-localpv-provisioner-6bd66f8598-nnzzv     1/1     Running   2 (4d3h ago)   17d
openebs-lvm-localpv-controller-6bbd64786-kvbw4   5/5     Running   0              17d
openebs-lvm-localpv-node-2plnt                   2/2     Running   8 (17d ago)    39d
openebs-lvm-localpv-node-bw68q                   2/2     Running   8 (17d ago)    39d
openebs-lvm-localpv-node-nb7wl                   2/2     Running   0              38h
openebs-lvm-localpv-node-vjpgb                   2/2     Running   0              13d
openebs-zfs-localpv-controller-b8bc9ff46-x9k9w   5/5     Running   0              17d
openebs-zfs-localpv-node-bmgl6                   2/2     Running   13 (17d ago)   39d
openebs-zfs-localpv-node-lsh8p                   2/2     Running   9 (17d ago)    39d
openebs-zfs-localpv-node-n6p8p                   2/2     Running   0              38h
openebs-zfs-localpv-node-r9lmd                   2/2     Running   0              13d

Environment:

  • LVM Driver version
LVM version:     2.03.16(2) (2022-05-18)
Library version: 1.02.185 (2022-05-18)
Driver version:  4.47.0
  • Kubernetes version :
Client Version: v1.30.4
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.4
  • Kubernetes installer & version:
    k8s cluster deployed using Kubespray
  • OS:
    Debian GNU/Linux 12 (bookworm)
@sinhaashish sinhaashish added the to-be-scoped Need scoping label Jan 16, 2025
@abhilashshetty04
Copy link
Member

Hi @khalMeg ,

Can you share the new storage class with which provisioning works for other node.

Please also see the docs for custom label in storage class

https://openebs.io/docs/user-guides/local-storage-user-guide/local-pv-lvm/lvm-configuration#storageclass-with-custom-node-labels

@khalMeg
Copy link
Author

khalMeg commented Jan 16, 2025

Hi @abhilashshetty04
It is the same storageClass only with different node selected. This is the only way I can go about provisioning on nodes other than worker-01.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: w2-sc
allowVolumeExpansion: true
parameters:
  thinProvision: "yes"
  storage: "lvm"
  volgroup: "data"
  fsType: "xfs"
provisioner: local.csi.openebs.io
allowedTopologies:
- matchLabelExpressions:
  - key: kubernetes.io/hostname
    values:
       - worker-02

The same issue occurs when I use a storage class without specifying any node; the volumes are always provisioned on worker-01!

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: main-sc
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
allowVolumeExpansion: true
parameters:
  thinProvision: "yes"
  storage: "lvm"
  volgroup: "data"
  fsType: "xfs"
provisioner: local.csi.openebs.io

Please note that I am using Thin Provisioning in my LVM setup.

@abhilashshetty04
Copy link
Member

Hi @khalMeg ,

I hope you have vg named as data with enough capacity (while creating pvc) and have loaded dm_thin_pool on all the nodes.

What are the k8s nodename? You have mentioned you are using custom labels. Have you referred following docs

https://openebs.io/docs/user-guides/local-storage-user-guide/local-pv-lvm/lvm-configuration#storageclass-with-custom-node-labels

Can you share output of following command

kubectl get lvmnodes -n <namespace> -oyaml

@khalMeg
Copy link
Author

khalMeg commented Jan 29, 2025

Hi @abhilashshetty04 ,

yes the data volume group exist and have enough capacity. And dm_thin_pool is loaded on all worker nodes

lsmod | grep dm_thin_pool
dm_thin_pool           90112  7
dm_persistent_data    106496  1 dm_thin_pool
dm_bio_prison          20480  1 dm_thin_pool
dm_mod                184320  34 dm_thin_pool,dm_bufio

Here is the output of the command : kubectl get lvmnodes -n <namespace> -oyaml

# kubectl get lvmnodes -n openebs -oyaml
apiVersion: v1
items:
- apiVersion: local.openebs.io/v1alpha1
  kind: LVMNode
  metadata:
    creationTimestamp: "2024-11-03T13:06:15Z"
    generation: 221
    name: worker-01
    namespace: openebs
    ownerReferences:
    - apiVersion: v1
      controller: true
      kind: Node
      name: worker-01
      uid: 6a3a89ea-c231-4e85-a674-a0b5b8fe099c
    resourceVersion: "256281700"
    uid: 02c27672-b91d-46b8-b6e9-79430eb6b3d4
  volumeGroups:
  - allocationPolicy: 0
    free: "0"
    lvCount: 131
    maxLv: 0
    maxPv: 0
    metadataCount: 1
    metadataFree: "471552"
    metadataSize: 1020Ki
    metadataUsedCount: 1
    missingPvCount: 0
    name: data
    permissions: 0
    pvCount: 1
    size: 2252796Mi
    snapCount: 0
    uuid: LmZsF1-KkTH-o91f-AgeM-FJO3-mKUu-MGYPmV
- apiVersion: local.openebs.io/v1alpha1
  kind: LVMNode
  metadata:
    creationTimestamp: "2024-11-03T13:06:07Z"
    generation: 30
    name: worker-02
    namespace: openebs
    ownerReferences:
    - apiVersion: v1
      controller: true
      kind: Node
      name: worker-02
      uid: a73ef935-e91c-4248-bd66-5fcb2c4163b7
    resourceVersion: "197477537"
    uid: 856c086c-7a2f-4f18-a75e-6071d2136ec1
  volumeGroups:
  - allocationPolicy: 0
    free: "0"
    lvCount: 19
    maxLv: 0
    maxPv: 0
    metadataCount: 1
    metadataFree: "512512"
    metadataSize: 1020Ki
    metadataUsedCount: 1
    missingPvCount: 0
    name: data
    permissions: 0
    pvCount: 1
    size: 2252796Mi
    snapCount: 0
    uuid: 1Uz02y-kOKK-0zXL-ixoU-xiiV-7uIN-7ln0Kq
- apiVersion: local.openebs.io/v1alpha1
  kind: LVMNode
  metadata:
    creationTimestamp: "2024-12-02T21:27:32Z"
    generation: 44
    name: worker-03
    namespace: openebs
    ownerReferences:
    - apiVersion: v1
      controller: true
      kind: Node
      name: worker-03
      uid: dfc37503-20e9-4b87-9f3d-0c1712fa5349
    resourceVersion: "153305158"
    uid: cfc761a9-ed0c-4e22-89be-2927da45dd39
  volumeGroups:
  - allocationPolicy: 0
    free: "0"
    lvCount: 27
    maxLv: 0
    maxPv: 0
    metadataCount: 1
    metadataFree: "509440"
    metadataSize: 1020Ki
    metadataUsedCount: 1
    missingPvCount: 0
    name: data
    permissions: 0
    pvCount: 1
    size: 3526652Mi
    snapCount: 0
    uuid: zNvEfq-M99r-cddb-25V1-eTaz-Lb94-GBn1KY
- apiVersion: local.openebs.io/v1alpha1
  kind: LVMNode
  metadata:
    creationTimestamp: "2024-12-14T22:59:49Z"
    generation: 20
    name: worker-04
    namespace: openebs
    ownerReferences:
    - apiVersion: v1
      controller: true
      kind: Node
      name: worker-04
      uid: 28f5b69d-21bd-46a6-840a-61cd2161d51d
    resourceVersion: "246832683"
    uid: ced502d4-5dda-4d22-85d2-a52a53a86d04
  volumeGroups:
  - allocationPolicy: 0
    free: "0"
    lvCount: 10
    maxLv: 0
    maxPv: 0
    metadataCount: 1
    metadataFree: "515584"
    metadataSize: 1020Ki
    metadataUsedCount: 1
    missingPvCount: 0
    name: data
    permissions: 0
    pvCount: 1
    size: 3526652Mi
    snapCount: 0
    uuid: 3Y9inX-rW08-aSvp-c64j-PqvM-DMHX-KD1aqM
- apiVersion: local.openebs.io/v1alpha1
  kind: LVMNode
  metadata:
    creationTimestamp: "2025-01-06T12:14:24Z"
    generation: 14
    name: worker-05
    namespace: openebs
    ownerReferences:
    - apiVersion: v1
      controller: true
      kind: Node
      name: worker-05
      uid: faae0a95-1af9-4a76-86fd-4187c916787f
    resourceVersion: "251189334"
    uid: c5e206bc-ff31-4851-a547-cb6c381a4378
  volumeGroups:
  - allocationPolicy: 0
    free: "0"
    lvCount: 9
    maxLv: 0
    maxPv: 0
    metadataCount: 1
    metadataFree: 504Ki
    metadataSize: 1020Ki
    metadataUsedCount: 1
    missingPvCount: 0
    name: data
    permissions: 0
    pvCount: 1
    size: 3526652Mi
    snapCount: 0
    uuid: r1s6o7-AVOv-mCuW-m9Ic-7GZW-vYpG-vbp1vv
  - allocationPolicy: 0
    free: "0"
    lvCount: 1
    maxLv: 0
    maxPv: 0
    metadataCount: 1
    metadataFree: 507Ki
    metadataSize: 1020Ki
    metadataUsedCount: 1
    missingPvCount: 0
    name: lvmvg
    permissions: 0
    pvCount: 1
    size: 3071996Mi
    snapCount: 0
    uuid: DGA5DB-zZIA-srz0-J4y2-4ydR-oLX3-ch81Zk
- apiVersion: local.openebs.io/v1alpha1
  kind: LVMNode
  metadata:
    creationTimestamp: "2025-01-06T12:14:24Z"
    generation: 11
    name: worker-06
    namespace: openebs
    ownerReferences:
    - apiVersion: v1
      controller: true
      kind: Node
      name: worker-06
      uid: 6de575d2-704b-4c6c-903d-d315fee70c18
    resourceVersion: "251205543"
    uid: bc829745-2ff8-42d7-b6a4-3316b374a54e
  volumeGroups:
  - allocationPolicy: 0
    free: "0"
    lvCount: 6
    maxLv: 0
    maxPv: 0
    metadataCount: 1
    metadataFree: 505Ki
    metadataSize: 1020Ki
    metadataUsedCount: 1
    missingPvCount: 0
    name: data
    permissions: 0
    pvCount: 1
    size: 3526652Mi
    snapCount: 0
    uuid: b1OeSA-4F6A-84ra-c9S3-RsQY-NzvO-xio4dG
  - allocationPolicy: 0
    free: "0"
    lvCount: 1
    maxLv: 0
    maxPv: 0
    metadataCount: 1
    metadataFree: 507Ki
    metadataSize: 1020Ki
    metadataUsedCount: 1
    missingPvCount: 0
    name: lvmvg
    permissions: 0
    pvCount: 1
    size: 3071996Mi
    snapCount: 0
    uuid: 8ehJW8-uepE-LG0f-ro4k-QnAT-1m3Y-TQzPbO
kind: List
metadata:
  resourceVersion: ""

Note that I have created another vg named lvmvg using another disks in node worker-05 and worker-06 just to use the same vgname as the documentation in order to check if the vgname was the issue; but I can confirm that it is not the case.

@abhilashshetty04
Copy link
Member

abhilashshetty04 commented Jan 30, 2025

Hi @khalMeg ,

Note that I have created another vg named lvmvg using another disks in node worker-05 and worker-06 just to use the same vgname as the documentation in order to check if the vgname was the issue; but I can confirm that it is not the case.

vgname should not matter at all. lvmvg on worker-05 and worker-06 has one lv each. Do you see scheduling issue there also.

More importantly, i see free on all vg across lvmnodes is set to "0". It indicates free space left on vg. Which means lvm-controller doesnt see any free space in the cluster during scheduling. Can you list vg using vgs --options vg_all --reportformat json on all nodes and share output.

@khalMeg
Copy link
Author

khalMeg commented Jan 30, 2025

Hi @abhilashshetty04 ,

vgname should not matter at all. lvmvg on worker-05 and worker-06 has one lv each.

The displayed lvCount: 1 from the get lvmnode command is for the thinpool, here is the output of lvs command (see the last line) :

root@worker-06:~# lvs
LV                                       VG    Attr       LSize   Pool          Origin Data%  Meta%  Move Log Cpy%Sync Convert
data_thinpool                            data  twi-aotz--   3.36t                      8.10   4.10
pvc-0ca5e5ec-c899-473e-a6c4-f3869ad00fe3 data  Vwi-aotz-- 220.00g data_thinpool        99.43
pvc-1356b525-2729-4ed5-9f76-572d8964e486 data  Vwi-aotz--  35.00g data_thinpool        31.27
pvc-3ccda4e2-0e7d-4116-bbe4-288822be3d08 data  Vwi-aotz--  35.00g data_thinpool        0.19
pvc-681cfc32-6be2-486c-aed3-cbd134b8d5e5 data  Vwi-a-tz-- 250.00g data_thinpool        0.00
pvc-885a56b3-9308-4ca9-b3f3-df1804176852 data  Vwi-aotz--  35.00g data_thinpool        2.83
pvc-9a10e321-2325-4471-94d8-ca9365b10be2 data  Vwi-aotz--  75.00g data_thinpool        0.09
pvc-ae0986a4-621f-448d-85a6-d56d3d5f78ac data  Vwi-aotz-- 170.00g data_thinpool        3.89
pvc-c0453c9e-e7ef-499f-8480-8d5005cba087 data  Vwi-aotz--  35.00g data_thinpool        3.03
pvc-c44b3189-ee37-46f0-9205-7a6494ad0b8b data  Vwi-aotz--  45.00g data_thinpool        87.33
pvc-e729a770-e186-42a4-bd59-f1280b10fcd9 data  Vwi-aotz--  35.00g data_thinpool        2.35
pvc-ef71775c-718d-4920-a5dc-113e24864d2d data  Vwi-aotz-- 150.00g data_thinpool        0.05
pvc-f9a8d255-d19b-40e3-83fd-f704ff053cfc data  Vwi-aotz-- 170.00g data_thinpool        0.05
lvmvg_thinpool                            lvmvg twi-a-tz--  <2.93t                      0.00   1.86

The thinpool is created like this:
pvcreate /dev/sdc
vgcreate lvmvg /dev/sdc
lvcreate -l 100%FREE -T lvmvg/lvmvg_thinpool --poolmetadatasize 1G

Do you see scheduling issue there also.

Volumes are scheduled on the first node on the list of nodes which is worker-05 using this newly created storage class, just like storage class with vgname : data :

kind: StorageClass
metadata:
  name: lvmvg-sc
allowVolumeExpansion: true
parameters:
  thinProvision: "yes"
  storage: "lvm"
  volgroup: "lvmvg"
  fsType: "xfs"
provisioner: local.csi.openebs.io
allowedTopologies:
- matchLabelExpressions:
  - key: openebs.io/nodename
     values:
     - worker-05
     - worker-06

More importantly, i see free on all vg across lvmnodes is set to "0". It indicates free space left on vg. Which means lvm-controller doesnt see any free space in the cluster during scheduling. Can you list vg using vgs --options vg_all --reportformat json on all nodes and share output.

That's because the space is being allocated for the thin pool rather than being available for new physical volumes, the used space can be shown with lvs command (see the above lvs output - Data% column).

Here is the output for the vgs --options vg_all --reportformat json from worker-06

  {
      "report": [
          {
              "vg": [
                  {"vg_fmt":"lvm2", "vg_uuid":"b1OeSA-4F6A-84ra-c9S3-RsQY-NzvO-xio4dG", "vg_name":"data", "vg_attr":"wz--n-", "vg_permissions":"writeable", "vg_extendable":"extendable", "vg_exported":"", "vg_autoactivation":"enabled", "vg_partial":"", "vg_allocation_policy":"normal", "vg_clustered":"", "vg_shared":"", "vg_size":"3.36t", "vg_free":"0 ", "vg_sysid":"", "vg_systemid":"", "vg_lock_type":"", "vg_lock_args":"", "vg_extent_size":"4.00m", "vg_extent_count":"881663", "vg_free_count":"0", "max_lv":"0", "max_pv":"0", "pv_count":"1", "vg_missing_pv_count":"0", "lv_count":"13", "snap_count":"0", "vg_seqno":"45", "vg_tags":"", "vg_profile":"", "vg_mda_count":"1", "vg_mda_used_count":"1", "vg_mda_free":"502.50k", "vg_mda_size":"1020.00k", "vg_mda_copies":"unmanaged"},
                  {"vg_fmt":"lvm2", "vg_uuid":"8ehJW8-uepE-LG0f-ro4k-QnAT-1m3Y-TQzPbO", "vg_name":"lvmvg", "vg_attr":"wz--n-", "vg_permissions":"writeable", "vg_extendable":"extendable", "vg_exported":"", "vg_autoactivation":"enabled", "vg_partial":"", "vg_allocation_policy":"normal", "vg_clustered":"", "vg_shared":"", "vg_size":"<2.93t", "vg_free":"0 ", "vg_sysid":"", "vg_systemid":"", "vg_lock_type":"", "vg_lock_args":"", "vg_extent_size":"4.00m", "vg_extent_count":"767999", "vg_free_count":"0", "max_lv":"0", "max_pv":"0", "pv_count":"1", "vg_missing_pv_count":"0", "lv_count":"1", "snap_count":"0", "vg_seqno":"4", "vg_tags":"", "vg_profile":"", "vg_mda_count":"1", "vg_mda_used_count":"1", "vg_mda_free":"507.00k", "vg_mda_size":"1020.00k", "vg_mda_copies":"unmanaged"}
              ]
          }
      ]
  }

Please note that I am able to provision volumes normally with a storageclass for each node using the vg: data like described in my previous comments. So there is no issue with the available capacity.

EDIT:
I have edited my comment, volumes are provisioned on the first node on the selected node list (worker-05) for the vgname: lvmvg

@abhilashshetty04
Copy link
Member

Hi @khalMeg , I see. We take care of thinpool creation. We create thinpool when the first thin volume is created.

Let me run some tests ill get back to you. Thanks.

@abhilashshetty04
Copy link
Member

abhilashshetty04 commented Jan 31, 2025

Hi @khalMeg , I was able to reproduce the issue.

localpv-lvm scheduler has no visibility of lvmvg_thinpool(which i feel should change in furture.). Since you are creating lvmvg_thinpool of size equal to vg size on all vg, no vg has no free space left so scheduler assumes that there is no space left in the cluster.

scheduler creates node_candicates list which is empty at first as there is no vg matching storage requirement. Which is then passed to one more logic where topology is evaluated. Even though, localpv-lvm scheduler has not found a vg to with enough capacity if topology is used in sc it will select the first node on the list (worker-0) in your case to create the CR.

If you have not used any topology it just picks the first node i.e worker-0.

You dont have to create lvmvg_thinkpool by yourself. That is created by the node-plugin when the first thin volume is getting created. lvmvg_thinkpool created by plugin usually is a little more larger then the first thin lv size getting provisioned. So VG will have freespace and scheduling works as expected.

You might want to enable thinpool monitoring on all vgs. There is also a configuration in lvm.conf to extend a thinpool by a certain extend when Data% reached a certain threshold

You can refer documentation
https://openebs.io/docs/user-guides/local-storage-user-guide/local-pv-lvm/advanced-operations/lvm-thin-provisioning.

We would like to change some of the scheduling logic. Please find the below issue

#312

@khalMeg
Copy link
Author

khalMeg commented Feb 2, 2025

Thank you @abhilashshetty04,

I have deleted the created thinpool on my test disks in worker-05 and worker-06, now I am able to schedule the pv on node worker-06 rather than worker-05 (first on selected nodes list). I have enabled the thinpool Auto-Extending and it works!

But, I'm getting a strange behavior in the scheduling process, where even I do a cordon on node worker-06, new PVs are still scheduled on it!

for test reasons, I tried to deploy an HA postgres cluster with 2 replicas using Crunchy Postgres for Kubernetes but the two PVs were scheduled in the same node every time, even if PodAntiAffinity is enabled . I'm not sure if it is an issue with lvm-localpv or something else. Have you any idea about this?

@abhilashshetty04
Copy link
Member

abhilashshetty04 commented Feb 3, 2025

Hi @khalMeg ,

I have deleted the created thinpool on my test disks in worker-05 and worker-06, now I am able to schedule the pv on node worker-06 rather than worker-05 (first on selected nodes list). I have enabled the thinpool Auto-Extending and it works!

Thanks for confirming :)

But, I'm getting a strange behavior in the scheduling process, where even I do a cordon on node worker-06, new PVs are still scheduled on it!

I feel we dont check that. Probably we should. Can you create a ticket for it. We can take that up.

for test reasons, I tried to deploy an HA postgres cluster with 2 replicas using Crunchy Postgres for Kubernetes but the two PVs were scheduled in the same node every time, even if PodAntiAffinity is enabled . I'm not sure if it is an issue with lvm-localpv or something else. Have you any idea about this?

This we need to check. Not sure about crunchdata postgresql. Does this deploy statefulset? Can you also create new ticket for this with logs of lvm-controller and additional details on workflows.

@khalMeg
Copy link
Author

khalMeg commented Feb 4, 2025

Thank you again @abhilashshetty04 ,

A new issue has been created here: #366

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
to-be-scoped Need scoping
Projects
None yet
Development

No branches or pull requests

3 participants