-
Notifications
You must be signed in to change notification settings - Fork 53
Description
Hello,
I am trying to setup an NVMeOF setup for test in Ceph 19.2.2 with 2 nodes, but i have some issues....
Process:
I have created the nvmeof service on the services page for ceph-nvme1 node (runs ubuntu22.04 and podman), the daemon runs, the container gets deployed to the node. I configured the subsystem, listener, namespace and initiator. When i do a discovery on the esxi 8.0.2 host it sees the controller. When i connect to it, it sees the namespace, but the lun stays standby.
Log on the esxi side:
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)NVMEDEV:6278 Discover namespace on controller 452
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu47:2097610)NvmeDiscover: 2192: Failed to get valid NS for controller nqn.2001-07.com.ceph:1753261226568#vmhba65#10.10.11.8:4420 with with status Invalid or missing namespace
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)NVMEDEV:5417 Controller 452, construct namespace 1
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)NVMEDEV:5359 Namespace identifier type 0x2, length 0x10.
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)NVMEDEV:5359 Namespace identifier type 0x3, length 0x10.
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)NVMEDEV:5301 Set name of ns 1 to eui.610465c484794101931f35ab92209987
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)NvmeDiscover: 6804: Scan operation 3 received on adapter vmhba65
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)NvmeDiscover: 6241: Add namespace on controller = nqn.2001-07.com.ceph:1753261226568#vmhba65#10.10.11.8:4420
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)NvmeDiscover: 6274: Failed to get valid NS handle from controller nqn.2001-07.com.ceph:1753261226568#vmhba65#10.10.11.8:4420 with status Invalid or missing namespace
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)NVMEPSA:1631 adpater: vmhba65, action: 0
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)NvmeDiscover: 3167: Created path for NS (under nqn.2001-07.com.ceph:1753261226568#vmhba65#10.10.11.8:4420) 0x4306f6a6b858 = 0x431e6b807800 for ns = 0x1 for path vmhba65:C0:T0:L0
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)NvmeDiscover: 3221: ANA enabled = 1 on ns=0x1
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)NvmeDiscover: 2955: NGUID based 16-byte identifier: 0x61 0x4 0x65 0xc4 0x84 0x79 0x41 0x1 0x93 0x1f 0x35 0xab 0x92 0x20 0x99 0x87
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)NvmeDiscover: 3045: Path 'vmhba65:C0:T0:L0' : Current UID is eui.610465c484794101931f35ab92209987
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)ns = 0x1
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)NvmeDiscover: 6828: Scan operation 3 completed with status Success on adapter vmhba65
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)NVMEDEV:6419 Discover namespaces on controller 452 is complete
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)NVMEDEV:8533 Controller 452 setting event[2,3]: aec=2048
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)NVMEDEV:8617 Configuring aec: 0x900 on controller 452.
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)NVMEDEV:8645 Async event configuration value is 0x900 on controller 452.
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)NVMEDEV:9252 Succeeded to create recovery world for controller 452
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)NVMFDEV:1245 Controller 452 (nqn.2001-07.com.ceph:1753261226568#vmhba65#10.10.11.8:4420) registered successfully.
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)NVMFVSI:1342 Connected to controller successfully
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)NVMFVSI:693 Masking out trType 1
2025-07-23T09:05:01.234Z In(182) vmkernel: cpu30:2098764 opID=772545d0)NVMFVSI:693 Masking out trType 2
2025-07-23T09:05:02.236Z In(182) vmkernel: cpu37:2097607)NvmeDiscover: 6200: NS discover waited 1002 msec for controller nqn.2001-07.com.ceph:1753261226568#vmhba65#10.10.11.8:4420 , now: 417980701 nsAddTimestamp: 417979699
2025-07-23T09:05:02.236Z In(182) vmkernel: cpu13:2097592)HPP: HppCreateDevice:2807: Created logical device 'eui.610465c484794101931f35ab92209987'.
2025-07-23T09:05:02.236Z In(182) vmkernel: cpu13:2097592)HPP: HppClaimPath:3582: ALUA/ANA target (vmhba65:C0:T0:L0)
2025-07-23T09:05:02.236Z In(182) vmkernel: cpu13:2097592)HPP: HppClaimPath:3647: Added path 'vmhba65:C0:T0:L0' to logical device 'eui.610465c484794101931f35ab92209987'. Total Path Count 1
2025-07-23T09:05:02.236Z In(182) vmkernel: cpu36:2098204)HPP: HppRegisterDevice:3109: Registering logical device with uid 'eui.610465c484794101931f35ab92209987'.
2025-07-23T09:05:02.236Z In(182) vmkernel: cpu36:2098204)StorageApdHandler: 965: APD Handle Created with lock[StorageApd-0x4306f6a5eaa0]
2025-07-23T09:05:02.236Z In(182) vmkernel: cpu36:2098204)StorageDevice: 1142: Alloc'd device 0x4306f6a7a980
2025-07-23T09:05:02.237Z In(182) vmkernel: cpu36:2098204)HPP: HppPathGroupMovePath:688: Path "vmhba65:C0:T0:L0" state changed from "dead" to "standby"
2025-07-23T09:05:02.237Z Wa(180) vmkwarning: cpu36:2098204)WARNING: HPP: HppRegisterDeviceEvents:3005: Could not register reservation events on device "Unregistered Device", Status: Failure. Event Registration will be retried on next path eval.
2025-07-23T09:05:02.237Z Wa(180) vmkwarning: cpu36:2098204)WARNING: HPP: HppRegisterDevice:3131: Not registering device eui.610465c484794101931f35ab92209987 - no active paths
2025-07-23T09:05:02.237Z In(182) vmkernel: cpu36:2098204)StorageApdHandler: 1051: Freeing APD handle 0x4306f6a5eaa0 []
2025-07-23T09:05:02.237Z In(182) vmkernel: cpu36:2098204)StorageApdHandler: 1135: APD Handle freed!
2025-07-23T09:05:21.139Z In(182) vmkernel: cpu36:2098204)HPP: HppRegisterDevice:3109: Registering logical device with uid 'eui.610465c484794101931f35ab92209987'.
2025-07-23T09:05:21.139Z In(182) vmkernel: cpu36:2098204)StorageApdHandler: 965: APD Handle Created with lock[StorageApd-0x4306f6a5eaa0]
2025-07-23T09:05:21.139Z In(182) vmkernel: cpu36:2098204)StorageDevice: 1142: Alloc'd device 0x4306f6a774c0
2025-07-23T09:05:21.140Z Wa(180) vmkwarning: cpu36:2098204)WARNING: HPP: HppRegisterDeviceEvents:3005: Could not register reservation events on device "Unregistered Device", Status: Failure. Event Registration will be retried on next path eval.
2025-07-23T09:05:21.140Z Wa(180) vmkwarning: cpu36:2098204)WARNING: HPP: HppRegisterDevice:3131: Not registering device eui.610465c484794101931f35ab92209987 - no active paths
2025-07-23T09:05:21.140Z In(182) vmkernel: cpu36:2098204)StorageApdHandler: 1051: Freeing APD handle 0x4306f6a5eaa0 []
2025-07-23T09:05:21.140Z In(182) vmkernel: cpu36:2098204)StorageApdHandler: 1135: APD Handle freed!
2025-07-23T09:05:21.140Z In(182) vmkernel: cpu36:2098204)HPP: HppRegisterDevice:3109: Registering logical device with uid 'eui.610465c484794101931f35ab92209987'.
2025-07-23T09:05:21.140Z In(182) vmkernel: cpu36:2098204)StorageApdHandler: 965: APD Handle Created with lock[StorageApd-0x4306f6a5eaa0]
2025-07-23T09:05:21.140Z In(182) vmkernel: cpu36:2098204)StorageDevice: 1142: Alloc'd device 0x4306f6a7a980
2025-07-23T09:05:21.140Z Wa(180) vmkwarning: cpu36:2098204)WARNING: HPP: HppRegisterDeviceEvents:3005: Could not register reservation events on device "Unregistered Device", Status: Failure. Event Registration will be retried on next path eval.
2025-07-23T09:05:21.140Z Wa(180) vmkwarning: cpu36:2098204)WARNING: HPP: HppRegisterDevice:3131: Not registering device eui.610465c484794101931f35ab92209987 - no active paths
2025-07-23T09:05:21.140Z In(182) vmkernel: cpu36:2098204)StorageApdHandler: 1051: Freeing APD handle 0x4306f6a5eaa0 []
2025-07-23T09:05:21.140Z In(182) vmkernel: cpu36:2098204)StorageApdHandler: 1135: APD Handle freed!
I checked the configuration multiple times and it looks fine by me.
I tried to make it work with the base image of 1.2.5 and tried a couple more. 1.3 and 1.4 images says ana group cannot be 0, and the 1.5 says server shutting down when i try to create the namespace.
That one explained the same issue imo:
#683
But even if i try to do it with 1 nvme node, not with 2, the same issue happens on the ceph 19.2.2
Interesing part is i tried to setup a ceph 18.2.7 version of that and it does work with the base image until i reboot the ceph nvme node. After i do reboot it, the container doesnt appear again on the node. Thats what i thought.... I have spammed podman ps, and seen it, the container gets created, and delete, this happens 3-4 times and after it stops. If i remove the daemon, and reconfigure everything on the nvmeof part, it works again if i am not rebooting it.
Is there any way to make them work? Or am i doing smth wrong? I checked the github issues on the opened and closed part, and i didnt really see anything what can fix it.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status