-
Notifications
You must be signed in to change notification settings - Fork 205
Description
What version of hwloc are you using?
2.13.0
Which operating system and hardware are you running on?
Ubuntu 24.04 / Nvidia DGX Spark OS 7.4.0
Linux 6.17.0-1008-nvidia #8-Ubuntu SMP PREEMPT_DYNAMIC Wed Jan 21 17:56:56 UTC 2026 aarch64 aarch64 aarch64 GNU/Linux
Dell Pro Max with GB10
Topology
Machine (P#0 total=125443040KB SoC0ID=jep106:0426:8901 SoC0Family=jep106:0426 SoC0Revision=0x00000000 DMIProductName="Dell Pro Max with GB10 FCM1253" DMIProductVersion= DMIBoardVendor="Dell Inc." DMIBoardName=038RY7 DMIBoardVersion=A02 DMIBoardAssetTag= DMIChassisVendor="Dell Inc." DMIChassisType=3 DMIChassisVersion= DMIChassisAssetTag=AFTPASS DMIBIOSVendor="Dell Inc." DMIBIOSVersion=5.36_1.1.1 DMIBIOSDate=09/23/2025 DMISysVendor="Dell Inc." Backend=Linux LinuxCgroup=/user.slice/user-273949.slice/session-1905.scope OSName=Linux OSRelease=6.17.0-1008-nvidia OSVersion="#8-Ubuntu SMP PREEMPT_DYNAMIC Wed Jan 21 17:56:56 UTC 2026" HostName=xxxx Architecture=aarch64 hwlocVersion=2.13.0 ProcessName=lstopo-no-graphics)
Package L#0 (P#36 total=125443040KB CPUImplementer=0x41 CPUArchitecture=8 CPUVariant=0x0 CPUPart=0xd87 CPURevision=1)
NUMANode L#0 (P#0 local=125443040KB total=125443040KB)
L3Cache L#0 (size=8192KB linesize=64 ways=16)
L2Cache L#0 (size=512KB linesize=64 ways=8)
L1dCache L#0 (size=64KB linesize=64 ways=4)
L1iCache L#0 (size=64KB linesize=64 ways=4)
Core L#0 (P#0)
PU L#0 (P#0)
L2Cache L#1 (size=512KB linesize=64 ways=8)
L1dCache L#1 (size=64KB linesize=64 ways=4)
L1iCache L#1 (size=64KB linesize=64 ways=4)
Core L#1 (P#1)
PU L#1 (P#1)
L2Cache L#2 (size=512KB linesize=64 ways=8)
L1dCache L#2 (size=64KB linesize=64 ways=4)
L1iCache L#2 (size=64KB linesize=64 ways=4)
Core L#2 (P#2)
PU L#2 (P#2)
L2Cache L#3 (size=512KB linesize=64 ways=8)
L1dCache L#3 (size=64KB linesize=64 ways=4)
L1iCache L#3 (size=64KB linesize=64 ways=4)
Core L#3 (P#3)
PU L#3 (P#3)
L2Cache L#4 (size=512KB linesize=64 ways=8)
L1dCache L#4 (size=64KB linesize=64 ways=4)
L1iCache L#4 (size=64KB linesize=64 ways=4)
Core L#4 (P#4)
PU L#4 (P#4)
L2Cache L#5 (size=2048KB linesize=64 ways=8)
L1dCache L#5 (size=64KB linesize=64 ways=4)
L1iCache L#5 (size=64KB linesize=64 ways=4)
Core L#5 (P#5)
PU L#5 (P#5)
L2Cache L#6 (size=2048KB linesize=64 ways=8)
L1dCache L#6 (size=64KB linesize=64 ways=4)
L1iCache L#6 (size=64KB linesize=64 ways=4)
Core L#6 (P#6)
PU L#6 (P#6)
L2Cache L#7 (size=2048KB linesize=64 ways=8)
L1dCache L#7 (size=64KB linesize=64 ways=4)
L1iCache L#7 (size=64KB linesize=64 ways=4)
Core L#7 (P#7)
PU L#7 (P#7)
L2Cache L#8 (size=2048KB linesize=64 ways=8)
L1dCache L#8 (size=64KB linesize=64 ways=4)
L1iCache L#8 (size=64KB linesize=64 ways=4)
Core L#8 (P#8)
PU L#8 (P#8)
L2Cache L#9 (size=2048KB linesize=64 ways=8)
L1dCache L#9 (size=64KB linesize=64 ways=4)
L1iCache L#9 (size=64KB linesize=64 ways=4)
Core L#9 (P#9)
PU L#9 (P#9)
L3Cache L#1 (size=16384KB linesize=64 ways=16)
L2Cache L#10 (size=512KB linesize=64 ways=8)
L1dCache L#10 (size=64KB linesize=64 ways=4)
L1iCache L#10 (size=64KB linesize=64 ways=4)
Core L#10 (P#10)
PU L#10 (P#10)
L2Cache L#11 (size=512KB linesize=64 ways=8)
L1dCache L#11 (size=64KB linesize=64 ways=4)
L1iCache L#11 (size=64KB linesize=64 ways=4)
Core L#11 (P#11)
PU L#11 (P#11)
L2Cache L#12 (size=512KB linesize=64 ways=8)
L1dCache L#12 (size=64KB linesize=64 ways=4)
L1iCache L#12 (size=64KB linesize=64 ways=4)
Core L#12 (P#12)
PU L#12 (P#12)
L2Cache L#13 (size=512KB linesize=64 ways=8)
L1dCache L#13 (size=64KB linesize=64 ways=4)
L1iCache L#13 (size=64KB linesize=64 ways=4)
Core L#13 (P#13)
PU L#13 (P#13)
L2Cache L#14 (size=512KB linesize=64 ways=8)
L1dCache L#14 (size=64KB linesize=64 ways=4)
L1iCache L#14 (size=64KB linesize=64 ways=4)
Core L#14 (P#14)
PU L#14 (P#14)
L2Cache L#15 (size=2048KB linesize=64 ways=8)
L1dCache L#15 (size=64KB linesize=64 ways=4)
L1iCache L#15 (size=64KB linesize=64 ways=4)
Core L#15 (P#15)
PU L#15 (P#15)
L2Cache L#16 (size=2048KB linesize=64 ways=8)
L1dCache L#16 (size=64KB linesize=64 ways=4)
L1iCache L#16 (size=64KB linesize=64 ways=4)
Core L#16 (P#16)
PU L#16 (P#16)
L2Cache L#17 (size=2048KB linesize=64 ways=8)
L1dCache L#17 (size=64KB linesize=64 ways=4)
L1iCache L#17 (size=64KB linesize=64 ways=4)
Core L#17 (P#17)
PU L#17 (P#17)
L2Cache L#18 (size=2048KB linesize=64 ways=8)
L1dCache L#18 (size=64KB linesize=64 ways=4)
L1iCache L#18 (size=64KB linesize=64 ways=4)
Core L#18 (P#18)
PU L#18 (P#18)
L2Cache L#19 (size=2048KB linesize=64 ways=8)
L1dCache L#19 (size=64KB linesize=64 ways=4)
L1iCache L#19 (size=64KB linesize=64 ways=4)
Core L#19 (P#19)
PU L#19 (P#19)
HostBridge L#0 (buses=0000:[00-0f])
PCIBridge L#1 (busid=0000:00:00.0 id=10de:22ce class=0604(PCIBridge) link=15.75GB/s buses=0000:[01-0f])
PCI L#0 (busid=0000:01:00.0 id=15b3:1021 class=0200(Ethernet) link=15.75GB/s)
PCI L#1 (busid=0000:01:00.1 id=15b3:1021 class=0200(Ethernet) link=15.75GB/s)
HostBridge L#2 (buses=0002:[00-0f])
PCIBridge L#3 (busid=0002:00:00.0 id=10de:22ce class=0604(PCIBridge) link=15.75GB/s buses=0002:[01-0f])
PCI L#2 (busid=0002:01:00.0 id=15b3:1021 class=0200(Ethernet) link=15.75GB/s)
PCI L#3 (busid=0002:01:00.1 id=15b3:1021 class=0200(Ethernet) link=15.75GB/s)
HostBridge L#4 (buses=0004:[00-0f])
PCIBridge L#5 (busid=0004:00:00.0 id=10de:22ce class=0604(PCIBridge) link=7.88GB/s buses=0004:[01-0f])
PCI L#4 (busid=0004:01:00.0 id=1987:5027 class=0108(NVMExp) link=7.88GB/s PCISlot=4)
Block(Disk) L#8 (Size=3907018584 SectorSize=512 LinuxDeviceID=259:0 Model=ESL04TBTLCZ-27J4-TYN Revision=ERFM12.0 SerialNumber=F47D7258122901595184) "nvme0n1"
HostBridge L#6 (buses=0007:[00-0f])
PCIBridge L#7 (busid=0007:00:00.0 id=10de:22d0 class=0604(PCIBridge) link=1.97GB/s buses=0007:[01-0f])
PCI L#5 (busid=0007:01:00.0 id=10ec:8127 class=0200(Ethernet) link=1.97GB/s PCISlot=7)
HostBridge L#8 (buses=0009:[00-0f])
PCIBridge L#9 (busid=0009:00:00.0 id=10de:22d0 class=0604(PCIBridge) link=0.62GB/s buses=0009:[01-0f])
PCI L#6 (busid=0009:01:00.0 id=14c3:7925 class=0280(Network) link=0.62GB/s PCISlot=9)
HostBridge L#10 (buses=000f:[00-01])
PCIBridge L#11 (busid=000f:00:00.0 id=10de:22d1 class=0604(PCIBridge) buses=000f:[01-01])
PCI L#7 (busid=000f:01:00.0 id=10de:2e12 class=0300(VGA) link=0.25GB/s)
Co-Processor(CUDA) L#11 (Backend=CUDA GPUVendor="NVIDIA Corporation" GPUModel="NVIDIA GB10" CUDAGlobalMemorySize=125443040 CUDAL2CacheSize=24576 CUDAMultiProcessors=48 CUDACoresPerMP=128 CUDASharedMemorySizePerMP=48) "cuda0"
GPU(NVML) L#12 (Backend=NVML GPUVendor="NVIDIA Corporation" GPUModel="NVIDIA GB10" NVIDIAUUID=GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) "nvml0"
depth 0: 1 Machine (type #0)
depth 1: 1 Package (type #1)
depth 2: 2 L3Cache (type #6)
depth 3: 20 L2Cache (type #5)
depth 4: 20 L1dCache (type #4)
depth 5: 20 L1iCache (type #9)
depth 6: 20 Core (type #2)
depth 7: 20 PU (type #3)
Special depth -3: 1 NUMANode (type #13)
Special depth -4: 12 Bridge (type #14)
Special depth -5: 8 PCIDev (type #15)
Special depth -6: 13 OSDev (type #16)
CPU kind #0 efficiency 0 cpuset 0x0000001f
FrequencyMaxMHz = 2808
FrequencyBaseMHz = 2808
LinuxCapacity = 718
CPU kind #1 efficiency 1 cpuset 0x00007c00
FrequencyMaxMHz = 2808
FrequencyBaseMHz = 2808
LinuxCapacity = 731
CPU kind #2 efficiency 2 cpuset 0x000003e0
FrequencyMaxMHz = 3900
FrequencyBaseMHz = 3900
LinuxCapacity = 997
CPU kind #3 efficiency 3 cpuset 0x00078000
FrequencyMaxMHz = 3900
FrequencyBaseMHz = 3900
LinuxCapacity = 1017
CPU kind #4 efficiency 4 cpuset 0x00080000
FrequencyMaxMHz = 3900
FrequencyBaseMHz = 3900
LinuxCapacity = 1024
Details of the problem
hwloc-ls shows more than the expected two CPU kinds on the NVIDIA GB10 CPU, presumably due to very minor variations in LinuxCapacity. NVIDIA GB10 is has two CPU kinds. Cores 0-4 and 10-14 are the "slow" kind and cores 5-9 and 15-19 are the "fast" kind.
CPU kind #0 efficiency 0 cpuset 0x0000001f
FrequencyMaxMHz = 2808
LinuxCapacity = 718
CPU kind #1 efficiency 1 cpuset 0x00007c00
FrequencyMaxMHz = 2808
LinuxCapacity = 731
CPU kind #2 efficiency 2 cpuset 0x000003e0
FrequencyMaxMHz = 3900
LinuxCapacity = 997
CPU kind #3 efficiency 3 cpuset 0x00078000
FrequencyMaxMHz = 3900
LinuxCapacity = 1017
CPU kind #4 efficiency 4 cpuset 0x00080000
FrequencyMaxMHz = 3900
LinuxCapacity = 1024
Additional information
The two core kinds can be distinguished based on their L2 cache size (512KB vs. 2048KiB). They can also be distinguished based on their ARM CPU part identifiers (0xd87 and 0xd85), but hwloc currently makes the incorrect assumption that it's homogeneous across all cores of a package, for which I have opened #767.
I observed the issue on a Dell-branded GB10, but I assume it would affect DGX Spark from any vendor systems.
This actually seems like a very similar issue to #634, which affected the Nvidia Grace. That one was resolved by adding a quirk based on its SoC ID. That seems like a viable option here too (SoC ID is jep106:0426:8901), but the implementation will be different than for the Grace as we still need to distinguish the two CPU kinds.