Skip to content

Incorrect CPU kinds on Nvidia GB10 CPU #766

@mkuron

Description

@mkuron

What version of hwloc are you using?

2.13.0

Which operating system and hardware are you running on?

Ubuntu 24.04 / Nvidia DGX Spark OS 7.4.0
Linux 6.17.0-1008-nvidia #8-Ubuntu SMP PREEMPT_DYNAMIC Wed Jan 21 17:56:56 UTC 2026 aarch64 aarch64 aarch64 GNU/Linux

Dell Pro Max with GB10

Topology
Machine (P#0 total=125443040KB SoC0ID=jep106:0426:8901 SoC0Family=jep106:0426 SoC0Revision=0x00000000 DMIProductName="Dell Pro Max with GB10 FCM1253" DMIProductVersion= DMIBoardVendor="Dell Inc." DMIBoardName=038RY7 DMIBoardVersion=A02 DMIBoardAssetTag= DMIChassisVendor="Dell Inc." DMIChassisType=3 DMIChassisVersion= DMIChassisAssetTag=AFTPASS DMIBIOSVendor="Dell Inc." DMIBIOSVersion=5.36_1.1.1 DMIBIOSDate=09/23/2025 DMISysVendor="Dell  Inc." Backend=Linux LinuxCgroup=/user.slice/user-273949.slice/session-1905.scope OSName=Linux OSRelease=6.17.0-1008-nvidia OSVersion="#8-Ubuntu SMP PREEMPT_DYNAMIC Wed Jan 21 17:56:56 UTC 2026" HostName=xxxx Architecture=aarch64 hwlocVersion=2.13.0 ProcessName=lstopo-no-graphics)
  Package L#0 (P#36 total=125443040KB CPUImplementer=0x41 CPUArchitecture=8 CPUVariant=0x0 CPUPart=0xd87 CPURevision=1)
    NUMANode L#0 (P#0 local=125443040KB total=125443040KB)
    L3Cache L#0 (size=8192KB linesize=64 ways=16)
      L2Cache L#0 (size=512KB linesize=64 ways=8)
        L1dCache L#0 (size=64KB linesize=64 ways=4)
          L1iCache L#0 (size=64KB linesize=64 ways=4)
            Core L#0 (P#0)
              PU L#0 (P#0)
      L2Cache L#1 (size=512KB linesize=64 ways=8)
        L1dCache L#1 (size=64KB linesize=64 ways=4)
          L1iCache L#1 (size=64KB linesize=64 ways=4)
            Core L#1 (P#1)
              PU L#1 (P#1)
      L2Cache L#2 (size=512KB linesize=64 ways=8)
        L1dCache L#2 (size=64KB linesize=64 ways=4)
          L1iCache L#2 (size=64KB linesize=64 ways=4)
            Core L#2 (P#2)
              PU L#2 (P#2)
      L2Cache L#3 (size=512KB linesize=64 ways=8)
        L1dCache L#3 (size=64KB linesize=64 ways=4)
          L1iCache L#3 (size=64KB linesize=64 ways=4)
            Core L#3 (P#3)
              PU L#3 (P#3)
      L2Cache L#4 (size=512KB linesize=64 ways=8)
        L1dCache L#4 (size=64KB linesize=64 ways=4)
          L1iCache L#4 (size=64KB linesize=64 ways=4)
            Core L#4 (P#4)
              PU L#4 (P#4)
      L2Cache L#5 (size=2048KB linesize=64 ways=8)
        L1dCache L#5 (size=64KB linesize=64 ways=4)
          L1iCache L#5 (size=64KB linesize=64 ways=4)
            Core L#5 (P#5)
              PU L#5 (P#5)
      L2Cache L#6 (size=2048KB linesize=64 ways=8)
        L1dCache L#6 (size=64KB linesize=64 ways=4)
          L1iCache L#6 (size=64KB linesize=64 ways=4)
            Core L#6 (P#6)
              PU L#6 (P#6)
      L2Cache L#7 (size=2048KB linesize=64 ways=8)
        L1dCache L#7 (size=64KB linesize=64 ways=4)
          L1iCache L#7 (size=64KB linesize=64 ways=4)
            Core L#7 (P#7)
              PU L#7 (P#7)
      L2Cache L#8 (size=2048KB linesize=64 ways=8)
        L1dCache L#8 (size=64KB linesize=64 ways=4)
          L1iCache L#8 (size=64KB linesize=64 ways=4)
            Core L#8 (P#8)
              PU L#8 (P#8)
      L2Cache L#9 (size=2048KB linesize=64 ways=8)
        L1dCache L#9 (size=64KB linesize=64 ways=4)
          L1iCache L#9 (size=64KB linesize=64 ways=4)
            Core L#9 (P#9)
              PU L#9 (P#9)
    L3Cache L#1 (size=16384KB linesize=64 ways=16)
      L2Cache L#10 (size=512KB linesize=64 ways=8)
        L1dCache L#10 (size=64KB linesize=64 ways=4)
          L1iCache L#10 (size=64KB linesize=64 ways=4)
            Core L#10 (P#10)
              PU L#10 (P#10)
      L2Cache L#11 (size=512KB linesize=64 ways=8)
        L1dCache L#11 (size=64KB linesize=64 ways=4)
          L1iCache L#11 (size=64KB linesize=64 ways=4)
            Core L#11 (P#11)
              PU L#11 (P#11)
      L2Cache L#12 (size=512KB linesize=64 ways=8)
        L1dCache L#12 (size=64KB linesize=64 ways=4)
          L1iCache L#12 (size=64KB linesize=64 ways=4)
            Core L#12 (P#12)
              PU L#12 (P#12)
      L2Cache L#13 (size=512KB linesize=64 ways=8)
        L1dCache L#13 (size=64KB linesize=64 ways=4)
          L1iCache L#13 (size=64KB linesize=64 ways=4)
            Core L#13 (P#13)
              PU L#13 (P#13)
      L2Cache L#14 (size=512KB linesize=64 ways=8)
        L1dCache L#14 (size=64KB linesize=64 ways=4)
          L1iCache L#14 (size=64KB linesize=64 ways=4)
            Core L#14 (P#14)
              PU L#14 (P#14)
      L2Cache L#15 (size=2048KB linesize=64 ways=8)
        L1dCache L#15 (size=64KB linesize=64 ways=4)
          L1iCache L#15 (size=64KB linesize=64 ways=4)
            Core L#15 (P#15)
              PU L#15 (P#15)
      L2Cache L#16 (size=2048KB linesize=64 ways=8)
        L1dCache L#16 (size=64KB linesize=64 ways=4)
          L1iCache L#16 (size=64KB linesize=64 ways=4)
            Core L#16 (P#16)
              PU L#16 (P#16)
      L2Cache L#17 (size=2048KB linesize=64 ways=8)
        L1dCache L#17 (size=64KB linesize=64 ways=4)
          L1iCache L#17 (size=64KB linesize=64 ways=4)
            Core L#17 (P#17)
              PU L#17 (P#17)
      L2Cache L#18 (size=2048KB linesize=64 ways=8)
        L1dCache L#18 (size=64KB linesize=64 ways=4)
          L1iCache L#18 (size=64KB linesize=64 ways=4)
            Core L#18 (P#18)
              PU L#18 (P#18)
      L2Cache L#19 (size=2048KB linesize=64 ways=8)
        L1dCache L#19 (size=64KB linesize=64 ways=4)
          L1iCache L#19 (size=64KB linesize=64 ways=4)
            Core L#19 (P#19)
              PU L#19 (P#19)
  HostBridge L#0 (buses=0000:[00-0f])
    PCIBridge L#1 (busid=0000:00:00.0 id=10de:22ce class=0604(PCIBridge) link=15.75GB/s buses=0000:[01-0f])
      PCI L#0 (busid=0000:01:00.0 id=15b3:1021 class=0200(Ethernet) link=15.75GB/s)
      PCI L#1 (busid=0000:01:00.1 id=15b3:1021 class=0200(Ethernet) link=15.75GB/s)
  HostBridge L#2 (buses=0002:[00-0f])
    PCIBridge L#3 (busid=0002:00:00.0 id=10de:22ce class=0604(PCIBridge) link=15.75GB/s buses=0002:[01-0f])
      PCI L#2 (busid=0002:01:00.0 id=15b3:1021 class=0200(Ethernet) link=15.75GB/s)
      PCI L#3 (busid=0002:01:00.1 id=15b3:1021 class=0200(Ethernet) link=15.75GB/s)
  HostBridge L#4 (buses=0004:[00-0f])
    PCIBridge L#5 (busid=0004:00:00.0 id=10de:22ce class=0604(PCIBridge) link=7.88GB/s buses=0004:[01-0f])
      PCI L#4 (busid=0004:01:00.0 id=1987:5027 class=0108(NVMExp) link=7.88GB/s PCISlot=4)
        Block(Disk) L#8 (Size=3907018584 SectorSize=512 LinuxDeviceID=259:0 Model=ESL04TBTLCZ-27J4-TYN Revision=ERFM12.0 SerialNumber=F47D7258122901595184) "nvme0n1"
  HostBridge L#6 (buses=0007:[00-0f])
    PCIBridge L#7 (busid=0007:00:00.0 id=10de:22d0 class=0604(PCIBridge) link=1.97GB/s buses=0007:[01-0f])
      PCI L#5 (busid=0007:01:00.0 id=10ec:8127 class=0200(Ethernet) link=1.97GB/s PCISlot=7)
  HostBridge L#8 (buses=0009:[00-0f])
    PCIBridge L#9 (busid=0009:00:00.0 id=10de:22d0 class=0604(PCIBridge) link=0.62GB/s buses=0009:[01-0f])
      PCI L#6 (busid=0009:01:00.0 id=14c3:7925 class=0280(Network) link=0.62GB/s PCISlot=9)
  HostBridge L#10 (buses=000f:[00-01])
    PCIBridge L#11 (busid=000f:00:00.0 id=10de:22d1 class=0604(PCIBridge) buses=000f:[01-01])
      PCI L#7 (busid=000f:01:00.0 id=10de:2e12 class=0300(VGA) link=0.25GB/s)
        Co-Processor(CUDA) L#11 (Backend=CUDA GPUVendor="NVIDIA Corporation" GPUModel="NVIDIA GB10" CUDAGlobalMemorySize=125443040 CUDAL2CacheSize=24576 CUDAMultiProcessors=48 CUDACoresPerMP=128 CUDASharedMemorySizePerMP=48) "cuda0"
        GPU(NVML) L#12 (Backend=NVML GPUVendor="NVIDIA Corporation" GPUModel="NVIDIA GB10" NVIDIAUUID=GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) "nvml0"
depth 0:           1 Machine (type #0)
 depth 1:          1 Package (type #1)
  depth 2:         2 L3Cache (type #6)
   depth 3:        20 L2Cache (type #5)
    depth 4:       20 L1dCache (type #4)
     depth 5:      20 L1iCache (type #9)
      depth 6:     20 Core (type #2)
       depth 7:    20 PU (type #3)
Special depth -3:  1 NUMANode (type #13)
Special depth -4:  12 Bridge (type #14)
Special depth -5:  8 PCIDev (type #15)
Special depth -6:  13 OSDev (type #16)
CPU kind #0 efficiency 0 cpuset 0x0000001f
  FrequencyMaxMHz = 2808
  FrequencyBaseMHz = 2808
  LinuxCapacity = 718
CPU kind #1 efficiency 1 cpuset 0x00007c00
  FrequencyMaxMHz = 2808
  FrequencyBaseMHz = 2808
  LinuxCapacity = 731
CPU kind #2 efficiency 2 cpuset 0x000003e0
  FrequencyMaxMHz = 3900
  FrequencyBaseMHz = 3900
  LinuxCapacity = 997
CPU kind #3 efficiency 3 cpuset 0x00078000
  FrequencyMaxMHz = 3900
  FrequencyBaseMHz = 3900
  LinuxCapacity = 1017
CPU kind #4 efficiency 4 cpuset 0x00080000
  FrequencyMaxMHz = 3900
  FrequencyBaseMHz = 3900
  LinuxCapacity = 1024

Details of the problem

hwloc-ls shows more than the expected two CPU kinds on the NVIDIA GB10 CPU, presumably due to very minor variations in LinuxCapacity. NVIDIA GB10 is has two CPU kinds. Cores 0-4 and 10-14 are the "slow" kind and cores 5-9 and 15-19 are the "fast" kind.

CPU kind #0 efficiency 0 cpuset 0x0000001f
  FrequencyMaxMHz = 2808
  LinuxCapacity = 718
CPU kind #1 efficiency 1 cpuset 0x00007c00
  FrequencyMaxMHz = 2808
  LinuxCapacity = 731
CPU kind #2 efficiency 2 cpuset 0x000003e0
  FrequencyMaxMHz = 3900
  LinuxCapacity = 997
CPU kind #3 efficiency 3 cpuset 0x00078000
  FrequencyMaxMHz = 3900
  LinuxCapacity = 1017
CPU kind #4 efficiency 4 cpuset 0x00080000
  FrequencyMaxMHz = 3900
  LinuxCapacity = 1024

Additional information

The two core kinds can be distinguished based on their L2 cache size (512KB vs. 2048KiB). They can also be distinguished based on their ARM CPU part identifiers (0xd87 and 0xd85), but hwloc currently makes the incorrect assumption that it's homogeneous across all cores of a package, for which I have opened #767.

I observed the issue on a Dell-branded GB10, but I assume it would affect DGX Spark from any vendor systems.

This actually seems like a very similar issue to #634, which affected the Nvidia Grace. That one was resolved by adding a quirk based on its SoC ID. That seems like a viable option here too (SoC ID is jep106:0426:8901), but the implementation will be different than for the Grace as we still need to distinguish the two CPU kinds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions