Compile bug: b7551 build failed on CUDA 12.8.1

### Git commit

https://github.com/ggml-org/llama.cpp/tree/a52dc60ba3ae0ef1e941ce9a4585672cc335a175

### Operating systems

Linux

### GGML backends

CUDA

### Problem description & steps to reproduce

Build with CUDA 12.4.1 is good, build b7376 with CUDA 12.8.1 is also good.

### First Bad Commit

_No response_

### Compile command

```shell
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0
update from llama.cpp main repo
Building for CUDA
Running CMake with arguments: -DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DCMAKE_INSTALL_LIBDIR=lib -DLLAMA_CURL=OFF -DLLAMA_LLGUIDANCE=ON -DCMAKE_AR=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-ar -DCMAKE_CXX_COMPILER_AR=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_C_COMPILER_AR=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_RANLIB=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-ranlib -DCMAKE_CXX_COMPILER_RANLIB=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_C_COMPILER_RANLIB=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_LINKER=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-ld -DCMAKE_STRIP=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-strip -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DGGML_CUDA=ON -DGGML_CUDA_FORCE_MMQ=ON -DCMAKE_CUDA_ARCHITECTURES=all


cd /home/runner/work/xllamacpp/xllamacpp/thirdparty/llama.cpp/build/ggml/src/ggml-cuda && /home/runner/miniconda3/envs/llamacpp/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_FORCE_MMQ -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 --options-file CMakeFiles/ggml-cuda.dir/includes_CUDA.rsp -O3 -DNDEBUG -std=c++17 -arch=all -Xcompiler=-fPIC -use_fast_math -extended-lambda -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-mxfp4.cu.o -MF CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-mxfp4.cu.o.d -x cu -c /home/runner/work/xllamacpp/xllamacpp/thirdparty/llama.cpp/ggml/src/ggml-cuda/template-instances/mmq-instance-mxfp4.cu -o CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-mxfp4.cu.o
```

### Relevant log output

```shell
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
ptxas /tmp/tmpxft_00004b0d_00000000-6_mmq-instance-mxfp4.compute_120.ptx, line 1025; error   : Instruction 'mma with block scale' not supported on .target 'sm_120'
ptxas /tmp/tmpxft_00004b0d_00000000-6_mmq-instance-mxfp4.compute_120.ptx, line 1025; error   : Feature '.kind::mxf4' not supported on .target 'sm_120'
ptxas /tmp/tmpxft_00004b0d_00000000-6_mmq-instance-mxfp4.compute_120.ptx, line 1025; error   : Feature '.block_scale' not supported on .target 'sm_120'
ptxas /tmp/tmpxft_00004b0d_00000000-6_mmq-instance-mxfp4.compute_120.ptx, line 1025; error   : Feature '.scale_vec::2X' not supported on .target 'sm_120'
ptxas /tmp/tmpxft_00004b0d_00000000-6_mmq-instance-mxfp4.compute_120.ptx, line 1039; error   : Instruction 'mma with block scale' not supported on .target 'sm_120'
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compile bug: b7551 build failed on CUDA 12.8.1 #18447

Git commit

Operating systems

GGML backends

Problem description & steps to reproduce

First Bad Commit

Compile command

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Compile bug: b7551 build failed on CUDA 12.8.1 #18447

Description

Git commit

Operating systems

GGML backends

Problem description & steps to reproduce

First Bad Commit

Compile command

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions