Skip to content

Compile bug: b7551 build failed on CUDA 12.8.1 #18447

@codingl2k1

Description

@codingl2k1

Git commit

https://github.com/ggml-org/llama.cpp/tree/a52dc60ba3ae0ef1e941ce9a4585672cc335a175

Operating systems

Linux

GGML backends

CUDA

Problem description & steps to reproduce

Build with CUDA 12.4.1 is good, build b7376 with CUDA 12.8.1 is also good.

First Bad Commit

No response

Compile command

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0
update from llama.cpp main repo
Building for CUDA
Running CMake with arguments: -DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DCMAKE_INSTALL_LIBDIR=lib -DLLAMA_CURL=OFF -DLLAMA_LLGUIDANCE=ON -DCMAKE_AR=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-ar -DCMAKE_CXX_COMPILER_AR=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_C_COMPILER_AR=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_RANLIB=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-ranlib -DCMAKE_CXX_COMPILER_RANLIB=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_C_COMPILER_RANLIB=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_LINKER=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-ld -DCMAKE_STRIP=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-strip -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DGGML_CUDA=ON -DGGML_CUDA_FORCE_MMQ=ON -DCMAKE_CUDA_ARCHITECTURES=all


cd /home/runner/work/xllamacpp/xllamacpp/thirdparty/llama.cpp/build/ggml/src/ggml-cuda && /home/runner/miniconda3/envs/llamacpp/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_FORCE_MMQ -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 --options-file CMakeFiles/ggml-cuda.dir/includes_CUDA.rsp -O3 -DNDEBUG -std=c++17 -arch=all -Xcompiler=-fPIC -use_fast_math -extended-lambda -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-mxfp4.cu.o -MF CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-mxfp4.cu.o.d -x cu -c /home/runner/work/xllamacpp/xllamacpp/thirdparty/llama.cpp/ggml/src/ggml-cuda/template-instances/mmq-instance-mxfp4.cu -o CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-mxfp4.cu.o

Relevant log output

nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
ptxas /tmp/tmpxft_00004b0d_00000000-6_mmq-instance-mxfp4.compute_120.ptx, line 1025; error   : Instruction 'mma with block scale' not supported on .target 'sm_120'
ptxas /tmp/tmpxft_00004b0d_00000000-6_mmq-instance-mxfp4.compute_120.ptx, line 1025; error   : Feature '.kind::mxf4' not supported on .target 'sm_120'
ptxas /tmp/tmpxft_00004b0d_00000000-6_mmq-instance-mxfp4.compute_120.ptx, line 1025; error   : Feature '.block_scale' not supported on .target 'sm_120'
ptxas /tmp/tmpxft_00004b0d_00000000-6_mmq-instance-mxfp4.compute_120.ptx, line 1025; error   : Feature '.scale_vec::2X' not supported on .target 'sm_120'
ptxas /tmp/tmpxft_00004b0d_00000000-6_mmq-instance-mxfp4.compute_120.ptx, line 1039; error   : Instruction 'mma with block scale' not supported on .target 'sm_120'

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions