-
Notifications
You must be signed in to change notification settings - Fork 14.3k
Open
Labels
Description
Git commit
https://github.com/ggml-org/llama.cpp/tree/a52dc60ba3ae0ef1e941ce9a4585672cc335a175
Operating systems
Linux
GGML backends
CUDA
Problem description & steps to reproduce
Build with CUDA 12.4.1 is good, build b7376 with CUDA 12.8.1 is also good.
First Bad Commit
No response
Compile command
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0
update from llama.cpp main repo
Building for CUDA
Running CMake with arguments: -DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DCMAKE_INSTALL_LIBDIR=lib -DLLAMA_CURL=OFF -DLLAMA_LLGUIDANCE=ON -DCMAKE_AR=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-ar -DCMAKE_CXX_COMPILER_AR=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_C_COMPILER_AR=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_RANLIB=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-ranlib -DCMAKE_CXX_COMPILER_RANLIB=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_C_COMPILER_RANLIB=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_LINKER=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-ld -DCMAKE_STRIP=/home/runner/miniconda3/envs/llamacpp/bin/x86_64-conda-linux-gnu-strip -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DGGML_CUDA=ON -DGGML_CUDA_FORCE_MMQ=ON -DCMAKE_CUDA_ARCHITECTURES=all
cd /home/runner/work/xllamacpp/xllamacpp/thirdparty/llama.cpp/build/ggml/src/ggml-cuda && /home/runner/miniconda3/envs/llamacpp/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_FORCE_MMQ -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 --options-file CMakeFiles/ggml-cuda.dir/includes_CUDA.rsp -O3 -DNDEBUG -std=c++17 -arch=all -Xcompiler=-fPIC -use_fast_math -extended-lambda -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-mxfp4.cu.o -MF CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-mxfp4.cu.o.d -x cu -c /home/runner/work/xllamacpp/xllamacpp/thirdparty/llama.cpp/ggml/src/ggml-cuda/template-instances/mmq-instance-mxfp4.cu -o CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-mxfp4.cu.oRelevant log output
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
ptxas /tmp/tmpxft_00004b0d_00000000-6_mmq-instance-mxfp4.compute_120.ptx, line 1025; error : Instruction 'mma with block scale' not supported on .target 'sm_120'
ptxas /tmp/tmpxft_00004b0d_00000000-6_mmq-instance-mxfp4.compute_120.ptx, line 1025; error : Feature '.kind::mxf4' not supported on .target 'sm_120'
ptxas /tmp/tmpxft_00004b0d_00000000-6_mmq-instance-mxfp4.compute_120.ptx, line 1025; error : Feature '.block_scale' not supported on .target 'sm_120'
ptxas /tmp/tmpxft_00004b0d_00000000-6_mmq-instance-mxfp4.compute_120.ptx, line 1025; error : Feature '.scale_vec::2X' not supported on .target 'sm_120'
ptxas /tmp/tmpxft_00004b0d_00000000-6_mmq-instance-mxfp4.compute_120.ptx, line 1039; error : Instruction 'mma with block scale' not supported on .target 'sm_120'