Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webui crashes after sending prompt #6508

Open
1 task done
ndrew222 opened this issue Nov 2, 2024 · 0 comments
Open
1 task done

webui crashes after sending prompt #6508

ndrew222 opened this issue Nov 2, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@ndrew222
Copy link

ndrew222 commented Nov 2, 2024

Describe the bug

webui crashes after sending prompt

Is there an existing issue for this?

  • I have searched the existing issues

Reproduction

  1. ./start_linux.sh
  2. load a model
  3. send any prompt

Screenshot

No response

Logs

❯ ./start_linux.sh
14:01:15-573868 INFO     Starting Text generation web UI                                             

Running on local URL:  http://127.0.0.1:7860

14:01:45-565977 INFO     Loading "Phi-3-mini-4k-instruct-fp16.gguf"                                  
14:01:45-598096 INFO     llama.cpp weights detected: "models/Phi-3-mini-4k-instruct-fp16.gguf"       
llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from models/Phi-3-mini-4k-instruct-fp16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                           llama.vocab_size u32              = 32064
llama_model_loader: - kv   3:                       llama.context_length u32              = 4096
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 3072
llama_model_loader: - kv   5:                          llama.block_count u32              = 32
llama_model_loader: - kv   6:                  llama.feed_forward_length u32              = 8192
llama_model_loader: - kv   7:                 llama.rope.dimension_count u32              = 96
llama_model_loader: - kv   8:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   9:              llama.attention.head_count_kv u32              = 32
llama_model_loader: - kv  10:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  11:                       llama.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  12:                          general.file_type u32              = 1
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  14:                      tokenizer.ggml.tokens arr[str,32064]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  15:                      tokenizer.ggml.scores arr[f32,32064]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,32064]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  17:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  18:                tokenizer.ggml.eos_token_id u32              = 32000
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 32000
llama_model_loader: - kv  21:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  22:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  23:                    tokenizer.chat_template str              = {{ bos_token }}{% for message in mess...
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type  f16:  226 tensors
llm_load_vocab: control-looking token: '<|end|>' was not control-type; this is probably a bug in the model. its type will be overridden
llm_load_vocab: control-looking token: '<|endoftext|>' was not control-type; this is probably a bug in the model. its type will be overridden
llm_load_vocab: special tokens cache size = 67
llm_load_vocab: token to piece cache size = 0.1691 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32064
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 3072
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_rot            = 96
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 96
llm_load_print_meta: n_embd_head_v    = 96
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 3072
llm_load_print_meta: n_embd_v_gqa     = 3072
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 8192
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = F16
llm_load_print_meta: model params     = 3.82 B
llm_load_print_meta: model size       = 7.12 GiB (16.00 BPW) 
llm_load_print_meta: general.name     = LLaMA v2
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 32000 '<|endoftext|>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 32000 '<|endoftext|>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_print_meta: EOT token        = 32007 '<|end|>'
llm_load_print_meta: EOG token        = 32000 '<|endoftext|>'
llm_load_print_meta: EOG token        = 32007 '<|end|>'
llm_load_print_meta: max token length = 48
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 6900 XT, compute capability 10.3, VMM: no
llm_load_tensors: ggml ctx size =    0.27 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:      ROCm0 buffer size =  7100.64 MiB
llm_load_tensors:        CPU buffer size =   187.88 MiB
.................................................................................................
llama_new_context_with_model: n_ctx      = 4096
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      ROCm0 KV buffer size =  1536.00 MiB
llama_new_context_with_model: KV self size  = 1536.00 MiB, K (f16):  768.00 MiB, V (f16):  768.00 MiB
llama_new_context_with_model:  ROCm_Host  output buffer size =     0.12 MiB
llama_new_context_with_model:      ROCm0 compute buffer size =   288.00 MiB
llama_new_context_with_model:  ROCm_Host compute buffer size =    14.01 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 2
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | 
Model metadata: {'tokenizer.chat_template': "{{ bos_token }}{% for message in messages %}{{'<|' + message['role'] + '|>' + '\n' + message['content'] + '<|end|>\n' }}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>\n' }}{% else %}{{ eos_token }}{% endif %}", 'tokenizer.ggml.add_eos_token': 'false', 'tokenizer.ggml.padding_token_id': '32000', 'tokenizer.ggml.unknown_token_id': '0', 'tokenizer.ggml.eos_token_id': '32000', 'tokenizer.ggml.model': 'llama', 'general.architecture': 'llama', 'llama.rope.freq_base': '10000.000000', 'llama.context_length': '4096', 'general.name': 'LLaMA v2', 'llama.vocab_size': '32064', 'general.file_type': '1', 'tokenizer.ggml.add_bos_token': 'true', 'llama.embedding_length': '3072', 'llama.feed_forward_length': '8192', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.dimension_count': '96', 'tokenizer.ggml.bos_token_id': '1', 'llama.attention.head_count': '32', 'llama.block_count': '32', 'llama.attention.head_count_kv': '32'}
Available chat formats from metadata: chat_template.default
Using gguf chat template: {{ bos_token }}{% for message in messages %}{{'<|' + message['role'] + '|>' + '
' + message['content'] + '<|end|>
' }}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>
' }}{% else %}{{ eos_token }}{% endif %}
Using chat eos_token: <|endoftext|>
Using chat bos_token: <s>
14:01:47-732817 INFO     Loaded "Phi-3-mini-4k-instruct-fp16.gguf" in 2.17 seconds.                  
14:01:47-733607 INFO     LOADER: "llama.cpp"                                                         
14:01:47-734197 INFO     TRUNCATION LENGTH: 4096                                                     
14:01:47-734685 INFO     INSTRUCTION TEMPLATE: "Custom (obtained from model metadata)"               
ggml_cuda_compute_forward: RMS_NORM failed
CUDA error: invalid device function
  current device: 0, in function ggml_cuda_compute_forward at /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml/src/ggml-cuda.cu:2368
  err
/home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml/src/ggml-cuda.cu:106: CUDA error
[New LWP 12809]
[New LWP 12789]
[New LWP 12788]
[New LWP 12751]
[New LWP 12716]
[New LWP 12715]
[New LWP 12714]
[New LWP 12713]
[New LWP 12712]
[New LWP 12711]
[New LWP 12710]
[New LWP 12709]
[New LWP 12708]
[New LWP 12707]
[New LWP 12706]
[New LWP 12705]
[New LWP 12704]
[New LWP 12703]
[New LWP 12702]
[New LWP 12701]
[New LWP 12700]
[New LWP 12699]

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.fedoraproject.org/>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007fe8071e5c13 in clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6
#0  0x00007fe8071e5c13 in clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6
#1  0x0000000000645275 in pysleep (timeout=<optimized out>) at /usr/local/src/conda/python-3.11.10/Modules/timemodule.c:2159
warning: 2159	/usr/local/src/conda/python-3.11.10/Modules/timemodule.c: No such file or directory
#2  time_sleep (self=<optimized out>, timeout_obj=<optimized out>) at /usr/local/src/conda/python-3.11.10/Modules/timemodule.c:383
383	in /usr/local/src/conda/python-3.11.10/Modules/timemodule.c
#3  0x0000000000511e46 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x8a7a38 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7fe8073fa020, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.10/Python/ceval.c:5020
warning: 5020	/usr/local/src/conda/python-3.11.10/Python/ceval.c: No such file or directory
#4  0x00000000005cc1ea in _PyEval_EvalFrame (throwflag=0, frame=0x7fe8073fa020, tstate=0x8a7a38 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.10/Include/internal/pycore_ceval.h:73
warning: 73	/usr/local/src/conda/python-3.11.10/Include/internal/pycore_ceval.h: No such file or directory
#5  _PyEval_Vector (tstate=tstate@entry=0x8a7a38 <_PyRuntime+166328>, func=func@entry=0x7fe8070987c0, locals=locals@entry=0x7fe8070f24c0, args=args@entry=0x0, argcount=argcount@entry=0, kwnames=kwnames@entry=0x0) at /usr/local/src/conda/python-3.11.10/Python/ceval.c:6434
warning: 6434	/usr/local/src/conda/python-3.11.10/Python/ceval.c: No such file or directory
#6  0x00000000005cb8bf in PyEval_EvalCode (co=co@entry=0xbac8130, globals=globals@entry=0x7fe8070f24c0, locals=locals@entry=0x7fe8070f24c0) at /usr/local/src/conda/python-3.11.10/Python/ceval.c:1148
1148	in /usr/local/src/conda/python-3.11.10/Python/ceval.c
#7  0x00000000005ec9e7 in run_eval_code_obj (tstate=tstate@entry=0x8a7a38 <_PyRuntime+166328>, co=co@entry=0xbac8130, globals=globals@entry=0x7fe8070f24c0, locals=locals@entry=0x7fe8070f24c0) at /usr/local/src/conda/python-3.11.10/Python/pythonrun.c:1741
warning: 1741	/usr/local/src/conda/python-3.11.10/Python/pythonrun.c: No such file or directory
#8  0x00000000005e8580 in run_mod (mod=mod@entry=0xbae9900, filename=filename@entry=0x7fe80702d300, globals=globals@entry=0x7fe8070f24c0, locals=locals@entry=0x7fe8070f24c0, flags=flags@entry=0x7fff954f7af8, arena=arena@entry=0x7fe80701b630) at /usr/local/src/conda/python-3.11.10/Python/pythonrun.c:1762
1762	in /usr/local/src/conda/python-3.11.10/Python/pythonrun.c
#9  0x00000000005fd4d2 in pyrun_file (fp=fp@entry=0xba23080, filename=filename@entry=0x7fe80702d300, start=start@entry=257, globals=globals@entry=0x7fe8070f24c0, locals=locals@entry=0x7fe8070f24c0, closeit=closeit@entry=1, flags=0x7fff954f7af8) at /usr/local/src/conda/python-3.11.10/Python/pythonrun.c:1657
1657	in /usr/local/src/conda/python-3.11.10/Python/pythonrun.c
#10 0x00000000005fc89f in _PyRun_SimpleFileObject (fp=0xba23080, filename=0x7fe80702d300, closeit=1, flags=0x7fff954f7af8) at /usr/local/src/conda/python-3.11.10/Python/pythonrun.c:440
440	in /usr/local/src/conda/python-3.11.10/Python/pythonrun.c
#11 0x00000000005fc5c3 in _PyRun_AnyFileObject (fp=0xba23080, filename=filename@entry=0x7fe80702d300, closeit=closeit@entry=1, flags=flags@entry=0x7fff954f7af8) at /usr/local/src/conda/python-3.11.10/Python/pythonrun.c:79
79	in /usr/local/src/conda/python-3.11.10/Python/pythonrun.c
#12 0x00000000005f723e in pymain_run_file_obj (skip_source_first_line=0, filename=0x7fe80702d300, program_name=0x7fe8070f26b0) at /usr/local/src/conda/python-3.11.10/Modules/main.c:360
warning: 360	/usr/local/src/conda/python-3.11.10/Modules/main.c: No such file or directory
#13 pymain_run_file (config=0x88da80 <_PyRuntime+59904>) at /usr/local/src/conda/python-3.11.10/Modules/main.c:379
379	in /usr/local/src/conda/python-3.11.10/Modules/main.c
#14 pymain_run_python (exitcode=0x7fff954f7af0) at /usr/local/src/conda/python-3.11.10/Modules/main.c:605
605	in /usr/local/src/conda/python-3.11.10/Modules/main.c
#15 Py_RunMain () at /usr/local/src/conda/python-3.11.10/Modules/main.c:684
684	in /usr/local/src/conda/python-3.11.10/Modules/main.c
#16 0x00000000005bbf89 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at /usr/local/src/conda/python-3.11.10/Modules/main.c:738
738	in /usr/local/src/conda/python-3.11.10/Modules/main.c
#17 0x00007fe80712c088 in __libc_start_call_main () from /lib64/libc.so.6
#18 0x00007fe80712c14b in __libc_start_main_impl () from /lib64/libc.so.6
#19 0x00000000005bbdd3 in _start ()
[Inferior 1 (process 12681) detached]

System Info

fedora 6.11.5-200.fc40.x86_64
Fedora 40
CPU: AMD Ryzen 7 5800X (16) @ 4.85 GHz
GPU: AMD Radeon RX 6900 XT [Discrete]


rocminfo
========
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
Runtime Ext Version:     1.4
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 7 5800X 8-Core Processor 
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 7 5800X 8-Core Processor 
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   4851                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    40969404(0x27124bc) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    40969404(0x27124bc) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    40969404(0x27124bc) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1030                            
  Uuid:                    GPU-762c9ecf002e0002               
  Marketing Name:          AMD Radeon RX 6900 XT              
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      4096(0x1000) KB                    
    L3:                      131072(0x20000) KB                 
  Chip ID:                 29615(0x73af)                      
  ASIC Revision:           1(0x1)                             
  Cacheline Size:          128(0x80)                          
  Max Clock Freq. (MHz):   2720                               
  BDFID:                   2816                               
  Internal Node ID:        1                                  
  Compute Unit:            80                                 
  SIMDs per CU:            2                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 120                                
  SDMA engine uCode::      83                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16760832(0xffc000) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    16760832(0xffc000) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1030         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***    



rocm-clinfo
===========
Number of platforms:				 1
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 2.1 AMD-APP (3614.0)
  Platform Name:				 AMD Accelerated Parallel Processing
  Platform Vendor:				 Advanced Micro Devices, Inc.
  Platform Extensions:				 cl_khr_icd cl_amd_event_callback 


  Platform Name:				 AMD Accelerated Parallel Processing
Number of devices:				 1
  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Board name:					 AMD Radeon RX 6900 XT
  Device Topology:				 PCI[ B#11, D#0, F#0 ]
  Max compute units:				 40
  Max work items dimensions:			 3
    Max work items[0]:				 1024
    Max work items[1]:				 1024
    Max work items[2]:				 1024
  Max work group size:				 256
  Preferred vector width char:			 4
  Preferred vector width short:			 2
  Preferred vector width int:			 1
  Preferred vector width long:			 1
  Preferred vector width float:			 1
  Preferred vector width double:		 1
  Native vector width char:			 4
  Native vector width short:			 2
  Native vector width int:			 1
  Native vector width long:			 1
  Native vector width float:			 1
  Native vector width double:			 1
  Max clock frequency:				 2720Mhz
  Address bits:					 64
  Max memory allocation:			 14588628168
  Image support:				 Yes
  Max number of images read arguments:		 128
  Max number of images write arguments:		 8
  Max image 2D width:				 16384
  Max image 2D height:				 16384
  Max image 3D width:				 16384
  Max image 3D height:				 16384
  Max image 3D depth:				 8192
  Max samplers within kernel:			 16
  Max size of kernel argument:			 1024
  Alignment (bits) of base address:		 1024
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 Yes
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 Yes
    Round to +ve and infinity:			 Yes
    IEEE754-2008 fused multiply-add:		 Yes
  Cache type:					 Read/Write
  Cache line size:				 128
  Cache size:					 16384
  Global memory size:				 17163091968
  Constant buffer size:				 14588628168
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 65536
  Max pipe arguments:				 16
  Max pipe active reservations:			 16
  Max pipe packet size:				 1703726280
  Max global variable size:			 14588628168
  Max global variable preferred total size:	 17163091968
  Max read/write image args:			 64
  Max on device events:				 1024
  Queue on device max size:			 8388608
  Max on device queues:				 1
  Queue on device preferred size:		 262144
  SVM capabilities:				 
    Coarse grain buffer:			 Yes
    Fine grain buffer:				 Yes
    Fine grain system:				 No
    Atomics:					 No
  Preferred platform atomic alignment:		 0
  Preferred global atomic alignment:		 0
  Preferred local atomic alignment:		 0
  Kernel Preferred work group size multiple:	 32
  Error correction support:			 0
  Unified memory for Host and Device:		 0
  Profiling timer resolution:			 1
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 No
  Queue on Host properties:				 
    Out-of-Order:				 No
    Profiling :					 Yes
  Queue on Device properties:				 
    Out-of-Order:				 Yes
    Profiling :					 Yes
  Platform ID:					 0x7efe81d1c7c8
  Name:						 gfx1030
  Vendor:					 Advanced Micro Devices, Inc.
  Device OpenCL C version:			 OpenCL C 2.0 
  Driver version:				 3614.0 (HSA1.1,LC)
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 2.0 
  Extensions:					 cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program
@ndrew222 ndrew222 added the bug Something isn't working label Nov 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant