Description
environment:
pcie environment
transformers: 4.51.0
torch: 2.6.0
LLM-TPU: f5502b5 2025.03.29
tpu-mlir: 5eba2a0e2 2025.03.20
driver version: release version:0.5.2 release date: 20250411-003300
path:
/workspace/LLM-TPU/template/parallel_demo
operate:
python3 pipeline.py --devid 4,5,6,7 --dir_path /workspace/LLM-TPU/inference_bmodels/Llama-3.2-3B-Instruct-w4-4dev
question:
Hello, I'm using the SC7-224T card with sophosdk version 24.04.01 and libsophon version 0.5.2, and I've encountered an issue with the chat.cpp module located in the template/parallel_demo directory. Previously, models configured to run on 1, 2, 4, and 8 chips all functioned as expected. However, now unexpectedly only the 1-chip and 2-chip models are working. The 4-chip and 8-chip configurations, which used to run without any issues, are now failing to load. You can download the full debug trace using this link. Do you have any idea what might be causing this issue or how I can fix it?