MultiDeviceModel problem

# environment:
pcie environment
transformers: 4.51.0
torch: 2.6.0
LLM-TPU: f5502b5 2025.03.29
tpu-mlir: 5eba2a0e2 2025.03.20
driver version: release version:0.5.2   release date: 20250411-003300

# path:
/workspace/LLM-TPU/template/parallel_demo

# operate:
python3 pipeline.py --devid 4,5,6,7 --dir_path /workspace/LLM-TPU/inference_bmodels/Llama-3.2-3B-Instruct-w4-4dev

# question:
Hello, I'm using the SC7-224T card with sophosdk version 24.04.01 and libsophon version 0.5.2, and I've encountered an issue with the chat.cpp module located in the template/parallel_demo directory. Previously, models configured to run on 1, 2, 4, and 8 chips all functioned as expected. However, now unexpectedly only the 1-chip and 2-chip models are working. The 4-chip and 8-chip configurations, which used to run without any issues, are now failing to load. You can download the full debug trace using [this link](https://drive.google.com/file/d/11b7PNUjmCtgd_zXU58ooQe-Bk8tgjbR7/view?usp=sharing). Do you have any idea what might be causing this issue or how I can fix it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MultiDeviceModel problem #121

environment:

path:

operate:

question:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MultiDeviceModel problem #121

Description

environment:

path:

operate:

question:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions