Skip to content

Enable deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B #26

@guangy10

Description

@guangy10

The bos_token_id doesn't match between the model config and its tokenizer. It happens on those using Qwen as the base model. Opened an discussion here: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/discussions/25

It may not fit on device w/o quantization, but exporting the llama based deepseek-R1 to ExecuTorch works just fine, e.g. setting model_id to deepseek-ai/DeepSeek-R1-Distill-Llama-8B

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions