OSError with Llama3.2-3B-Instruct-QLORA_INT4_EO8 - missing files? #194

StephenQuirolgico · 2024-10-25T17:06:07Z

When trying to run Llama3.2-3B-Instruct-QLORA_INT4_EO8, I'm getting the error:

OSError: meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8 does not appear to have a file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt or flax_model.msgpack.

I've tried using transformers to pull the model as well as downloading the model directly using llama model download. In both cases, the models are downloading successfully, so not sure why it is missing files.

The text was updated successfully, but these errors were encountered:

ashwinb · 2024-10-25T17:08:58Z

The files we provide via llama model download are intended to be run either via ExecuTorch or via llama-stack. As such, they don't have these other files you need. It sounds like your inference code is based on HuggingFace transformers, so you should download the files from the corresponding HuggingFace repositories.

I am curious what code needs these files and spitting out this error?

StephenQuirolgico · 2024-10-25T17:22:07Z

Yes, I'm using transformers. I've tried using both transformers pipeline and automodel:

from transformers import pipeline

model_id = "meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

and automodel:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = 'meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8'
model = AutoModelForCausalLM.from_pretrained(model_name, device_map='auto')

both methods produce the same error:

OSError: meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8 does not appear to have a file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt or flax_model.msgpack.

init27 · 2024-10-25T17:41:35Z

@StephenQuirolgico Can you kindly confirm what version of transformers you're using?

StephenQuirolgico · 2024-10-25T17:43:09Z

transformers 4.46.0

WuhanMonkey · 2024-10-25T18:00:37Z

Hey @StephenQuirolgico, we are working with HF to have these weights converted and supported in transformers. But for now, you can try either llama stack or export with ExecuTorch. Our official llama website has more detail on these.

We might also better help you if you share which platform you plan to run inference on and what use cases you are trying to do.

StephenQuirolgico · 2024-10-25T18:22:55Z

@WuhanMonkey I'm running this on RHEL 8. I have existing code using transformers and Llama3.2-3B, and just wanted to test the quantized version (by just swapping out the model in the code). Is there a rough timeframe on when these models will be supported in HF?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OSError with Llama3.2-3B-Instruct-QLORA_INT4_EO8 - missing files? #194

OSError with Llama3.2-3B-Instruct-QLORA_INT4_EO8 - missing files? #194

StephenQuirolgico commented Oct 25, 2024

ashwinb commented Oct 25, 2024

StephenQuirolgico commented Oct 25, 2024 •

edited

Loading

init27 commented Oct 25, 2024

StephenQuirolgico commented Oct 25, 2024

WuhanMonkey commented Oct 25, 2024

StephenQuirolgico commented Oct 25, 2024

OSError with Llama3.2-3B-Instruct-QLORA_INT4_EO8 - missing files? #194

OSError with Llama3.2-3B-Instruct-QLORA_INT4_EO8 - missing files? #194

Comments

StephenQuirolgico commented Oct 25, 2024

ashwinb commented Oct 25, 2024

StephenQuirolgico commented Oct 25, 2024 • edited Loading

init27 commented Oct 25, 2024

StephenQuirolgico commented Oct 25, 2024

WuhanMonkey commented Oct 25, 2024

StephenQuirolgico commented Oct 25, 2024

StephenQuirolgico commented Oct 25, 2024 •

edited

Loading