Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError with Llama3.2-3B-Instruct-QLORA_INT4_EO8 - missing files? #194

Open
StephenQuirolgico opened this issue Oct 25, 2024 · 6 comments
Open

Comments

@StephenQuirolgico
Copy link

When trying to run Llama3.2-3B-Instruct-QLORA_INT4_EO8, I'm getting the error:

OSError: meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8 does not appear to have a file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt or flax_model.msgpack.

I've tried using transformers to pull the model as well as downloading the model directly using llama model download. In both cases, the models are downloading successfully, so not sure why it is missing files.

@ashwinb
Copy link
Contributor

ashwinb commented Oct 25, 2024

The files we provide via llama model download are intended to be run either via ExecuTorch or via llama-stack. As such, they don't have these other files you need. It sounds like your inference code is based on HuggingFace transformers, so you should download the files from the corresponding HuggingFace repositories.

I am curious what code needs these files and spitting out this error?

@StephenQuirolgico
Copy link
Author

StephenQuirolgico commented Oct 25, 2024

Yes, I'm using transformers. I've tried using both transformers pipeline and automodel:

from transformers import pipeline

model_id = "meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

and automodel:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = 'meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8'
model = AutoModelForCausalLM.from_pretrained(model_name, device_map='auto')

both methods produce the same error:

OSError: meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8 does not appear to have a file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt or flax_model.msgpack.

@init27
Copy link

init27 commented Oct 25, 2024

@StephenQuirolgico Can you kindly confirm what version of transformers you're using?

@StephenQuirolgico
Copy link
Author

transformers 4.46.0

@WuhanMonkey
Copy link

Hey @StephenQuirolgico, we are working with HF to have these weights converted and supported in transformers. But for now, you can try either llama stack or export with ExecuTorch. Our official llama website has more detail on these.

We might also better help you if you share which platform you plan to run inference on and what use cases you are trying to do.

@StephenQuirolgico
Copy link
Author

@WuhanMonkey I'm running this on RHEL 8. I have existing code using transformers and Llama3.2-3B, and just wanted to test the quantized version (by just swapping out the model in the code). Is there a rough timeframe on when these models will be supported in HF?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants