Closed
Description
System Info
Model: Qwen/Qwen2.5-0.5B-Instruct, BitsAndBytesConfig(load_in_8bit=True)
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
When doing model.generate() and accessing hidden_states, I access the first element, which consists again of the number of layers.
Accessing an element again gives something like (batch_size, tokens, embedding_size) -> similar to model() when accessing the hidden_states.
I always thought that
generated_hidden_states[0][layer][:,-1,:],
hidden_states[layer][:,-1,:],
should be equal, both have the same shape and I thought both represent the same thing, but when looking at the cosine similarity between these vectors, I get different results
Does model.generate() do stuff that is not present in model() or vice versa ?
For reproducing:
model_name = "Qwen/Qwen2.5-0.5B-Instruct"
def load_model(model_name):
bnb_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
)
model.config.use_cache = False
tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side='left')
tokenizer.pad_token = tokenizer.eos_token
tokenizer.pad_token_id = tokenizer.eos_token_id
return model, tokenizer
model, tokenizer = load_model(model_name)
model_inputs = tokenizer(
["A list of colors: red, blue", "Portugal is"], return_tensors="pt", padding=True
).to("cuda")
output = model(**model_inputs, output_hidden_states=True)
my_hidden_states = output.hidden_states
generated_ids = model.generate(**model_inputs, output_hidden_states=True,return_dict_in_generate=True, max_new_tokens=13, output_scores=True)
F.cosine_similarity(my_hidden_states[-1][0,-1,:], generated_ids["hidden_states"][0][-1][0,-1,:], dim=0)
# results in different values
### Expected behavior
Should have equal values.