Hidden states are different for model() and model.generate()

### System Info

Model: Qwen/Qwen2.5-0.5B-Instruct, BitsAndBytesConfig(load_in_8bit=True)

### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

When doing model.generate() and accessing hidden_states, I access the first element, which consists again of the number of layers.
Accessing an element again gives something like (batch_size, tokens, embedding_size) -> similar to model() when accessing the hidden_states.

I always thought that
 ```python
generated_hidden_states[0][layer][:,-1,:],
hidden_states[layer][:,-1,:],
```
should be equal, both have the same shape and I thought both represent the same thing, but when looking at the cosine similarity between these vectors, I get different results

![Image](https://github.com/user-attachments/assets/ac7221ca-a5d0-4e19-aa07-f6fec65add81)

Does model.generate() do stuff that is not present in model() or vice versa ? 

For reproducing: 
```python
model_name = "Qwen/Qwen2.5-0.5B-Instruct"

def load_model(model_name):
    bnb_config = BitsAndBytesConfig(load_in_8bit=True)

    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        quantization_config=bnb_config,
        
    )
    model.config.use_cache = False

    tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side='left')

  
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id
    return model, tokenizer

model, tokenizer = load_model(model_name)
model_inputs = tokenizer(
    ["A list of colors: red, blue", "Portugal is"], return_tensors="pt", padding=True
).to("cuda")
output = model(**model_inputs, output_hidden_states=True)
my_hidden_states = output.hidden_states
generated_ids = model.generate(**model_inputs, output_hidden_states=True,return_dict_in_generate=True, max_new_tokens=13, output_scores=True)
F.cosine_similarity(my_hidden_states[-1][0,-1,:], generated_ids["hidden_states"][0][-1][0,-1,:], dim=0)
# results in different values

### Expected behavior


Should have equal values.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hidden states are different for model() and model.generate() #38538

System Info

Who can help?

Information

Tasks

Reproduction

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hidden states are different for model() and model.generate() #38538

Description

System Info

Who can help?

Information

Tasks

Reproduction

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions