Skip to content

Hidden states are different for model() and model.generate() #38538

Closed
@okihnjo

Description

@okihnjo

System Info

Model: Qwen/Qwen2.5-0.5B-Instruct, BitsAndBytesConfig(load_in_8bit=True)

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

When doing model.generate() and accessing hidden_states, I access the first element, which consists again of the number of layers.
Accessing an element again gives something like (batch_size, tokens, embedding_size) -> similar to model() when accessing the hidden_states.

I always thought that

generated_hidden_states[0][layer][:,-1,:],
hidden_states[layer][:,-1,:],

should be equal, both have the same shape and I thought both represent the same thing, but when looking at the cosine similarity between these vectors, I get different results

Image

Does model.generate() do stuff that is not present in model() or vice versa ?

For reproducing:

model_name = "Qwen/Qwen2.5-0.5B-Instruct"

def load_model(model_name):
    bnb_config = BitsAndBytesConfig(load_in_8bit=True)

    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        quantization_config=bnb_config,
        
    )
    model.config.use_cache = False

    tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side='left')

  
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id
    return model, tokenizer

model, tokenizer = load_model(model_name)
model_inputs = tokenizer(
    ["A list of colors: red, blue", "Portugal is"], return_tensors="pt", padding=True
).to("cuda")
output = model(**model_inputs, output_hidden_states=True)
my_hidden_states = output.hidden_states
generated_ids = model.generate(**model_inputs, output_hidden_states=True,return_dict_in_generate=True, max_new_tokens=13, output_scores=True)
F.cosine_similarity(my_hidden_states[-1][0,-1,:], generated_ids["hidden_states"][0][-1][0,-1,:], dim=0)
# results in different values

### Expected behavior


Should have equal values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions