How to retrive the raw attention scores or logits from blip model ( image captioning) #206

umme17 · 2024-04-11T14:47:52Z

from PIL import Image
import requests
url = "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTrp7yKuY1NxcXlHQX10JtTlECuna-xWv-jetxnv73WBw&s"
image = Image.open(requests.get(url, stream=True).raw)

prepare image for the model

inputs_base= processor_base(images=image, return_tensors="pt")
pixel_values = inputs_base.pixel_values
outputs_base = model_base.generate(**inputs_base, renormalize_logits=True, max_length=50 )
generated_caption_base = processor_base.batch_decode(outputs_base[0], skip_special_tokens=True)[0]
print(f"Generated caption: {generated_caption_base}")

I want to retrive the raw attention scores or logits from outputs_base. then make some change to the logits and then generate the new caption for new logits. how to do that?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to retrive the raw attention scores or logits from blip model ( image captioning) #206

How to retrive the raw attention scores or logits from blip model ( image captioning) #206

umme17 commented Apr 11, 2024

How to retrive the raw attention scores or logits from blip model ( image captioning) #206

How to retrive the raw attention scores or logits from blip model ( image captioning) #206

Comments

umme17 commented Apr 11, 2024

prepare image for the model