Open
Description
System Info
transformers
version: 4.53.0.dev0- Platform: Linux-5.4.0-1128-aws-fips-x86_64-with-glibc2.31
- Python version: 3.11.11
- Huggingface_hub version: 0.33.0
- Safetensors version: 0.5.3
- Accelerate version: 1.8.1
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.7.1+cu126 (NA)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
Who can help?
No response
Information
- The official example scriptsMy own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...)My own task or dataset (give details below)
Reproduction
import transformers
architecture = "csarron/mobilebert-uncased-squad-v2"
tokenizer = transformers.AutoTokenizer.from_pretrained(architecture, low_cpu_mem_usage=True)
model = transformers.MobileBertForQuestionAnswering.from_pretrained(
architecture, low_cpu_mem_usage=True
)
pipeline = transformers.pipeline(task="question-answering", model=model, tokenizer=tokenizer)
data = [
{'question': ['What color is it?', 'How do the people go?', "What does the 'wolf' howl at?"],
'context': [
"Some people said it was green but I know that it's pink.",
'The people on the bus go up and down. Up and down.',
"The pack of 'wolves' stood on the cliff and a 'lone wolf' howled at the moon for hours."
]}
]
# prediction result is wrong
pipeline(data, top_k=2, max_answer_len=5)
Expected behavior
Expected prediction response:
[[{'score': 0.5683297514915466, 'start': 51, 'end': 55, 'answer': 'pink'}, {'score': 0.028800610452890396, 'start': 51, 'end': 56, 'answer': 'pink.'}], [{'score': 0.3008899986743927, 'start': 25, 'end': 36, 'answer': 'up and down'}, {'score': 0.12070021033287048, 'start': 38, 'end': 49, 'answer': 'Up and down'}], [{'score': 0.8356598615646362, 'start': 68, 'end': 76, 'answer': 'the moon'}, {'score': 0.0971309095621109, 'start': 72, 'end': 76, 'answer': 'moon'}]]
But it gets the following response (one 'Up and down' answer is missing )
[[{'score': 0.5683297514915466, 'start': 51, 'end': 55, 'answer': 'pink'}, {'score': 0.028800610452890396, 'start': 51, 'end': 56, 'answer': 'pink.'}], {'score': 0.4215902090072632, 'start': 25, 'end': 36, 'answer': 'up and down'}, [{'score': 0.8356598615646362, 'start': 68, 'end': 76, 'answer': 'the moon'}, {'score': 0.0971309095621109, 'start': 72, 'end': 76, 'answer': 'moon'}]]
Activity
Rocketknight1 commentedon Jun 23, 2025
cc @yushi2006, I did a
git bisect
and this change occurs because of #38761! I think the issue is thattop_k
and the new answer-merging logic are conflicting, so we get fewer thantop_k
answers because they get merged. What users probably want is that answers get merged beforetop_k
is applied. I probably should have caught this in the review.Maybe we should do a follow-up PR to fix it and move the score-merging before top_k? There are multiple ways to do this - if you want to take the PR, let me know, if not we'll do it internally at some point.
itsmejul commentedon Jul 2, 2025
I think the easiest way would be to just remove the topk sampling in decode_spans, and keep the full scores matrix until after we merge duplicate answers, calculate answer probabilities and save the answers, and then sample topk only at the very end.
Obviously this adds quadratic overhead because we would need to add the probs for all start-end combinations, not sure if there are more efficient ways to get around this. The only thing I could think of would be to just artificially increase topk at first (lets say 10*topk) before the merging, and then later sample again with the actual topk value, which would only have constant extra overhead but would not guarantee that we have exact probabilities in the results (which we currently also don’t). @Rocketknight1 What do you think?
yushi2006 commentedon Jul 3, 2025
Hey @Rocketknight1! I just noticed I was tagged here — sorry I missed it earlier. I’m jumping on the bug now and will get a fix out soon. Appreciate the mention!
yushi2006 commentedon Jul 7, 2025
Hey @Rocketknight1! I have finished fixing this bug, appreciate if you can review it.