QA pipeline prediction generates wrong response when `top_k` param > 1

### System Info

- `transformers` version: 4.53.0.dev0
- Platform: Linux-5.4.0-1128-aws-fips-x86_64-with-glibc2.31
- Python version: 3.11.11
- Huggingface_hub version: 0.33.0
- Safetensors version: 0.5.3
- Accelerate version: 1.8.1
- Accelerate config: 	not found
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.7.1+cu126 (NA)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed


### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [x] My own task or dataset (give details below)

### Reproduction

```
import transformers

architecture = "csarron/mobilebert-uncased-squad-v2"
tokenizer = transformers.AutoTokenizer.from_pretrained(architecture, low_cpu_mem_usage=True)
model = transformers.MobileBertForQuestionAnswering.from_pretrained(
    architecture, low_cpu_mem_usage=True
)
pipeline = transformers.pipeline(task="question-answering", model=model, tokenizer=tokenizer)


data = [
    {'question': ['What color is it?', 'How do the people go?', "What does the 'wolf' howl at?"],
     'context': [
         "Some people said it was green but I know that it's pink.",
         'The people on the bus go up and down. Up and down.',
         "The pack of 'wolves' stood on the cliff and a 'lone wolf' howled at the moon for hours."
     ]}
]

# prediction result is wrong
pipeline(data, top_k=2, max_answer_len=5)
```
### Expected behavior

Expected prediction response:

```
[[{'score': 0.5683297514915466, 'start': 51, 'end': 55, 'answer': 'pink'}, {'score': 0.028800610452890396, 'start': 51, 'end': 56, 'answer': 'pink.'}], [{'score': 0.3008899986743927, 'start': 25, 'end': 36, 'answer': 'up and down'}, {'score': 0.12070021033287048, 'start': 38, 'end': 49, 'answer': 'Up and down'}], [{'score': 0.8356598615646362, 'start': 68, 'end': 76, 'answer': 'the moon'}, {'score': 0.0971309095621109, 'start': 72, 'end': 76, 'answer': 'moon'}]]
```
But it gets the following response (**one 'Up and down' answer is missing** )

```
[[{'score': 0.5683297514915466, 'start': 51, 'end': 55, 'answer': 'pink'}, {'score': 0.028800610452890396, 'start': 51, 'end': 56, 'answer': 'pink.'}], {'score': 0.4215902090072632, 'start': 25, 'end': 36, 'answer': 'up and down'}, [{'score': 0.8356598615646362, 'start': 68, 'end': 76, 'answer': 'the moon'}, {'score': 0.0971309095621109, 'start': 72, 'end': 76, 'answer': 'moon'}]]
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

QA pipeline prediction generates wrong response when `top_k` param > 1 #38984

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

QA pipeline prediction generates wrong response when top_k param > 1 #38984

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Activity

Rocketknight1 commented on Jun 23, 2025

itsmejul commented on Jul 2, 2025

yushi2006 commented on Jul 3, 2025

yushi2006 commented on Jul 7, 2025

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions

QA pipeline prediction generates wrong response when `top_k` param > 1 #38984