Open
Description
First of all, thank you for the great work on the evaluate library — it's been incredibly useful for benchmarking and analyzing model performance!
I’d like to suggest a feature idea that could enhance the interpretability and debugging process when evaluating models. Specifically, it would be great if evaluate could provide a way to retrieve the list of mispredicted examples, including both the model's prediction and the corresponding ground truth.
This could be especially useful for Creating visualizations or reports of incorrect predictions
Thanks again 🙌
Metadata
Metadata
Assignees
Labels
No labels