Skip to content

Add ignore_decode_errors option to Image feature for robust decoding #7612 #7638

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ArjunJagdale
Copy link
Contributor

This PR implements support for robust image decoding in the Image feature, as discussed in issue #7612.

🔧 What was added

  • A new boolean field: ignore_decode_errors (default: False)
  • If set to True, any exceptions during decoding will be caught, and None will be returned instead of raising an error
features = Features({
    "image": Image(decode=True, ignore_decode_errors=True),
})

This enables robust iteration over potentially corrupted datasets — especially useful when streaming datasets like WebDataset or image-heavy public sets where sample corruption is common.

🧪 Behavior

  • If ignore_decode_errors=False (default), decoding behaves exactly as before

  • If True, decoding errors are caught, and a warning is emitted:

    [Image.decode_example] Skipped corrupted image: ...
    

🧵 Linked issue

Closes #7612

Let me know if you'd like a follow-up test PR. Happy to write one!

…uggingface#7612)

This PR implements support for robust image decoding in the `Image` feature, as discussed in issue huggingface#7612.

## 🔧 What was added
- A new boolean field: `ignore_decode_errors` (default: `False`)
- If set to `True`, any exceptions during decoding will be caught, and `None` will be returned instead of raising an error

```python
features = Features({
    "image": Image(decode=True, ignore_decode_errors=True),
})
````

This enables robust iteration over potentially corrupted datasets — especially useful when streaming datasets like WebDataset or image-heavy public sets where sample corruption is common.

## 🧪 Behavior

* If `ignore_decode_errors=False` (default), decoding behaves exactly as before
* If `True`, decoding errors are caught, and a warning is emitted:

  ```
  [Image.decode_example] Skipped corrupted image: ...
  ```

## 🧵 Linked issue

Closes huggingface#7612

Let me know if you'd like a follow-up test PR. Happy to write one!
@ArjunJagdale
Copy link
Contributor Author

cc @lhoestq

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide an option of robust dataset iterator with error handling
1 participant