[datasets] Filter currupted and wrong annotated files in ready to use datasets

### Bug description

While testing #933 i have seen some errors (empty crops / non utf-8 strings / and so on)
We need to filter some invalid files/annotations

- [x]  ensure all ready to use datasets works fine with eval_detection eval_recognition scripts (TF and PT)
- [ ]  unify recognition part with recognition dataset / word generator to return string directly instead of {labels: ['string']} #954 


**detection:**

- [x] CORD  (PT/TF) #983
- [x] FUNSD (PT/TF) #983 
- [x] IC03 (PT/TF) #983 
- [x] IC13 (PT/TF) validated by: @felixdittrich92 
- [x] IIIT5K (PT/TF) validated by: @felixdittrich92 
- [x] IMGUR5K (PT/TF) validated by: @felixdittrich92 
- [x] SROIE (PT/TF) #983 
- [x] SVHN (PT/TF) validated by: @felixdittrich92
- [x] SVT (PT/TF) #955 
- [x] SynthText (PT/TF) validated by: @felixdittrich92 

**recognition:**

- [x] MJSynth (PT/TF) #956 
- [x] CORD  (PT/TF) #983 
- [x] FUNSD (PT/TF) #983 
- [x] IC03 (PT/TF) #983 
- [x] IC13 (PT/TF) validated by: @felixdittrich92 
- [x] IIIT5K (PT/TF) validated by: @felixdittrich92 
- [x] IMGUR5K (PT/TF) #1038 
- [x] SROIE (PT/TF) #983 (NOTE: 99% contains whitespaces exluding not possible)
- [x] SVHN (PT/TF) #987 
- [x] SVT (PT/TF) #955 
- [x] SynthText (PT/TF)  #1038 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[datasets] Filter currupted and wrong annotated files in ready to use datasets #935

Bug description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[datasets] Filter currupted and wrong annotated files in ready to use datasets #935

Description

Bug description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions