Closed
Description
Bug description
While testing #933 i have seen some errors (empty crops / non utf-8 strings / and so on)
We need to filter some invalid files/annotations
- ensure all ready to use datasets works fine with eval_detection eval_recognition scripts (TF and PT)
- unify recognition part with recognition dataset / word generator to return string directly instead of {labels: ['string']} update reco datasets and tests #954
detection:
- CORD (PT/TF) [datasets] update IC / SROIE / FUNSD / CORD #983
- FUNSD (PT/TF) [datasets] update IC / SROIE / FUNSD / CORD #983
- IC03 (PT/TF) [datasets] update IC / SROIE / FUNSD / CORD #983
- IC13 (PT/TF) validated by: @felixdittrich92
- IIIT5K (PT/TF) validated by: @felixdittrich92
- IMGUR5K (PT/TF) validated by: @felixdittrich92
- SROIE (PT/TF) [datasets] update IC / SROIE / FUNSD / CORD #983
- SVHN (PT/TF) validated by: @felixdittrich92
- SVT (PT/TF) [Fix] SVT dataset: clip box values and add shape and label check #955
- SynthText (PT/TF) validated by: @felixdittrich92
recognition:
- MJSynth (PT/TF) [Fix] MJSynth dataset: filter corrupted or missing images #956
- CORD (PT/TF) [datasets] update IC / SROIE / FUNSD / CORD #983
- FUNSD (PT/TF) [datasets] update IC / SROIE / FUNSD / CORD #983
- IC03 (PT/TF) [datasets] update IC / SROIE / FUNSD / CORD #983
- IC13 (PT/TF) validated by: @felixdittrich92
- IIIT5K (PT/TF) validated by: @felixdittrich92
- IMGUR5K (PT/TF) [datasets] Fix recognition parts of SynthText and IMGUR5K #1038
- SROIE (PT/TF) [datasets] update IC / SROIE / FUNSD / CORD #983 (NOTE: 99% contains whitespaces exluding not possible)
- SVHN (PT/TF) [datasets] revert whitespace filtering and fix svhn reco #987
- SVT (PT/TF) [Fix] SVT dataset: clip box values and add shape and label check #955
- SynthText (PT/TF) [datasets] Fix recognition parts of SynthText and IMGUR5K #1038