📚Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models
The Do-GOOD warehouse is the analysis for document changes in the three modal distributions of image, layout and text. It covers the generation of nine kinds of OOD data, the application of five shifts, the acquisition of FUNSD-H and FUNSD-R datasets, the generation of FUNSD-L datasets, and the running of two kinds of OOD baseline methods Deep Core and Mixup codes under all shift.
The shift type of the Do GOOD dataset is shown in the following figure.
This code is developed with
transformers 4.24.0
pytesseract 0.3.9
tesseract 0.1.3
textattack 0.3.7
python 3.9.11
yarl 1.7.2
detectron2 0.6
editdistance 0.6.0
einops 0.4.1
Installation for Project,if you need to study the robustness of the model to text shift, you need to install Textattack
git clone https://anonymous.4open.science/r/Do-GOOD-D88A && cd Do-GOOD
We provide manually labeled FUNSD-H and FUNSD-R, which can be obtained from the links below, and methods for generating FUNSD-L, CDIP-L, CDIP-I1 and CDIP-I2 datasets.
Dataset | Header | Question | Answer | Other | Total | Link |
---|---|---|---|---|---|---|
FUNSD | 122 | 1077 | 821 | 312 | 2332 | download |
FUNSD-H | 126 | 981 | 755 | 380 | 2304 | download |
FUNSD-R | 90 | 475 | 445 | 471 | 1487 | download |
First generate strong and weak semantic entities and get the following files , /weak_other_map
, /strong_answer_map
, /strong_question_map
, /weak_Q_map
, /weak_A_map
,We provide five strong and weak semantic entity libraries extracted from our shuffle layout method on the FUNSD test set for five different pre-training models ,You can choose to fill in v3
, v2
, v1
, bros
or lilt
in { } and execute the following code
python map_{ }_funsd_L.py
Then modify the file path to generate FUNSD-L test data , which is saved in the mix_test.txt
, you can modify the number of rows and columns generated by the layout, the size of the bounding box, the probability of random filling, and the number of documents generated
generate_ood_data("mix_test.txt", "/strong_question_map",
"/strong_answer_map", "/weak_Q_map",
"/weak_A_map", "/weak_other_map",50)
python gen_ood_mix.py
To facilitate use, we separately place it in the main directory, and adjust two parameters: lamda1 controls the horizontal distance, and lamda2 controls the vertical distance. We use the priority order of consolidation: horizontal first and then vertical
python merge_layout.py
Separate text pixels and non text pixels in the document, and then overlay them into the natural scene MSCOCO
python python mixup_image.py
Using pre-trained DocGeoNet(specific process reference), a forward propagation calculation of the normal document image is performed to get the distorted image, and then OCR again
python inference.py
Select the model used and the task fill, the first { } select v3
, v2
, v1
, bros
or lilt
, the second {} selectfunsd
or cdip
Finetune your own LayoutLMv3 model or download our finetuned model download,Select models and tasks to use
python -m torch.distributed.launch --nproc_per_node --use_env finetune_{ }_{ }.py --config config.yaml --output_dir
For VQA tasks, use the command line alone,fill in the selected model at { }
python docvqa_{}_main.py
Select the model used and the task fill, the first { } select v3
, v2
, v1
, bros
or lilt
, the second { } selectfunsd
or cdip
, modify the following parameters to perform a shift operation on a mode.--text_aug
,--image_aug
,--aut_layout
--text_aug={'WordSwapMaskedLM','WordSwapEmbedding','WordSwapHomoglyphSwap','WordSwapChangeNumber','WordSwapRandomCharacterDeletion'} , --image_aug=True/False , --aug_layout=True/False
python demo_{ }_ood_{ }.py