📚Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models

Overview

The Do-GOOD warehouse is the analysis for document changes in the three modal distributions of image, layout and text. It covers the generation of nine kinds of OOD data, the application of five shifts, the acquisition of FUNSD-H and FUNSD-R datasets, the generation of FUNSD-L datasets, and the running of two kinds of OOD baseline methods Deep Core and Mixup codes under all shift.

The shift type of the Do GOOD dataset is shown in the following figure.

Requirement

This code is developed with

transformers              4.24.0 
pytesseract               0.3.9 
tesseract                 0.1.3     
textattack                0.3.7 
python                    3.9.11
yarl                      1.7.2
detectron2                0.6                         
editdistance              0.6.0                    
einops                    0.4.1

Installation

Installation for Project，if you need to study the robustness of the model to text shift, you need to install Textattack

git clone https://anonymous.4open.science/r/Do-GOOD-D88A && cd Do-GOOD

Datasets

We provide manually labeled FUNSD-H and FUNSD-R, which can be obtained from the links below, and methods for generating FUNSD-L, CDIP-L, CDIP-I₁ and CDIP-I₂ datasets.

Dataset	Header	Question	Answer	Other	Total	Link
FUNSD	122	1077	821	312	2332	download
FUNSD-H	126	981	755	380	2304	download
FUNSD-R	90	475	445	471	1487	download

Generate FUNSD-L

First generate strong and weak semantic entities and get the following files , /weak_other_map , /strong_answer_map , /strong_question_map , /weak_Q_map , /weak_A_map,We provide five strong and weak semantic entity libraries extracted from our shuffle layout method on the FUNSD test set for five different pre-training models ,You can choose to fill in v3, v2, v1, bros or lilt in { } and execute the following code

python map_{ }_funsd_L.py

Then modify the file path to generate FUNSD-L test data , which is saved in the mix_test.txt , you can modify the number of rows and columns generated by the layout, the size of the bounding box, the probability of random filling, and the number of documents generated

generate_ood_data("mix_test.txt", "/strong_question_map",
                  "/strong_answer_map", "/weak_Q_map",
                  "/weak_A_map", "/weak_other_map",50)

python gen_ood_mix.py

Generate CDIP-L

To facilitate use, we separately place it in the main directory, and adjust two parameters: lamda1 controls the horizontal distance, and lamda2 controls the vertical distance. We use the priority order of consolidation: horizontal first and then vertical

python merge_layout.py

Generate CDIP-I₁

Separate text pixels and non text pixels in the document, and then overlay them into the natural scene MSCOCO

python python mixup_image.py

Generate CDIP-I₂

Using pre-trained DocGeoNet(specific process reference), a forward propagation calculation of the normal document image is performed to get the distorted image, and then OCR again

python inference.py

Tuning and Testing

Tuning

Select the model used and the task fill, the first { } select v3, v2, v1, bros or lilt, the second {} selectfunsd or cdip

Finetune your own LayoutLMv3 model or download our finetuned model download，Select models and tasks to use

python -m torch.distributed.launch --nproc_per_node --use_env finetune_{ }_{ }.py --config config.yaml --output_dir

For VQA tasks, use the command line alone，fill in the selected model at { }

python docvqa_{}_main.py

Testing

Select the model used and the task fill, the first { } select v3, v2, v1, bros or lilt, the second { } selectfunsd or cdip ， modify the following parameters to perform a shift operation on a mode.--text_aug,--image_aug,--aut_layout

--text_aug={'WordSwapMaskedLM','WordSwapEmbedding','WordSwapHomoglyphSwap','WordSwapChangeNumber','WordSwapRandomCharacterDeletion'} , --image_aug=True/False , --aug_layout=True/False

python demo_{ }_ood_{ }.py

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
finetune		finetune
models		models
preprocess		preprocess
test		test
utils		utils
README.md		README.md
config.yaml		config.yaml
demo.py		demo.py
gen_ood_mix.py		gen_ood_mix.py
get_aug_image.py		get_aug_image.py
merge_layout.py		merge_layout.py
mixup_image.py		mixup_image.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📚Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models

Table of contents

Overview

Requirement

Installation

Datasets

Generate FUNSD-L

Generate CDIP-L

Generate CDIP-I₁

Generate CDIP-I₂

Tuning and Testing

Tuning

Testing

Results

The ID and OOD performance of the existing models

Incremental training results on the FUNSD and CDIP datasets

Algorithm details

Visualize the results

About

Releases

Packages

Contributors 2

Languages

MAEHCM/Do-GOOD

Folders and files

Latest commit

History

Repository files navigation

📚Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models

Table of contents

Overview

Requirement

Installation

Datasets

Generate FUNSD-L

Generate CDIP-L

Generate CDIP-I1

Generate CDIP-I2

Tuning and Testing

Tuning

Testing

Results

The ID and OOD performance of the existing models

Incremental training results on the FUNSD and CDIP datasets

Algorithm details

Visualize the results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Generate CDIP-I₁

Generate CDIP-I₂

Packages