Unofficial implementation of DocMAE: Document Image Rectification via Self-supervised Representation Learning
https://arxiv.org/abs/2304.10341
- Document background segmentation network using U2 net
- Synthetic data generation for self-supervised pre-training
- Pre-training
- Fine-tuning for document rectification (In progress)
- Evaluation
- Code clean up and documentation
- Model release
Find a jupyter notebook at demo/background_segmentation.ipynb
- 3411482 pages from ~1M documents from Docile dataset (https://github.com/rossumai/docile)
- Rendered with Doc3D https://github.com/Dawars/doc3D-renderer
- 558 HDR env lighting from https://hdri-haven.com/
Pretraining on 200k documents:
python pretrain.py -c config/config.json
Visualize trained model using https://github.com/NielsRogge/Transformers-Tutorials/blob/master/ViTMAE/ViT_MAE_visualization_demo.ipynb
Test documents come from DIR300 dataset https://github.com/fh2019ustc/DocGeoNet