DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer

This is a fork of the official repo for the paper "DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer", which is accepted to AAAI 2023.

Introduction

Abstract. Recently, Transformer-based methods, which predict polygon points or Bezier curve control points for localizing texts, are popular in scene text detection. However, these methods built upon detection transformer framework might achieve sub-optimal training efficiency and performance due to coarse positional query modeling. In addition, the point label form exploited in previous works implies the reading order of humans, which impedes the detection robustness from our observation. To address these challenges, this paper proposes a concise Dynamic Point Text DEtection TRansformer network, termed DPText-DETR. In detail, DPText-DETR directly leverages explicit point coordinates to generate position queries and dynamically updates them in a progressive way. Moreover, to improve the spatial inductive bias of non-local self-attention in Transformer, we present an Enhanced Factorized Self-Attention module which provides point queries within each instance with circular shape guidance. Furthermore, we design a simple yet effective positional label form to tackle the side effect of the previous form. To further evaluate the impact of different label forms on the detection robustness in real-world scenario, we establish an Inverse-Text test set containing 500 manually labeled images. Extensive experiments prove the high training efficiency, robustness, and state-of-the-art performance of our method on popular benchmarks.

Updates

[Apr.04, 2024] Repo forked from main repo. This fork may not reflect changes in the main repo from this point.

[Mar.07, 2023] The code and models of our latest work DeepSolo (CVPR 2023, Code) are released. 🔥🔥

[Nov.29, 2022] The code and models are released. The Arxiv version paper is updated.

[Jul.12, 2022] Inverse-Text is available.

[Jul.10, 2022]The paper is submitted to ArXiv. Inverse-Text test set will be available very soon. Work in progress.

Main Results

Benchmark	Backbone	Precision	Recall	F-measure	Pre-trained Model	Fine-tuned Model
Total-Text	Res50	91.8	86.4	89.0	OneDrive/Baidu(osxo)	OneDrive/Baidu(p268)
CTW1500	Res50	91.7	86.2	88.8	The same as above ↑	OneDrive/Baidu(disp)
ICDAR19 ArT	Res50	83.0	73.7	78.1	OneDrive/Baidu(7sfe)	OneDrive/Baidu(z8if)

Installation

git clone https://github.com/maps-as-data/DPText-DETR.git
cd DPText-DETR
pip install .

Citation

If you find DPText-DETR useful in your research, please consider citing:

@inproceedings{ye2022dptext,
  title={DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer},
  author={Ye, Maoyuan and Zhang, Jing and Zhao, Shanshan and Liu, Juhua and Du, Bo and Tao, Dacheng},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={37},
  number={3},
  pages={3241--3249},
  year={2023}
}

Acknowledgement

DPText-DETR is inspired a lot by Deformable DETR, DAB-DETR, and TESTR. Thanks for their great works!

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
configs/DPText_DETR		configs/DPText_DETR
demo		demo
figs		figs
src/dptext_detr		src/dptext_detr
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
process_positional_label.py		process_positional_label.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer

Introduction

Updates

Main Results

Installation

Citation

Acknowledgement

About

Releases

Packages

Languages

License

maps-as-data/DPText-DETR

Folders and files

Latest commit

History

Repository files navigation

DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer

Introduction

Updates

Main Results

Installation

Citation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages