DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios

Yufeng Zhong, Zhixiong Zeng†, Lei Chen, Longrong Yang, Liming Zheng, Jing Huang, Siqi Yang, Lin Ma*

Meituan Group

† Project Leader; * Corresponding Author

Optical Character Recognition (OCR) for mathematical formula is essential for the intelligent analysis of scientific literature. However, both task-specific and general vision-language models often struggle to handle the structural diversity, complexity, and real-world variability inherent in mathematical content. In this work, we present DocTron-Formula, a unified framework built upon general vision-language models, thereby eliminating the need for specialized architectures. Furthermore, we introduce CSFormula, a large-scale and challenging dataset that encompasses multidisciplinary and structurally complex formulas at the line, paragraph, and page levels. Through straightforward supervised fine-tuning, our approach achieves state-of-the-art performance across a variety of styles, scientific domains, and complex layouts. Experimental results demonstrate that our method not only surpasses specialized models in terms of accuracy and robustness, but also establishes a new paradigm for the automated understanding of complex scientific documents.

📢 News and Updates

2025.08.01 We have released our model weights (DocTron-Formula) and an interactive Demo on Hugging Face.
2025.08.01 🔥🔥🔥 We release the technical report of DocTron-Formula at arXiv link.

🤗 Models

Model	Download Link
DocTron-Formula	DocTron/DocTron-Formula

The DocTron-Formula is Qwen2.5-VL-7B-Instruct fine-tuned via supervised learning on the Im2LaTeX-160k, the UniMER, and the CSFormula datasets.

📊 Performance

🔍 Usage Example

Clone the repo and download the model

git clone https://github.com/DocTron-hub/DocTron-Formula.git

Installation

conda create -n DTFormula python=3.10
conda activate DTFormula

pip install qwen_vl_utils torch transformers rapidfuzz

The following are three simple examples of how to use DocTron-Formula to predict LaTeX code from an image at the line level, paragraph level, and page level. If you want to test other cases, please first organize your data in JSON format, such as asset/test_jsons/line-level.json.

python demo.py --input_file line-level        # Test the line-level case
python demo.py --input_file paragraph-level   # Test the paragraph-level case
python demo.py --input_file page-level        # Test the page-level case

📌 Acknowledgement

We sincerely appreciate LLaMA-Factory for providing reference training framework.

📖 Citation

If you find this project useful, please feel free to leave a star and cite our paper:

@misc{zhong2025doctronformulageneralizedformularecognition,
      title={DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios}, 
      author={Yufeng Zhong and Zhixiong Zeng and Lei Chen and Longrong Yang and Liming Zheng and Jing Huang and Siqi Yang and Lin Ma},
      year={2025},
      eprint={2508.00311},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2508.00311}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
README.md		README.md
demo.py		demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios

📢 News and Updates

🤗 Models

📊 Performance

🔍 Usage Example

Clone the repo and download the model

Installation

📌 Acknowledgement

📖 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Languages

DocTron-hub/DocTron-Formula

Folders and files

Latest commit

History

Repository files navigation

DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios

📢 News and Updates

🤗 Models

📊 Performance

🔍 Usage Example

Clone the repo and download the model

Installation

📌 Acknowledgement

📖 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages