Optical Character Recognition (OCR) for mathematical formula is essential for the intelligent analysis of scientific literature. However, both task-specific and general vision-language models often struggle to handle the structural diversity, complexity, and real-world variability inherent in mathematical content. In this work, we present DocTron-Formula, a unified framework built upon general vision-language models, thereby eliminating the need for specialized architectures. Furthermore, we introduce CSFormula, a large-scale and challenging dataset that encompasses multidisciplinary and structurally complex formulas at the line, paragraph, and page levels. Through straightforward supervised fine-tuning, our approach achieves state-of-the-art performance across a variety of styles, scientific domains, and complex layouts. Experimental results demonstrate that our method not only surpasses specialized models in terms of accuracy and robustness, but also establishes a new paradigm for the automated understanding of complex scientific documents.
2025.08.01
We have released our model weights (DocTron-Formula) and an interactive Demo on Hugging Face.2025.08.01
🔥🔥🔥 We release the technical report of DocTron-Formula at arXiv link.
Model | Download Link |
---|---|
DocTron-Formula | DocTron/DocTron-Formula |
The DocTron-Formula
is Qwen2.5-VL-7B-Instruct fine-tuned via supervised learning on the Im2LaTeX-160k, the UniMER, and the CSFormula datasets.
git clone https://github.com/DocTron-hub/DocTron-Formula.git
conda create -n DTFormula python=3.10
conda activate DTFormula
pip install qwen_vl_utils torch transformers rapidfuzz
The following are three simple examples of how to use DocTron-Formula to predict LaTeX code from an image at the line level, paragraph level, and page level. If you want to test other cases, please first organize your data in JSON format, such as asset/test_jsons/line-level.json
.
python demo.py --input_file line-level # Test the line-level case
python demo.py --input_file paragraph-level # Test the paragraph-level case
python demo.py --input_file page-level # Test the page-level case
We sincerely appreciate LLaMA-Factory for providing reference training framework.
If you find this project useful, please feel free to leave a star and cite our paper:
@misc{zhong2025doctronformulageneralizedformularecognition,
title={DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios},
author={Yufeng Zhong and Zhixiong Zeng and Lei Chen and Longrong Yang and Liming Zheng and Jing Huang and Siqi Yang and Lin Ma},
year={2025},
eprint={2508.00311},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.00311},
}