Skip to content

DocTron-hub/DocTron-Formula

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios

Yufeng Zhong, Zhixiong Zeng†, Lei Chen, Longrong Yang, Liming Zheng, Jing Huang, Siqi Yang, Lin Ma*
Meituan Group
† Project Leader; * Corresponding Author

Optical Character Recognition (OCR) for mathematical formula is essential for the intelligent analysis of scientific literature. However, both task-specific and general vision-language models often struggle to handle the structural diversity, complexity, and real-world variability inherent in mathematical content. In this work, we present DocTron-Formula, a unified framework built upon general vision-language models, thereby eliminating the need for specialized architectures. Furthermore, we introduce CSFormula, a large-scale and challenging dataset that encompasses multidisciplinary and structurally complex formulas at the line, paragraph, and page levels. Through straightforward supervised fine-tuning, our approach achieves state-of-the-art performance across a variety of styles, scientific domains, and complex layouts. Experimental results demonstrate that our method not only surpasses specialized models in terms of accuracy and robustness, but also establishes a new paradigm for the automated understanding of complex scientific documents.

📢 News and Updates

  • 2025.08.01 We have released our model weights (DocTron-Formula) and an interactive Demo on Hugging Face.
  • 2025.08.01 🔥🔥🔥 We release the technical report of DocTron-Formula at arXiv link.

🤗 Models

Model Download Link
DocTron-Formula DocTron/DocTron-Formula

The DocTron-Formula is Qwen2.5-VL-7B-Instruct fine-tuned via supervised learning on the Im2LaTeX-160k, the UniMER, and the CSFormula datasets.

📊 Performance

🔍 Usage Example

Clone the repo and download the model

git clone https://github.com/DocTron-hub/DocTron-Formula.git

Installation

conda create -n DTFormula python=3.10
conda activate DTFormula

pip install qwen_vl_utils torch transformers rapidfuzz

The following are three simple examples of how to use DocTron-Formula to predict LaTeX code from an image at the line level, paragraph level, and page level. If you want to test other cases, please first organize your data in JSON format, such as asset/test_jsons/line-level.json.

python demo.py --input_file line-level        # Test the line-level case
python demo.py --input_file paragraph-level   # Test the paragraph-level case
python demo.py --input_file page-level        # Test the page-level case

📌 Acknowledgement

We sincerely appreciate LLaMA-Factory for providing reference training framework.

📖 Citation

If you find this project useful, please feel free to leave a star and cite our paper:

@misc{zhong2025doctronformulageneralizedformularecognition,
      title={DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios}, 
      author={Yufeng Zhong and Zhixiong Zeng and Lei Chen and Longrong Yang and Liming Zheng and Jing Huang and Siqi Yang and Lin Ma},
      year={2025},
      eprint={2508.00311},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2508.00311}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages