Skip to content
Change the repository type filter

All

    Repositories list

    • SigBench

      Public
      GNU General Public License v3.0
      0000Updated Jun 1, 2025Jun 1, 2025
    • [arXiv: 2505.17163] OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning
      Python
      Apache License 2.0
      23710Updated May 27, 2025May 27, 2025
    • PAVENet

      Public
      [IEEE TPAMI 2025] Official repository of "Privacy-Preserving Biometric Verification With Handwritten Random Digit String".
      Python
      GNU General Public License v3.0
      0600Updated May 25, 2025May 25, 2025
    • A Comprehensive Benchmark for Chinese Long Historical Document Understanding
      Python
      0100Updated May 23, 2025May 23, 2025
    • AutoHDR

      Public
      [ACL 2025 main] The official GitHub page of "Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration"
      0500Updated May 20, 2025May 20, 2025
    • DOLPHIN

      Public
      [IEEE TIFS 2024] Official repository of "Online Writer Retrieval with Chinese Handwritten Phrases: A Synergistic Temporal-Frequency Representation Learning Approach".
      Python
      GNU General Public License v3.0
      0900Updated May 17, 2025May 17, 2025
    • The official GitHub page of "AutoScaler: Self Scale Alignment for Handwritten Mathematical Expression Recognition"
      Python
      0300Updated May 15, 2025May 15, 2025
    • [PR 2025] The official GitHub page of "MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K Categories"
      Python
      02000Updated May 12, 2025May 12, 2025
    • ACP-RAG

      Public
      [NAACL 2025] Large-Scale Corpus Construction and Retrieval-Augmented Generation for Ancient Chinese Poetry: New Method and Data Insights (ACP-Corpus; ACP-QA; ACP-RAG)
      Python
      0300Updated May 6, 2025May 6, 2025
    • MCS-Bench

      Public
      Python
      0200Updated May 6, 2025May 6, 2025
    • C3bench

      Public
      C3 benchmark
      0210Updated Mar 30, 2025Mar 30, 2025
    • HisDoc1B

      Public
      11110Updated Mar 2, 2025Mar 2, 2025
    • Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.
      719300Updated Mar 1, 2025Mar 1, 2025
    • DCOH-120K

      Public
      1300Updated Feb 20, 2025Feb 20, 2025
    • RFUND

      Public
      [MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction"
      01900Updated Dec 4, 2024Dec 4, 2024
    • [EMNLP 2024] TongGu, a classical Chinese language model.
      13850Updated Sep 28, 2024Sep 28, 2024
    • WenMind

      Public
      WenMind benchmark.
      Python
      1600Updated Sep 26, 2024Sep 26, 2024
    • .github

      Public
      0000Updated Jun 4, 2024Jun 4, 2024
    • SCUT-EnsExam is a real-world handwritten text erasure dataset for examination paper scenarios, which consists of 545 examination paper images. The dataset is randomly divided into training set and test set of 430 and 115 images, respectively.
      01100Updated Dec 5, 2023Dec 5, 2023
    • Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)
      Python
      412400Updated Nov 13, 2023Nov 13, 2023
    • A CNN model builds with Pytorch and reaches 99.7% accuracy
      Python
      2400Updated May 1, 2021May 1, 2021