Skip to content

Releases: opendatalab/MinerU

mineru-2.1.1-released

16 Jul 10:20
0bdbdff
Compare
Choose a tag to compare

What's Changed

  • 2025/07/16 2.1.1发布

    • bug修复
      • 修复pipeline在某些情况可能发生的文本块内容丢失问题 #3005
      • 修复sglang-client需要安装torch等不必要的包的问题 #2968
      • 更新dockerfile以修复linux字体缺失导致的解析文本内容不完整问题 #2915
    • 易用性更新
      • 更新compose.yaml,便于用户直接启动sglang-servermineru-apimineru-gradio服务
      • 启用全新的在线文档站点,简化readme,提供更好的文档体验
  • 2025/07/16 2.1.1 Released

    • Bug fixes
      • Fixed text block content loss issue that could occur in certain pipeline scenarios #3005
      • Fixed issue where sglang-client required unnecessary packages like torch #2968
      • Updated dockerfile to fix incomplete text content parsing due to missing fonts in Linux #2915
    • Usability improvements
      • Updated compose.yaml to facilitate direct startup of sglang-server, mineru-api, and mineru-gradio services
      • Launched brand new online documentation site, simplified readme, providing better documentation experience

New Contributors

Full Changelog: mineru-2.1.0-released...mineru-2.1.1-released

mineru-2.1.0-released

04 Jul 21:27
aa53c9a
Compare
Choose a tag to compare

What's Changed

  • 2025/07/05 2.1.0发布

    • 这是 MinerU 2 的第一个大版本更新,包含了大量新功能和改进,包含众多性能优化、体验优化和bug修复,具体更新内容如下:
    • 性能优化:
      • 大幅提升某些特定分辨率(长边2000像素左右)文档的预处理速度
      • 大幅提升pipeline后端批量处理大量页数较少(<10)文档时的后处理速度
      • pipline后端的layout分析速度提升约20%
    • 体验优化:
      • 内置开箱即用的fastapi服务gradio webui,详细使用方法请参考文档
      • sglang适配0.4.8版本,大幅降低vlm-sglang后端的显存要求,最低可在8G显存(Turing及以后架构)的显卡上运行
      • 对所有命令增加sglang的参数透传,使得sglang-engine后端可以与sglang-server一致,接收sglang的所有参数
      • 支持基于配置文件的功能扩展,包含自定义公式标识符开启标题分级功能自定义本地模型目录,详细使用方法请参考文档
    • 新特性:
      • pipeline后端更新 PP-OCRv5 多语种文本识别模型,支持法语、西班牙语、葡萄牙语、俄语、韩语等 37 种语言的文字识别,平均精度涨幅超30%。详情
      • pipeline后端增加对竖排文本的有限支持
  • 2025/07/05 Version 2.1.0 Released

    • This is the first major update of MinerU 2, which includes a large number of new features and improvements, covering significant performance optimizations, user experience enhancements, and bug fixes. The detailed update contents are as follows:
    • Performance Optimizations:
      • Significantly improved preprocessing speed for documents with specific resolutions (around 2000 pixels on the long side).
      • Greatly enhanced post-processing speed when the pipeline backend handles batch processing of documents with fewer pages (<10 pages).
      • Layout analysis speed of the pipeline backend has been increased by approximately 20%.
    • Experience Enhancements:
      • Built-in ready-to-use fastapi service and gradio webui. For detailed usage instructions, please refer to Documentation.
      • Adapted to sglang version 0.4.8, significantly reducing the GPU memory requirements for the vlm-sglang backend. It can now run on graphics cards with as little as 8GB GPU memory (Turing architecture or newer).
      • Added transparent parameter passing for all commands related to sglang, allowing the sglang-engine backend to receive all sglang parameters consistently with the sglang-server.
      • Supports feature extensions based on configuration files, including custom formula delimiters, enabling heading classification, and customizing local model directories. For detailed usage instructions, please refer to Documentation.
    • New Features:
      • Updated the pipeline backend with the PP-OCRv5 multilingual text recognition model, supporting text recognition in 37 languages such as French, Spanish, Portuguese, Russian, and Korean, with an average accuracy improvement of over 30%. Details
      • Introduced limited support for vertical text layout in the pipeline backend.

New Contributors

Full Changelog: mineru-2.0.6-released...mineru-2.1.0-released

mineru-2.0.6-released

20 Jun 11:52
48085d0
Compare
Choose a tag to compare

What's Changed

  • 2025/06/20 2.0.6发布
    • 修复vlm模式下,某些偶发的无效块内容导致解析中断问题 #2687 #2749
    • 修复vlm模式下,某些不完整的表结构导致的解析中断问题 #2690
  • 2025/06/20 2.0.6 Released
    • Fixed occasional parsing interruptions caused by invalid block content in vlm mode #2687 #2749
    • Fixed parsing interruptions caused by incomplete table structures in vlm mode #2690

New Contributors

Full Changelog: mineru-2.0.5-released...mineru-2.0.6-released

mineru-2.0.5-released

17 Jun 14:03
e8865a6
Compare
Choose a tag to compare

What's Changed

  • 2025/06/17 2.0.5发布

    • 修复了sglang-client模式下依然需要下载模型的问题
    • 修复了sglang-client模式需要依赖torch等实际运行不需要的包的问题
    • 修复了同一进程内尝试通过多个url启动多个sglang-client实例时,只有第一个生效的问题
  • 2025/06/17 2.0.5 Released

    • Fixed the issue where models were still required to be downloaded in the sglang-client mode
    • Fixed the issue where the sglang-client mode unnecessarily depended on packages like torch during runtime.
    • Fixed the issue where only the first instance would take effect when attempting to launch multiple sglang-client instances via multiple URLs within the same process

New Contributors

Full Changelog: mineru-2.0.3-released...mineru-2.0.5-released

mineru-2.0.3-released

15 Jun 03:15
9f0008a
Compare
Choose a tag to compare

What's Changed

  • 2025/06/15 2.0.3发布

    • 修复了当下载模型类型设置为all时,配置文件出现键值更新错误的问题 #2643
    • 修复了命令行模式下公式和表格功能开关不生效导致功能无法关闭的问题 #2641
    • 修复了sglang-engine模式下,0.4.7版本sglang的兼容性问题 #2651
    • 更新了sglang环境下部署完整版MinerU的Dockerfile和相关安装文档
  • 2025/06/15 2.0.3 released

    • Fixed a configuration file key-value update error that occurred when downloading model type was set to all #2643
    • Fixed the issue where the formula and table feature toggle switches were not working in command line mode, causing the features to remain enabled. #2641
    • Fixed compatibility issues with sglang version 0.4.7 in the sglang-engine mode. #2651
    • Updated Dockerfile and installation documentation for deploying the full version of MinerU in sglang environment

Full Changelog: mineru-2.0.0-released...mineru-2.0.3-released

mineru-2.0.0-released

13 Jun 12:59
c5480b9
Compare
Choose a tag to compare
  • 2025/06/13 2.0.0发布

    • MinerU 2.0 是一次从架构到功能的全面重构与升级,带来了更简洁的设计、更强的性能以及更灵活的使用体验。
      • 全新架构:MinerU 2.0 在代码结构和交互方式上进行了深度重构,显著提升了系统的易用性、可维护性与扩展能力。
        • 去除第三方依赖限制:彻底移除对 pymupdf 的依赖,推动项目向更开放、合规的开源方向迈进。
        • 开箱即用,配置便捷:无需手动编辑 JSON 配置文件,绝大多数参数已支持命令行或 API 直接设置。
        • 模型自动管理:新增模型自动下载与更新机制,用户无需手动干预即可完成模型部署。
        • 离线部署友好:提供内置模型下载命令,支持完全断网环境下的部署需求。
        • 代码结构精简:移除数千行冗余代码,简化类继承逻辑,显著提升代码可读性与开发效率。
        • 统一中间格式输出:采用标准化的 middle_json 格式,兼容多数基于该格式的二次开发场景,确保生态业务无缝迁移。
      • 全新模型:MinerU 2.0 集成了我们最新研发的小参数量、高性能多模态文档解析模型,实现端到端的高速、高精度文档理解。
        • 小模型,大能力:模型参数不足 1B,却在解析精度上超越传统 72B 级别的视觉语言模型(VLM)。
        • 多功能合一:单模型覆盖多语言识别、手写识别、版面分析、表格解析、公式识别、阅读顺序排序等核心任务。
        • 极致推理速度:在单卡 NVIDIA 4090 上通过 sglang 加速,达到峰值吞吐量超过 10,000 token/s,轻松应对大规模文档处理需求。
        • 在线体验:您可在我们的huggingface demo上在线体验该模型:体验链接
      • 不兼容变更说明:为提升整体架构合理性与长期可维护性,本版本包含部分不兼容的变更:
        • Python 包名从 magic-pdf 更改为 mineru,命令行工具也由 magic-pdf 改为 mineru,请同步更新脚本与调用命令。
        • 出于对系统模块化设计与生态一致性的考虑,MinerU 2.0 已不再内置 LibreOffice 文档转换模块。如需处理 Office 文档,建议通过独立部署的 LibreOffice 服务先行转换为 PDF 格式,再进行后续解析操作。
  • 2025/06/13 2.0.0 Released

    • MinerU 2.0 represents a comprehensive reconstruction and upgrade from architecture to functionality, delivering a more streamlined design, enhanced performance, and more flexible user experience.
      • New Architecture: MinerU 2.0 has been deeply restructured in code organization and interaction methods, significantly improving system usability, maintainability, and extensibility.
        • Removal of Third-party Dependency Limitations: Completely eliminated the dependency on pymupdf, moving the project toward a more open and compliant open-source direction.
        • Ready-to-use, Easy Configuration: No need to manually edit JSON configuration files; most parameters can now be set directly via command line or API.
        • Automatic Model Management: Added automatic model download and update mechanisms, allowing users to complete model deployment without manual intervention.
        • Offline Deployment Friendly: Provides built-in model download commands, supporting deployment requirements in completely offline environments.
        • Streamlined Code Structure: Removed thousands of lines of redundant code, simplified class inheritance logic, significantly improving code readability and development efficiency.
        • Unified Intermediate Format Output: Adopted standardized middle_json format, compatible with most secondary development scenarios based on this format, ensuring seamless ecosystem business migration.
      • New Model: MinerU 2.0 integrates our latest small-parameter, high-performance multimodal document parsing model, achieving end-to-end high-speed, high-precision document understanding.
        • Small Model, Big Capabilities: With parameters under 1B, yet surpassing traditional 72B-level vision-language models (VLMs) in parsing accuracy.
        • Multiple Functions in One: A single model covers multilingual recognition, handwriting recognition, layout analysis, table parsing, formula recognition, reading order sorting, and other core tasks.
        • Ultimate Inference Speed: Achieves peak throughput exceeding 10,000 tokens/s through sglang acceleration on a single NVIDIA 4090 card, easily handling large-scale document processing requirements.
        • Online Experience: You can experience this model online on our Hugging Face demo: Experience Link
      • Incompatible Changes Notice: To improve overall architectural rationality and long-term maintainability, this version contains some incompatible changes:
        • Python package name changed from magic-pdf to mineru, and the command-line tool changed from magic-pdf to mineru. Please update your scripts and command calls accordingly.
        • For modular system design and ecosystem consistency considerations, MinerU 2.0 no longer includes the LibreOffice document conversion module. If you need to process Office documents, we recommend converting them to PDF format through an independently deployed LibreOffice service before proceeding with subsequent parsing operations.

New Contributors

Full Changelog: magic_pdf-1.3.12-released...mineru-2.0.0-released

magic_pdf-1.3.12-released

24 May 08:09
a989444
Compare
Choose a tag to compare

What's Changed

  • 2025/05/24 1.3.12 Released

    • Added support for ppocrv5 model, updated ch_server model to PP-OCRv5_rec_server and ch_lite model to PP-OCRv5_rec_mobile (model update required)
      • In testing, we found that ppocrv5(server) shows some improvement for handwritten documents, but slightly lower accuracy than v4_server_doc for other document types. Therefore, the default ch model remains unchanged as PP-OCRv4_server_rec_doc.
      • Since ppocrv5 enhances recognition capabilities for handwritten text and special characters, you can manually select ppocrv5 models for Japanese, traditional Chinese mixed scenarios and handwritten document scenarios
      • You can select the appropriate model through the lang parameter lang='ch_server' (python api) or --lang ch_server (command line):
        • ch: PP-OCRv4_rec_server_doc (default) (Chinese, English, Japanese, Traditional Chinese mixed/15k dictionary)
        • ch_server: PP-OCRv5_rec_server (Chinese, English, Japanese, Traditional Chinese mixed + handwriting/18k dictionary)
        • ch_lite: PP-OCRv5_rec_mobile (Chinese, English, Japanese, Traditional Chinese mixed + handwriting/18k dictionary)
        • ch_server_v4: PP-OCRv4_rec_server (Chinese, English mixed/6k dictionary)
        • ch_lite_v4: PP-OCRv4_rec_mobile (Chinese, English mixed/6k dictionary)
    • Added support for handwritten documents by optimizing layout recognition of handwritten text areas
      • This feature is supported by default, no additional configuration needed
      • You can refer to the instructions above to manually select ppocrv5 model for better handwritten document parsing
  • 2025/05/24 1.3.12 发布

    • 增加ppocrv5模型的支持,将ch_server模型更新为PP-OCRv5_rec_serverch_lite模型更新为PP-OCRv5_rec_mobile(需更新模型)
      • 在测试中,发现ppocrv5(server)对手写文档效果有一定提升,但在其余类别文档的精度略差于v4_server_doc,因此默认的ch模型保持不变,仍为PP-OCRv4_server_rec_doc
      • 由于ppocrv5强化了手写场景和特殊字符的识别能力,因此您可以在日繁混合场景以及手写文档场景下手动选择使用ppocrv5模型
      • 您可通过lang参数lang='ch_server'(python api)或--lang ch_server(命令行)自行选择相应的模型:
        • chPP-OCRv4_rec_server_doc(默认)(中英日繁混合/1.5w字典)
        • ch_serverPP-OCRv5_rec_server(中英日繁混合+手写场景/1.8w字典)
        • ch_litePP-OCRv5_rec_mobile(中英日繁混合+手写场景/1.8w字典)
        • ch_server_v4PP-OCRv4_rec_server(中英混合/6k字典)
        • ch_lite_v4PP-OCRv4_rec_mobile(中英混合/6k字典)
    • 增加手写文档的支持,通过优化layout对手写文本区域的识别,现已支持手写文档的解析
      • 默认支持此功能,无需额外配置
      • 可以参考上述说明,手动选择ppocrv5模型以获得更好的手写文档解析效果

Full Changelog: magic_pdf-1.3.11-released...magic_pdf-1.3.12-released

magic_pdf-1.3.11-released

14 May 02:37
ea61928
Compare
Choose a tag to compare

What's Changed

  • Limit Python version <3.14
  • Support torch==2.7
  • Update pdfminer-six to the latest version

New Contributors

Full Changelog: magic_pdf-1.3.10-released...magic_pdf-1.3.11-released

magic_pdf-1.3.10-released

29 Apr 08:08
8802687
Compare
Choose a tag to compare

What's Changed

  • 2025/04/29 1.3.10 Released

    • Support for custom formula delimiters can be achieved by modifying the latex-delimiter-config item in the magic-pdf.json file under the user directory.
    • Pinned pdfminer.six to version 20250324 to prevent parsing failures caused by new versions.
  • 2025/04/29 1.3.10 发布

    • 支持使用自定义公式标识符,可通过修改用户目录下的magic-pdf.json文件中的latex-delimiter-config项实现。
    • 锁定pdfminer.six20250324版本,以避免新版本导致的解析失败问题。

Full Changelog: magic_pdf-1.3.9-released...magic_pdf-1.3.10-released

magic_pdf-1.3.9-released

27 Apr 10:25
0d5762e
Compare
Choose a tag to compare

What's Changed

  • 1.3.9 Released

    • Optimized the formula parsing function to improve the success rate of formula rendering
    • Updated pdfminer.six to the latest version, fixing some abnormal PDF parsing issues
  • 1.3.9 发布

    • 优化公式解析功能,提升公式渲染的成功率
    • 更新pdfminer.six到最新版本,修复了部分pdf解析异常问题

New Contributors

Full Changelog: magic_pdf-1.3.8-released...magic_pdf-1.3.9-released