dots-ocr-client

A lightweight Python client for dots.ocr. Call either a self-hosted vLLM service or Replicate with the same API. No heavy deps, no file I/O. Based on the original project dots.ocr.

Usage

Use with Replicate

First, get your API token.

from dots_ocr_client.parser import DotsOCRParser

# Default: use the public model sljeff/dots.ocr
parser = DotsOCRParser(
    backend="replicate",
    api_token="your-replicate-token"  # Required
)
results = parser.parse_file("/path/to/file.pdf", prompt_mode="prompt_layout_all_en")

Advanced: use your own Replicate deployment

For better performance and dedicated resources, you can create your own deployment:

Create a deployment on Replicate:
- Go to your Replicate Deployments page
- Click "Create deployment"
- Select model: sljeff/dots.ocr
- Choose your hardware configuration
- Name your deployment (e.g., yourname/dots-ocr)
Get your API token from https://replicate.com/account/api-tokens
Use your deployment in code:

from dots_ocr_client.parser import DotsOCRParser

parser = DotsOCRParser(
    backend="replicate",
    api_token="your-api-token",               # Required
    replicate_deployment="yourname/dots-ocr",  # your deployment name
)
results = parser.parse_file("/path/to/file.pdf", prompt_mode="prompt_layout_all_en")

Use with vLLM

Prerequisite: have a running dots.ocr vLLM service.

from dots_ocr_client.parser import DotsOCRParser

parser = DotsOCRParser(
    backend="vllm",
    base_url="http://localhost:8000",  # Your vLLM server URL
    api_token="your-api-token",       # Optional, depends on your setup
    model_name="model",
)
results = parser.parse_file("/path/to/file.pdf", prompt_mode="prompt_layout_all_en")

Installation

Install this project directly from Git.

With uv:

uv add git+https://github.com/sljeff/dots-ocr-client.git

With pip:

pip install git+https://github.com/sljeff/dots-ocr-client.git

Why this fork & Differences

This is a client-only fork focusing on:

Minimal dependencies (no transformers/flash-attn, etc.)
Simple API to call existing deployments (vLLM or Replicate)
No file outputs; functions return in-memory results

API Reference

Constructor:

DotsOCRParser(
  backend: str = "vllm",                   # "vllm" or "replicate"
  base_url: str = "http://127.0.0.1:8000", # for vLLM backend
  api_token: str | None = None,            # API token for both backends
  model_name: str = "model",
  temperature: float = 0.1,
  top_p: float = 1.0,
  max_completion_tokens: int = 16384,
  num_thread: int = 64,
  dpi: int = 200,
  min_pixels: int | None = None,
  max_pixels: int | None = None,
  replicate_deployment: str | None = None, # if None and backend=replicate -> public model sljeff/dots.ocr
)

Methods:

parse_file(path, prompt_mode="prompt_layout_all_en", bbox=None, fitz_preprocess=False)
parse_pdf(input_path, filename, prompt_mode, save_dir)
parse_image(input_path, filename, prompt_mode, save_dir, bbox=None, fitz_preprocess=False)

The sampling parameters temperature, top_p, and max_completion_tokens have the same meaning on both backends.

Data Structure

The SDK returns a list of dictionaries, where each dictionary represents one page:

[
  {
    "page_no": 0,
    "file_path": "document.pdf",
    "input_height": 2212,
    "input_width": 1708,

    # Core data: detected layout elements
    "cells": [
      {
        "bbox": [41, 589, 103, 1587],
        "category": "Text",
        "text": "Extracted text..."
      },
      {
        "bbox": [167, 323, 1486, 464],
        "category": "Title",
        "text": "Document Title Here"
      }
      # ...
    ],

    # Additional outputs
    "image_with_layout": "<PIL.Image>",
    "md_content": "# Title\n...",
    "md_content_no_hf": "..."
  },
  # ... more pages
]

Common categories include: Text, Title, Table, Picture, Formula, Section-header, List-item, Caption, Footnote, Page-header, Page-footer, Other, Unknown.

License

See LICENSE and NOTICE.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
dots_ocr_client		dots_ocr_client
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

dots-ocr-client

Usage

Use with Replicate

Use with vLLM

Installation

Why this fork & Differences

API Reference

Data Structure

License

About

Uh oh!

Releases

Packages

Languages

License

ljsalvatierra-factorlibre/dots-ocr-client

Folders and files

Latest commit

History

Repository files navigation

dots-ocr-client

Usage

Use with Replicate

Use with vLLM

Installation

Why this fork & Differences

API Reference

Data Structure

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages