Skip to content

Latest commit

 

History

History
54 lines (41 loc) · 2.29 KB

README.md

File metadata and controls

54 lines (41 loc) · 2.29 KB

Haystack Docling integration

PyPI version PyPI - Python Version Poetry Code style: black Imports: isort Pydantic v2 pre-commit License MIT

A Docling integration for Haystack.

Installation

Simply install docling-haystack from your package manager, e.g. pip:

pip install docling-haystack

Usage

Basic usage

Basic usage of DoclingConverter looks as follows:

from haystack import Pipeline
from docling_haystack.converter import DoclingConverter

idx_pipe = Pipeline()
# ...
converter = DoclingConverter()
idx_pipe.add_component("converter", converter)
# ...

Advanced usage

When initializing a DoclingConverter, you can use the following parameters:

  • converter (optional): any specific Docling DocumentConverter instance to use
  • convert_kwargs (optional): any specific kwargs for conversion execution
  • export_type (optional): export mode to use: ExportType.DOC_CHUNKS (default) or ExportType.MARKDOWN
  • md_export_kwargs (optional): any specific Markdown export kwargs (for Markdown mode)
  • chunker (optional): any specific Docling chunker instance to use (for doc-chunk mode)
  • meta_extractor (optional): any specific metadata extractor to use

Example

For an end-to-end usage example, check out this notebook.