A Docling integration for Haystack.
Simply install docling-haystack
from your package manager, e.g. pip:
pip install docling-haystack
Basic usage of DoclingConverter
looks as follows:
from haystack import Pipeline
from docling_haystack.converter import DoclingConverter
idx_pipe = Pipeline()
# ...
converter = DoclingConverter()
idx_pipe.add_component("converter", converter)
# ...
When initializing a DoclingConverter
, you can use the following parameters:
converter
(optional): any specific DoclingDocumentConverter
instance to useconvert_kwargs
(optional): any specific kwargs for conversion executionexport_type
(optional): export mode to use:ExportType.DOC_CHUNKS
(default) orExportType.MARKDOWN
md_export_kwargs
(optional): any specific Markdown export kwargs (for Markdown mode)chunker
(optional): any specific Docling chunker instance to use (for doc-chunk mode)meta_extractor
(optional): any specific metadata extractor to use
For an end-to-end usage example, check out this notebook.