👉 Now part of Docling!

Important

👉 Now part of Docling!

Quackling

Easily build document-native generative AI applications, such as RAG, leveraging Docling's efficient PDF extraction and rich data model — while still using your favorite framework, 🦙 LlamaIndex or 🦜🔗 LangChain.

Features

🧠 Enables rich gen AI applications by providing capabilities on native document level — not just plain text / Markdown!
⚡️ Leverages Docling's conversion quality and speed.
⚙️ Plug-and-play integration with LlamaIndex and LangChain for building powerful applications like RAG.

Installation

To use Quackling, simply install quackling from your package manager, e.g. pip:

pip install quackling

Usage

Quackling offers core capabilities (quackling.core), as well as framework integration components (quackling.llama_index and quackling.langchain). Below you find examples of both.

Basic RAG

Here is a basic RAG pipeline using LlamaIndex:

Note

To use as is, first pip install llama-index-embeddings-huggingface llama-index-llms-huggingface-api additionally to quackling to install the models. Otherwise, you can set EMBED_MODEL & LLM as desired, e.g. using local models.

import os

from llama_index.core import VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
from quackling.llama_index.node_parsers import HierarchicalJSONNodeParser
from quackling.llama_index.readers import DoclingPDFReader

DOCS = ["https://arxiv.org/pdf/2206.01062"]
QUESTION = "How many pages were human annotated?"
EMBED_MODEL = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
LLM = HuggingFaceInferenceAPI(
    token=os.getenv("HF_TOKEN"),
    model_name="mistralai/Mistral-7B-Instruct-v0.3",
)

index = VectorStoreIndex.from_documents(
    documents=DoclingPDFReader(parse_type=DoclingPDFReader.ParseType.JSON).load_data(DOCS),
    embed_model=EMBED_MODEL,
    transformations=[HierarchicalJSONNodeParser()],
)
query_engine = index.as_query_engine(llm=LLM)
result = query_engine.query(QUESTION)
print(result.response)
# > 80K pages were human annotated

Chunking

You can also use Quackling as a standalone with any pipeline. For instance, to split the document to chunks based on document structure and returning pointers to Docling document's nodes:

from docling.document_converter import DocumentConverter
from quackling.core.chunkers import HierarchicalChunker

doc = DocumentConverter().convert_single("https://arxiv.org/pdf/2408.09869").output
chunks = list(HierarchicalChunker().chunk(doc))
# > [
# >     ChunkWithMetadata(
# >         path='$.main-text[4]',
# >         text='Docling Technical Report\n[...]',
# >         page=1,
# >         bbox=[117.56, 439.85, 494.07, 482.42]
# >     ),
# >     [...]
# > ]

More examples

LlamaIndex

LangChain

Milvus basic RAG (dense embeddings)

Contributing

Please read Contributing to Quackling for details.

References

If you use Quackling in your projects, please consider citing the following:

@techreport{Docling,
  author = "Deep Search Team",
  month = 8,
  title = "Docling Technical Report",
  url = "https://arxiv.org/abs/2408.09869",
  eprint = "2408.09869",
  doi = "10.48550/arXiv.2408.09869",
  version = "1.0.0",
  year = 2024
}

License

The Quackling codebase is under MIT license. For individual component usage, please refer to the component licenses found in the original packages.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github		.github
examples		examples
quackling		quackling
resources		resources
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MAINTAINERS.md		MAINTAINERS.md
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

👉 Now part of Docling!

Quackling

Features

Installation

Usage

Basic RAG

Chunking

More examples

LlamaIndex

LangChain

Contributing

References

License

About

Releases 6

Contributors 4

Languages

License

DS4SD/quackling

Folders and files

Latest commit

History

Repository files navigation

👉 Now part of Docling!

Quackling

Features

Installation

Usage

Basic RAG

Chunking

More examples

LlamaIndex

LangChain

Contributing

References

License

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 6

Contributors 4

Languages