Skip to content

Docling integration #1337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

gabe-l-hart
Copy link
Contributor

@gabe-l-hart gabe-l-hart commented Apr 11, 2025

Description

This PR extends the local document parsing capabilities to allow the user to choose docling and optionally also use a VLM like smoldocling or granite3.2-vision for visual parsing. In addition to allowing users to choose docling for its output quality, this also allows users to input png and jpeg images as context.

Testing

I've tested this in conjunction with my #1278 branch for a fully local LLM / VLM researcher. My two methods of testing are via a custom script and via the nextjs CLI.

Custom Script

import asyncio
import json
import os
import sys
from gpt_researcher import GPTResearcher

os.environ["FAST_LLM"] = "ollama:granite3.2:2b"
os.environ["SMART_LLM"] = "ollama:granite3.2:8b"
os.environ["STRATEGIC_LLM"] = "ollama:granite3.2:8b"
os.environ["EMBEDDING"] = "ollama:granite-embedding:278m"
os.environ["OPENAI_API_KEY"] = "ollama"
os.environ["OLLAMA_BASE_URL"] = "http://localhost:11434"
os.environ["RETRIEVER"] = "duckduckgo"
os.environ["PROMPT_FAMILY"] = "granite"
os.environ["LLM_KWARGS"] = json.dumps({"num_ctx": 1024 * 128})

os.environ["REPORT_SOURCE"] = "hybrid"
os.environ["DOC_PATH"] = "./report_docs"
os.environ["CONVERT_WITH_DOCLING"]  = "True"
os.environ["DOCLING_VLM"] = "granite_vision_ollama"

if not (query := " ".join(sys.argv[1:])):
    query = input("?> ")

researcher = GPTResearcher(
    query=query,
    report_type="research_report",
    report_source="hybrid",
)

loop = asyncio.get_event_loop()
loop.run_until_complete(researcher.conduct_research())
loop.run_until_complete(researcher.write_report())

nextjs Frontend

I use the following env vars before running python main.py in the root of the repo

export FAST_LLM="ollama:granite3.2:2b"
export SMART_LLM="ollama:granite3.2:8b"
export STRATEGIC_LLM="ollama:granite3.2:8b"
export EMBEDDING="ollama:granite-embedding:278m"
export OPENAI_API_KEY="ollama"
export OLLAMA_BASE_URL="http://localhost:11434"
export RETRIEVER="duckduckgo"
export PROMPT_FAMILY="granite"
export LLM_KWARGS='{"num_ctx": 131072}'
export REPORT_SOURCE="hybrid"
export DOC_PATH="./report_docs"
export CONVERT_WITH_DOCLING="true"
export DOCLING_VLM="granite_vision_ollama"

From there, I've tested swapping to hybrid via the Preferences panel and then dropping in a .png screenshot to ensure it gets converted correctly.

Dependencies

This PR depends on getting the next release of docling out with ollama support for granite3.2-vision. I'm coordinating with @PeterStaar-IBM on that release, so it should go out soon!

Ollama support for Granite Vision 3.2 in Docling was added in 2.30.0

@gabe-l-hart gabe-l-hart marked this pull request as ready for review April 14, 2025 16:26
@assafelovic
Copy link
Owner

Hey @gabe-l-hart love your contributions! Same as before, can you please update related docs? Happy to merge afterward!

@gabe-l-hart
Copy link
Contributor Author

@assafelovic Docs are updated. Thanks!

Branch: DoclingIntegration

Signed-off-by: Gabe Goodhart <[email protected]>
Branch: DoclingIntegration

Signed-off-by: Gabe Goodhart <[email protected]>
Branch: DoclingIntegration

Signed-off-by: Gabe Goodhart <[email protected]>
Branch: DoclingIntegration

Signed-off-by: Gabe Goodhart <[email protected]>
Branch: DoclingIntegration

Signed-off-by: Gabe Goodhart <[email protected]>
NOTE: The ability to invoke Granite 3.2 Vision via Ollama was added in
2.30.0, so that is the required lower bound.

Branch: DoclingIntegration

Signed-off-by: Gabe Goodhart <[email protected]>
Branch: DoclingIntegration

Signed-off-by: Gabe Goodhart <[email protected]>
Branch: DoclingIntegration

Signed-off-by: Gabe Goodhart <[email protected]>
Branch: DoclingIntegration

Signed-off-by: Gabe Goodhart <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants