experimental: introduce img understand pipeline #95

dolfim-ibm · 2024-09-22T18:30:44Z

This new feature creates a new ImgUnderstand pipeline which uses vision LLMs to describe the pictures contained in documents.

The pipeline allows to use

Local LLM, via vLLM
LLM as a service, e.g. on watsonx.ai or openai compatible apis

Checklist:

Commit Message Formatting: Commit titles and messages follow guidelines in the
conventional commits.
Documentation has been updated, if necessary.
Examples have been added, if necessary.
Tests have been added, if necessary.

Signed-off-by: Michele Dolfi <[email protected]>

dolfim-ibm · 2024-09-22T18:41:58Z

Offline LLM

vLLM

Pros:

efficiently run vision models offline, see the docs page.
supports different models without further specialization
already used by InstructLab and part of RHEL AI

Cons:

no support for mac (any architecture)
vLLM has an exact pinning of torch, which creates issues with poetry.
- vllm==0.5.x depends on torch==2.3.0
- vllm==0.6.x depends on torch==2.4.0.

HF `transforms`

Pros:

no strong pinning of torch

Cons:

more code needed
different models require different implementations, e.g. llava-next is different than phi-3-v.

cau-git · 2024-09-23T09:36:54Z

docling/models/img_understand_base_model.py

+    # if the relative area of the image with respect to the whole image page
+    # is larger than this threshold it will be processed, otherwise not.
+    # TODO: implement the skip logic
+    min_area: float = 0.05


Let's call that min_area_frac to be aligned.

PeterStaar-IBM · 2024-09-24T04:31:27Z

@dolfim-ibm could we not use some standard HF models (eg florence and onechart)?

introduce img understand pipeline

a122a7b

Signed-off-by: Michele Dolfi <[email protected]>

cau-git reviewed Sep 23, 2024

View reviewed changes

dolfim-ibm changed the title ~~feat: introduce img understand pipeline~~ experimental: introduce img understand pipeline Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experimental: introduce img understand pipeline #95

experimental: introduce img understand pipeline #95

dolfim-ibm commented Sep 22, 2024

dolfim-ibm commented Sep 22, 2024

cau-git Sep 23, 2024

PeterStaar-IBM commented Sep 24, 2024

experimental: introduce img understand pipeline #95

Are you sure you want to change the base?

experimental: introduce img understand pipeline #95

Conversation

dolfim-ibm commented Sep 22, 2024

dolfim-ibm commented Sep 22, 2024

Offline LLM

vLLM

HF transforms

cau-git Sep 23, 2024

Choose a reason for hiding this comment

PeterStaar-IBM commented Sep 24, 2024

HF `transforms`