Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experimental: introduce img understand pipeline #95

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dolfim-ibm
Copy link
Contributor

This new feature creates a new ImgUnderstand pipeline which uses vision LLMs to describe the pictures contained in documents.

The pipeline allows to use

  1. Local LLM, via vLLM
  2. LLM as a service, e.g. on watsonx.ai or openai compatible apis

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the
    conventional commits.
  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

@dolfim-ibm
Copy link
Contributor Author

Offline LLM

vLLM

Pros:

  • efficiently run vision models offline, see the docs page.
  • supports different models without further specialization
  • already used by InstructLab and part of RHEL AI

Cons:

  • no support for mac (any architecture)
  • vLLM has an exact pinning of torch, which creates issues with poetry.
    • vllm==0.5.x depends on torch==2.3.0
    • vllm==0.6.x depends on torch==2.4.0.

HF transforms

Pros:

  • no strong pinning of torch

Cons:

  • more code needed
  • different models require different implementations, e.g. llava-next is different than phi-3-v.

# if the relative area of the image with respect to the whole image page
# is larger than this threshold it will be processed, otherwise not.
# TODO: implement the skip logic
min_area: float = 0.05
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call that min_area_frac to be aligned.

@PeterStaar-IBM
Copy link
Contributor

@dolfim-ibm could we not use some standard HF models (eg florence and onechart)?

@dolfim-ibm dolfim-ibm changed the title feat: introduce img understand pipeline experimental: introduce img understand pipeline Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants