VisionLLM Batcher

VisionLLM Batcher is a local-first AI image batch processing tool designed to work with any large language model (LLM) that supports image inputs. The tool enables automated analysis and information extraction from batches of images using custom prompts. It is fully prompt-agnostic and can handle a variety of image-processing tasks beyond UI specifications.

We highly recommend using LM Studio in combination with the Qwen2-VL-7B-Instruct model for an efficient and private local inference setup. This model offers robust image understanding and high-quality text output. LM Studio makes it easy to run the model locally with a simple interface and API access. However, the tool is LLM-agnostic—so you're free to integrate any other LLM API that supports image inputs.

Features

Batch processes images using any compatible LLM
Extracts data from images based on user-defined prompts
Tracks processing history with batch IDs
Organized folder structure and auto-sorting
Clear logging and error handling
Progress tracking with visual feedback

Installation

Clone the repository:

git clone https://github.com/mhd-fettah/VisionLLM-Batcher.git

Navigate to the project directory:

cd VisionLLM-Batcher

Create and activate a virtual environment:

python -m venv venv
venv\Scripts\activate

Install required packages:

pip install -r requirements.txt

Create a .env file in the root directory. Configure the following variables:

LLM_API_URL=http://localhost:1234/v1/chat/completions
LLM_MODEL_NAME=qwen2-vl-7b-instruct

Usage

Place your images in the input_images folder
Add your prompt in input_images/prompt.txt
Run the application:

python main.py

Processed results will be saved in the output_responses folder

Sample Prompts

Customize the prompt to match the kind of data you want to extract from the images. Below are a few useful examples:

UI to laravel Dev Spec

Analyze the attached image of a user interface and generate a minimal product requirement summary with the following structure:

Controller:
    Controller name (e.g., CounselorController)
    List the actions/methods needed (e.g., edit, update, uploadProfilePicture)

Database:
    Main table name
    Fields (column names + types if visible/inferable)
    Note any relationships (e.g., city → country foreign key)

View:
    Blade file name (e.g., counselor/edit.blade.php)
    Key components or sections (e.g., profile image upload, basic info form)

Other Notes:
    Form behavior (e.g., dynamic dropdowns, validation, file upload handling)
    Required/optional field assumptions
    Permissions or role assumptions if relevant

- Format the response in clean bullet points, no extra explanation. Keep it short and developer-friendly.

UI to feature list Json

You are a UI feature analyzer. I will give you a screenshot.

Focus only on the central content area. Ignore sidebars, navigation, and footers completely.

Return your output strictly in the following JSON format:

{
  "pageTitle": "Extracted Title",
  "purpose": "What the screen is for",
  "fields": [
    "Field 1",
    "Field 2"
  ],
  "actions": [
    "Button 1",
    "Button 2"
  ]
}

Keep field and action names exactly as shown in the UI, and make sure the output is clean, minimal, and accurate. No extra explanations or comments — only valid JSON.

Extract UI Specifications

Extract a detailed specification sheet from this UI design image, including component names, hierarchy, dimensions, and color codes.

Extract All Visible Text

Extract all visible text from this image exactly as shown, maintaining the order and structure.

Describe the Image (General)

Provide a general description of what is happening in this image, mentioning people, objects, and context.

Describe a UI Design

Describe the purpose and layout of this UI design. Mention interactive components and potential user actions.

Extract Book Page Text

Extract and reconstruct the text from this scanned book page. Maintain paragraph structure and line breaks if possible.

Summarize Document Content

Summarize the content of this document image, focusing on key points, names, and any numerical data.

Object Identification

Identify and list all objects present in this image, along with how many of each are visible.

Requirements

Python 3.8+
An image-capable LLM API (e.g., LM Studio with Qwen2-VL-7B-Instruct)
Python packages:
- requests
- python-dotenv
- tqdm

Contributing

Pull requests are welcome. For major changes, please open an issue to discuss what you want to change first.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
input_images		input_images
.gitignore		.gitignore
README.md		README.md
env.example		env.example
main.py		main.py
visionllm-banner.png		visionllm-banner.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VisionLLM Batcher

Features

Installation

Usage

Sample Prompts

UI to laravel Dev Spec

UI to feature list Json

Extract UI Specifications

Extract All Visible Text

Describe the Image (General)

Describe a UI Design

Extract Book Page Text

Summarize Document Content

Object Identification

Requirements

Contributing

License

About

Uh oh!

Releases

Packages

Languages

mhd-fettah/VisionLLM-Batcher

Folders and files

Latest commit

History

Repository files navigation

VisionLLM Batcher

Features

Installation

Usage

Sample Prompts

UI to laravel Dev Spec

UI to feature list Json

Extract UI Specifications

Extract All Visible Text

Describe the Image (General)

Describe a UI Design

Extract Book Page Text

Summarize Document Content

Object Identification

Requirements

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages