Skip to content

VisionLLM Batcher is a local-first batch processing tool for images using any LLM that supports vision. Automate analysis and information extraction across image batches with custom prompts. Fully prompt-agnostic and model-flexible—built for tasks beyond basic UI specs.

Notifications You must be signed in to change notification settings

mhd-fettah/VisionLLM-Batcher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VisionLLM Batcher

VisionLLM Batcher is a local-first AI image batch processing tool designed to work with any large language model (LLM) that supports image inputs. The tool enables automated analysis and information extraction from batches of images using custom prompts. It is fully prompt-agnostic and can handle a variety of image-processing tasks beyond UI specifications.

VisionLLM Batcher banner

We highly recommend using LM Studio in combination with the Qwen2-VL-7B-Instruct model for an efficient and private local inference setup. This model offers robust image understanding and high-quality text output. LM Studio makes it easy to run the model locally with a simple interface and API access. However, the tool is LLM-agnostic—so you're free to integrate any other LLM API that supports image inputs.

Features

  • Batch processes images using any compatible LLM
  • Extracts data from images based on user-defined prompts
  • Tracks processing history with batch IDs
  • Organized folder structure and auto-sorting
  • Clear logging and error handling
  • Progress tracking with visual feedback

Installation

  1. Clone the repository:
git clone https://github.com/mhd-fettah/VisionLLM-Batcher.git
  1. Navigate to the project directory:
cd VisionLLM-Batcher
  1. Create and activate a virtual environment:
python -m venv venv
venv\Scripts\activate
  1. Install required packages:
pip install -r requirements.txt
  1. Create a .env file in the root directory. Configure the following variables:
LLM_API_URL=http://localhost:1234/v1/chat/completions
LLM_MODEL_NAME=qwen2-vl-7b-instruct

Usage

  1. Place your images in the input_images folder
  2. Add your prompt in input_images/prompt.txt
  3. Run the application:
python main.py
  1. Processed results will be saved in the output_responses folder

Sample Prompts

Customize the prompt to match the kind of data you want to extract from the images. Below are a few useful examples:

UI to laravel Dev Spec

Analyze the attached image of a user interface and generate a minimal product requirement summary with the following structure:

Controller:
    Controller name (e.g., CounselorController)
    List the actions/methods needed (e.g., edit, update, uploadProfilePicture)

Database:
    Main table name
    Fields (column names + types if visible/inferable)
    Note any relationships (e.g., city → country foreign key)

View:
    Blade file name (e.g., counselor/edit.blade.php)
    Key components or sections (e.g., profile image upload, basic info form)

Other Notes:
    Form behavior (e.g., dynamic dropdowns, validation, file upload handling)
    Required/optional field assumptions
    Permissions or role assumptions if relevant

- Format the response in clean bullet points, no extra explanation. Keep it short and developer-friendly.

UI to feature list Json

You are a UI feature analyzer. I will give you a screenshot.

Focus only on the central content area. Ignore sidebars, navigation, and footers completely.

Return your output strictly in the following JSON format:

{
  "pageTitle": "Extracted Title",
  "purpose": "What the screen is for",
  "fields": [
    "Field 1",
    "Field 2"
  ],
  "actions": [
    "Button 1",
    "Button 2"
  ]
}

Keep field and action names exactly as shown in the UI, and make sure the output is clean, minimal, and accurate. No extra explanations or comments — only valid JSON.

Extract UI Specifications

Extract a detailed specification sheet from this UI design image, including component names, hierarchy, dimensions, and color codes.

Extract All Visible Text

Extract all visible text from this image exactly as shown, maintaining the order and structure.

Describe the Image (General)

Provide a general description of what is happening in this image, mentioning people, objects, and context.

Describe a UI Design

Describe the purpose and layout of this UI design. Mention interactive components and potential user actions.

Extract Book Page Text

Extract and reconstruct the text from this scanned book page. Maintain paragraph structure and line breaks if possible.

Summarize Document Content

Summarize the content of this document image, focusing on key points, names, and any numerical data.

Object Identification

Identify and list all objects present in this image, along with how many of each are visible.

Requirements

  • Python 3.8+
  • An image-capable LLM API (e.g., LM Studio with Qwen2-VL-7B-Instruct)
  • Python packages:
    • requests
    • python-dotenv
    • tqdm

Contributing

Pull requests are welcome. For major changes, please open an issue to discuss what you want to change first.

License

MIT

About

VisionLLM Batcher is a local-first batch processing tool for images using any LLM that supports vision. Automate analysis and information extraction across image batches with custom prompts. Fully prompt-agnostic and model-flexible—built for tasks beyond basic UI specs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages