UI-Redline-bench

This repository provides the evaluation environment and scripts for UI-Redline-bench, a benchmark for Web UI code modification based on visual instructions.

🤗 Hugging Face Dataset: https://huggingface.co/datasets/future-architect/UI-Redline-bench (The visual instruction images and metadata are hosted on Hugging Face.)

Repository Structure

.
├── data/                       # Contains HTML/CSS code for base and reference sites
│   ├── news/
│   │   ├── bootstrap/
│   │   │   ├── src/            # Original Website (Base)
│   │   │   │   ├── index.html
│   │   │   │   ├── styles.css
│   │   │   │   └── images/     # Image assets
│   │   │   ├── ref_01/         # Reference Website (Ground Truth)
│   │   │   │   ├── index.html
│   │   │   │   └── styles.css
│   │   │   └── ...
│   │   └── ...
│   └── ...
├── script/
│   ├── llm_eval.py             # LLM-based automatic evaluation script
│   ├── llm_utils.py            # Common utilities for LLM clients and image processing
│   ├── prediction_based_on_image_claude.py # Inference script for Claude (Bedrock)
│   ├── prediction_based_on_image_gemini.py # Inference script for Gemini
│   ├── prediction_based_on_image_gpt5.py   # Inference script for GPT (Azure/OpenAI)
│   ├── prediction_based_on_image_qwen.py   # Inference script for Qwen (vLLM)
│   ├── launch_vllm_server.sh   # Launch script for vLLM server (Qwen)
│   └── setup_images.py         # Helper script to distribute image assets
├── cpu-env/                    # Environment for API-based models & evaluation
│   ├── pyproject.toml
│   └── uv.lock
└── gpu-env/                    # Environment for local models (vLLM/Qwen)
    ├── pyproject.toml
    └── uv.lock

Setup

This project uses uv for dependency management. We separate environments into cpu-env (for API-based models and evaluation) and gpu-env (for local models needing CUDA).

1. Install uv (if not installed)

This project uses uv.

2. Setup Environments

Run the sync command for the environment you need.

For API Models & Evaluation (CPU): This environment is used for GPT, Claude, Gemini scripts, and the evaluation script.

uv sync --project cpu-env

For Local Models (GPU): This environment is used for running Qwen (vLLM). Requires NVIDIA drivers.

uv sync --project gpu-env

3. Setup Image Assets

By default, image assets are stored only in the src directories to avoid duplication. To make the ref (Reference) HTML files render correctly in a browser or for evaluation, run the following script using the cpu-env.

uv run --project cpu-env script/setup_images.py

Now you can open any index.html (e.g., data/news/bootstrap/ref_01/index.html) in your browser to inspect the UI.

Inference

We provide scripts to generate modified HTML/CSS code based on visual instructions using various VLMs.

Prerequisites (API Keys)

Set the environment variables corresponding to the model you wish to use.

For GPT (Azure OpenAI / OpenAI):

# Azure OpenAI (Recommended)
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
export OPENAI_API_KEY="your-api-key"
export OPENAI_API_VERSION="2024-10-21"

# Or Standard OpenAI
export OPENAI_API_KEY="your-api-key"

For Claude (AWS Bedrock):

export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="ap-northeast-1" # Region supporting the model

For Gemini (Google GenAI):

export GEMINI_API_KEY="your-api-key"

For Qwen (Local vLLM): You need a GPU environment (e.g., 4x GPUs for 32B model tensor parallelism) to run the local model. Start the vLLM server using the gpu-env before running inference:

uv run --project gpu-env bash script/launch_vllm_server.sh

This will start the OpenAI-compatible server at http://localhost:8000 with API key local.

Running Inference

Use uv run --project <env> to execute scripts in the correct environment.

Example usage (GPT-5) [CPU Env]:

uv run --project cpu-env script/prediction_based_on_image_gpt5.py \
  --html_path "data/news/bootstrap/src/index.html" \
  --css_path "data/news/bootstrap/src/styles.css" \
  --image "path/to/instruction_image.png" \
  --output "output/news/bootstrap/ref_01"

Example usage (Qwen via vLLM) [GPU Env]:

uv run --project gpu-env script/prediction_based_on_image_qwen.py \
  --html_path "data/news/bootstrap/src/index.html" \
  --css_path "data/news/bootstrap/src/styles.css" \
  --image "path/to/instruction_image.png" \
  --output "output/news/bootstrap/ref_01"

Available Scripts:

script/prediction_based_on_image_gpt5.py (use cpu-env)
script/prediction_based_on_image_claude.py (use cpu-env)
script/prediction_based_on_image_gemini.py (use cpu-env)
script/prediction_based_on_image_qwen.py (use gpu-env)

Note: Replace arguments with your actual paths. The instruction images can be retrieved from the Hugging Face dataset.

Evaluation

We provide an automatic evaluation script using LLM as described in the paper.

Prerequisites

The evaluation script uses GPT-5. Set your OpenAI/Azure API keys as described in the Inference section.

Running Evaluation

Use script/llm_eval.py with cpu-env to evaluate a predicted code against the ground truth.

uv run --project cpu-env script/llm_eval.py \
  --org_html "data/news/bootstrap/src/index.html" \
  --org_css "data/news/bootstrap/src/styles.css" \
  --ref_html "data/news/bootstrap/ref_01/index.html" \
  --ref_css "data/news/bootstrap/ref_01/styles.css" \
  --pred_html "output/news/bootstrap/ref_01/index.html" \
  --pred_css "output/news/bootstrap/ref_01/styles.css" \
  --image "path/to/instruction_image.png" \
  --output "evaluation_result.json"

Citation

@inproceedings{hiai2026uiredline,
  title={UI-Redline-bench: 赤入れ指示によるWebUIコード修正ベンチマーク},
  author={肥合智史 and 藤井諒 and 岸波洋介 and 森下睦},
  booktitle={Proceedings of the 32nd Annual Meeting of the Association for Natural Language Processing (NLP2026)},
  year={2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UI-Redline-bench

Repository Structure

Setup

1. Install uv (if not installed)

2. Setup Environments

3. Setup Image Assets

Inference

Prerequisites (API Keys)

Running Inference

Evaluation

Prerequisites

Running Evaluation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cpu-env		cpu-env
data		data
gpu-env		gpu-env
script		script
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

UI-Redline-bench

Repository Structure

Setup

1. Install uv (if not installed)

2. Setup Environments

3. Setup Image Assets

Inference

Prerequisites (API Keys)

Running Inference

Evaluation

Prerequisites

Running Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages