Skip to content

🧩 IMAGHarmony 🧩: Controllable image editing with consistent object quantity and layout. A structure-aware framework that ensures high fidelity and coherence in complex multi-object edits. It integrates harmony-aware attention and preference-guided noise selection to enable precise, stable, and semantically aligned generation.

License

Notifications You must be signed in to change notification settings

muzishen/IMAGHarmony

Repository files navigation

IMAGHarmony: Controllable Image Editing with Consistent Object Quantity and Layout

🗓️ Release

  • [2025/5/30] 🔥 We released the technical report of IMAGHarmony.
  • [2025/5/28] 🔥 We release the train and inference code of IMAGHarmony.
  • [2025/5/17] 🎉 We launch the project page of IMAGHarmony.

💡 Introduction

IMAGHarmony tackles the challenge of controllable image editing in multi-object scenes, where existing models struggle with aligning object quantity and spatial layout. To this end, IMAGHarmony introduces a structure-aware framework for quantity-and-layout consistent image editing (QL-Edit), enabling precise control over object count, category, and arrangement. We propose a harmony aware (HA) mudule to jointly model object structure and semantics, and a preference-guided noise selection (PNS) strategy to stabilize generation by selecting semantically aligned initial noise. Our method is trained and evaluated on HarmonyBench, a newly curated benchmark with diverse editing scenarios.

architecture

🚀 HarmonyBench Dataset Demo

dataset_demo

🚀 Examples

results_1

results_2

Dual-Category Editing

results_5

🔧 Requirements

conda create --name IMAGHarmony python=3.8.18
conda activate IMAGHarmony

# Install requirements
pip install -r requirements.txt

🌐 Download Models

You can download our models from Huggingface. You can download the other component models from the original repository, as follows.

🚀 How to train

# Please download the HarmonyBench data first or prepare your own images
# and modify the path in run.sh
## Write caption of your image in your train.json file 
# start training

sh train.sh

🚀 How to test

#Please convert your checkpionts
python conver_bin.py

#Please fill in your path in test.py
#then run

python test.py

Or you may like to test it on gradio

python demo.py

Acknowledgement

We would like to thank the contributors to the Instantstyle and IP-Adapter repositories, for their open research and exploration.

The IMAGHarmony code is available for both academic and commercial use. Users are permitted to generate images using this tool, provided they comply with local laws and exercise responsible use. The developers disclaim all liability for any misuse or unlawful activity by users.

Citation

If you find IMAGHarmony useful for your research and applications, please cite using this BibTeX:

@misc{shen2025imagharmonycontrollableimageediting,
      title={IMAGHarmony: Controllable Image Editing with Consistent Object Quantity and Layout}, 
      author={Fei Shen and Yutong Gao and Jian Yu and Xiaoyu Du and Jinhui Tang},
      year={2025},
      eprint={2506.01949},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.01949}, 
}

🕒 TODO List

  • Paper
  • Train Code
  • Inference Code
  • HarmonyBench Dataset
  • Model Weights

👉 Our other projects:

  • IMAGEdit: Training-Free Controllable Video Editing with Consistent Object Layout. [可控多目标视频编辑]
  • IMAGDressing: Controllable dressing generation. [可控穿衣生成]
  • IMAGGarment: Fine-grained controllable garment generation. [可控服装生成]
  • IMAGHarmony: Controllable image editing with consistent object layout. [可控多目标图像编辑]
  • IMAGPose: Pose-guided person generation with high fidelity. [可控多模式人物生成]
  • RCDMs: Rich-contextual conditional diffusion for story visualization. [可控故事生成]
  • PCDMs: Progressive conditional diffusion for pose-guided image synthesis. [可控人物生成]
  • V-Express: Explores strong and weak conditional relationships for portrait video generation. [可控数字人生成]
  • FaceShot: Talkingface plugin for any character. [可控动漫数字人生成]
  • CharacterShot: Controllable and consistent 4D character animation framework. [可控4D角色生成]
  • StyleTailor: An Agent for personalized fashion styling. [个性化时尚Agent]
  • SignVip: Controllable sign language video generation. [可控手语生成]

📨 Contact

If you have any questions, please feel free to contact with us at [email protected] and [email protected].

About

🧩 IMAGHarmony 🧩: Controllable image editing with consistent object quantity and layout. A structure-aware framework that ensures high fidelity and coherence in complex multi-object edits. It integrates harmony-aware attention and preference-guided noise selection to enable precise, stable, and semantically aligned generation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •