- [2025/5/30] 🔥 We released the technical report of IMAGHarmony.
- [2025/5/28] 🔥 We release the train and inference code of IMAGHarmony.
- [2025/5/17] 🎉 We launch the project page of IMAGHarmony.
IMAGHarmony tackles the challenge of controllable image editing in multi-object scenes, where existing models struggle with aligning object quantity and spatial layout. To this end, IMAGHarmony introduces a structure-aware framework for quantity-and-layout consistent image editing (QL-Edit), enabling precise control over object count, category, and arrangement. We propose a harmony aware (HA) mudule to jointly model object structure and semantics, and a preference-guided noise selection (PNS) strategy to stabilize generation by selecting semantically aligned initial noise. Our method is trained and evaluated on HarmonyBench, a newly curated benchmark with diverse editing scenarios.
- Python>=3.8
- PyTorch>=2.0.0
- cuda>=11.8
conda create --name IMAGHarmony python=3.8.18
conda activate IMAGHarmony
# Install requirements
pip install -r requirements.txt
You can download our models from Huggingface. You can download the other component models from the original repository, as follows.
# Please download the HarmonyBench data first or prepare your own images
# and modify the path in run.sh
## Write caption of your image in your train.json file
# start training
sh train.sh
#Please convert your checkpionts
python conver_bin.py
#Please fill in your path in test.py
#then run
python test.py
Or you may like to test it on gradio
python demo.py
We would like to thank the contributors to the Instantstyle and IP-Adapter repositories, for their open research and exploration.
The IMAGHarmony code is available for both academic and commercial use. Users are permitted to generate images using this tool, provided they comply with local laws and exercise responsible use. The developers disclaim all liability for any misuse or unlawful activity by users.
If you find IMAGHarmony useful for your research and applications, please cite using this BibTeX:
@misc{shen2025imagharmonycontrollableimageediting,
title={IMAGHarmony: Controllable Image Editing with Consistent Object Quantity and Layout},
author={Fei Shen and Yutong Gao and Jian Yu and Xiaoyu Du and Jinhui Tang},
year={2025},
eprint={2506.01949},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.01949},
}- Paper
- Train Code
- Inference Code
- HarmonyBench Dataset
- Model Weights
- IMAGEdit: Training-Free Controllable Video Editing with Consistent Object Layout. [可控多目标视频编辑]
- IMAGDressing: Controllable dressing generation. [可控穿衣生成]
- IMAGGarment: Fine-grained controllable garment generation. [可控服装生成]
- IMAGHarmony: Controllable image editing with consistent object layout. [可控多目标图像编辑]
- IMAGPose: Pose-guided person generation with high fidelity. [可控多模式人物生成]
- RCDMs: Rich-contextual conditional diffusion for story visualization. [可控故事生成]
- PCDMs: Progressive conditional diffusion for pose-guided image synthesis. [可控人物生成]
- V-Express: Explores strong and weak conditional relationships for portrait video generation. [可控数字人生成]
- FaceShot: Talkingface plugin for any character. [可控动漫数字人生成]
- CharacterShot: Controllable and consistent 4D character animation framework. [可控4D角色生成]
- StyleTailor: An Agent for personalized fashion styling. [个性化时尚Agent]
- SignVip: Controllable sign language video generation. [可控手语生成]
If you have any questions, please feel free to contact with us at [email protected] and [email protected].




