A Python tool that identifies and helps you delete duplicate and similar images, using OpenAI's CLIP (Contrastive Language-Image Pre-Training) neural network. It uses the model to generate image embeddings and compares them using cosine similarity.
demo2.mp4
This tool helps you:
- Find exact duplicates and visually similar images in a directory
- Preview images using terminal-based image viewer (viu)
- Interactively choose which images to keep or delete
- Support common image formats (jpg, jpeg, png, gif, bmp, tiff, webp)
The similarity detection is powered by OpenAI's CLIP model (ViT-B-32) through the sentence-transformers library, providing visual similarity matching beyond just pixel-perfect duplicates.
- Smart Similarity Detection: Uses CLIP (ViT-B-32) model to detect both exact duplicates (similarity score ≥ 0.9999) and visually similar images (customizable threshold)
- Interactive Review Process:
- Visual preview of each image group using terminal-based 'viu' viewer
- User-friendly selection interface for choosing which images to keep/delete
- Robust File Handling:
- Recursive directory scanning
- Support for multiple image formats (jpg, jpeg, png, gif, bmp, tiff, webp)
- Cross-platform compatibility using Path objects
- Needs sufficient memory to load the CLIP model and process image batches
- Initial CLIP model loading time may be significant
- Terminal-based image preview has limited resolution
You have to have CLIP, sentence-transformers, PyTorch, and viu installed to get this working. BEFORE STARTING THIS INSTALLATION, go ahead to their respective pages and install them.
-
Clone the repository:
git clone https://github.com/erenmenges/image-dedup-with-CLIP.git cd image-dedup-with-CLIP -
Install required Python packages:
pip install sentence-transformers Pillow
-
If you haven't, install the 'viu' terminal image viewer:
-
On Ubuntu/Debian:
sudo apt install viu
-
On macOS:
brew install viu
-
Other systems: Visit viu repository
-
-
Basic Usage:
python main.py /path/to/image/directory
-
Interactive Process:
- The tool will scan the directory and process images in batches
- For each group of similar images:
- Original image is displayed first
- Followed by potential duplicates
- Enter numbers to select images to delete:
-1to delete the original image0,1,2to delete specific duplicates (comma-separated)
-
Review Results:
- Program shows list of deleted files
- Shows list of kept files
- Automatically skips previously reviewed images
Note: The default similarity threshold is 0.8 (80%). Images with similarity score ≥ 0.9999 are considered exact duplicates.