TL;DR: A Python script to translate text in images with inpainting by your favorite generative AI models (Stable Diffusion, Midjourney, DALL·E).
Before | After (hungarian translation) |
---|---|
![]() |
![]() |
InpaintTranslate
runs text detection on your image, masks the text boxes, and in-paints the masked regions
until your image is text-free.
InpaintTranslate
can be run entirely on your local machine using
- Tesseract or PaddleOCR for text detection and
- Stable Diffusion for in-painting
or can call existing APIs
You can translate text from your image in just a few lines:
from inpaint_translate.text_detector import PaddleTextDetector
from inpaint_translate.inpainter import LocalSDInpainter
from inpaint_translate.inpaint_translator import InpaintTranslator
text_detector = PaddleTextDetector()
inpainter = LocalSDInpainter()
translator = MyMemoryTranslator(source="en-US", target="hu-HU")
inpaint_translator = InpaintTranslator(text_detector, inpainter, translator)
inpaint_translator.inpaint_translate("/my/input/image/path.png", "/my/output/image/path.png")
or throught the handy run.py
script
python run.py "/my/input/image/path.png" -o "/my/output/image/path.png"
Use verbose mode to create intermediary images in a "debug" folder by
-v
flag inrun.py
or bylogging
library logger set to debug mode.
We provide multiple implementations for text detection and in-painting (both local and API-based), and you are also free to add your own.
TesseractTextDetector
(based on Tesseract) runs locally. Follow this guide to install thetesseract
library locally. On Ubuntu:
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
To find the path where it was installed (and pass it to the TesseractTextDetector
constructor):
whereis tesseract
AzureTextDetector
calls a computer vision API from Microsoft Azure. You will first need to create a Computer Vision resource via the Azure portal. Once created, take note of the endpoint and the key.
AZURE_CV_ENDPOINT = "https://your-endpoint.cognitiveservices.azure.com"
AZURE_CV_KEY = "your-azure-key"
text_detector = AzureTextDetector(AZURE_CV_ENDPOINT, AZURE_CV_KEY)
Our evaluation shows that the two text detectors produce comparable results.
PaddleTextDetector
(based on PaddleOCR) runs locally. Follow this guide to install thepaddlepaddle
library locally. Or just use
pip install -r requirements_paddleocr.txt
LocalSDInpainter
(implemented via Huggingface'sdiffusers
library) runs locally and requires a GPU. Defaults to Stable Diffusion v2 for in-painting.ReplicateSDInpainter
calls the Replicate API. Defaults to Stable Diffusion v2 for in-painting (and requires an API key).DalleInpainter
calls the DALL·E 2 API from OpenAI (and requires an API key).
# You only need to instantiate one of the following:
local_inpainter = LocalSDInpainter()
replicate_inpainter = ReplicateSDInpainter("your-replicate-key")
dalle_inpainter = DalleInpainter("your-openai-key")
Translations are provided with deep-translator.
Keep in mind that different providers have different supported languages and restrictions on usage
This project was based on detexify by Mihail Eric and Julia Turc.
Created by Dolers for his own amusement.