deepdub

🗣️ Making videos more accessible to people around the world.

This paper is an implementation of "Automated Dubbing and Facial Synchronization using Deep Learning".

Welcome

This is the main repo of deepdub. It contains a CLI engine and a Wooey engine which adds a UI to the CLI.

Installation Guide

The CLI and the web server have separate installation mechanisms, however in our tests we find that if you install the CLI first and the web server afterwards, everything comes together pretty nicely.

Click here for the main installation guide.

What is Deepdub?

Over 2.3 billion people worldwide use YouTube once a month. Yet, it is estimated that only 25% of the world’s population has an estimation of what the English language is. Thus, it can be inferred that many people would be having difficulty understanding videos which are in English. This brings a need for video translation into other languages.

This is normally tackled by manual dubbing, in which a translated script is written, voice actors are hired, and video editors mix together tracks to create a dubbed version of the original video. Yet even after all this labor, in the end-product we see that the lip movement of the actor in the original video, does not match with the dubbed audio. Here we see another gap; automatic voice dubbing, and also lip syncing of dubbed audio with the original clip.

To solve these problems, we aim to build a pipeline called "Deepdub" which takes input a video, and outputs a dubbed version of it in a preferred language, with the lips synced to the dubbed audio.

Iterations

Pipeline 0 (Release 1)

We started off simple, and tried our best to make a pipeline which at least WORKS.

During our research, we came across text-to-speech implementations, and speech-to-lip (speech-driven facial generation) implementations. So we got a cool idea for our first pipeline... why not put those two together!

🖊🍍🍎🖊
\(•_•)/
( ( pen-pineapple-apple-pen
/ \

We had a rough idea in our head about what we were aiming for:

Out of the implementations we saw, we identified some of the best to be:

TTS -> Real-Time-Voice-Cloning
STL -> Wav2Lip

So we visualized a pipeline as the following:

More on the implementation-level details, ToDos, etc are here.

Sample Results

Original Video: https://drive.google.com/file/d/1bedh5iu2L8y_zG8XWu2x6wFZG4yykIyv/view?usp=sharing

Translated Video (using Deepdub): https://drive.google.com/file/d/1pDv6-JCHaTafGNarOuMuhGIcv6x-PPM5/view?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
deepdub_cli		deepdub_cli
deepdub_frontend/echo		deepdub_frontend/echo
deepdub_server		deepdub_server
images		images
.gitignore		.gitignore
INSTALL.md		INSTALL.md
README.md		README.md
requirements3.txt		requirements3.txt
requirements4.txt		requirements4.txt
requirements_all_37.txt		requirements_all_37.txt
requirements_all_37_2.txt		requirements_all_37_2.txt
requirements_all_37_3.txt		requirements_all_37_3.txt
requirements_all_37_4.txt		requirements_all_37_4.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

deepdub

Welcome

Installation Guide

What is Deepdub?

Iterations

Pipeline 0 (Release 1)

Sample Results

About

Releases 3

Contributors 2

Languages

grayhatdevelopers/deepdub

Folders and files

Latest commit

History

Repository files navigation

deepdub

Welcome

Installation Guide

What is Deepdub?

Iterations

Pipeline 0 (Release 1)

Sample Results

About

Topics

Resources

Stars

Watchers

Forks

Releases 3

Contributors 2

Languages