Chatify is an AI-powered Flask web application that takes a YouTube video URL and automatically generates meaningful chapter-wise summaries and titles from spoken Hindi or Hinglish content. It uses OpenAI's Whisper, mBART, and KeyBERT to convert speech to text, summarize it, and generate chapter titles. The output is a clean, timestamped JSON fileβideal for content indexing, accessibility, or quick navigation.
This project is the practical Flask implementation of our published research paper:
π Automatic Chapter Generation for Hindi-English YouTube Videos (JISEM 2024)
π The full research repository (dataset pipeline, methodology, experiments):
π Automatic-Chapter-Generation-for-Hindi-English-YouTube-Videos
- Flask β Lightweight Python web framework for handling routes and requests.
- Whisper (by OpenAI) β For speech-to-text transcription from Hindi/Hinglish audio.
- mBART (by Facebook AI) β For abstractive summarization and Hindi β English translation.
- KeyBERT β For keyword-based title generation using BERT embeddings.
- yt-dlp β For downloading audio from YouTube videos.
- ffmpeg β For converting and processing audio formats (MP4 β MP3/WAV).
- uuid β For generating unique job identifiers.
- pathlib / os / json β For safe file and directory operations.
- HTML β For rendering dynamic content using Flask templates.
- JSON β Chapters with timestamps, titles, and summaries.
- π₯ Accepts a YouTube video URL as input
- π§ Converts spoken Hindi/Hinglish content into English summaries
- π Breaks videos into timestamped chapters (default: every 5 minutes)
- π Generates meaningful chapter titles using keyword extraction
- π Generates a structured
.jsonfile containing start time, title, and summary - π Simple Flask UI to interact with the tool via browser
chatify/
βββ app.py # Flask application entry point
βββ workspace/ # Temporary folder to store job-specific files
βββ templates/
β βββ index.html # Main web interface
βββ static/
β βββ style.css # Web design
βββ trail/ # Demo files (sample output)
β βββ try.ipynb
β βββ chapters.ipynb
βββ pipeline/
β βββ downloader.py # Uses yt-dlp to extract audio from YouTube
β βββ transcriber.py # Whisper transcription + transcript saver
β βββ chapterizer.py # Chunking + summarization + title generation
β βββ utils.py # Time conversion utilities
- The user provides a YouTube video URL.
- The audio stream is extracted and saved as an MP3 using
yt-dlpandffmpeg.
- Audio is transcribed using OpenAI's Whisper model.
- Output: Timestamped transcript in Hindi/Hinglish.
- Format:
[start_time - end_time]: text
- The transcript is cleaned and formatted.
- Each segment includes a timestamp and its corresponding spoken content.
- The transcript is split into fixed-length chunks (e.g., 300 seconds = 5 minutes).
- Timestamp alignment is preserved.
- Each chunk is treated as a potential chapter.
- Each chunk is summarized using mBART, a multilingual transformer fine-tuned for Hindi-to-English summarization.
- Output: Concise English summary of the chunkβs content.
- Using KeyBERT, important keywords are extracted from each summary.
- The most relevant keyword or phrase is selected as the chapter title.
- For each chunk, the following are saved:
start_timesummarytitle
- Final output is stored as a structured
.jsonfile.
[
{
"start_time": "0:00:00",
"title": "Social Professions",
"summary": "The speaker discusses how certain professions like tea vendors, garbage collectors, and dancers are perceived with bias in Indian society..."
},
{
"start_time": "0:05:00",
"title": "Education Challenges",
"summary": "The video highlights problems in the Indian education system including outdated curriculum, exam pressure, and limited access in rural areas..."
}
]git clone https://github.com/avanigupta06/Chaptify.git
cd Chaptifypython -m venv venv
source venv/bin/activate # For Linux/Mac
venv\Scripts\activate # For Windowspip install -r requirements.txtpython app.pyVisit: π http://127.0.0.1:5000/
Paste any YouTube URL and get automatic chapters & summaries π
- Download ffmpeg locally
- GPU is recommended for running this code