WhatsArchive is a command-line tool to parse and export WhatsApp chat transcripts. It supports exporting data in multiple formats (JSON, TXT, HTML) and includes options for handling media files.
First, clone the repository and navigate into the project directory. Run the following command to install the required dependencies:
npm install
This tool requires exiftool
and ffmpeg
to be installed for media file validation and conversion.
To install exiftool
, follow these steps:
- macOS: Run
brew install exiftool
if you have Homebrew installed. - Linux: Run
sudo apt install exiftool
(for Debian/Ubuntu-based systems). - Windows: Download the installer from the official ExifTool website and follow the instructions.
To install ffmpeg
, follow these steps:
- macOS: Run
brew install ffmpeg
. - Linux: Run
sudo apt install ffmpeg
(for Debian/Ubuntu-based systems). - Windows: Download the installer from the official FFmpeg website and follow the instructions.
Ensure both exiftool
and ffmpeg
are accessible from the command line.
To use the tool, run the following command:
npm start -- --input <file_path> --output <output_directory> [options]
Replace <file_path>
with the path to your .zip
file containing the WhatsApp chat transcript, and <output_directory>
with the path where the parsed output will be saved.
input
Required. Specifies the input file path.--output
Required. Specifies the output directory where the parsed data and/or media will be saved.--convert-to
<json | txt | html> DEFAULT: JSON Specifies the output format. Supports json, txt, or html.--convert-opus
Converts .opus audio files to .mp3 format for compatibility with more audio players. Requires ffmpeg.--no-media
Skips downloading media files; only the chat transcript will be saved.
The WhatsArchive CLI tool includes a mock data generator (mocker.ts
) that creates a test .zip
file containing a simulated WhatsApp chat transcript along with optional media files. This can be useful for testing the tool.
To generate a mock .zip
file, run the following command:
npm run mock -- --output <output_zip_path> [options]
--output
<path>
(Required) - Specifies the output path where the mock.zip
file will be saved.--messageCount
<number>
(Optional) - Number of messages to generate (default:100
).--textToMediaRatio
<number>
(Optional) - Defines the ratio of text messages to media messages (default:5
).--addMedia
(Optional) - Includes media messages in the chat if enabled.
npm run mock -- --output test_files/mock_chat.zip
npm run mock -- --output test_files/mock_chat.zip --messageCount 200 --addMedia
npm run mock -- --output test_files/mock_chat.zip --messageCount 50 --textToMediaRatio 3 --addMedia
- The script generates a chat transcript with realistic timestamps and randomized message content.
- If the
--addMedia
flag is used, the script randomly inserts media messages and includes sample media files (.jpg
,.opus
). - The generated chat and media files are bundled into a
.zip
archive, simulating a real exported WhatsApp chat.
After generating the .zip
file, you can use it as an input when running the WhatsArchive CLI tool:
npm start -- --input test_files/mock_chat.zip --output output
This will parse the mock chat and process it based on the provided options.
npm start -- --input test_files/test_zip.zip --output output
This will parse the chat from test_zip.zip, save the output in JSON format, and include all media files.
npm start -- --input test_files/test_zip.zip --output output --convert-to html
This will parse the chat and save the output as an HTML file, including media files.
npm start -- --input test_files/test_zip.zip --output output --convert-to json --noMedia
This will parse the chat and save the output without downloading any media files.
npm start -- --input test_files/test_zip_opus.zip --output output --convert-to json --convertOpus
This will parse the chat, convert .opus audio files to .mp3, and download other media files.