A command-line tool for downloading ZIP files from the National Archives JFK Bulk Download page or any other URL containing ZIP file links.
Note: This tool has been tested on macOS. Windows and Linux support should work as described, but has not been extensively tested. Feedback and bug reports are welcome!
- Downloads all ZIP files or a specified subset
- Handles connection errors with retries
- Supports resuming interrupted downloads
- Shows progress bar for downloads
- Verifies downloaded files
- Checks for existing files and prompts before overwriting
- Configurable input URL and output directory
- Parallel downloading capability
# Clone the repository
git clone https://github.com/yourusername/jfk-dl.git
cd jfk-dl
# Create and activate virtual environment
# On Windows:
python -m venv venv
venv\Scripts\activate
# On macOS/Linux:
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt# Clone the repository
git clone https://github.com/yourusername/jfk-dl.git
cd jfk-dl
# Install dependencies with uv
# If you don't have uv installed:
# pip install uv
# On all platforms:
uv venv # Creates a .venv directory by default
uv pip install -r requirements.txt
# Activate the virtual environment
# On Windows: .venv\Scripts\activate
# On macOS/Linux: source .venv/bin/activateOnce installed, you can download all JFK document ZIP files with:
# Activate the virtual environment if needed
# For standard venv:
# On Windows: venv\Scripts\activate
# On macOS/Linux: source venv/bin/activate
#
# For uv:
# On Windows: .venv\Scripts\activate
# On macOS/Linux: source .venv/bin/activate
# Run the downloader with default settings
# Without arguments, the script will display help
python bulk_download.pypython bulk_download.py [OPTIONS]--url URL: URL containing ZIP files to download (default: https://www.archives.gov/research/jfk/jfkbulkdownload)--output-dir DIR: Directory to save downloaded files (default: auto-generated based on URL)--max-files N: Maximum number of files to download (default: 0, download all)--retry ATTEMPTS: Maximum number of retry attempts (default: 3)--workers N: Number of parallel downloads (default: 4)--force: Force download without prompting, even if files exist--skip-existing: Skip files that already exist without prompting (default: True)--no-skip-existing: Prompt for each existing file--smart-check: Smart check: skip files with matching size (default: True)--no-smart-check: Disable smart file size checking--filter PATTERN: Filter files by filename pattern (e.g., 'doc-*')--extension EXT: File extension to look for (e.g., 'zip', 'pdf', 'docx') without the dot (default: zip)--cowboyup: Run with defaults without showing help message (only needed when running with no other arguments)
## Download 2016 to 2023 bulk files
python bulk_download.py --cowboyup## Download 2025 files, it is smart enough to skip existing ones as they keep adding
python bulk_download.py --url https://www.archives.gov/research/jfk/release-2025 --output-dir data/raw/archive_gov/2025 --extension pdf
## For testing, you can limit to just a few files
python bulk_download.py --url https://www.archives.gov/research/jfk/release-2025 --output-dir data/raw/archive_gov/2025 --extension pdf --max-files 5Additional Options:
- Use
--max-files 5to download only the first 5 files - Use
--filter "record-*"to download files matching a pattern - Use
--workers 8to increase parallel downloads for faster performance
MIT