The JFK files are now part of the public domain, offering a trove of historical documents for researchers, journalists, and enthusiasts alike.
However, the vast collection remains unindexed, lacks a text layer, and is difficult to search—making it challenging to analyze effectively, especially for AI-powered research.
As a leader in OCR (Optical Character Recognition) technology, ABBYY is facilitating research. We are providing the JFK files as fully searchable, structured PDFs, freely available for the open-source community. By making these documents machine-readable, we aim to unlock deeper insights, accelerate historical research, and enable advanced AI-driven analysis.
With this dataset, you can for example:
- 🔍 Perform Full-Text Search – Instantly locate key events, names, and places across thousands of pages.
- 🏗 Build AI-Powered Research Tools – Leverage Retrieval-Augmented Generation (RAG) to create AI assistants that can answer JFK-related questions.
- 📊 Run NLP & Machine Learning Analysis – Detect patterns, extract key insights, and apply entity recognition to map relationships.
- 📜 Enhance Historical Investigations – Cross-reference details, analyze declassified records, and uncover new connections.
These records are sourced from the U.S. National Archives and are part of the public domain:
🔗 JFK Records Collection (National Archives)
⚠ Disclaimer: While these records are public domain, any copyrighted material within them remains the property of the respective copyright owner. These documents are provided for private study, scholarship, or research purposes only and are shared as-is, without warranty of any kind.
The JFK Files were made machine-readable using the Document AI API. Get access here: https://hubs.li/Q039Y11p0