Releases: MissCrispenCakes/DigitalChild
Release v2.0.0 - REST API + Documentation Restructure
Release v2.0.0 - REST API + Documentation Restructure
Release Date: January 26, 2026
Zenodo DOI: 10.5281/zenodo.18318098
Major release adding production-ready REST API (Phase 4) and complete documentation reorganization.
🎉 What's New
REST API (Phase 4 Complete)
Production-ready REST API with 14 endpoints for programmatic data access:
- Documents API - List, filter, paginate, sort documents with 9 filter options
- Scorecard API - Country indicators, summary statistics, regional filtering
- Tags API - Frequency analysis, version management, multi-dimensional filtering
- Timeline API - Temporal analysis (tags over time, year × tag matrices)
- Export API - CSV downloads with SPDX license headers
Features:
- Optional API key authentication with dynamic rate limiting (100-2000 req/hr)
- Redis caching with 15min-1hr TTLs
- Complete Docker deployment (docker-compose, Nginx, Redis)
- 104 integration tests (100% pass rate, 100% endpoint coverage)
📖 API Documentation: https://grimdata.org/api/
Documentation Restructure
Complete reorganization with dedicated landing pages:
- API Landing Page - Overview, features, quick start examples
- Scorecard Landing Page - Design, methodology, data access guides
- Projects Landing Page - LittleRainbowRights and SGBV-UPR overviews
- Clean professional navigation throughout
Technical Improvements
- Test Coverage: Expanded from 124 → 274 tests (170 pipeline + 104 API)
- Codebase: Grew from 15,000+ → 21,000+ lines of Python
- Dependencies: Updated Flask ecosystem to latest stable versions
- CITATION.cff: Updated with REST API keywords, SGBV journal article DOI
📦 Installation
# Clone and install
git clone https://github.com/MissCrispenCakes/DigitalChild.git
cd DigitalChild
pip install -r requirements.txt
# Run API server
pip install -r api_requirements.txt
python run_api.py
# Access at http://localhost:5000
curl http://localhost:5000/api/health🔧 Changed
- Test coverage: 124 → 274 tests
- Codebase: 15k → 21k+ lines
- Documentation structure reorganized
- API docs moved to docs/website/api/
- Scorecard docs moved to docs/website/scorecard/
- Navigation structure improved
🐛 Fixed
- Navigation links consistency
- Duplicate content removed
- TOC integration in sidebar
- 404 errors on Projects pages
- Scorecard v1.0.0 release notes corrected
📊 Metrics
- 14 API endpoints operational
- 194 countries tracked
- 10 indicators per country
- 2,543 source URLs validated
- 274 tests passing (100% success rate)
- 21,000+ lines of code
- 75+ documentation files
🔗 Links
- Website: https://grimdata.org
- API Docs: https://grimdata.org/api/
- Scorecard: https://grimdata.org/scorecard/
- Deployment Guide: docs/guides/PRODUCTION_DEPLOYMENT.md
- Zenodo Archive: https://doi.org/10.5281/zenodo.18318098
📖 Citation
@software{digitalchild2026,
title = {DigitalChild: Human Rights Data Pipeline for Child and LGBTQ+ Digital Protection},
author = {Vollmer, S.C. and Vollmer, D.T.},
year = {2026},
version = {2.0.0},
url = {https://github.com/MissCrispenCakes/DigitalChild},
doi = {10.5281/zenodo.18318098},
note = {Available at: https://grimdata.org. ORCID: 0000-0002-3359-2810 (S.C. Vollmer)}
}🙏 Acknowledgments
- Python 3.12, Flask, BeautifulSoup4, Selenium, pandas, pypdf, pytest
- Redis, Docker, Nginx for production infrastructure
- GitHub Actions for CI/CD
- MkDocs Material for documentation
Full Changelog: https://github.com/MissCrispenCakes/DigitalChild/blob/basecamp/CHANGELOG.md
DigitalChild v1.0.1 - Zenodo Archive Release
DigitalChild v1.0.0 - Initial Public Release
First stable release of the DigitalChild data pipeline for analyzing human rights documents with focus on child and LGBTQ+ digital protection.
🎯 What's Included
Core Pipeline
- 7 automated scrapers - AU Policy, OHCHR, UPR, UNICEF, ACERWC, ACHPR, manual upload
- Multi-format processing - PDF, DOCX, HTML document conversion
- Versioned tagging system - 4 tag versions (v1, v2, v3, digital) with 20+ rights themes
- Recommendations extraction - Regex-based extraction with versioning and history tracking
- Timeline analysis - Global, by-country, and by-region temporal analysis
- Comparison analytics - Side-by-side version comparison for tags and recommendations
Scorecard System
- 194 countries tracked with 10 human rights indicators per country
- 2,543 authoritative source URLs validated and monitored
- Automated validation - URL checking, change detection, link rot monitoring
- CSV exports - Summary tables, by-indicator breakdowns, regional analysis
Quality & Testing
- 124 tests - Comprehensive test coverage (scrapers, processors, validators, scorecard)
- 68 validator tests - Input validation, path traversal protection, URL validation, file size limits
- CI/CD pipeline - Automated testing with GitHub Actions
- Pre-commit hooks - Code formatting (black, isort, flake8), markdown linting
Documentation
- 25+ markdown files - Installation guides, API docs, standards, architecture
- CLAUDE.md - Comprehensive development guide for AI assistants
- Website deployment - Material for MkDocs with GitHub Pages integration
📊 Dataset Highlights
-
10 indicators tracked per country:
- AI Policy Status
- Data Protection Law
- LGBTQ Legal Status
- Child Online Protection
- Biometric SIM Registration
- Digital Services Taxation
- Internet Penetration
- Mobile Coverage
- Digital Skills Investment
- Online Content Regulation
-
Data sources: UNESCO, UNCTAD, ILGA, UNICEF, national governments, treaty bodies
🔧 Technical Specifications
- Language: Python 3.12
- Key libraries: BeautifulSoup4, Selenium, pandas, PyPDF2, pytest
- Lines of code: ~15,000+ (Python, config, tests)
- Export formats: CSV, JSON
- License: MIT (code), CC BY 4.0 (data)
📝 Citation
@software{digitalchild2026,
author = {Vollmer, S.C.},
title = {DigitalChild: Human Rights Data Pipeline for Child and LGBTQ+ Digital Protection},
year = {2026},
version = {1.0.0},
url = {https://github.com/MissCrispenCakes/DigitalChild},
doi = {10.5281/zenodo.XXXXXXX}
}🌍 Related Projects
- GRIMdata.org - Main platform website
- LittleRainbowRights.com - Child & LGBTQ+ digital rights project
- SGBV-UPR Research - Precursor research on SGBV and Universal Periodic Review
⚠️ Known Limitations
- Scorecard sources require periodic manual validation (some URLs change)
- PDF extraction may have OCR limitations for scanned documents
- Regional coverage currently strongest in Africa (global expansion planned)
🚀 What's Next (Phase 4)
- Research dashboard with interactive visualizations
- REST API for data access
- NLP-based recommendations extraction
- Global expansion (Europe, Asia, Americas)
📖 Documentation
- Full docs: https://grimdata.org
- Installation: See README.md
- API docs: See docs/
Note: This is the first public release suitable for research, citation, and replication. Future versions will include dashboard features and expanded geographic coverage.
DigitalChild v1.0.0 - Initial Public Release
DigitalChild v1.0.0 - Initial Public Release
First stable release of the DigitalChild data pipeline for analyzing human rights documents with focus on child and LGBTQ+ digital protection.
🎯 What's Included
Core Pipeline
- 7 automated scrapers - AU Policy, OHCHR, UPR, UNICEF, ACERWC, ACHPR, manual upload
- Multi-format processing - PDF, DOCX, HTML document conversion
- Versioned tagging system - 4 tag versions (v1, v2, v3, digital) with 20+ rights themes
- Recommendations extraction - Regex-based extraction with versioning and history tracking
- Timeline analysis - Global, by-country, and by-region temporal analysis
- Comparison analytics - Side-by-side version comparison for tags and recommendations
Scorecard System
- 194 countries tracked with 10 human rights indicators per country
- 2,543 authoritative source URLs validated and monitored
- Automated validation - URL checking, change detection, link rot monitoring
- CSV exports - Summary tables, by-indicator breakdowns, regional analysis
Quality & Testing
- 124 tests - Comprehensive test coverage (scrapers, processors, validators, scorecard)
- 68 validator tests - Input validation, path traversal protection, URL validation, file size limits
- CI/CD pipeline - Automated testing with GitHub Actions
- Pre-commit hooks - Code formatting (black, isort, flake8), markdown linting
Documentation
- 25+ markdown files - Installation guides, API docs, standards, architecture
- CLAUDE.md - Comprehensive development guide for AI assistants
- Website deployment - Material for MkDocs with GitHub Pages integration
📊 Dataset Highlights
-
10 indicators tracked per country:
- AI Policy Status
- Data Protection Law
- LGBTQ Legal Status
- Child Online Protection
- Biometric SIM Registration
- Digital Services Taxation
- Internet Penetration
- Mobile Coverage
- Digital Skills Investment
- Online Content Regulation
-
Data sources: UNESCO, UNCTAD, ILGA, UNICEF, national governments, treaty bodies
🔧 Technical Specifications
- Language: Python 3.12
- Key libraries: BeautifulSoup4, Selenium, pandas, PyPDF2, pytest
- Lines of code: ~15,000+ (Python, config, tests)
- Export formats: CSV, JSON
- License: MIT (code), CC BY 4.0 (data)
📝 Citation
@software{digitalchild2026,
author = {Vollmer, S.C.},
title = {DigitalChild: Human Rights Data Pipeline for Child and LGBTQ+ Digital Protection},
year = {2026},
version = {1.0.0},
url = {https://github.com/MissCrispenCakes/DigitalChild},
doi = {10.5281/zenodo.XXXXXXX}
}🌍 Related Projects
- GRIMdata.org - Main platform website
- LittleRainbowRights.com - Child & LGBTQ+ digital rights project
- SGBV-UPR Research - Precursor research on SGBV and Universal Periodic Review
⚠️ Known Limitations
- Scorecard sources require periodic manual validation (some URLs change)
- PDF extraction may have OCR limitations for scanned documents
- Regional coverage currently strongest in Africa (global expansion planned)
🚀 What's Next (Phase 4)
- Research dashboard with interactive visualizations
- REST API for data access
- NLP-based recommendations extraction
- Global expansion (Europe, Asia, Americas)
📖 Documentation
- Full docs: https://grimdata.org
- Installation: See README.md
- API docs: See docs/
Note: This is the first public release suitable for research, citation, and replication. Future versions will include dashboard features and expanded geographic coverage.