Releases: dual/doubletake
1.1.0
Full Changelog: 1.0.3...1.1.0
1.0.3
Full Changelog: 1.0.2...1.0.3
1.0.0
🎉 doubletake v1.0.0 - First Official Release!
We're incredibly excited to announce the first official release of doubletake! 🚀
After months of development, rigorous testing, and careful optimization, we're proud to bring you a powerful, flexible, and production-ready library for intelligent PII detection and replacement in Python.
🌟 What is doubletake?
doubletake is a sophisticated library that automatically detects and replaces Personally Identifiable Information (PII) in your data structures. Whether you're anonymizing datasets for testing, protecting sensitive information in logs, or ensuring GDPR compliance, doubletake makes it effortless and reliable.
✨ Key Features in v1.0.0
🚀 Dual-Strategy Architecture
- JSONGrepper: Lightning-fast JSON serialization + regex replacement for simple use cases
- DataWalker: Flexible recursive tree traversal with full context for advanced scenarios
- Automatic Strategy Selection: The library intelligently chooses the optimal approach based on your configuration
🎯 Smart PII Detection
Built-in patterns for the most common PII types:
- 📧 Email addresses (
[email protected]) - 📱 Phone numbers (
555-123-4567,(555) 123-4567) - 🆔 Social Security Numbers (
123-45-6789) - 💳 Credit card numbers (
4532-1234-5678-9012) - 🌐 IP addresses (
192.168.1.1) - 🔗 URLs (
https://example.com/path)
🔧 Highly Configurable
- Custom Patterns: Add your own regex patterns for domain-specific PII
- Allowed Lists: Exclude certain pattern types from replacement
- Path Targeting: Precisely target specific data paths using dot notation
- Flexible Replacement: Choose between asterisks, custom characters, or realistic fake data
📊 Realistic Fake Data Generation
Powered by the Faker library to generate believable replacement data:
# Instead of: email: "****@******.***"
# Get: email: "[email protected]"🌳 Deep Structure Support
- Handle complex nested dictionaries and lists automatically
- Preserve data structure and non-PII content perfectly
- Breadcrumb navigation for context-aware processing
🛡️ Type Safe & Robust
- Full type hints for excellent IDE support
- Comprehensive input validation
- 100% test coverage with rigorous edge case testing
🚀 Quick Start
pip install doubletakefrom doubletake import DoubleTake
# Initialize with default settings
db = DoubleTake()
# Your data with PII
data = [
{
"user_id": 12345,
"name": "John Doe",
"email": "[email protected]",
"phone": "555-123-4567",
"ssn": "123-45-6789"
}
]
# Replace PII automatically
masked_data = db.mask_data(data)
# Result: email becomes "****@******.***", phone becomes "***-***-****"🎛️ Advanced Configuration Examples
Generate Realistic Fake Data
db = DoubleTake(use_faker=True)
# Emails become: [email protected]
# Phones become: +1-555-234-5678Custom Replacement Logic
def custom_replacer(item, key, pattern_type, breadcrumbs):
if pattern_type == 'email':
return "***REDACTED_EMAIL***"
elif pattern_type == 'ssn':
return "XXX-XX-XXXX"
return "***CLASSIFIED***"
db = DoubleTake(callback=custom_replacer)Precise Path Targeting
# Only replace PII at specific locations
db = DoubleTake(known_paths=[
'customer.email',
'billing.ssn',
'contacts.emergency.phone'
])🏗️ Architecture Highlights
Intelligent Strategy Selection
# Fast path: Uses JSONGrepper for simple replacements
db = DoubleTake()
# Advanced path: Uses DataWalker for complex scenarios
db = DoubleTake(use_faker=True)
db = DoubleTake(callback=my_function)Performance Optimized
- JSONGrepper: ~0.1s for 10,000 records (simple patterns)
- DataWalker: ~0.3s for 10,000 records (with fake data generation)
🧪 Real-World Use Cases
API Response Sanitization
Perfect for sanitizing API responses before logging:
api_response = {
"status": "success",
"data": {
"users": [
{"id": 1, "email": "[email protected]", "role": "admin"}
]
}
}
db = DoubleTake()
safe_response = db.mask_data([api_response])[0]
# Safe to log without exposing PIIDatabase Export Anonymization
Anonymize database exports for development environments:
db_records = [
{"patient_id": "PT001", "ssn": "123-45-6789", "email": "[email protected]"}
]
db = DoubleTake(use_faker=True)
anonymized_records = db.mask_data(db_records)
# Safe for development with realistic data🔬 Quality & Testing
- 100% Test Coverage: Comprehensive test suite with edge case coverage
- Type Safety: Full type hints and mypy compatibility
- Input Validation: Robust configuration validation with clear error messages
- Cross-Platform: Tested on Python 3.9+ across major platforms
- Performance Tested: Benchmarked with large datasets
🤝 Contributing & Community
We're thrilled to have built something that we hope will be valuable to the Python community! This is just the beginning, and we're excited to see how you use doubletake in your projects.
Get Involved
- 🐛 Found a bug? Open an issue
- 💡 Have a feature idea? Start a discussion
- 🤝 Want to contribute? Check out our contributing guidelines
Development Setup
git clone https://github.com/dual/doubletake.git
cd doubletake
pipenv install --dev
pipenv run test📋 What's Next?
We have exciting plans for future releases:
- Additional PII pattern types (driver's licenses, passport numbers, etc.)
- Performance optimizations for extremely large datasets
- Plugin architecture for custom PII detectors
- Integration with popular data processing frameworks
- Enhanced documentation and tutorials
🙏 Acknowledgments
Special thanks to our early adopters, beta testers, and everyone who provided feedback during development. Your input was invaluable in making doubletake robust and user-friendly.
📄 License & Links
- License: MIT License
- PyPI: https://pypi.org/project/doubletake/
- Documentation: https://github.com/dual/doubletake/wiki (coming soon)
- Issues: https://github.com/dual/doubletake/issues
- Security: See SECURITY.md
🎯 Installation
pip install doubletakeMinimum Requirements: Python 3.9+
Dependencies:
faker(for realistic fake data generation)msgspec(for high-performance JSON processing)typing_extensions(for enhanced type support)
Thank you for your interest in doubletake! We can't wait to see what you build with it.
Made with ❤️ for data privacy and security.
— The doubletake Team
Full Changelog: https://github.com/dual/doubletake/commits/1.0.0