Skip to content

Releases: dual/doubletake

1.1.0

19 Sep 02:41
Immutable release. Only release title and notes can be modified.
d1b7ce9

Choose a tag to compare

Full Changelog: 1.0.3...1.1.0

1.0.3

04 Sep 05:54
Immutable release. Only release title and notes can be modified.
997b1f4

Choose a tag to compare

Full Changelog: 1.0.2...1.0.3

1.0.0

30 Aug 02:26
4d52958

Choose a tag to compare

🎉 doubletake v1.0.0 - First Official Release!

We're incredibly excited to announce the first official release of doubletake! 🚀

After months of development, rigorous testing, and careful optimization, we're proud to bring you a powerful, flexible, and production-ready library for intelligent PII detection and replacement in Python.

🌟 What is doubletake?

doubletake is a sophisticated library that automatically detects and replaces Personally Identifiable Information (PII) in your data structures. Whether you're anonymizing datasets for testing, protecting sensitive information in logs, or ensuring GDPR compliance, doubletake makes it effortless and reliable.

✨ Key Features in v1.0.0

🚀 Dual-Strategy Architecture

  • JSONGrepper: Lightning-fast JSON serialization + regex replacement for simple use cases
  • DataWalker: Flexible recursive tree traversal with full context for advanced scenarios
  • Automatic Strategy Selection: The library intelligently chooses the optimal approach based on your configuration

🎯 Smart PII Detection

Built-in patterns for the most common PII types:

  • 📧 Email addresses ([email protected])
  • 📱 Phone numbers (555-123-4567, (555) 123-4567)
  • 🆔 Social Security Numbers (123-45-6789)
  • 💳 Credit card numbers (4532-1234-5678-9012)
  • 🌐 IP addresses (192.168.1.1)
  • 🔗 URLs (https://example.com/path)

🔧 Highly Configurable

  • Custom Patterns: Add your own regex patterns for domain-specific PII
  • Allowed Lists: Exclude certain pattern types from replacement
  • Path Targeting: Precisely target specific data paths using dot notation
  • Flexible Replacement: Choose between asterisks, custom characters, or realistic fake data

📊 Realistic Fake Data Generation

Powered by the Faker library to generate believable replacement data:

# Instead of: email: "****@******.***" 
# Get: email: "[email protected]"

🌳 Deep Structure Support

  • Handle complex nested dictionaries and lists automatically
  • Preserve data structure and non-PII content perfectly
  • Breadcrumb navigation for context-aware processing

🛡️ Type Safe & Robust

  • Full type hints for excellent IDE support
  • Comprehensive input validation
  • 100% test coverage with rigorous edge case testing

🚀 Quick Start

pip install doubletake
from doubletake import DoubleTake

# Initialize with default settings
db = DoubleTake()

# Your data with PII
data = [
    {
        "user_id": 12345,
        "name": "John Doe",
        "email": "[email protected]", 
        "phone": "555-123-4567",
        "ssn": "123-45-6789"
    }
]

# Replace PII automatically
masked_data = db.mask_data(data)
# Result: email becomes "****@******.***", phone becomes "***-***-****"

🎛️ Advanced Configuration Examples

Generate Realistic Fake Data

db = DoubleTake(use_faker=True)
# Emails become: [email protected]
# Phones become: +1-555-234-5678

Custom Replacement Logic

def custom_replacer(item, key, pattern_type, breadcrumbs):
    if pattern_type == 'email':
        return "***REDACTED_EMAIL***"
    elif pattern_type == 'ssn':
        return "XXX-XX-XXXX"
    return "***CLASSIFIED***"

db = DoubleTake(callback=custom_replacer)

Precise Path Targeting

# Only replace PII at specific locations
db = DoubleTake(known_paths=[
    'customer.email',
    'billing.ssn',
    'contacts.emergency.phone'
])

🏗️ Architecture Highlights

Intelligent Strategy Selection

# Fast path: Uses JSONGrepper for simple replacements
db = DoubleTake()  

# Advanced path: Uses DataWalker for complex scenarios  
db = DoubleTake(use_faker=True)
db = DoubleTake(callback=my_function)

Performance Optimized

  • JSONGrepper: ~0.1s for 10,000 records (simple patterns)
  • DataWalker: ~0.3s for 10,000 records (with fake data generation)

🧪 Real-World Use Cases

API Response Sanitization

Perfect for sanitizing API responses before logging:

api_response = {
    "status": "success", 
    "data": {
        "users": [
            {"id": 1, "email": "[email protected]", "role": "admin"}
        ]
    }
}

db = DoubleTake()
safe_response = db.mask_data([api_response])[0]
# Safe to log without exposing PII

Database Export Anonymization

Anonymize database exports for development environments:

db_records = [
    {"patient_id": "PT001", "ssn": "123-45-6789", "email": "[email protected]"}
]

db = DoubleTake(use_faker=True)
anonymized_records = db.mask_data(db_records)
# Safe for development with realistic data

🔬 Quality & Testing

  • 100% Test Coverage: Comprehensive test suite with edge case coverage
  • Type Safety: Full type hints and mypy compatibility
  • Input Validation: Robust configuration validation with clear error messages
  • Cross-Platform: Tested on Python 3.9+ across major platforms
  • Performance Tested: Benchmarked with large datasets

🤝 Contributing & Community

We're thrilled to have built something that we hope will be valuable to the Python community! This is just the beginning, and we're excited to see how you use doubletake in your projects.

Get Involved

Development Setup

git clone https://github.com/dual/doubletake.git
cd doubletake
pipenv install --dev
pipenv run test

📋 What's Next?

We have exciting plans for future releases:

  • Additional PII pattern types (driver's licenses, passport numbers, etc.)
  • Performance optimizations for extremely large datasets
  • Plugin architecture for custom PII detectors
  • Integration with popular data processing frameworks
  • Enhanced documentation and tutorials

🙏 Acknowledgments

Special thanks to our early adopters, beta testers, and everyone who provided feedback during development. Your input was invaluable in making doubletake robust and user-friendly.

📄 License & Links


🎯 Installation

pip install doubletake

Minimum Requirements: Python 3.9+

Dependencies:

  • faker (for realistic fake data generation)
  • msgspec (for high-performance JSON processing)
  • typing_extensions (for enhanced type support)

Thank you for your interest in doubletake! We can't wait to see what you build with it.

Made with ❤️ for data privacy and security.

— The doubletake Team

Full Changelog: https://github.com/dual/doubletake/commits/1.0.0