Skip to content

Commit d1b7ce9

Browse files
committed
Handles single data items correctly
Ensures the mask_data method correctly processes single data items, as well as lists of data items, and returns the appropriate type. Previously, the code only worked correctly for lists. Bump pytest-cov from 6.2.1 to 6.3.0 (#1) Updates supported version in security policy Reflects that version 1.0.x is currently supported with security updates. Bump mypy from 1.17.1 to 1.18.1 (#4) Bump pytest-cov from 6.3.0 to 7.0.0 (#3) Updates pylint target for doubletake Updates the pylint target in the lint-code command to analyze the 'doubletake' project instead of 'chilo_api'. This ensures that the linting process focuses on the relevant codebase for the doubletake project. Adds MetaMatch to pass matching context Introduces a MetaMatch dataclass to encapsulate pattern matching context, including the matched pattern, value, replacement, and breadcrumbs. This allows custom callback functions to access more comprehensive information about the match during replacement, enabling more flexible and context-aware data masking logic. It replaces passing individual pattern key/value pairs, and is provided to the user callback. Refactors extras setting to use a dictionary Updates the "extras" setting to use a dictionary for custom regex patterns, allowing for named patterns. This change enhances configurability and readability by allowing users to define custom regex patterns with associated names, making it easier to manage and understand the purpose of each pattern. It also improves validation and type safety. Adds faker-based address masking with extras Implements masking for address data using Faker, enhancing data obfuscation with dynamically generated realistic address values. Also adds support for idempotent address masking, ensuring consistent replacement for duplicate address values. Creates setup.py for package installation Initializes the setup.py file with metadata and dependencies. Defines the package name, version, author, description, required Python version, install dependencies, keywords, project URLs, classifiers, license, and supported platforms. This allows the package to be installed using setuptools.
1 parent 997b1f4 commit d1b7ce9

20 files changed

+431
-217
lines changed

.circleci/config.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ commands:
3333
lint-code:
3434
steps:
3535
- run: pipenv run lint
36-
- run: pipenv run pylint chilo_api --output-format=json > ./coverage/lint/pylint_report.json || exit 0
36+
- run: pipenv run pylint doubletake --output-format=json > ./coverage/lint/pylint_report.json || exit 0
3737
- run: pipenv run pylint-json2html ./coverage/lint/pylint_report.json -o ./coverage/lint/pylint_report.html || exit 0
3838
- store_artifacts:
3939
path: ./coverage/lint

Pipfile.lock

Lines changed: 45 additions & 46 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

README.md

Lines changed: 22 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -108,15 +108,27 @@ masked_data = db.mask_data(data)
108108
### Custom Replacement Logic
109109

110110
```python
111-
def custom_replacer(pattern_key: str, pattern_value: str, possible_replacement: str, value: Any):
111+
# The callback receives:
112+
# meta_match: MetaMatch (fields: pattern, value, replacement, breadcrumbs)
113+
# faker: Faker instance (for generating fake data)
114+
# value: The matched value being replaced
115+
116+
def custom_replacer(meta_match, faker, value):
112117
"""Custom replacement with full context"""
113-
if pattern_key == 'email':
118+
# meta_match.pattern: the pattern key (e.g. 'email', 'ssn', etc.)
119+
# meta_match.value: the regex pattern or extra pattern string
120+
# meta_match.replacement: the default replacement value
121+
# meta_match.breadcrumbs: set of path keys (for path-aware logic)
122+
if meta_match.pattern == 'email':
114123
return "***REDACTED_EMAIL***"
115-
if pattern_key == 'ssn':
124+
if meta_match.pattern == 'ssn':
116125
return "XXX-XX-XXXX"
126+
if meta_match.pattern == 'city':
127+
# Use Faker to generate a fake city name
128+
return faker.city()
117129
if 'secret' in value:
118130
return "***CLASSIFIED***"
119-
return replacement
131+
return meta_match.replacement
120132

121133
db = DoubleTake(callback=custom_replacer)
122134
```
@@ -127,7 +139,10 @@ db = DoubleTake(callback=custom_replacer)
127139
# Only replace certain types, allow others through
128140
db = DoubleTake(
129141
allowed=['email'], # Don't replace emails
130-
extras=[r'CUST-\d+', r'REF-[A-Z]{3}-\d{4}'] # Custom patterns
142+
extras={
143+
'customer_id': r'CUST-\d+',
144+
'reference': r'REF-[A-Z]{3}-\d{4}'
145+
} # Custom patterns as a dict
131146
)
132147
```
133148

@@ -279,7 +294,7 @@ db = DoubleTake(
279294
use_faker=False, # Use fake data vs asterisks
280295
callback=None, # Custom replacement function
281296
allowed=[], # Pattern types to skip
282-
extras=[], # Additional regex patterns
297+
extras={}, # Additional regex patterns as a dict
283298
safe_values=[], # Values to protect from replacement
284299
idempotent=False, # Prevent double-masking operations
285300
known_paths=[], # Specific paths to target
@@ -359,7 +374,7 @@ logs = [
359374

360375
db = DoubleTake(
361376
safe_values=['[email protected]'], # Keep official support email visible
362-
extras=[r'\+1-555-SUPPORT'] # Keep support phone pattern
377+
extras={'phone': r'\+1-555-SUPPORT'} # Keep support phone pattern
363378
)
364379

365380
sanitized_logs = db.mask_data(logs)

SECURITY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ currently being supported with security updates.
77

88
| Version | Supported |
99
| ------- | ------------------ |
10-
| 2.0.x | :white_check_mark: |
10+
| 1.0.x | :white_check_mark: |
1111

1212
## Reporting a Vulnerability
1313

doubletake/__init__.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -106,11 +106,12 @@ def mask_data(self, data: list[Any]) -> list[Any]:
106106
>>> result = db.mask_data(data)
107107
>>> # Emails and phone numbers will be replaced, names and IDs preserved
108108
"""
109+
process_data = data if isinstance(data, list) else [data]
109110
return_data: list[Any] = []
110-
for item in data:
111+
for item in process_data:
111112
masked_item = self.__process_data_item(item)
112113
return_data.append(masked_item)
113-
return return_data
114+
return return_data if isinstance(data, list) else return_data[0]
114115

115116
def __process_data_item(self, item: Any) -> Any:
116117
if not self.__use_faker and self.__callback is None:

doubletake/searcher/data_walker.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33

44
from doubletake.utils.pattern_manager import PatternManager
55
from doubletake.types.settings import Settings
6+
from doubletake.utils.meta_match import MetaMatch
67

78

89
class DataWalker:
@@ -64,18 +65,19 @@ class DataWalker:
6465
"""
6566

6667
def __init__(self, **kwargs: Unpack[Settings]) -> None:
67-
self.__breadcrumbs: set[str] = set()
6868
self.__known_paths: list[str] = kwargs.get('known_paths', [])
69+
self.__meta_match: MetaMatch = MetaMatch()
70+
kwargs['meta_match'] = self.__meta_match
6971
self.__pattern_manager: PatternManager = PatternManager(**kwargs)
7072

7173
def walk_and_replace(self, item: dict[str, Any]) -> dict[str, Any]:
72-
self.__breadcrumbs = set()
74+
self.__meta_match.breadcrumbs = set()
7375
self.__walk_dict(item, None)
7476
return item
7577

7678
def __walk_dict(self, item: dict[str, Any], current_key: Optional[str]) -> None:
7779
if current_key is not None:
78-
self.__breadcrumbs.add(current_key)
80+
self.__meta_match.breadcrumbs.add(current_key)
7981
for key in item.keys():
8082
self.__determine_next_step(item, key)
8183

@@ -96,6 +98,6 @@ def __replace_known_paths(self, item: Any) -> None:
9698
for known_pattern in self.__known_paths:
9799
known_list = known_pattern.split('.')
98100
key = known_list.pop()
99-
if known_list == list(self.__breadcrumbs):
101+
if known_list == list(self.__meta_match.breadcrumbs):
100102
if isinstance(item, dict) and key in item:
101103
item[key] = self.__pattern_manager.replace_value(item[key], item[key], key, None)

doubletake/searcher/json_grepper.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55

66
from doubletake.utils.pattern_manager import PatternManager
77
from doubletake.types.settings import Settings
8+
from doubletake.utils.meta_match import MetaMatch
89

910

1011
class JSONGrepper:
@@ -78,6 +79,7 @@ class JSONGrepper:
7879
"""
7980

8081
def __init__(self, **kwargs: Unpack[Settings]) -> None:
82+
kwargs['meta_match'] = MetaMatch()
8183
self.__pattern_manager: PatternManager = PatternManager(**kwargs)
8284

8385
def grep_and_replace(self, item: Any) -> Any:

doubletake/searcher/string_replacer.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33

44
from doubletake.utils.pattern_manager import PatternManager
55
from doubletake.types.settings import Settings
6+
from doubletake.utils.meta_match import MetaMatch
67

78

89
class StringReplacer:
@@ -71,6 +72,7 @@ class StringReplacer:
7172
"""
7273

7374
def __init__(self, **kwargs: Unpack[Settings]) -> None:
75+
kwargs['meta_match'] = MetaMatch()
7476
self.__pattern_manager: PatternManager = PatternManager(**kwargs)
7577

7678
def scan_and_replace(self, item: str) -> Union[str, None]:

doubletake/types/settings.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,16 @@
11
from typing_extensions import TypedDict, NotRequired, Callable
22

3+
from doubletake.utils.meta_match import MetaMatch
4+
35

46
class Settings(TypedDict, total=False):
57
allowed: NotRequired[list[str]]
68
callback: NotRequired[Callable]
7-
extras: NotRequired[list[str]]
9+
extras: NotRequired[dict[str, str]]
810
idempotent: NotRequired[bool]
911
known_paths: NotRequired[list[str]]
1012
maintain_length: NotRequired[bool]
1113
replace_with: NotRequired[str]
1214
safe_values: NotRequired[list[str]]
1315
use_faker: NotRequired[bool]
16+
meta_match: NotRequired[MetaMatch]

doubletake/utils/config_validator.py

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -101,17 +101,15 @@ def _validate_extras(config: Settings) -> None:
101101
extras = config.get('extras')
102102
if extras is None:
103103
return
104-
105-
if not isinstance(extras, list):
106-
raise ValueError('The "extras" key must be a list of regexstrings if provided.')
107-
108-
for item in extras:
109-
if not isinstance(item, str):
110-
raise ValueError('The "extras" key must be a list of regexstrings if provided.')
104+
if not isinstance(extras, dict):
105+
raise ValueError('The "extras" key must be a dict of {str: regex string} if provided.')
106+
for key, value in extras.items():
107+
if not isinstance(key, str) or not isinstance(value, str):
108+
raise ValueError('Each key and value in "extras" must be a string (key: name, value: regex pattern).')
111109
try:
112-
re.compile(item)
110+
re.compile(value)
113111
except re.error as error:
114-
raise ValueError('The "extras" key must be a list of regexstrings if provided.') from error
112+
raise ValueError(f'Invalid regex pattern in "extras": {value}') from error
115113

116114
@staticmethod
117115
def _validate_safe_values(config: Settings) -> None:

0 commit comments

Comments
 (0)