-
Notifications
You must be signed in to change notification settings - Fork 8.4k
feat(components): add PII Masking component for enhanced privacy #11453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(components): add PII Masking component for enhanced privacy #11453
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
WalkthroughIntroduces a new PIIMaskingComponent that detects and masks various PII patterns (emails, phones, credit cards, SSN, IPs) in text using configurable regex patterns and replacement templates. The component is added to the processing module's public API with lazy import support and comprehensive unit tests. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 3❌ Failed checks (3 warnings)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/lfx/src/lfx/components/processing/pii_masking.py`:
- Around line 70-77: The PHONE regex in PII_PATTERNS is inconsistent with the
inline phone regex in get_masked_text, creating dead code and maintenance
confusion; choose one approach and make them consistent — either remove the
"PHONE" entry from PII_PATTERNS and keep the refined inline pattern in
get_masked_text, or update PII_PATTERNS["PHONE"] to the refined pattern and
replace the inline pattern in get_masked_text to reference
PII_PATTERNS["PHONE"]; ensure all references use the same symbol (PII_PATTERNS
and PHONE key) so future updates only need to change one location.
- Around line 113-117: The exception handler in the try block that applies
custom patterns (where pattern, label = line.split(...); masked_text =
re.sub(..., template.format(entity=label.strip()), masked_text)) doesn't catch
format-string errors from template.format, so add KeyError and IndexError to the
except tuple and update the self.log call to include the caught exception
variable (e) as done for other exceptions; specifically modify the except clause
to catch (re.error, AttributeError, ValueError, KeyError, IndexError) around the
template.format call used in the pii masking logic.
🧹 Nitpick comments (3)
src/lfx/tests/unit/components/processing/test_pii_masking.py (2)
68-75: Missing initialization of boolean mask flags.
test_template_customizationonly setsmask_emailsandreplacement_templatebut doesn't initializemask_phones,mask_credit_cards,mask_ssn,mask_ip, orcustom_patterns. This relies on the component's default values (Truefor all mask flags), which could cause unexpected masking if the input contained other PII types.For test clarity and isolation, explicitly set all flags:
Suggested fix
def test_template_customization(self): component = PIIMaskingComponent() component.text_input = "Email me at dev@langflow.org" component.mask_emails = True + component.mask_phones = False + component.mask_credit_cards = False + component.mask_ssn = False + component.mask_ip = False + component.custom_patterns = "" component.replacement_template = "REDACTED" result = component.get_masked_text() assert result.text == "Email me at REDACTED"
5-66: Consider adding edge case and error condition tests.Per coding guidelines, tests should cover negative scenarios and edge cases. Consider adding tests for:
- Invalid custom pattern - verify error is logged but processing continues
- No PII in input - text returned unchanged
- Text with only whitespace
- Custom pattern with invalid regex syntax
- Overlapping PII patterns (e.g., a string that could match both phone and credit card)
Example test for invalid custom pattern:
Example additional test
def test_invalid_custom_pattern_continues_processing(self): component = PIIMaskingComponent() component.text_input = "Email: test@example.com" component.mask_emails = True component.mask_phones = False component.mask_credit_cards = False component.mask_ssn = False component.mask_ip = False # Invalid regex pattern - unclosed bracket component.custom_patterns = r"[invalid:INVALID" + "\n" + r"\d{5}:ZIP" component.replacement_template = "<{entity}>" # Should not raise, invalid pattern is logged and skipped result = component.get_masked_text() assert "<EMAIL>" in result.text def test_no_pii_in_input(self): component = PIIMaskingComponent() component.text_input = "Hello, this is plain text with no PII." component.mask_emails = True component.mask_phones = True component.mask_credit_cards = True component.mask_ssn = True component.mask_ip = True component.custom_patterns = "" component.replacement_template = "<{entity}>" result = component.get_masked_text() assert result.text == "Hello, this is plain text with no PII."Based on coding guidelines requiring edge case and error condition coverage.
src/lfx/src/lfx/components/processing/pii_masking.py (1)
76-76: IP pattern matches invalid addresses (0-999 per octet).The pattern
\b(?:\d{1,3}\.){3}\d{1,3}\bwill match strings like999.999.999.999. For PII masking, this is likely acceptable (false positives are safer than false negatives), but if precision matters, consider validating octet ranges.
This PR introduces a new
PIIMaskingcomponent to the processing category. It allows users to automatically detect and mask sensitive information such as emails, phone numbers, credit cards, SSNs, and IP addresses using pre-defined regex patterns. It also supports custom regex patterns and customizable masking templates. This is a critical feature for users who need to ensure data privacy before sending information to LLMs or other downstream components.Changes:
PIIMaskingComponentinsrc/lfx/src/lfx/components/processing/pii_masking.py.src/lfx/tests/unit/components/processing/test_pii_masking.py.Summary by CodeRabbit
Release Notes
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.