Skip to content

Conversation

@adwaitjr10
Copy link

@adwaitjr10 adwaitjr10 commented Jan 26, 2026

This PR introduces a new PIIMasking component to the processing category. It allows users to automatically detect and mask sensitive information such as emails, phone numbers, credit cards, SSNs, and IP addresses using pre-defined regex patterns. It also supports custom regex patterns and customizable masking templates. This is a critical feature for users who need to ensure data privacy before sending information to LLMs or other downstream components.

Changes:

  • Added PIIMaskingComponent in src/lfx/src/lfx/components/processing/pii_masking.py.
    • Registered the component in the processing module.
    • Added unit tests in src/lfx/tests/unit/components/processing/test_pii_masking.py.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a new PII Masking component that detects and masks sensitive information in text, including emails, phones, credit cards, SSN, and IP addresses.
    • Supports user-configurable custom patterns and replacement templates for masked text.
  • Tests

    • Added comprehensive unit tests validating PII masking functionality across various scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions github-actions bot added the community Pull Request from an external contributor label Jan 26, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 26, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review

Walkthrough

Introduces a new PIIMaskingComponent that detects and masks various PII patterns (emails, phones, credit cards, SSN, IPs) in text using configurable regex patterns and replacement templates. The component is added to the processing module's public API with lazy import support and comprehensive unit tests.

Changes

Cohort / File(s) Summary
Public API Exposure
src/lfx/src/lfx/components/processing/__init__.py
Adds PIIMaskingComponent to module's public surface: imports in TYPE_CHECKING block, lazy import mapping in _dynamic_imports, and inclusion in __all__ for package export.
Component Implementation
src/lfx/src/lfx/components/processing/pii_masking.py
New PIIMaskingComponent class that masks PII patterns. Defines configurable inputs (text, masking toggles for email/phone/credit card/SSN/IP, custom patterns, replacement template) and output for masked text. Implements get_masked_text() method applying predefined regex patterns in sequence, followed by user-provided custom patterns with error handling. Uses <{entity}> as default replacement template.
Unit Tests
src/lfx/tests/unit/components/processing/test_pii_masking.py
Comprehensive test suite validating PII masking across emails, phones, credit cards, SSN, IP addresses, custom patterns, empty input handling, and replacement template customization.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 3
❌ Failed checks (3 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Quality And Coverage ⚠️ Warning Test suite has critical gaps failing to catch identified bugs: missing PHONE pattern inconsistency validation, lacking error handling tests for format string vulnerability with curly braces in labels, and insufficient edge case coverage. Add error handling tests for custom pattern labels with special characters, invalid regex patterns, and missing colons; validate PII_PATTERNS usage; add edge case tests for unicode/long inputs; update exception handler to catch KeyError and IndexError.
Test File Naming And Structure ⚠️ Warning Test suite lacks coverage of error conditions including custom patterns with special characters that break format strings, invalid regex patterns, and error logging verification. Add test cases for format string errors with special characters, invalid patterns, malformed specifications, and verify error handling and logging for KeyError and IndexError exceptions.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding a PII Masking component to the processing module for privacy enhancement. It is concise, focused, and directly reflects the primary objective of the changeset.
Test Coverage For New Implementations ✅ Passed The PR includes a comprehensive unit test file (test_pii_masking.py) that follows the project's backend naming convention and contains five meaningful test methods with real assertions validating PII masking functionality.
Excessive Mock Usage Warning ✅ Passed Test file uses real component instantiation with actual input/output assertions rather than excessive mocking, demonstrating appropriate unit test design.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added the enhancement New feature or request label Jan 26, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 26, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 26, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/lfx/src/lfx/components/processing/pii_masking.py`:
- Around line 70-77: The PHONE regex in PII_PATTERNS is inconsistent with the
inline phone regex in get_masked_text, creating dead code and maintenance
confusion; choose one approach and make them consistent — either remove the
"PHONE" entry from PII_PATTERNS and keep the refined inline pattern in
get_masked_text, or update PII_PATTERNS["PHONE"] to the refined pattern and
replace the inline pattern in get_masked_text to reference
PII_PATTERNS["PHONE"]; ensure all references use the same symbol (PII_PATTERNS
and PHONE key) so future updates only need to change one location.
- Around line 113-117: The exception handler in the try block that applies
custom patterns (where pattern, label = line.split(...); masked_text =
re.sub(..., template.format(entity=label.strip()), masked_text)) doesn't catch
format-string errors from template.format, so add KeyError and IndexError to the
except tuple and update the self.log call to include the caught exception
variable (e) as done for other exceptions; specifically modify the except clause
to catch (re.error, AttributeError, ValueError, KeyError, IndexError) around the
template.format call used in the pii masking logic.
🧹 Nitpick comments (3)
src/lfx/tests/unit/components/processing/test_pii_masking.py (2)

68-75: Missing initialization of boolean mask flags.

test_template_customization only sets mask_emails and replacement_template but doesn't initialize mask_phones, mask_credit_cards, mask_ssn, mask_ip, or custom_patterns. This relies on the component's default values (True for all mask flags), which could cause unexpected masking if the input contained other PII types.

For test clarity and isolation, explicitly set all flags:

Suggested fix
     def test_template_customization(self):
         component = PIIMaskingComponent()
         component.text_input = "Email me at dev@langflow.org"
         component.mask_emails = True
+        component.mask_phones = False
+        component.mask_credit_cards = False
+        component.mask_ssn = False
+        component.mask_ip = False
+        component.custom_patterns = ""
         component.replacement_template = "REDACTED"

         result = component.get_masked_text()
         assert result.text == "Email me at REDACTED"

5-66: Consider adding edge case and error condition tests.

Per coding guidelines, tests should cover negative scenarios and edge cases. Consider adding tests for:

  1. Invalid custom pattern - verify error is logged but processing continues
  2. No PII in input - text returned unchanged
  3. Text with only whitespace
  4. Custom pattern with invalid regex syntax
  5. Overlapping PII patterns (e.g., a string that could match both phone and credit card)

Example test for invalid custom pattern:

Example additional test
def test_invalid_custom_pattern_continues_processing(self):
    component = PIIMaskingComponent()
    component.text_input = "Email: test@example.com"
    component.mask_emails = True
    component.mask_phones = False
    component.mask_credit_cards = False
    component.mask_ssn = False
    component.mask_ip = False
    # Invalid regex pattern - unclosed bracket
    component.custom_patterns = r"[invalid:INVALID" + "\n" + r"\d{5}:ZIP"
    component.replacement_template = "<{entity}>"

    # Should not raise, invalid pattern is logged and skipped
    result = component.get_masked_text()
    assert "<EMAIL>" in result.text

def test_no_pii_in_input(self):
    component = PIIMaskingComponent()
    component.text_input = "Hello, this is plain text with no PII."
    component.mask_emails = True
    component.mask_phones = True
    component.mask_credit_cards = True
    component.mask_ssn = True
    component.mask_ip = True
    component.custom_patterns = ""
    component.replacement_template = "<{entity}>"

    result = component.get_masked_text()
    assert result.text == "Hello, this is plain text with no PII."

Based on coding guidelines requiring edge case and error condition coverage.

src/lfx/src/lfx/components/processing/pii_masking.py (1)

76-76: IP pattern matches invalid addresses (0-999 per octet).

The pattern \b(?:\d{1,3}\.){3}\d{1,3}\b will match strings like 999.999.999.999. For PII masking, this is likely acceptable (false positives are safer than false negatives), but if precision matters, consider validating octet ranges.

@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 26, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community Pull Request from an external contributor enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant