Skip to content

pagopa/pagopa-anonymizer

Repository files navigation

PII Anonymization Service

Quality Gate Status

This project is a simple Flask API that exposes an endpoint to anonymize text, with a particular focus on identifying and masking Personally Identifiable Information (PII) in Italian text using Microsoft Presidio.


API Documentation 📖

This service exposes a single POST endpoint: /anonymize.

Request:

  • Method: POST
  • Path: /anonymize
  • Headers: Content-Type: application/json
  • Body:
    {
      "text": "String containing the text to be anonymized."
    }

Successful Response (200 OK):

  • Headers: Content-Type: application/json
  • Body:
    {
      "text": "String containing the anonymized text."
    }

Error Responses:

  • 400 Bad Request: Invalid JSON or missing text field.
  • 500 Internal Server Error: Internal processing error.

Technology Stack 🛠️

  • Python 3.8+
  • Flask: Micro web framework for creating the API.
  • Presidio Analyzer: For PII detection.
  • Presidio Anonymizer: For PII anonymization/masking.
  • spaCy: NLP library used by Presidio for entity recognition (specifically with the Italian model it_core_news_lg).

Start Project Locally 🚀

Prerequisites

  • Python 3.8+ and pip
  • git (for cloning)

Setup Instructions

  1. Clone the repository:

    git clone https://github.com/<YOUR_GITHUB_USER>/<YOUR_REPO_NAME>.git
    cd <YOUR_REPO_NAME>
  2. Create and activate a virtual environment:

    python3 -m venv venv
    # On Windows:
    # venv\Scripts\activate
    # On macOS/Linux:
    source venv/bin/activate
  3. Install dependencies:

    pip install -r requirements.txt

Run the Application

Start the Flask development server:

python3 app.py

The application will typically be available at http://127.0.0.1:3000/.


Development 💻

Project Structure

  • app.py: Flask application entry point, API endpoint definition.
  • presidio_logic.py: Core Presidio setup, custom recognizers, and anonymization functions.
  • requirements.txt: Python dependencies.
  • venv/: Virtual environment directory (usually gitignored).
  • README.md: This file.

Local Development Server

As described in "Run the Application" above:

python app.py

Python Version Management

It's recommended to use a virtual environment (venv) to manage Python versions and dependencies per project. Tools like pyenv can also be used for managing multiple Python installations.


Testing 🧪

Unit Testing (Conceptual)

While this template doesn't include specific unit tests, you would typically use a framework like pytest or Python's built-in unittest module.

To run unit tests (example using pytest):

  1. Install pytest: pip install pytest
  2. Create test files (e.g., test_presidio_logic.py, test_app.py) in a tests/ directory.
  3. Run tests:
    pytest

Manual API Testing

Use tools like curl, Postman, or Insomnia to send POST requests to the /anonymize endpoint.

Example using curl:

curl -X POST \
  http://127.0.0.1:3000/anonymize \
  -H 'Content-Type: application/json' \
  -d '{
    "text": "Il signor Mario Rossi vive in Via Roma 123. Contattare a [email protected]"
  }'

About

Anonymize text from PII

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors 6