detection-of-personal-data is a CLI tool to detect sensitive personal data, including names, contact information, health details, identification numbers, and financial details.
Users can input a variety of text files (e.g., .txt, .csv) which the service then processes, returning a JSON. The JSON not only indicates the presence of personal information but also provides tags for the detected data.
NLTK is a leading platform for building Python programs to work with human language data. It provides easy - to - use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial - strength NLP libraries, and an active discussion forum.
A regular expression is a method used in programming for pattern matching. Regular expressions provide a flexible and concise means to match strings of text.
State-of-the-art Machine Learning for PyTorch, TensorFlow and JAX. Transformers provides APIs to easily download and train state-of-the-art pretrained models.
Retrieve command help with:
poetry run detection-of-personal-data pii-detect --helpUsage: detection-of-personal-data pii-detect [OPTIONS]
Represents cli 'pii_detect' command
Options:
-i, --input TEXT path to text file [required]
-o, --output TEXT output directory where json file will be
written [default: .]
-tr, --thresh <TEXT FLOAT>... the minimum probability of private data for
labels
-f, --force overwrite existing file
--dry-run passthrough, will not write anything
--help Show this message and exit.Example:
poetry run detection-of-personal-data pii-detect \
-tr person 0.3 \
-tr passport 0.3 \
-i ./tests/data/inputs_test/text \
-o ./tests/data/outputs -fThe repository targets python 3.9 and higher.
The repository uses Poetry as python packaging and dependency management. Be sure to have it properly installed before.
curl -sSL https://install.python-poetry.org | python3You can follow the link below on how to install and configure Docker on your local machine:
Project is built by poetry. Initialize the project using:
poetry install
β οΈ Ensure your code complies with our linters to pass CI checks.
Code linting is performed by flake8.
poetry run flake8 --count --show-source --statisticsStatic type check is performed by mypy.
poetry run mypy .To improve code quality, we use other linters in our workflows, if you want them to succeed in the CI, please check these additional linters.
Markdown linting is performed by markdownlint-cli.
markdownlint "**/*.md"Docker linting is performed hadolint.
hadolint Dockerfile
β οΈ Be sure to write tests that succeed to pass CI checks.
Unit testing is performed by the pytest testing framework.
poetry run pytest -vBuild a local docker image using the following command line:
docker build -t detection-of-personal-data .Once built, you can run the container locally with the following command line:
docker run -ti --rm detection-of-personal-dataPlease check out OKP4 health files :