Update pipeline card (#1399)

JohnSnowLabs · Aug 21, 2024 · ed4d2d3 · ed4d2d3
1 parent 3abbede
commit ed4d2d3
Showing 1 changed file with 27 additions and 4 deletions.
diff --git a/..._posts/kolia1985/2024-06-12-pdf_deid_subentity_context_augmented_pipeline_en.md b/..._posts/kolia1985/2024-06-12-pdf_deid_subentity_context_augmented_pipeline_en.md
@@ -18,7 +18,8 @@ use_language_switcher: "Python-Scala-Java"
 
 ## Description
 
-This pipeline can be used to extract PHI information from PDF such as AGE, BIOID, CITY, COUNTRY, DATE, DEVICE, DOCTOR, EMAIL, FAX, HEALTHPLAN, HOSPITAL, IDNUM, LOCATION, MEDICALRECORD, ORGANIZATION, PATIENT, PHONE, PROFESSION, STATE, STREET, URL, USERNAME, ZIP, ACCOUNT, LICENSE, VIN, SSN, DLN, PLATE, IPADDR entities.
+This pipeline can be used to mask PHI information in PDFs. Masked entities include AGE, BIOID, CITY, COUNTRY, DATE, DEVICE, DOCTOR, EMAIL, FAX, HEALTHPLAN, HOSPITAL, IDNUM, LOCATION, MEDICALRECORD, ORGANIZATION, PATIENT, PHONE, PROFESSION, STATE, STREET, URL, USERNAME, ZIP, ACCOUNT, LICENSE, VIN, SSN, DLN, PLATE, and IPADDR.
+The output is a PDF document, similar to the one at the input, but with black bounding boxes on top of the targeted entities.
 
 {:.btn-box}
 <button class="button button-orange" disabled>Live Demo</button>
@@ -28,8 +29,6 @@ This pipeline can be used to extract PHI information from PDF such as AGE, BIOID
 
 ## How to use
 
-
-
 <div class="tabs-box" markdown="1">
 {% include programmingLanguageSelectScalaPythonNLU.html %}
 ```python
@@ -82,4 +81,28 @@ deid_pipeline = PretrainedPipeline("pdf_deid_subentity_context_augmented_pipelin
 - PositionFinder
 - ImageDrawRegions
 - ImageToPdf
-- PdfAssembler
+- PdfAssembler
+
+
+## Speed Benchmarks
+
+- **Dataset:** 1000 scanned PDF pages.
+- **Instance :** 
+  - m5n.4xlarge (16 vCPUs, 64 GiB memory) 
+  - m5n.8xlarge (32 vCPUs, 128 GiB memory)
+- **AMI:** ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20240411
+- **Versions:**
+  - **spark-nlp Version:** v5.4.0
+  - **visual-nlp Version:** v5.3.2
+  - **spark-nlp-jsl Version :** v5.3.2
+  - **Spark Version :** v3.4.1
+- **Visual NLP Pipeline:** 'pdf_deid_subentity_context_augmented_pipeline'
+
+
+#### Benchmark Table
+
+{:.table-model-big}
+| Instance      | memory | cores | input\_data\_pages| partition     | second per page | timing  |
+| ------------- | ------ | ----- | ----------------- | ------------- | --------------- | ------- |
+| m5n.4xlarge   | 64 GB  | 16    | 1000              | 10            | 0.24            | 4 mins  |
+| m5n.8xlarge   | 128 GB | 32    | 1000              | 32            | 0.15            | 2.5 mins|