diff --git a/docs/_posts/aymanechilah/2023-01-10-general_model_table_detection_v2_en_3_2.md b/docs/_posts/aymanechilah/2023-01-10-general_model_table_detection_v2_en_3_2.md index e716e0f821..6a104475b2 100644 --- a/docs/_posts/aymanechilah/2023-01-10-general_model_table_detection_v2_en_3_2.md +++ b/docs/_posts/aymanechilah/2023-01-10-general_model_table_detection_v2_en_3_2.md @@ -25,7 +25,7 @@ Here it is used the CascadeTabNet general model for table detection inspired by ## Predicted Entities {:.btn-box} - +[Live Demo](https://demo.johnsnowlabs.com/ocr/IMAGE_TABLE_DETECTION_ONLY/){:.button.button-orange.button-orange-trans.co.button-icon} [Open in Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-ocr-workshop/blob/master/jupyter/Cards/SparkOcrImageTableDetection.ipynb){:.button.button-orange.button-orange-trans.co.button-icon} [Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/ocr/general_model_table_detection_v2_en_3.3.0_3.0_1623301511401.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} [Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/ocr/general_model_table_detection_v2_en_3.3.0_3.0_1623301511401.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} diff --git a/docs/_posts/aymanechilah/2023-01-10-ocr_small_handwritten_en_2_4.md b/docs/_posts/aymanechilah/2023-01-10-ocr_small_handwritten_en_2_4.md index c753452825..0d9a7f686c 100644 --- a/docs/_posts/aymanechilah/2023-01-10-ocr_small_handwritten_en_2_4.md +++ b/docs/_posts/aymanechilah/2023-01-10-ocr_small_handwritten_en_2_4.md @@ -47,8 +47,7 @@ text_detector = ImageTextDetectorV2 \ .setOutputCol("text_regions") \ .setWithRefiner(True) \ .setSizeThreshold(-1) \ - .setLinkThreshold(0.3) \ - .setWidth(500) + .setLinkThreshol .setWidth(500) # Try "ocr_base_handwritten" for better quality ocr = ImageToTextV2.pretrained("ocr_small_handwritten", "en", "clinical/ocr") \ diff --git a/docs/_posts/aymanechilah/2023-01-10-visualner_keyvalue_10kfilings_en_3_2.md b/docs/_posts/aymanechilah/2023-01-10-visualner_keyvalue_10kfilings_en_3_2.md index eaba18573c..9ae588a524 100644 --- a/docs/_posts/aymanechilah/2023-01-10-visualner_keyvalue_10kfilings_en_3_2.md +++ b/docs/_posts/aymanechilah/2023-01-10-visualner_keyvalue_10kfilings_en_3_2.md @@ -26,7 +26,8 @@ This is a Form Recognition / Key Value extraction model, trained on the summary `KEY`, `VALUE`, `HEADER` {:.btn-box} -[Live Demo](https://nlp.johnsnowlabs.com/demos){:.button.button-orange.button-orange-trans.co.button-icon} +[Live Demo](https://demo.johnsnowlabs.com/finance/VISUALNER_10KFILINGS/){:.button.button-orange.button-orange-trans.co.button-icon} + [Open in Colab](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/finance-nlp/90.2.Financial_Visual_NER.ipynb){:.button.button-orange.button-orange-trans.co.button-icon} [Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/ocr/visualner_keyvalue_10kfilings_en_4.0.0_3.2_1663781115795.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} [Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/ocr/visualner_keyvalue_10kfilings_en_4.0.0_3.2_1663781115795.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} diff --git a/docs/_posts/aymanechilah/2023-01-17-ocr_base_handwritten_v2_en_3_2.md b/docs/_posts/aymanechilah/2023-01-17-ocr_base_handwritten_v2_en_3_2.md index 086ed8d4da..880145c2ac 100644 --- a/docs/_posts/aymanechilah/2023-01-17-ocr_base_handwritten_v2_en_3_2.md +++ b/docs/_posts/aymanechilah/2023-01-17-ocr_base_handwritten_v2_en_3_2.md @@ -49,7 +49,7 @@ text_detector = ImageTextDetectorV2 \ .setLinkThreshold(0.3) \ .setWidth(500) -ocr = ImageToTextV2Opt.pretrained("ocr_base_handwritten_v2", "en", "clinical/ocr") \ +ocr = ImageToTextV2.pretrained("ocr_base_handwritten_v2", "en", "clinical/ocr") \ .setInputCols(["image", "text_regions"]) \ .setGroupImages(True) \ .setOutputCol("text") \ diff --git a/docs/_posts/aymanechilah/2023-01-17-ocr_base_handwritten_v2_opt_en_3_2.md b/docs/_posts/aymanechilah/2023-01-17-ocr_base_handwritten_v2_opt_en_3_2.md index 06eb3798f7..63b96b89ba 100644 --- a/docs/_posts/aymanechilah/2023-01-17-ocr_base_handwritten_v2_opt_en_3_2.md +++ b/docs/_posts/aymanechilah/2023-01-17-ocr_base_handwritten_v2_opt_en_3_2.md @@ -50,7 +50,7 @@ text_detector = ImageTextDetectorV2 \ .setLinkThreshold(0.3) \ .setWidth(500) -ocr = ImageToTextV2Opt.pretrained("ocr_base_handwritten_v2_opt", "en", "clinical/ocr") \ +ocr = ImageToTextV2.pretrained("ocr_base_handwritten_v2_opt", "en", "clinical/ocr") \ .setInputCols(["image", "text_regions"]) \ .setGroupImages(True) \ .setOutputCol("text") \ diff --git a/docs/_posts/aymanechilah/2023-01-17-ocr_base_printed_v2_en_3_2.md b/docs/_posts/aymanechilah/2023-01-17-ocr_base_printed_v2_en_3_2.md index a840962511..c4a48e27f9 100644 --- a/docs/_posts/aymanechilah/2023-01-17-ocr_base_printed_v2_en_3_2.md +++ b/docs/_posts/aymanechilah/2023-01-17-ocr_base_printed_v2_en_3_2.md @@ -50,7 +50,7 @@ text_detector = ImageTextDetectorV2 \ .setLinkThreshold(0.3) \ .setWidth(500) -ocr = ImageToTextV2Opt.pretrained("ocr_base_printed_v2", "en", "clinical/ocr") \ +ocr = ImageToTextV2.pretrained("ocr_base_printed_v2", "en", "clinical/ocr") \ .setInputCols(["image", "text_regions"]) \ .setGroupImages(True) \ .setOutputCol("text") \ diff --git a/docs/_posts/aymanechilah/2023-01-17-ocr_base_printed_v2_opt_en_3_2.md b/docs/_posts/aymanechilah/2023-01-17-ocr_base_printed_v2_opt_en_3_2.md index d0c17eeab8..756bf162e5 100644 --- a/docs/_posts/aymanechilah/2023-01-17-ocr_base_printed_v2_opt_en_3_2.md +++ b/docs/_posts/aymanechilah/2023-01-17-ocr_base_printed_v2_opt_en_3_2.md @@ -49,7 +49,7 @@ text_detector = ImageTextDetectorV2 \ .setLinkThreshold(0.3) \ .setWidth(500) -ocr = ImageToTextV2Opt.pretrained("ocr_base_printed_v2_opt", "en", "clinical/ocr") \ +ocr = ImageToTextV2.pretrained("ocr_base_printed_v2_opt", "en", "clinical/ocr") \ .setInputCols(["image", "text_regions"]) \ .setGroupImages(True) \ .setOutputCol("text") \ diff --git a/docs/_posts/aymanechilah/2023-07-11-dit_base_finetuned_rvlcdip_en_3_2.md b/docs/_posts/aymanechilah/2023-07-11-dit_base_finetuned_rvlcdip_en_3_2.md new file mode 100644 index 0000000000..028d755a69 --- /dev/null +++ b/docs/_posts/aymanechilah/2023-07-11-dit_base_finetuned_rvlcdip_en_3_2.md @@ -0,0 +1,116 @@ +--- +layout: model +title: DiT model pretrained on IIT-CDIP and finetuned on RVL-CDIP for document classification +author: John Snow Labs +name: dit_base_finetuned_rvlcdip +date: 2023-07-11 +tags: [en, licensed] +task: OCR Document Classification +language: en +nav_key: models +edition: Visual NLP 4.0.0 +spark_version: 3.2.1 +supported: true +annotator: VisualDocumentClassifierv3 +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +DiT was proposed in DiT: Self-supervised Pre-training for Document Image Transformer by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei. DiT applies the self-supervised objective of BEiT (BERT pre-training of Image Transformers) to 42 million document images. This model was trained for document image classification in the RVL-CDIP dataset (a collection of 400,000 images belonging to one of 16 classes). + +The abstract from the paper is the following: Image Transformer has recently achieved significant progress for natural image understanding, either using supervised (ViT, DeiT, etc.) or self-supervised (BEiT, MAE, etc.) pre-training techniques. In this paper, we propose DiT, a self-supervised pre-trained Document Image Transformer model using large-scale unlabeled text images for Document AI tasks, which is essential since no supervised counterparts ever exist due to the lack of human labeled document images. We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, as well as table detection. Experiment results have illustrated that the self-supervised pre-trained DiT model achieves new state-of-the-art results on these downstream tasks, e.g. document image classification (91.11 → 92.69), document layout analysis (91.0 → 94.9) and table detection (94.23 → 96.55). + + +## Predicted Entities + + + +{:.btn-box} +[Live Demo](https://demo.johnsnowlabs.com/ocr/IMAGE_CLASSIFIER/){:.button.button-orange.button-orange-trans.co.button-icon} +[Open in Colab](https://github.com/JohnSnowLabs/spark-ocr-workshop/blob/master/tutorials/Certification_Trainings/5.2.Visual_Document_Classifier_v3.ipynb){:.button.button-orange.button-orange-trans.co.button-icon} +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/ocr/dit_base_finetuned_rvlcdip_en_3.3.0_3.0_1654798502586.zip){:.button.button-orange.button-orange-trans.arr.button-icon} + +## How to use + +