diff --git a/.gitignore b/.gitignore index 2a827386..7b96deef 100644 --- a/.gitignore +++ b/.gitignore @@ -8,14 +8,12 @@ .styleenv .coverage flake8.txt -/build/ +**/build/* generated /dist/ *.hdf5 *.h5 -#!nlp_architect/server/angular-ui/dist/angular-ui/*.html -#*.html -docs-sources/build/** +docs-source/_build/** !nlp_architect/solutions/set_expansion/ui/templates/*.html .vscode !tests/fixtures/data/server/*.gz diff --git a/README.md b/README.md index 1fce530e..8482e875 100644 --- a/README.md +++ b/README.md @@ -41,10 +41,11 @@ Features: * Core NLP models used in many NLP tasks and useful in many NLP applications * Novel NLU models showcasing novel topologies and techniques * Optimized NLP/NLU models showcasing different optimization algorithms on neural NLP/NLU models -* Simple REST API server ([doc](http://nlp_architect.nervanasys.com/service.html)): - * serving trained models (for inference) - * plug-in system for adding your own model -* 4 Demos of models (pre-trained by us) showcasing NLP Architect (Dependency parser, NER, Intent Extraction, Q&A) +* Model-oriented design: + * Train and run models from command-line. + * API for using models for inference in python. + * Procedures to define custom processes for training, inference or anything related to processing. + * CLI sub-system for running procedures * Based on optimized Deep Learning frameworks: * [TensorFlow] @@ -52,13 +53,21 @@ Features: * [Intel-Optimized TensorFlow with MKL-DNN] * [Dynet] -* Documentation [website](http://nlp_architect.nervanasys.com/) and [tutorials](http://nlp_architect.nervanasys.com/tutorials.html) * Essential utilities for working with NLP models - Text/String pre-processing, IO, data-manipulation, metrics, embeddings. +* Plug-able REST API server to serve models via REST API ## Installing NLP Architect We recommend to install NLP Architect in a new python environment, to use python 3.6+ with up-to-date `pip`, `setuptools` and `h5py`. +### Install using `pip` + +Includes only core library (without `examples/` directory) + +```sh +pip install nlp-architect +``` + ### Install from source (Github) Includes core library and all content (example scripts, datasets, tutorials) @@ -76,14 +85,6 @@ Install (in develop mode) pip install -e . ``` -### Install from pypi (using `pip install`) - -Includes only core library - -```sh -pip install nlp-architect -``` - ### Further installation options Refer to our full [installation instructions](http://nlp_architect.nervanasys.com/installation.html) page on our website for complete details on how to install NLP Architect and other backend installations such as MKL-DNN or GPU backends. @@ -93,40 +94,30 @@ Users can install any deep learning backends manually before/after they install NLP models that provide best (or near) in class performance: -* [Word chunking](http://nlp_architect.nervanasys.com/chunker.html) -* [Named Entity Recognition](http://nlp_architect.nervanasys.com/ner_crf.html) +* [Word chunking](http://nlp_architect.nervanasys.com/tagging/sequence_tagging.html#word-chunker) +* [Named Entity Recognition](http://nlp_architect.nervanasys.com/tagging/sequence_tagging.html#named-entity-recognition) * [Dependency parsing](http://nlp_architect.nervanasys.com/bist_parser.html) * [Intent Extraction](http://nlp_architect.nervanasys.com/intent.html) -* [Sentiment classification](http://nlp_architect.nervanasys.com/supervised_sentiment.html) -* [Language models](http://nlp_architect.nervanasys.com/tcn.html) -* [Transformers](http://nlp_architect.nervanasys.com/) (for most NLP tasks) +* [Sentiment classification](http://nlp_architect.nervanasys.com/sentiment.html#supervised-sentiment) +* [Language models](http://nlp_architect.nervanasys.com/lm.html#language-modeling-with-tcn) +* [Transformers](http://nlp_architect.nervanasys.com/transformers.html) (for NLP tasks) Natural Language Understanding (NLU) models that address semantic understanding: * [Aspect Based Sentiment Analysis (ABSA)](http://nlp_architect.nervanasys.com/absa.html) +* [Joint intent detection and slot tagging](http://nlp_architect.nervanasys.com/intent.html) * [Noun phrase embedding representation (NP2Vec)](http://nlp_architect.nervanasys.com/np2vec.html) * [Most common word sense detection](http://nlp_architect.nervanasys.com/word_sense.html) * [Relation identification](http://nlp_architect.nervanasys.com/identifying_semantic_relation.html) * [Cross document coreference](http://nlp_architect.nervanasys.com/cross_doc_coref.html) * [Noun phrase semantic segmentation](http://nlp_architect.nervanasys.com/np_segmentation.html) -Components instrumental for conversational AI: - -* [Joint intent detection and slot tagging](http://nlp_architect.nervanasys.com/intent.html) -* [Memory Networks for goal oriented dialog](http://nlp_architect.nervanasys.com/memn2n.html) - Optimizing NLP/NLU models and misc. optimization techniques: -* [Quantized BERT (8bit)](http://nlp_architect.nervanasys.com/) -* [Knowledge Distillation using BERT](http://nlp_architect.nervanasys.com/) +* [Quantized BERT (8bit)](http://nlp_architect.nervanasys.com/quantized_bert.html) +* [Knowledge Distillation using Transformers](http://nlp_architect.nervanasys.com/transformers_distillation.html) * [Sparse and Quantized Neural Machine Translation (GNMT)](http://nlp_architect.nervanasys.com/sparse_gnmt.html) -End-to-end Deep Learning-based NLP models: - -* [Reading comprehension](http://nlp_architect.nervanasys.com/reading_comprehension.html) -* [Language Modeling using Temporal Convolution Network (TCN)](http://nlp_architect.nervanasys.com/tcn.html) -* [Unsupervised Cross-lingual embeddings](http://nlp_architect.nervanasys.com/crosslingual_emb.html) - Solutions (End-to-end applications) using one or more models: * [Term Set expansion](http://nlp_architect.nervanasys.com/term_set_expansion.html) - uses the included word chunker as a noun phrase extractor and NP2Vec to create semantic term sets @@ -155,34 +146,8 @@ The main design guidelines are: * REST API servers with ability to serve trained models via HTTP * Extensive model documentation and tutorials -## Demo UI examples - -Dependency parser -

- -

-Intent Extraction -

- -

- -## Packages - -| Package | Description | -|------------------------- |------------------------------------------------------ | -| `nlp_architect.api` | Model API interfaces | -| `nlp_architect.common` | Common packages | -| `nlp_architect.cli` | Command line module | -| `nlp_architect.data` | Datasets, loaders and data processors | -| `nlp_architect.models` | NLP, NLU and End-to-End models | -| `nlp_architect.nn` | Topology related models and additions (per framework) | -| `nlp_architect.pipelines` | End-to-end NLP apps | -| `nlp_architect.procedures`| Procedure scripts | -| `nlp_architect.server` | API Server and demos UI | -| `nlp_architect.solutions` | Solution applications | -| `nlp_architect.utils` | Misc. I/O, metric, pre-processing and text utilities | - ### Note + NLP Architect is an active space of research and development; Throughout future releases new models, solutions, topologies and framework additions and changes will be made. We aim to make sure all models run with Python 3.6+. We @@ -191,7 +156,7 @@ encourage researchers and developers to contribute their work into the library. ## Citing If you use NLP Architect in your research, please use the following citation: -``` + @misc{izsak_peter_2018_1477518, title = {NLP Architect by Intel AI Lab}, month = nov, @@ -199,9 +164,9 @@ If you use NLP Architect in your research, please use the following citation: doi = {10.5281/zenodo.1477518}, url = {https://doi.org/10.5281/zenodo.1477518} } -``` ## Disclaimer + The NLP Architect is released as reference code for research purposes. It is not an official Intel product, and the level of quality and support may not be as expected from an official product. NLP Architect is intended to be used diff --git a/dev-requirements.txt b/dev-requirements.txt index 6148f87d..8ef6db14 100644 --- a/dev-requirements.txt +++ b/dev-requirements.txt @@ -1,4 +1,4 @@ -sphinx +sphinx==1.8.5 sphinx_rtd_theme flake8-html pep8 diff --git a/docs-source/source/CONTRIBUTING.rst b/docs-source/source/CONTRIBUTING.rst index 3bc6a8b3..67625145 100644 --- a/docs-source/source/CONTRIBUTING.rst +++ b/docs-source/source/CONTRIBUTING.rst @@ -22,8 +22,9 @@ Contribution Process * Create an issue on GitHub: https://github.com/NervanaSystems/nlp-architect/issues -2. Clone and/or update your checked out copy of nlp-architect to ensure you have the - most recent commits from the master branch: +2. Fork NLP Architect and/or update your checked out copy of + nlp-architect to ensure you have the + most recent commits from the master branch, for example: .. code-block:: bash @@ -48,8 +49,8 @@ Contribution Process .. code-block:: bash - nlp_architect test # ensure all are OK - nlp_architect style # ensure there are no style related issues + ./scripts/run_tests.sh # ensure all are OK + ./scripts/check_style.sh # ensure there are no style related issues 5. If necessary you may want to update and/or rebuild the documentation. This all exists under docs-source/source and is in @@ -57,8 +58,7 @@ Contribution Process .. code-block:: bash - cd scripts/ - sh create_docs.sh # builds the doc and starts a local server directly + ./scripts/create_docs.sh # builds the doc and starts a local server directly 6. Commit your changes and push your feature branch to your GitHub fork. Be sure to add a descriptive message and reference the GitHub issue associated @@ -71,23 +71,10 @@ Contribution Process git commit -m "Added new awesome functionality. Closes issue #1" git push origin my_new_feature_branch -7. Create a new pull request to get your feature branch merged into master for - others to use. You'll first need to ensure your feature branch contains the - latest changes from master. Furthermore, internal devs will need to assign - the request to someone else for a code review. You must also ensure there - are no errors when run through the items defined in step 4. - - .. code-block:: bash - - # (external contribs): make a new pull request: - https://github.com/NervanaSystems/nlp-architect/pulls - - # merge latest master changes into your feature branch - git fetch origin - git checkout master - git pull origin master - git checkout my_new_feature_branch - git merge master # you may need to manually resolve any merge conflicts +7. Create a new `pull request `_ + to get your feature branch merged into master for others to use. + You'll first need to ensure your feature branch contains the latest changes from + master. 8. If there are issues you can continue to push commits to your feature branch by following step 6. They will automatically be added to this same merge diff --git a/docs-source/source/api.rst b/docs-source/source/api.rst deleted file mode 100644 index 80581c1e..00000000 --- a/docs-source/source/api.rst +++ /dev/null @@ -1,138 +0,0 @@ -.. --------------------------------------------------------------------------- -.. Copyright 2017-2018 Intel Corporation -.. -.. Licensed under the Apache License, Version 2.0 (the "License"); -.. you may not use this file except in compliance with the License. -.. You may obtain a copy of the License at -.. -.. http://www.apache.org/licenses/LICENSE-2.0 -.. -.. Unless required by applicable law or agreed to in writing, software -.. distributed under the License is distributed on an "AS IS" BASIS, -.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -.. See the License for the specific language governing permissions and -.. limitations under the License. -.. --------------------------------------------------------------------------- - -API -### - -This API documentation covers each model within NLP Architect. Most modules have a -corresponding user guide section that introduces the main concepts. See this -API for specific function definitions. - -.. .. csv-table:: -.. :header: "Module API", "Description" -.. :widths: 20, 40 -.. :delim: | -.. -.. :py:mod:`nlp_architect.models` | Model architecture -.. :py:mod:`nlp_architect.layers` | Model layers -.. :py:mod:`nlp_architect.data` | Data loading and handling - -``nlp_architect.models`` ------------------------- -.. currentmodule:: nlp_architect.models - -Model classes stores a list of layers describing the model. Methods are provided -to train the model weights, perform inference, and save/load the model. - -.. autosummary:: - :toctree: generated/ - :nosignatures: - - nlp_architect.models.bist_parser.BISTModel - nlp_architect.models.chunker.SequenceChunker - nlp_architect.models.intent_extraction.Seq2SeqIntentModel - nlp_architect.models.intent_extraction.MultiTaskIntentModel - nlp_architect.models.matchlstm_ansptr.MatchLSTMAnswerPointer - nlp_architect.models.memn2n_dialogue.MemN2N_Dialog - nlp_architect.models.most_common_word_sense.MostCommonWordSense - nlp_architect.models.ner_crf.NERCRF - nlp_architect.models.np2vec.NP2vec - nlp_architect.models.np_semantic_segmentation.NpSemanticSegClassifier - nlp_architect.models.temporal_convolutional_network.TCN - nlp_architect.models.crossling_emb.WordTranslator - nlp_architect.models.cross_doc_sieves - nlp_architect.models.cross_doc_coref.sieves_config.EventSievesConfiguration - nlp_architect.models.cross_doc_coref.sieves_config.EntitySievesConfiguration - nlp_architect.models.cross_doc_coref.sieves_resource.SievesResources - nlp_architect.models.gnmt_model.GNMTModel - - -``nlp_architect.data`` ----------------------- -.. currentmodule:: nlp_architect.data - -Dataset implementations and data loaders (check deep learning framework compatibility of dataset/loader in documentation) - -.. autosummary:: - :toctree: generated/ - :nosignatures: - - nlp_architect.data.amazon_reviews.Amazon_Reviews - nlp_architect.data.babi_dialog.BABI_Dialog - nlp_architect.data.conll.ConllEntry - nlp_architect.data.intent_datasets.IntentDataset - nlp_architect.data.intent_datasets.TabularIntentDataset - nlp_architect.data.intent_datasets.SNIPS - nlp_architect.data.ptb.PTBDataLoader - nlp_architect.data.sequential_tagging.CONLL2000 - nlp_architect.data.sequential_tagging.SequentialTaggingDataset - nlp_architect.data.fasttext_emb.FastTextEmb - nlp_architect.data.cdc_resources.relations.computed_relation_extraction.ComputedRelationExtraction - nlp_architect.data.cdc_resources.relations.referent_dict_relation_extraction.ReferentDictRelationExtraction - nlp_architect.data.cdc_resources.relations.verbocean_relation_extraction.VerboceanRelationExtraction - nlp_architect.data.cdc_resources.relations.wikipedia_relation_extraction.WikipediaRelationExtraction - nlp_architect.data.cdc_resources.relations.within_doc_coref_extraction.WithinDocCoref - nlp_architect.data.cdc_resources.relations.word_embedding_relation_extraction.WordEmbeddingRelationExtraction - nlp_architect.data.cdc_resources.relations.wordnet_relation_extraction.WordnetRelationExtraction - nlp_architect.data.cdc_resources.relations.relation_types_enums.RelationType - - -``nlp_architect.pipelines`` ---------------------------- -.. currentmodule:: nlp_architect.pipelines - -NLP pipelines modules using NLP Architect models - -.. autosummary:: - :toctree: generated/ - :nosignatures: - - nlp_architect.pipelines.spacy_bist.SpacyBISTParser - nlp_architect.pipelines.spacy_np_annotator.NPAnnotator - nlp_architect.pipelines.spacy_np_annotator.SpacyNPAnnotator - - -``nlp_architect.nn`` -------------------------- -.. currentmodule:: nlp_architect.nn - -In addition to imported layers, the library also contains its own set of network layers and additions. -These are currently stored in the various models or related to which DL frameworks it was based on. - -.. autosummary:: - :toctree: generated/ - :nosignatures: - - nlp_architect.nn.tensorflow.python.keras.layers.crf.CRF - nlp_architect.nn.tensorflow.python.keras.utils.layer_utils.save_model - nlp_architect.nn.tensorflow.python.keras.utils.layer_utils.load_model - nlp_architect.nn.tensorflow.python.keras.callbacks.ConllCallback - - -``nlp_architect.common`` ------------------------- -.. currentmodule:: nlp_architect.common - -Common types of data structures used by NLP models - -.. autosummary:: - :toctree: generated/ - :nosignatures: - - nlp_architect.common.core_nlp_doc.CoreNLPDoc - nlp_architect.common.high_level_doc.HighLevelDoc - nlp_architect.common.cdc.mention_data.MentionDataLight - nlp_architect.common.cdc.mention_data.MentionData diff --git a/docs-source/source/archived/additional.rst b/docs-source/source/archived/additional.rst new file mode 100644 index 00000000..b9eb03bd --- /dev/null +++ b/docs-source/source/archived/additional.rst @@ -0,0 +1,30 @@ +.. --------------------------------------------------------------------------- +.. Copyright 2017-2018 Intel Corporation +.. +.. Licensed under the Apache License, Version 2.0 (the "License"); +.. you may not use this file except in compliance with the License. +.. You may obtain a copy of the License at +.. +.. http://www.apache.org/licenses/LICENSE-2.0 +.. +.. Unless required by applicable law or agreed to in writing, software +.. distributed under the License is distributed on an "AS IS" BASIS, +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +.. See the License for the specific language governing permissions and +.. limitations under the License. +.. --------------------------------------------------------------------------- + +================= +Additional Models +================= + + +.. include:: crosslingual_emb.rst + +---- + +.. include:: memn2n.rst + +---- + +.. include:: reading_comprehension.rst diff --git a/docs-source/source/crosslingual_emb.rst b/docs-source/source/archived/crosslingual_emb.rst similarity index 98% rename from docs-source/source/crosslingual_emb.rst rename to docs-source/source/archived/crosslingual_emb.rst index 6a5367e7..dda1e822 100755 --- a/docs-source/source/crosslingual_emb.rst +++ b/docs-source/source/archived/crosslingual_emb.rst @@ -5,7 +5,7 @@ Overview ======== This model uses a GAN to learn mapping between two language embeddings without supervision as demonstrated in Word Translation Without Parallel Data [1]_. -.. image:: assets/w2w.png +.. image:: ../assets/w2w.png Files diff --git a/docs-source/source/memn2n.rst b/docs-source/source/archived/memn2n.rst similarity index 100% rename from docs-source/source/memn2n.rst rename to docs-source/source/archived/memn2n.rst diff --git a/docs-source/source/reading_comprehension.rst b/docs-source/source/archived/reading_comprehension.rst similarity index 98% rename from docs-source/source/reading_comprehension.rst rename to docs-source/source/archived/reading_comprehension.rst index 1e8bcdd0..d97c772a 100755 --- a/docs-source/source/reading_comprehension.rst +++ b/docs-source/source/archived/reading_comprehension.rst @@ -27,7 +27,7 @@ input to the pointer network which identifies the start and end indices of the a Model Architecture ------------------ -.. image:: ../../examples/reading_comprehension/MatchLSTM_Model.png +.. image:: ../../../examples/reading_comprehension/MatchLSTM_Model.png Files diff --git a/docs-source/source/assets/cnn-lstm-fig.png b/docs-source/source/assets/cnn-lstm-fig.png new file mode 100644 index 00000000..d1a82ed1 Binary files /dev/null and b/docs-source/source/assets/cnn-lstm-fig.png differ diff --git a/docs-source/source/assets/idcnn-fig.png b/docs-source/source/assets/idcnn-fig.png new file mode 100644 index 00000000..47d2e77f Binary files /dev/null and b/docs-source/source/assets/idcnn-fig.png differ diff --git a/docs-source/source/assets/logo.png b/docs-source/source/assets/logo.png new file mode 100644 index 00000000..60b4f407 Binary files /dev/null and b/docs-source/source/assets/logo.png differ diff --git a/docs-source/source/conf.py b/docs-source/source/conf.py index e7575a58..f0f66fba 100644 --- a/docs-source/source/conf.py +++ b/docs-source/source/conf.py @@ -18,10 +18,16 @@ import os import sys +import sphinx_rtd_theme # noqa: E402 +from sphinx.ext import apidoc + +from nlp_architect.version import NLP_ARCHITECT_VERSION + +# -- Options for HTML output ---------------------------------------------- # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. -from nlp_architect.version import NLP_ARCHITECT_VERSION + sys.path.insert(0, os.path.abspath('../..')) @@ -39,16 +45,16 @@ 'sphinx.ext.autosummary', 'sphinx.ext.napoleon', 'sphinx.ext.doctest', - 'sphinx.ext.intersphinx', - 'sphinx.ext.todo', - 'sphinx.ext.coverage', + # 'sphinx.ext.intersphinx', + # 'sphinx.ext.todo', + # 'sphinx.ext.coverage', 'sphinx.ext.mathjax', - 'sphinx.ext.ifconfig', + # 'sphinx.ext.ifconfig', 'sphinx.ext.viewcode', + 'sphinx_rtd_theme', ] - # Autodoc settings -autodoc_default_flags = ['members', 'undoc-members', 'inherited-members'] +# autodoc_default_flags = ['members', 'undoc-members', 'inherited-members'] # Autosummary settings autosummary_generate = True @@ -122,7 +128,7 @@ # show_authors = False # The name of the Pygments (syntax highlighting) style to use. -pygments_style = 'tango' +pygments_style = 'default' # A list of ignored prefixes for module index sorting. # modindex_common_prefix = [] @@ -131,14 +137,12 @@ # keep_warnings = False -# -- Options for HTML output ---------------------------------------------- -import sphinx_rtd_theme # noqa: E402 # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. # html_theme = 'sphinx_rtd_theme' -html_theme_path = [sphinx_rtd_theme.get_html_theme_path()] +# html_theme_path = [sphinx_rtd_theme.get_html_theme_path()] html_theme_options = { # 'canonical_url': '', # 'analytics_id': '', @@ -165,7 +169,7 @@ # The name of an image file (relative to this directory) to place at the top # of the sidebar. -html_logo = '../../assets/nlp_architect_logo_white.png' +html_logo = 'assets/logo.png' # The name of an image file (within the static path) to use as favicon of the # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 @@ -176,10 +180,17 @@ # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ['static/'] +html_css_files = [ + 'nlp_arch_theme.css', + # 'https://fonts.googleapis.com/css?family=Lato', + # 'https://fonts.googleapis.com/css?family=Oswald', + 'https://fonts.googleapis.com/css?family=Roboto+Mono', + 'https://fonts.googleapis.com/css?family=Open+Sans:100,900' +] -def setup(app): - app.add_stylesheet('https://fonts.googleapis.com/css?family=Lato') - app.add_stylesheet('theme.css') +html_js_files = [ + 'install.js' +] # Add any extra paths that contain custom files (such as robots.txt or # .htaccess) here, relative to this directory. These files are copied @@ -277,3 +288,13 @@ def setup(app): .. |Geon| replace:: Nervana Graph .. |TF| replace:: TensorFlow\ |trade| """ +def run_apidoc(_): + api_docs = os.path.join(os.path.abspath("./source/"), "generated_api") + argv = ["-f", "-o", api_docs, os.path.abspath("../nlp_architect/")] + + apidoc.main(argv) + os.remove(os.path.join(api_docs, "modules.rst")) + os.remove(os.path.join(api_docs, "nlp_architect.rst")) + +def setup(app): + app.connect("builder-inited", run_apidoc) diff --git a/docs-source/source/developer_guide.rst b/docs-source/source/developer_guide.rst index 87d6ec5e..dcfca8d4 100644 --- a/docs-source/source/developer_guide.rst +++ b/docs-source/source/developer_guide.rst @@ -14,15 +14,18 @@ .. limitations under the License. .. --------------------------------------------------------------------------- +=============== Developer Guide -############### +=============== -The following sections describe how to set up a development environment, how to contribute your code and what are our contribution standards. +The following sections describe how to set up a development environment, +how to contribute your code and what are our contribution standards. Prepare Environment =================== -Install NLP Architect from source (Github) and install supplemental development packages: +Install NLP Architect from source (Github) and install supplemental +development packages: .. code:: bash @@ -58,11 +61,19 @@ Documentation Conventions * Limit your docs to 2-3 levels of headings. -* New .rst files will show up in the sidebar, and any first and second level headings will also show up in the menu. Keep the sidebar short and only add essentials items there. Otherwise, add your documentation to the pre-existing files. You can add to the toctree manually, but please don't add or create pages unless absolutely necessary! +* New .rst files will show up in the sidebar, and any first and second level + headings will also show up in the menu. Keep the sidebar short and only + add essentials items there. Otherwise, add your documentation to the + pre-existing files. You can add to the toctree manually, but please don't + add or create pages unless absolutely necessary! -* If you created a new class, add it to the API by creating a new section in api.rst and create an autosummary_. Anytime you add an autosummary, please remember to add :nosignatures: to keep things consistent with the rest of our docs. +* If you created a new class, add it to the API by creating a new section in + api.rst and create an autosummary_. Anytime you add an autosummary, please + remember to add :nosignatures: to keep things consistent with the rest of + our docs. -* Every time you make a significant contribution, add a short description of it in the relevant document. +* Every time you make a significant contribution, add a short description + of it in the relevant document. .. include:: writing_tests.rst diff --git a/docs-source/source/generated_api/nlp_architect.api.rst b/docs-source/source/generated_api/nlp_architect.api.rst new file mode 100644 index 00000000..5bcc03ea --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.api.rst @@ -0,0 +1,62 @@ +nlp\_architect.api package +========================== + +Submodules +---------- + +nlp\_architect.api.abstract\_api module +--------------------------------------- + +.. automodule:: nlp_architect.api.abstract_api + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.api.base module +------------------------------ + +.. automodule:: nlp_architect.api.base + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.api.bist\_parser\_api module +------------------------------------------- + +.. automodule:: nlp_architect.api.bist_parser_api + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.api.intent\_extraction\_api module +------------------------------------------------- + +.. automodule:: nlp_architect.api.intent_extraction_api + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.api.machine\_comprehension\_api module +----------------------------------------------------- + +.. automodule:: nlp_architect.api.machine_comprehension_api + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.api.ner\_api module +---------------------------------- + +.. automodule:: nlp_architect.api.ner_api + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.api + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.cli.rst b/docs-source/source/generated_api/nlp_architect.cli.rst new file mode 100644 index 00000000..894bca2a --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.cli.rst @@ -0,0 +1,30 @@ +nlp\_architect.cli package +========================== + +Submodules +---------- + +nlp\_architect.cli.cli\_commands module +--------------------------------------- + +.. automodule:: nlp_architect.cli.cli_commands + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.cli.cmd\_registry module +--------------------------------------- + +.. automodule:: nlp_architect.cli.cmd_registry + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.cli + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.common.cdc.rst b/docs-source/source/generated_api/nlp_architect.common.cdc.rst new file mode 100644 index 00000000..1d525718 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.common.cdc.rst @@ -0,0 +1,38 @@ +nlp\_architect.common.cdc package +================================= + +Submodules +---------- + +nlp\_architect.common.cdc.cluster module +---------------------------------------- + +.. automodule:: nlp_architect.common.cdc.cluster + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.common.cdc.mention\_data module +---------------------------------------------- + +.. automodule:: nlp_architect.common.cdc.mention_data + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.common.cdc.topics module +--------------------------------------- + +.. automodule:: nlp_architect.common.cdc.topics + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.common.cdc + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.common.rst b/docs-source/source/generated_api/nlp_architect.common.rst new file mode 100644 index 00000000..d844e061 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.common.rst @@ -0,0 +1,37 @@ +nlp\_architect.common package +============================= + +Subpackages +----------- + +.. toctree:: + + nlp_architect.common.cdc + +Submodules +---------- + +nlp\_architect.common.core\_nlp\_doc module +------------------------------------------- + +.. automodule:: nlp_architect.common.core_nlp_doc + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.common.high\_level\_doc module +--------------------------------------------- + +.. automodule:: nlp_architect.common.high_level_doc + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.common + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.data.cdc_resources.data_types.rst b/docs-source/source/generated_api/nlp_architect.data.cdc_resources.data_types.rst new file mode 100644 index 00000000..f2c185c5 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.data.cdc_resources.data_types.rst @@ -0,0 +1,18 @@ +nlp\_architect.data.cdc\_resources.data\_types package +====================================================== + +Subpackages +----------- + +.. toctree:: + + nlp_architect.data.cdc_resources.data_types.wiki + nlp_architect.data.cdc_resources.data_types.wn + +Module contents +--------------- + +.. automodule:: nlp_architect.data.cdc_resources.data_types + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.data.cdc_resources.data_types.wiki.rst b/docs-source/source/generated_api/nlp_architect.data.cdc_resources.data_types.wiki.rst new file mode 100644 index 00000000..870cf318 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.data.cdc_resources.data_types.wiki.rst @@ -0,0 +1,38 @@ +nlp\_architect.data.cdc\_resources.data\_types.wiki package +=========================================================== + +Submodules +---------- + +nlp\_architect.data.cdc\_resources.data\_types.wiki.wikipedia\_page module +-------------------------------------------------------------------------- + +.. automodule:: nlp_architect.data.cdc_resources.data_types.wiki.wikipedia_page + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.cdc\_resources.data\_types.wiki.wikipedia\_page\_extracted\_relations module +------------------------------------------------------------------------------------------------ + +.. automodule:: nlp_architect.data.cdc_resources.data_types.wiki.wikipedia_page_extracted_relations + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.cdc\_resources.data\_types.wiki.wikipedia\_pages module +--------------------------------------------------------------------------- + +.. automodule:: nlp_architect.data.cdc_resources.data_types.wiki.wikipedia_pages + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.data.cdc_resources.data_types.wiki + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.data.cdc_resources.data_types.wn.rst b/docs-source/source/generated_api/nlp_architect.data.cdc_resources.data_types.wn.rst new file mode 100644 index 00000000..178013d5 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.data.cdc_resources.data_types.wn.rst @@ -0,0 +1,22 @@ +nlp\_architect.data.cdc\_resources.data\_types.wn package +========================================================= + +Submodules +---------- + +nlp\_architect.data.cdc\_resources.data\_types.wn.wordnet\_page module +---------------------------------------------------------------------- + +.. automodule:: nlp_architect.data.cdc_resources.data_types.wn.wordnet_page + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.data.cdc_resources.data_types.wn + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.data.cdc_resources.embedding.rst b/docs-source/source/generated_api/nlp_architect.data.cdc_resources.embedding.rst new file mode 100644 index 00000000..8769f3c5 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.data.cdc_resources.embedding.rst @@ -0,0 +1,30 @@ +nlp\_architect.data.cdc\_resources.embedding package +==================================================== + +Submodules +---------- + +nlp\_architect.data.cdc\_resources.embedding.embed\_elmo module +--------------------------------------------------------------- + +.. automodule:: nlp_architect.data.cdc_resources.embedding.embed_elmo + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.cdc\_resources.embedding.embed\_glove module +---------------------------------------------------------------- + +.. automodule:: nlp_architect.data.cdc_resources.embedding.embed_glove + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.data.cdc_resources.embedding + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.data.cdc_resources.gen_scripts.rst b/docs-source/source/generated_api/nlp_architect.data.cdc_resources.gen_scripts.rst new file mode 100644 index 00000000..95c0f869 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.data.cdc_resources.gen_scripts.rst @@ -0,0 +1,62 @@ +nlp\_architect.data.cdc\_resources.gen\_scripts package +======================================================= + +Submodules +---------- + +nlp\_architect.data.cdc\_resources.gen\_scripts.create\_reference\_dict\_dump module +------------------------------------------------------------------------------------ + +.. automodule:: nlp_architect.data.cdc_resources.gen_scripts.create_reference_dict_dump + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.cdc\_resources.gen\_scripts.create\_verbocean\_dump module +------------------------------------------------------------------------------ + +.. automodule:: nlp_architect.data.cdc_resources.gen_scripts.create_verbocean_dump + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.cdc\_resources.gen\_scripts.create\_wiki\_dump module +------------------------------------------------------------------------- + +.. automodule:: nlp_architect.data.cdc_resources.gen_scripts.create_wiki_dump + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.cdc\_resources.gen\_scripts.create\_word\_embed\_elmo\_dump module +-------------------------------------------------------------------------------------- + +.. automodule:: nlp_architect.data.cdc_resources.gen_scripts.create_word_embed_elmo_dump + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.cdc\_resources.gen\_scripts.create\_word\_embed\_glove\_dump module +--------------------------------------------------------------------------------------- + +.. automodule:: nlp_architect.data.cdc_resources.gen_scripts.create_word_embed_glove_dump + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.cdc\_resources.gen\_scripts.create\_wordnet\_dump module +---------------------------------------------------------------------------- + +.. automodule:: nlp_architect.data.cdc_resources.gen_scripts.create_wordnet_dump + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.data.cdc_resources.gen_scripts + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.data.cdc_resources.relations.rst b/docs-source/source/generated_api/nlp_architect.data.cdc_resources.relations.rst new file mode 100644 index 00000000..18eff729 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.data.cdc_resources.relations.rst @@ -0,0 +1,86 @@ +nlp\_architect.data.cdc\_resources.relations package +==================================================== + +Submodules +---------- + +nlp\_architect.data.cdc\_resources.relations.computed\_relation\_extraction module +---------------------------------------------------------------------------------- + +.. automodule:: nlp_architect.data.cdc_resources.relations.computed_relation_extraction + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.cdc\_resources.relations.referent\_dict\_relation\_extraction module +---------------------------------------------------------------------------------------- + +.. automodule:: nlp_architect.data.cdc_resources.relations.referent_dict_relation_extraction + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.cdc\_resources.relations.relation\_extraction module +------------------------------------------------------------------------ + +.. automodule:: nlp_architect.data.cdc_resources.relations.relation_extraction + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.cdc\_resources.relations.relation\_types\_enums module +-------------------------------------------------------------------------- + +.. automodule:: nlp_architect.data.cdc_resources.relations.relation_types_enums + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.cdc\_resources.relations.verbocean\_relation\_extraction module +----------------------------------------------------------------------------------- + +.. automodule:: nlp_architect.data.cdc_resources.relations.verbocean_relation_extraction + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.cdc\_resources.relations.wikipedia\_relation\_extraction module +----------------------------------------------------------------------------------- + +.. automodule:: nlp_architect.data.cdc_resources.relations.wikipedia_relation_extraction + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.cdc\_resources.relations.within\_doc\_coref\_extraction module +---------------------------------------------------------------------------------- + +.. automodule:: nlp_architect.data.cdc_resources.relations.within_doc_coref_extraction + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.cdc\_resources.relations.word\_embedding\_relation\_extraction module +----------------------------------------------------------------------------------------- + +.. automodule:: nlp_architect.data.cdc_resources.relations.word_embedding_relation_extraction + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.cdc\_resources.relations.wordnet\_relation\_extraction module +--------------------------------------------------------------------------------- + +.. automodule:: nlp_architect.data.cdc_resources.relations.wordnet_relation_extraction + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.data.cdc_resources.relations + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.data.cdc_resources.rst b/docs-source/source/generated_api/nlp_architect.data.cdc_resources.rst new file mode 100644 index 00000000..c8e5e12f --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.data.cdc_resources.rst @@ -0,0 +1,22 @@ +nlp\_architect.data.cdc\_resources package +========================================== + +Subpackages +----------- + +.. toctree:: + + nlp_architect.data.cdc_resources.data_types + nlp_architect.data.cdc_resources.embedding + nlp_architect.data.cdc_resources.gen_scripts + nlp_architect.data.cdc_resources.relations + nlp_architect.data.cdc_resources.wikipedia + nlp_architect.data.cdc_resources.wordnet + +Module contents +--------------- + +.. automodule:: nlp_architect.data.cdc_resources + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.data.cdc_resources.wikipedia.rst b/docs-source/source/generated_api/nlp_architect.data.cdc_resources.wikipedia.rst new file mode 100644 index 00000000..fe7604bf --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.data.cdc_resources.wikipedia.rst @@ -0,0 +1,46 @@ +nlp\_architect.data.cdc\_resources.wikipedia package +==================================================== + +Submodules +---------- + +nlp\_architect.data.cdc\_resources.wikipedia.wiki\_elastic module +----------------------------------------------------------------- + +.. automodule:: nlp_architect.data.cdc_resources.wikipedia.wiki_elastic + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.cdc\_resources.wikipedia.wiki\_offline module +----------------------------------------------------------------- + +.. automodule:: nlp_architect.data.cdc_resources.wikipedia.wiki_offline + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.cdc\_resources.wikipedia.wiki\_online module +---------------------------------------------------------------- + +.. automodule:: nlp_architect.data.cdc_resources.wikipedia.wiki_online + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.cdc\_resources.wikipedia.wiki\_search\_page\_result module +------------------------------------------------------------------------------ + +.. automodule:: nlp_architect.data.cdc_resources.wikipedia.wiki_search_page_result + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.data.cdc_resources.wikipedia + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.data.cdc_resources.wordnet.rst b/docs-source/source/generated_api/nlp_architect.data.cdc_resources.wordnet.rst new file mode 100644 index 00000000..f1376d9d --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.data.cdc_resources.wordnet.rst @@ -0,0 +1,30 @@ +nlp\_architect.data.cdc\_resources.wordnet package +================================================== + +Submodules +---------- + +nlp\_architect.data.cdc\_resources.wordnet.wordnet\_offline module +------------------------------------------------------------------ + +.. automodule:: nlp_architect.data.cdc_resources.wordnet.wordnet_offline + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.cdc\_resources.wordnet.wordnet\_online module +----------------------------------------------------------------- + +.. automodule:: nlp_architect.data.cdc_resources.wordnet.wordnet_online + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.data.cdc_resources.wordnet + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.data.rst b/docs-source/source/generated_api/nlp_architect.data.rst new file mode 100644 index 00000000..6fa4c35c --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.data.rst @@ -0,0 +1,101 @@ +nlp\_architect.data package +=========================== + +Subpackages +----------- + +.. toctree:: + + nlp_architect.data.cdc_resources + +Submodules +---------- + +nlp\_architect.data.amazon\_reviews module +------------------------------------------ + +.. automodule:: nlp_architect.data.amazon_reviews + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.babi\_dialog module +--------------------------------------- + +.. automodule:: nlp_architect.data.babi_dialog + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.conll module +-------------------------------- + +.. automodule:: nlp_architect.data.conll + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.fasttext\_emb module +---------------------------------------- + +.. automodule:: nlp_architect.data.fasttext_emb + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.glue\_tasks module +-------------------------------------- + +.. automodule:: nlp_architect.data.glue_tasks + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.intent\_datasets module +------------------------------------------- + +.. automodule:: nlp_architect.data.intent_datasets + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.ptb module +------------------------------ + +.. automodule:: nlp_architect.data.ptb + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.sequence\_classification module +--------------------------------------------------- + +.. automodule:: nlp_architect.data.sequence_classification + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.sequential\_tagging module +---------------------------------------------- + +.. automodule:: nlp_architect.data.sequential_tagging + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.data.utils module +-------------------------------- + +.. automodule:: nlp_architect.data.utils + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.data + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.models.absa.inference.rst b/docs-source/source/generated_api/nlp_architect.models.absa.inference.rst new file mode 100644 index 00000000..19163af2 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.models.absa.inference.rst @@ -0,0 +1,30 @@ +nlp\_architect.models.absa.inference package +============================================ + +Submodules +---------- + +nlp\_architect.models.absa.inference.data\_types module +------------------------------------------------------- + +.. automodule:: nlp_architect.models.absa.inference.data_types + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.absa.inference.inference module +----------------------------------------------------- + +.. automodule:: nlp_architect.models.absa.inference.inference + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.models.absa.inference + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.models.absa.rst b/docs-source/source/generated_api/nlp_architect.models.absa.rst new file mode 100644 index 00000000..574d1c23 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.models.absa.rst @@ -0,0 +1,30 @@ +nlp\_architect.models.absa package +================================== + +Subpackages +----------- + +.. toctree:: + + nlp_architect.models.absa.inference + nlp_architect.models.absa.train + +Submodules +---------- + +nlp\_architect.models.absa.utils module +--------------------------------------- + +.. automodule:: nlp_architect.models.absa.utils + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.models.absa + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.models.absa.train.rst b/docs-source/source/generated_api/nlp_architect.models.absa.train.rst new file mode 100644 index 00000000..0c6a92f6 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.models.absa.train.rst @@ -0,0 +1,62 @@ +nlp\_architect.models.absa.train package +======================================== + +Submodules +---------- + +nlp\_architect.models.absa.train.acquire\_terms module +------------------------------------------------------ + +.. automodule:: nlp_architect.models.absa.train.acquire_terms + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.absa.train.data\_types module +--------------------------------------------------- + +.. automodule:: nlp_architect.models.absa.train.data_types + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.absa.train.generate\_lexicons module +---------------------------------------------------------- + +.. automodule:: nlp_architect.models.absa.train.generate_lexicons + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.absa.train.rerank\_terms module +----------------------------------------------------- + +.. automodule:: nlp_architect.models.absa.train.rerank_terms + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.absa.train.rules module +--------------------------------------------- + +.. automodule:: nlp_architect.models.absa.train.rules + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.absa.train.train module +--------------------------------------------- + +.. automodule:: nlp_architect.models.absa.train.train + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.models.absa.train + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.models.bist.eval.conllu.rst b/docs-source/source/generated_api/nlp_architect.models.bist.eval.conllu.rst new file mode 100644 index 00000000..38233155 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.models.bist.eval.conllu.rst @@ -0,0 +1,22 @@ +nlp\_architect.models.bist.eval.conllu package +============================================== + +Submodules +---------- + +nlp\_architect.models.bist.eval.conllu.conll17\_ud\_eval module +--------------------------------------------------------------- + +.. automodule:: nlp_architect.models.bist.eval.conllu.conll17_ud_eval + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.models.bist.eval.conllu + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.models.bist.eval.rst b/docs-source/source/generated_api/nlp_architect.models.bist.eval.rst new file mode 100644 index 00000000..88f2d155 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.models.bist.eval.rst @@ -0,0 +1,17 @@ +nlp\_architect.models.bist.eval package +======================================= + +Subpackages +----------- + +.. toctree:: + + nlp_architect.models.bist.eval.conllu + +Module contents +--------------- + +.. automodule:: nlp_architect.models.bist.eval + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.models.bist.rst b/docs-source/source/generated_api/nlp_architect.models.bist.rst new file mode 100644 index 00000000..c86e8d1e --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.models.bist.rst @@ -0,0 +1,45 @@ +nlp\_architect.models.bist package +================================== + +Subpackages +----------- + +.. toctree:: + + nlp_architect.models.bist.eval + +Submodules +---------- + +nlp\_architect.models.bist.decoder module +----------------------------------------- + +.. automodule:: nlp_architect.models.bist.decoder + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.bist.mstlstm module +----------------------------------------- + +.. automodule:: nlp_architect.models.bist.mstlstm + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.bist.utils module +--------------------------------------- + +.. automodule:: nlp_architect.models.bist.utils + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.models.bist + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.models.cross_doc_coref.rst b/docs-source/source/generated_api/nlp_architect.models.cross_doc_coref.rst new file mode 100644 index 00000000..5e3e21db --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.models.cross_doc_coref.rst @@ -0,0 +1,37 @@ +nlp\_architect.models.cross\_doc\_coref package +=============================================== + +Subpackages +----------- + +.. toctree:: + + nlp_architect.models.cross_doc_coref.system + +Submodules +---------- + +nlp\_architect.models.cross\_doc\_coref.sieves\_config module +------------------------------------------------------------- + +.. automodule:: nlp_architect.models.cross_doc_coref.sieves_config + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.cross\_doc\_coref.sieves\_resource module +--------------------------------------------------------------- + +.. automodule:: nlp_architect.models.cross_doc_coref.sieves_resource + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.models.cross_doc_coref + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.models.cross_doc_coref.system.rst b/docs-source/source/generated_api/nlp_architect.models.cross_doc_coref.system.rst new file mode 100644 index 00000000..02701f81 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.models.cross_doc_coref.system.rst @@ -0,0 +1,37 @@ +nlp\_architect.models.cross\_doc\_coref.system package +====================================================== + +Subpackages +----------- + +.. toctree:: + + nlp_architect.models.cross_doc_coref.system.sieves + +Submodules +---------- + +nlp\_architect.models.cross\_doc\_coref.system.cdc\_utils module +---------------------------------------------------------------- + +.. automodule:: nlp_architect.models.cross_doc_coref.system.cdc_utils + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.cross\_doc\_coref.system.sieves\_container\_init module +----------------------------------------------------------------------------- + +.. automodule:: nlp_architect.models.cross_doc_coref.system.sieves_container_init + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.models.cross_doc_coref.system + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.models.cross_doc_coref.system.sieves.rst b/docs-source/source/generated_api/nlp_architect.models.cross_doc_coref.system.sieves.rst new file mode 100644 index 00000000..0fcac38a --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.models.cross_doc_coref.system.sieves.rst @@ -0,0 +1,30 @@ +nlp\_architect.models.cross\_doc\_coref.system.sieves package +============================================================= + +Submodules +---------- + +nlp\_architect.models.cross\_doc\_coref.system.sieves.run\_sieve\_system module +------------------------------------------------------------------------------- + +.. automodule:: nlp_architect.models.cross_doc_coref.system.sieves.run_sieve_system + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.cross\_doc\_coref.system.sieves.sieves module +------------------------------------------------------------------- + +.. automodule:: nlp_architect.models.cross_doc_coref.system.sieves.sieves + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.models.cross_doc_coref.system.sieves + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.models.gnmt.rst b/docs-source/source/generated_api/nlp_architect.models.gnmt.rst new file mode 100644 index 00000000..5f7c93b5 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.models.gnmt.rst @@ -0,0 +1,46 @@ +nlp\_architect.models.gnmt package +================================== + +Subpackages +----------- + +.. toctree:: + + nlp_architect.models.gnmt.scripts + nlp_architect.models.gnmt.utils + +Submodules +---------- + +nlp\_architect.models.gnmt.attention\_model module +-------------------------------------------------- + +.. automodule:: nlp_architect.models.gnmt.attention_model + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.gnmt.model module +--------------------------------------- + +.. automodule:: nlp_architect.models.gnmt.model + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.gnmt.model\_helper module +----------------------------------------------- + +.. automodule:: nlp_architect.models.gnmt.model_helper + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.models.gnmt + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.models.gnmt.scripts.rst b/docs-source/source/generated_api/nlp_architect.models.gnmt.scripts.rst new file mode 100644 index 00000000..6a3787cc --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.models.gnmt.scripts.rst @@ -0,0 +1,30 @@ +nlp\_architect.models.gnmt.scripts package +========================================== + +Submodules +---------- + +nlp\_architect.models.gnmt.scripts.bleu module +---------------------------------------------- + +.. automodule:: nlp_architect.models.gnmt.scripts.bleu + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.gnmt.scripts.rouge module +----------------------------------------------- + +.. automodule:: nlp_architect.models.gnmt.scripts.rouge + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.models.gnmt.scripts + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.models.gnmt.utils.rst b/docs-source/source/generated_api/nlp_architect.models.gnmt.utils.rst new file mode 100644 index 00000000..b23b909b --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.models.gnmt.utils.rst @@ -0,0 +1,62 @@ +nlp\_architect.models.gnmt.utils package +======================================== + +Submodules +---------- + +nlp\_architect.models.gnmt.utils.evaluation\_utils module +--------------------------------------------------------- + +.. automodule:: nlp_architect.models.gnmt.utils.evaluation_utils + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.gnmt.utils.iterator\_utils module +------------------------------------------------------- + +.. automodule:: nlp_architect.models.gnmt.utils.iterator_utils + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.gnmt.utils.misc\_utils module +--------------------------------------------------- + +.. automodule:: nlp_architect.models.gnmt.utils.misc_utils + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.gnmt.utils.nmt\_utils module +-------------------------------------------------- + +.. automodule:: nlp_architect.models.gnmt.utils.nmt_utils + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.gnmt.utils.standard\_hparams\_utils module +---------------------------------------------------------------- + +.. automodule:: nlp_architect.models.gnmt.utils.standard_hparams_utils + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.gnmt.utils.vocab\_utils module +---------------------------------------------------- + +.. automodule:: nlp_architect.models.gnmt.utils.vocab_utils + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.models.gnmt.utils + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.models.rst b/docs-source/source/generated_api/nlp_architect.models.rst new file mode 100644 index 00000000..bc30e2fc --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.models.rst @@ -0,0 +1,145 @@ +nlp\_architect.models package +============================= + +Subpackages +----------- + +.. toctree:: + + nlp_architect.models.absa + nlp_architect.models.bist + nlp_architect.models.cross_doc_coref + nlp_architect.models.gnmt + nlp_architect.models.transformers + +Submodules +---------- + +nlp\_architect.models.bist\_parser module +----------------------------------------- + +.. automodule:: nlp_architect.models.bist_parser + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.chunker module +------------------------------------ + +.. automodule:: nlp_architect.models.chunker + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.cross\_doc\_sieves module +----------------------------------------------- + +.. automodule:: nlp_architect.models.cross_doc_sieves + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.crossling\_emb module +------------------------------------------- + +.. automodule:: nlp_architect.models.crossling_emb + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.gnmt\_model module +---------------------------------------- + +.. automodule:: nlp_architect.models.gnmt_model + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.intent\_extraction module +----------------------------------------------- + +.. automodule:: nlp_architect.models.intent_extraction + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.matchlstm\_ansptr module +---------------------------------------------- + +.. automodule:: nlp_architect.models.matchlstm_ansptr + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.memn2n\_dialogue module +--------------------------------------------- + +.. automodule:: nlp_architect.models.memn2n_dialogue + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.most\_common\_word\_sense module +------------------------------------------------------ + +.. automodule:: nlp_architect.models.most_common_word_sense + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.ner\_crf module +------------------------------------- + +.. automodule:: nlp_architect.models.ner_crf + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.np2vec module +----------------------------------- + +.. automodule:: nlp_architect.models.np2vec + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.np\_semantic\_segmentation module +------------------------------------------------------- + +.. automodule:: nlp_architect.models.np_semantic_segmentation + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.supervised\_sentiment module +-------------------------------------------------- + +.. automodule:: nlp_architect.models.supervised_sentiment + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.tagging module +------------------------------------ + +.. automodule:: nlp_architect.models.tagging + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.temporal\_convolutional\_network module +------------------------------------------------------------- + +.. automodule:: nlp_architect.models.temporal_convolutional_network + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.models + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.models.transformers.rst b/docs-source/source/generated_api/nlp_architect.models.transformers.rst new file mode 100644 index 00000000..195ca9e7 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.models.transformers.rst @@ -0,0 +1,46 @@ +nlp\_architect.models.transformers package +========================================== + +Submodules +---------- + +nlp\_architect.models.transformers.base\_model module +----------------------------------------------------- + +.. automodule:: nlp_architect.models.transformers.base_model + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.transformers.quantized\_bert module +--------------------------------------------------------- + +.. automodule:: nlp_architect.models.transformers.quantized_bert + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.transformers.sequence\_classification module +------------------------------------------------------------------ + +.. automodule:: nlp_architect.models.transformers.sequence_classification + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.models.transformers.token\_classification module +--------------------------------------------------------------- + +.. automodule:: nlp_architect.models.transformers.token_classification + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.models.transformers + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.nlp.rst b/docs-source/source/generated_api/nlp_architect.nlp.rst new file mode 100644 index 00000000..0f6b6be3 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.nlp.rst @@ -0,0 +1,10 @@ +nlp\_architect.nlp package +========================== + +Module contents +--------------- + +.. automodule:: nlp_architect.nlp + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.nn.rst b/docs-source/source/generated_api/nlp_architect.nn.rst new file mode 100644 index 00000000..f745db92 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.nn.rst @@ -0,0 +1,18 @@ +nlp\_architect.nn package +========================= + +Subpackages +----------- + +.. toctree:: + + nlp_architect.nn.tensorflow + nlp_architect.nn.torch + +Module contents +--------------- + +.. automodule:: nlp_architect.nn + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.nn.tensorflow.python.keras.layers.rst b/docs-source/source/generated_api/nlp_architect.nn.tensorflow.python.keras.layers.rst new file mode 100644 index 00000000..8933c35c --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.nn.tensorflow.python.keras.layers.rst @@ -0,0 +1,22 @@ +nlp\_architect.nn.tensorflow.python.keras.layers package +======================================================== + +Submodules +---------- + +nlp\_architect.nn.tensorflow.python.keras.layers.crf module +----------------------------------------------------------- + +.. automodule:: nlp_architect.nn.tensorflow.python.keras.layers.crf + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.nn.tensorflow.python.keras.layers + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.nn.tensorflow.python.keras.rst b/docs-source/source/generated_api/nlp_architect.nn.tensorflow.python.keras.rst new file mode 100644 index 00000000..4c64094d --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.nn.tensorflow.python.keras.rst @@ -0,0 +1,30 @@ +nlp\_architect.nn.tensorflow.python.keras package +================================================= + +Subpackages +----------- + +.. toctree:: + + nlp_architect.nn.tensorflow.python.keras.layers + nlp_architect.nn.tensorflow.python.keras.utils + +Submodules +---------- + +nlp\_architect.nn.tensorflow.python.keras.callbacks module +---------------------------------------------------------- + +.. automodule:: nlp_architect.nn.tensorflow.python.keras.callbacks + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.nn.tensorflow.python.keras + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.nn.tensorflow.python.keras.utils.rst b/docs-source/source/generated_api/nlp_architect.nn.tensorflow.python.keras.utils.rst new file mode 100644 index 00000000..accdae8e --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.nn.tensorflow.python.keras.utils.rst @@ -0,0 +1,22 @@ +nlp\_architect.nn.tensorflow.python.keras.utils package +======================================================= + +Submodules +---------- + +nlp\_architect.nn.tensorflow.python.keras.utils.layer\_utils module +------------------------------------------------------------------- + +.. automodule:: nlp_architect.nn.tensorflow.python.keras.utils.layer_utils + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.nn.tensorflow.python.keras.utils + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.nn.tensorflow.python.rst b/docs-source/source/generated_api/nlp_architect.nn.tensorflow.python.rst new file mode 100644 index 00000000..5f868bac --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.nn.tensorflow.python.rst @@ -0,0 +1,17 @@ +nlp\_architect.nn.tensorflow.python package +=========================================== + +Subpackages +----------- + +.. toctree:: + + nlp_architect.nn.tensorflow.python.keras + +Module contents +--------------- + +.. automodule:: nlp_architect.nn.tensorflow.python + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.nn.tensorflow.rst b/docs-source/source/generated_api/nlp_architect.nn.tensorflow.rst new file mode 100644 index 00000000..3b1921c3 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.nn.tensorflow.rst @@ -0,0 +1,17 @@ +nlp\_architect.nn.tensorflow package +==================================== + +Subpackages +----------- + +.. toctree:: + + nlp_architect.nn.tensorflow.python + +Module contents +--------------- + +.. automodule:: nlp_architect.nn.tensorflow + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.nn.torch.data.rst b/docs-source/source/generated_api/nlp_architect.nn.torch.data.rst new file mode 100644 index 00000000..29e6a077 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.nn.torch.data.rst @@ -0,0 +1,22 @@ +nlp\_architect.nn.torch.data package +==================================== + +Submodules +---------- + +nlp\_architect.nn.torch.data.dataset module +------------------------------------------- + +.. automodule:: nlp_architect.nn.torch.data.dataset + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.nn.torch.data + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.nn.torch.layers.rst b/docs-source/source/generated_api/nlp_architect.nn.torch.layers.rst new file mode 100644 index 00000000..302c64dc --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.nn.torch.layers.rst @@ -0,0 +1,22 @@ +nlp\_architect.nn.torch.layers package +====================================== + +Submodules +---------- + +nlp\_architect.nn.torch.layers.crf module +----------------------------------------- + +.. automodule:: nlp_architect.nn.torch.layers.crf + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.nn.torch.layers + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.nn.torch.modules.rst b/docs-source/source/generated_api/nlp_architect.nn.torch.modules.rst new file mode 100644 index 00000000..31702a61 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.nn.torch.modules.rst @@ -0,0 +1,22 @@ +nlp\_architect.nn.torch.modules package +======================================= + +Submodules +---------- + +nlp\_architect.nn.torch.modules.embedders module +------------------------------------------------ + +.. automodule:: nlp_architect.nn.torch.modules.embedders + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.nn.torch.modules + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.nn.torch.rst b/docs-source/source/generated_api/nlp_architect.nn.torch.rst new file mode 100644 index 00000000..e7bd6d02 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.nn.torch.rst @@ -0,0 +1,39 @@ +nlp\_architect.nn.torch package +=============================== + +Subpackages +----------- + +.. toctree:: + + nlp_architect.nn.torch.data + nlp_architect.nn.torch.layers + nlp_architect.nn.torch.modules + +Submodules +---------- + +nlp\_architect.nn.torch.distillation module +------------------------------------------- + +.. automodule:: nlp_architect.nn.torch.distillation + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.nn.torch.quantization module +------------------------------------------- + +.. automodule:: nlp_architect.nn.torch.quantization + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.nn.torch + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.pipelines.rst b/docs-source/source/generated_api/nlp_architect.pipelines.rst new file mode 100644 index 00000000..ff11f869 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.pipelines.rst @@ -0,0 +1,30 @@ +nlp\_architect.pipelines package +================================ + +Submodules +---------- + +nlp\_architect.pipelines.spacy\_bist module +------------------------------------------- + +.. automodule:: nlp_architect.pipelines.spacy_bist + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.pipelines.spacy\_np\_annotator module +---------------------------------------------------- + +.. automodule:: nlp_architect.pipelines.spacy_np_annotator + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.pipelines + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.procedures.rst b/docs-source/source/generated_api/nlp_architect.procedures.rst new file mode 100644 index 00000000..433ee8f0 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.procedures.rst @@ -0,0 +1,45 @@ +nlp\_architect.procedures package +================================= + +Subpackages +----------- + +.. toctree:: + + nlp_architect.procedures.transformers + +Submodules +---------- + +nlp\_architect.procedures.procedure module +------------------------------------------ + +.. automodule:: nlp_architect.procedures.procedure + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.procedures.registry module +----------------------------------------- + +.. automodule:: nlp_architect.procedures.registry + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.procedures.token\_tagging module +----------------------------------------------- + +.. automodule:: nlp_architect.procedures.token_tagging + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.procedures + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.procedures.transformers.rst b/docs-source/source/generated_api/nlp_architect.procedures.transformers.rst new file mode 100644 index 00000000..39ae7dbd --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.procedures.transformers.rst @@ -0,0 +1,38 @@ +nlp\_architect.procedures.transformers package +============================================== + +Submodules +---------- + +nlp\_architect.procedures.transformers.base module +-------------------------------------------------- + +.. automodule:: nlp_architect.procedures.transformers.base + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.procedures.transformers.glue module +-------------------------------------------------- + +.. automodule:: nlp_architect.procedures.transformers.glue + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.procedures.transformers.seq\_tag module +------------------------------------------------------ + +.. automodule:: nlp_architect.procedures.transformers.seq_tag + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.procedures.transformers + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.server.angular-ui.rst b/docs-source/source/generated_api/nlp_architect.server.angular-ui.rst new file mode 100644 index 00000000..91608085 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.server.angular-ui.rst @@ -0,0 +1,10 @@ +nlp\_architect.server.angular\-ui package +========================================= + +Module contents +--------------- + +.. automodule:: nlp_architect.server.angular-ui + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.server.rst b/docs-source/source/generated_api/nlp_architect.server.rst new file mode 100644 index 00000000..ceee75a0 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.server.rst @@ -0,0 +1,37 @@ +nlp\_architect.server package +============================= + +Subpackages +----------- + +.. toctree:: + + nlp_architect.server.angular-ui + +Submodules +---------- + +nlp\_architect.server.serve module +---------------------------------- + +.. automodule:: nlp_architect.server.serve + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.server.service module +------------------------------------ + +.. automodule:: nlp_architect.server.service + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.server + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.solutions.absa_solution.rst b/docs-source/source/generated_api/nlp_architect.solutions.absa_solution.rst new file mode 100644 index 00000000..70e800e8 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.solutions.absa_solution.rst @@ -0,0 +1,38 @@ +nlp\_architect.solutions.absa\_solution package +=============================================== + +Submodules +---------- + +nlp\_architect.solutions.absa\_solution.sentiment\_solution module +------------------------------------------------------------------ + +.. automodule:: nlp_architect.solutions.absa_solution.sentiment_solution + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.solutions.absa\_solution.ui module +------------------------------------------------- + +.. automodule:: nlp_architect.solutions.absa_solution.ui + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.solutions.absa\_solution.utils module +---------------------------------------------------- + +.. automodule:: nlp_architect.solutions.absa_solution.utils + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.solutions.absa_solution + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.solutions.rst b/docs-source/source/generated_api/nlp_architect.solutions.rst new file mode 100644 index 00000000..aa23b440 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.solutions.rst @@ -0,0 +1,31 @@ +nlp\_architect.solutions package +================================ + +Subpackages +----------- + +.. toctree:: + + nlp_architect.solutions.absa_solution + nlp_architect.solutions.set_expansion + nlp_architect.solutions.trend_analysis + +Submodules +---------- + +nlp\_architect.solutions.start\_ui module +----------------------------------------- + +.. automodule:: nlp_architect.solutions.start_ui + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.solutions + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.solutions.set_expansion.rst b/docs-source/source/generated_api/nlp_architect.solutions.set_expansion.rst new file mode 100644 index 00000000..4bed0a22 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.solutions.set_expansion.rst @@ -0,0 +1,45 @@ +nlp\_architect.solutions.set\_expansion package +=============================================== + +Subpackages +----------- + +.. toctree:: + + nlp_architect.solutions.set_expansion.ui + +Submodules +---------- + +nlp\_architect.solutions.set\_expansion.expand\_server module +------------------------------------------------------------- + +.. automodule:: nlp_architect.solutions.set_expansion.expand_server + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.solutions.set\_expansion.prepare\_data module +------------------------------------------------------------ + +.. automodule:: nlp_architect.solutions.set_expansion.prepare_data + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.solutions.set\_expansion.set\_expand module +---------------------------------------------------------- + +.. automodule:: nlp_architect.solutions.set_expansion.set_expand + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.solutions.set_expansion + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.solutions.set_expansion.ui.rst b/docs-source/source/generated_api/nlp_architect.solutions.set_expansion.ui.rst new file mode 100644 index 00000000..569776bb --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.solutions.set_expansion.ui.rst @@ -0,0 +1,30 @@ +nlp\_architect.solutions.set\_expansion.ui package +================================================== + +Submodules +---------- + +nlp\_architect.solutions.set\_expansion.ui.main module +------------------------------------------------------ + +.. automodule:: nlp_architect.solutions.set_expansion.ui.main + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.solutions.set\_expansion.ui.settings module +---------------------------------------------------------- + +.. automodule:: nlp_architect.solutions.set_expansion.ui.settings + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.solutions.set_expansion.ui + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.solutions.trend_analysis.rst b/docs-source/source/generated_api/nlp_architect.solutions.trend_analysis.rst new file mode 100644 index 00000000..e920a0b1 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.solutions.trend_analysis.rst @@ -0,0 +1,53 @@ +nlp\_architect.solutions.trend\_analysis package +================================================ + +Subpackages +----------- + +.. toctree:: + + nlp_architect.solutions.trend_analysis.ui + +Submodules +---------- + +nlp\_architect.solutions.trend\_analysis.np\_scorer module +---------------------------------------------------------- + +.. automodule:: nlp_architect.solutions.trend_analysis.np_scorer + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.solutions.trend\_analysis.scoring\_utils module +-------------------------------------------------------------- + +.. automodule:: nlp_architect.solutions.trend_analysis.scoring_utils + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.solutions.trend\_analysis.topic\_extraction module +----------------------------------------------------------------- + +.. automodule:: nlp_architect.solutions.trend_analysis.topic_extraction + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.solutions.trend\_analysis.trend\_analysis module +--------------------------------------------------------------- + +.. automodule:: nlp_architect.solutions.trend_analysis.trend_analysis + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.solutions.trend_analysis + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.solutions.trend_analysis.ui.rst b/docs-source/source/generated_api/nlp_architect.solutions.trend_analysis.ui.rst new file mode 100644 index 00000000..cd79a479 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.solutions.trend_analysis.ui.rst @@ -0,0 +1,22 @@ +nlp\_architect.solutions.trend\_analysis.ui package +=================================================== + +Submodules +---------- + +nlp\_architect.solutions.trend\_analysis.ui.main module +------------------------------------------------------- + +.. automodule:: nlp_architect.solutions.trend_analysis.ui.main + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.solutions.trend_analysis.ui + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.utils.resources.rst b/docs-source/source/generated_api/nlp_architect.utils.resources.rst new file mode 100644 index 00000000..e3718b5a --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.utils.resources.rst @@ -0,0 +1,10 @@ +nlp\_architect.utils.resources package +====================================== + +Module contents +--------------- + +.. automodule:: nlp_architect.utils.resources + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect.utils.rst b/docs-source/source/generated_api/nlp_architect.utils.rst new file mode 100644 index 00000000..8b505bdc --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect.utils.rst @@ -0,0 +1,101 @@ +nlp\_architect.utils package +============================ + +Subpackages +----------- + +.. toctree:: + + nlp_architect.utils.resources + +Submodules +---------- + +nlp\_architect.utils.ansi2html module +------------------------------------- + +.. automodule:: nlp_architect.utils.ansi2html + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.utils.embedding module +------------------------------------- + +.. automodule:: nlp_architect.utils.embedding + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.utils.ensembler module +------------------------------------- + +.. automodule:: nlp_architect.utils.ensembler + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.utils.generic module +----------------------------------- + +.. automodule:: nlp_architect.utils.generic + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.utils.io module +------------------------------ + +.. automodule:: nlp_architect.utils.io + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.utils.metrics module +----------------------------------- + +.. automodule:: nlp_architect.utils.metrics + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.utils.mrc\_utils module +-------------------------------------- + +.. automodule:: nlp_architect.utils.mrc_utils + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.utils.string\_utils module +----------------------------------------- + +.. automodule:: nlp_architect.utils.string_utils + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.utils.testing module +----------------------------------- + +.. automodule:: nlp_architect.utils.testing + :members: + :undoc-members: + :show-inheritance: + +nlp\_architect.utils.text module +-------------------------------- + +.. automodule:: nlp_architect.utils.text + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: nlp_architect.utils + :members: + :undoc-members: + :show-inheritance: diff --git a/docs-source/source/generated_api/nlp_architect_api_index.rst b/docs-source/source/generated_api/nlp_architect_api_index.rst new file mode 100644 index 00000000..5111f2a0 --- /dev/null +++ b/docs-source/source/generated_api/nlp_architect_api_index.rst @@ -0,0 +1,17 @@ +``nlp\_architect`` package +========================== + +.. toctree:: + + nlp_architect.api + nlp_architect.cli + nlp_architect.common + nlp_architect.data + nlp_architect.models + nlp_architect.nlp + nlp_architect.nn + nlp_architect.pipelines + nlp_architect.procedures + nlp_architect.server + nlp_architect.solutions + nlp_architect.utils diff --git a/docs-source/source/index.rst b/docs-source/source/index.rst index 5dc71b21..2e9ea0c8 100755 --- a/docs-source/source/index.rst +++ b/docs-source/source/index.rst @@ -21,12 +21,10 @@ :hidden: :maxdepth: 1 - Home + quick_start.rst installation.rst publications.rst Jupyter Tutorials - developer_guide.rst - REST Server Model Zoo .. toctree:: @@ -34,23 +32,23 @@ :maxdepth: 1 :caption: NLP/NLU Models - Aspect Based Sentiment Analysis - chunker.rst - ner_crf.rst - intent.rst - np_segmentation.rst - bist_parser.rst - word_sense.rst - np2vec.rst - supervised_sentiment.rst - reading_comprehension.rst - memn2n.rst - TCN Language Model - Unsupervised Crosslingual Embeddings - Cross Document Co-Reference - Semantic Relation Identification - Sparse Neural Machine Translation + Sequence Tagging + Sentiment Analysis + Dependency Parsing + Intent Extraction + Language Models + Information Extraction + Transformers + Additional Models + +.. toctree:: + :hidden: + :maxdepth: 1 + :caption: Optimized Models + Quantized BERT + Transformers Distillation + Sparse Neural Machine Translation .. toctree:: :hidden: @@ -61,19 +59,21 @@ Set Expansion Trend Analysis -.. toctree:: - :hidden: - :maxdepth: 1 - :caption: Pipelines +.. .. toctree:: +.. :hidden: +.. :maxdepth: 1 +.. :caption: Pipelines - spacy_bist.rst - spacy_np_annotator.rst +.. spacy_bist.rst +.. spacy_np_annotator.rst .. toctree:: :hidden: :maxdepth: 1 :caption: For Developers - api.rst + nlp_architect API + REST Server + developer_guide.rst .. _https://github.com/NervanaSystems/nlp-architect: https://github.com/NervanaSystems/nlp-architect diff --git a/docs-source/source/information_extraction.rst b/docs-source/source/information_extraction.rst new file mode 100644 index 00000000..deb30d08 --- /dev/null +++ b/docs-source/source/information_extraction.rst @@ -0,0 +1,37 @@ +.. --------------------------------------------------------------------------- +.. Copyright 2017-2018 Intel Corporation +.. +.. Licensed under the Apache License, Version 2.0 (the "License"); +.. you may not use this file except in compliance with the License. +.. You may obtain a copy of the License at +.. +.. http://www.apache.org/licenses/LICENSE-2.0 +.. +.. Unless required by applicable law or agreed to in writing, software +.. distributed under the License is distributed on an "AS IS" BASIS, +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +.. See the License for the specific language governing permissions and +.. limitations under the License. +.. --------------------------------------------------------------------------- + +====================== +Information Extraction +====================== + +.. include:: np2vec.rst + +---- + +.. include:: cross_doc_coref.rst + +---- + +.. include:: identifying_semantic_relation.rst + +---- + +.. include:: np_segmentation.rst + +---- + +.. include:: word_sense.rst diff --git a/docs-source/source/installation.rst b/docs-source/source/installation.rst index f913d795..0cf0626f 100644 --- a/docs-source/source/installation.rst +++ b/docs-source/source/installation.rst @@ -35,11 +35,16 @@ Before installing the library make sure you has the most recent packages listed pkg-config, pkg-config, Retrieves information about installed libraries .. note:: - The default installation of NLP Architect use CPU-based binaries of all deep learning frameworks. Intel Optimized MKL-DNN binaries will be installed if a Linux is detected. GPU backed is supported online on Linux and if a GPU is present. See details below for instructions on how to install each backend. + The default installation of NLP Architect use CPU-based binaries of all deep learning frameworks. Intel Optimized MKL-DNN binaries will be installed if a Linux is detected. GPU backend on Linux will install Tensorflow with MKL-DNN and if a GPU is present. See details below for instructions on how to install each backend. + +.. note:: + + For specific installation of backends of Tensorflow or PyTorch (CPU/MKL/GPU) we recommend installing NLP Architect and then installing the desired package of framework. Installation ============ + Prerequisites ------------- @@ -121,47 +126,3 @@ Using `pip` .. code:: bash pip install -U nlp-architect - -====== - -Compiling Intel® optimized Tensorflow with MKL-DNN -================================================== - -NLP Architect supports MKL-DNN flavor of Tensorflow out of the box, however, if the user wishes to compile Tensorflow we provide instructions below. - -Tensorflow has a guide `guide `_ for compiling and installing Tensorflow with with MKL-DNN optimization. Make sure to install all required tools: bazel and python development dependencies. - -Alternatively, follow the instructions below to compile and install the latest version of Tensorflow with MKL-DNN: - -* Clone Tensorflow repository from GitHub: - - .. code:: - - git clone https://github.com/tensorflow/tensorflow - cd tensorflow - -* Configure Tensorflow for compilation: - - .. code:: - - ./configure - -* Compile Tensorflow with MKL-DNN: - - .. code:: - - bazel build --config=mkl --config=opt //tensorflow/tools/pip_package:build_pip_package - -* Create pip package in ``/tmp/tensorflow_pkg``: - - .. code:: - - bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg - -* Install Tensorflow pip package: - - .. code:: - - pip install .whl - -* Refer to `this `_ guide for specific configuration to get optimal performance when running your model. diff --git a/docs-source/source/lm.rst b/docs-source/source/lm.rst new file mode 100644 index 00000000..fe897635 --- /dev/null +++ b/docs-source/source/lm.rst @@ -0,0 +1,21 @@ +.. --------------------------------------------------------------------------- +.. Copyright 2017-2018 Intel Corporation +.. +.. Licensed under the Apache License, Version 2.0 (the "License"); +.. you may not use this file except in compliance with the License. +.. You may obtain a copy of the License at +.. +.. http://www.apache.org/licenses/LICENSE-2.0 +.. +.. Unless required by applicable law or agreed to in writing, software +.. distributed under the License is distributed on an "AS IS" BASIS, +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +.. See the License for the specific language governing permissions and +.. limitations under the License. +.. --------------------------------------------------------------------------- + +=============== +Language Models +=============== + +.. include:: tcn.rst \ No newline at end of file diff --git a/docs-source/source/main.rst b/docs-source/source/main.rst index d5335a7e..e04bdc6b 100644 --- a/docs-source/source/main.rst +++ b/docs-source/source/main.rst @@ -14,69 +14,65 @@ .. limitations under the License. .. --------------------------------------------------------------------------- +============================== NLP Architect by Intel® AI Lab -############################### +============================== -NLP Architect is an open-source Python library for exploring the state-of-the-art deep learning topologies and techniques for natural language processing and natural -language understanding. It is intended to be a platform for future research and -collaboration. +NLP Architect is an open source Python library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing and Natural Language Understanding neural network. The library includes our past and ongoing NLP research and development efforts as part of Intel AI Lab. NLP Architect can be downloaded from Github: https://github.com/NervanaSystems/nlp-architect -Library Overview -================ - -Research driven NLP/NLU models ------------------------------- -The library contains state-of-art and novel NLP and NLU models in a variety of topics: - -- Dependency parsing -- Intent detection and Slot tagging model for Intent based applications -- Memory Networks for goal-oriented dialog -- Noun phrase embedding vectors model -- Noun phrase semantic segmentation -- Named Entity Recognition -- Word Chunking -- Reading comprehension -- Language modeling using Temporal Convolution Network -- Unsupervised Crosslingual Word Embedding -- Aspect Based Sentiment Analysis -- Supervised sentiment analysis -- Sparse and quantized neural machine translation -- Relation Identification and cross document coreference - -.. include:: _quick_install.rst - -How can NLP Architect be used -============================= - -- Train models using provided algorithms, reference datasets and configurations -- Train models using your own data -- Create new/extend models based on existing models or topologies -- Explore how deep learning models tackle various NLP tasks -- Experiment and optimize state-of-the-art deep learning algorithms -- integrate modules and utilities from the library to solutions - -Deep Learning frameworks ------------------------- -Because of the current research nature of the library, several open source deep learning frameworks are used in this repository including: - -- Tensorflow_ or `Intel-Optimized TensorFlow`_ -- Dynet_ - -Overtime the list of models and frameworks included in this space will change, though all generally run with Python 3.6+ - -Using the Models ----------------- -Each of the models includes a comprehensive description on algorithms, network topologies, reference dataset descriptions and loader, and evaluation results. Overtime the list of models included in this space will grow. - -Contributing to the library ---------------------------- -We welcome collaboration, suggestions, and critiques. For information on how to become a developer -on this project, please see the :doc:`developer guide `. +Overview +======== +NLP Architect is designed to be flexible for adding new models, neural network components, data handling methods and for easy training and running models. + +Features: + +* Core NLP models used in many NLP tasks and useful in many NLP applications +* Novel NLU models showcasing novel *topologies* and *techniques* +* **Optimized NLP/NLU models** showcasing different optimization algorithms on neural NLP/NLU models +* Model-oriented design: + + * Train and run models from command-line. + * API for using models for inference in python. + * Procedures to define custom processes for training, inference or anything related to processing. + * CLI sub-system for running procedures + +* Based on the following Deep Learning frameworks: + + * TensorFlow + * PyTorch + * Intel-Optimized TensorFlow with MKL-DNN + * Dynet + +* Essential utilities for working with NLP models - Text/String pre-processing, IO, data-manipulation, metrics, embeddings. +* Plug-able REST API server to serve models via REST API + + +Library design philosophy +========================= + +NLP Architect is a model-oriented library designed to showcase novel and different neural network optimizations. The library contains NLP/NLU related models per task, different neural network topologies (which are used in models), procedures for simplifying workflows in the library, pre-defined data processors and dataset loaders and misc utilities. The library is designed to be a tool for model development: data pre-process, build model, train, validate, infer, save or load a model. + +The main design guidelines are: + +* Deep Learning framework agnostic +* NLP/NLU models per task +* Different topologies (moduels) implementations that can be used with models +* Showcase End-to-End applications (Solutions) utilizing one or more NLP Architect model +* Generic dataset loaders, textual data processing utilities, and miscellaneous utilities that support NLP model development (loaders, text processors, io, metrics, etc.) +* ``Procedures`` for defining processes for training, inference, optimization or any kind of elaborate script. +* Pythonic API for using models for inference +* REST API servers with ability to serve trained models via HTTP +* Extensive model documentation and tutorials + +Disclaimer +========== + +NLP Architect is an active space of research and development; Throughout future releases new models, solutions, topologies and framework additions and changes will be made. We aim to make sure all models run with Python 3.6+. We encourage researchers and developers to contribute their work into the library. .. _Tensorflow: https://www.tensorflow.org/ .. _Intel-Optimized TensorFlow: https://software.intel.com/en-us/articles/intel-optimized-tensorflow-wheel-now-available diff --git a/docs-source/source/model_zoo.rst b/docs-source/source/model_zoo.rst index a5b04b2a..e435323b 100644 --- a/docs-source/source/model_zoo.rst +++ b/docs-source/source/model_zoo.rst @@ -14,8 +14,9 @@ .. limitations under the License. .. --------------------------------------------------------------------------- +======================= NLP Architect Model Zoo -####################### +======================= .. list-table:: :widths: 10 30 10 @@ -25,7 +26,7 @@ NLP Architect Model Zoo - Description - Links * - :doc:`Sparse GNMT ` - - 90% sparse GNMT model and a 2x2 block sparse translating German to English trained on Europarl-v7 [1]_ , Common Crawl and News Commentary 11 datasets + - 90% sparse GNMT model and a 2x2 block sparse translating German to English trained on Europarl-v7 [#]_ , Common Crawl and News Commentary 11 datasets - | `model `_ | `2x2 block sparse model `_ * - :doc:`Intent Extraction ` @@ -49,5 +50,6 @@ NLP Architect Model Zoo | `params `_ References -========== -.. [1] Europarl-v7: A Parallel Corpus for Statistical Machine Translation, Philipp Koehn, MT Summit 2005 \ No newline at end of file +---------- + +.. [#] Europarl-v7: A Parallel Corpus for Statistical Machine Translation, Philipp Koehn, MT Summit 2005 diff --git a/docs-source/source/publications.rst b/docs-source/source/publications.rst index 1bd15483..ec194f76 100644 --- a/docs-source/source/publications.rst +++ b/docs-source/source/publications.rst @@ -30,6 +30,8 @@ Blog posts - `Exploring Term Set Expansion with NLP Architect `_ - `Extracting Semantic Relations using External Knowledge Resources with NLP Architect `_ - `Future Directions for NLP in Commercial Environments `_ +- `Introducing Aspect-Based Sentiment Analysis in NLP Architect `_ +- `Advances in Cross-document Entity and Event Coreference Resolution for NLP `_ Conference Proceedings diff --git a/docs/_sources/quantized_bert.rst b/docs-source/source/quantized_bert.rst similarity index 97% rename from docs/_sources/quantized_bert.rst rename to docs-source/source/quantized_bert.rst index ef1b240b..d931810e 100644 --- a/docs/_sources/quantized_bert.rst +++ b/docs-source/source/quantized_bert.rst @@ -14,8 +14,9 @@ .. limitations under the License. .. --------------------------------------------------------------------------- +============================================== Quantize BERT with Quantization Aware Training -############################################## +============================================== Overview ======== @@ -38,7 +39,7 @@ to overcome this error. \ In this work we use the quantization scheme and method offered by Jacob et al [2]_. At the forward pass we use fake quantization to simulate the quantization error during the forward pass and at the backward pass we estimate -the fake quantization gradients using Straigh-Through Estimator [3]_. +the fake quantization gradients using Straight-Through Estimator [3]_. Results ======= @@ -121,6 +122,3 @@ References .. _`Microsoft Research Paraphrase Corpus (MRPC)`: https://www.microsoft.com/en-us/download/details.aspx?id=52398 .. _`GLUE benchmark`: https://gluebenchmark.com/ - - - diff --git a/docs-source/source/quick_start.rst b/docs-source/source/quick_start.rst new file mode 100644 index 00000000..024e3d97 --- /dev/null +++ b/docs-source/source/quick_start.rst @@ -0,0 +1,100 @@ +.. --------------------------------------------------------------------------- +.. Copyright 2017-2018 Intel Corporation +.. +.. Licensed under the Apache License, Version 2.0 (the "License"); +.. you may not use this file except in compliance with the License. +.. You may obtain a copy of the License at +.. +.. http://www.apache.org/licenses/LICENSE-2.0 +.. +.. Unless required by applicable law or agreed to in writing, software +.. distributed under the License is distributed on an "AS IS" BASIS, +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +.. See the License for the specific language governing permissions and +.. limitations under the License. +.. --------------------------------------------------------------------------- + +=========== +Quick start +=========== + +Installation +------------ + +Make sure to use **Python 3.6+** and a virtual environment. + +Using ``pip`` +~~~~~~~~~~~~~ + +.. code:: bash + + pip install nlp_architect + +From source +~~~~~~~~~~~ + +.. code:: bash + + git clone https://github.com/NervanaSystems/nlp-architect.git + cd nlp-architect + pip install -e . # install in development mode + +.. note:: + + For specific installation of backends of Tensorflow or PyTorch (CPU/MKL/GPU) we recommend installing NLP Architect and then installing the desired package of framework. + +Usage +----- + +NLP Architect has the following packages: + ++---------------------------+-------------------------------------------------------+ +| Package | Description | ++===========================+=======================================================+ +| `nlp_architect.api` | Model API interfaces | ++---------------------------+-------------------------------------------------------+ +| `nlp_architect.common` | Common packages | ++---------------------------+-------------------------------------------------------+ +| `nlp_architect.cli` | Command line module | ++---------------------------+-------------------------------------------------------+ +| `nlp_architect.data` | Datasets, loaders and data processors | ++---------------------------+-------------------------------------------------------+ +| `nlp_architect.models` | NLP, NLU and End-to-End models | ++---------------------------+-------------------------------------------------------+ +| `nlp_architect.nn` | Topology related models and additions (per framework) | ++---------------------------+-------------------------------------------------------+ +| `nlp_architect.pipelines` | End-to-end NLP apps | ++---------------------------+-------------------------------------------------------+ +| `nlp_architect.procedures`| Procedure scripts | ++---------------------------+-------------------------------------------------------+ +| `nlp_architect.server` | API Server and demos UI | ++---------------------------+-------------------------------------------------------+ +| `nlp_architect.solutions` | Solution applications | ++---------------------------+-------------------------------------------------------+ +| `nlp_architect.utils` | Misc. I/O, metric, pre-processing and text utilities | ++---------------------------+-------------------------------------------------------+ + + +CLI +--- + +NLP Architect comes with a CLI application that helps users run procedures and processes from the library. + +.. warning:: + + The CLI is in development and some functionality is not complete + and will be added in future versions + +The list of possible options can be obtained by ``nlp_architect -h``: + +``nlp_architect`` commands: + +.. code-block:: text + + train Train a model from the library + run Run a model from the library + process Run a data processor from the library + solution Run a solution process from the library + serve Server a trained model using REST service + +Use ``nlp_architect -h`` for per command usage instructions. \ No newline at end of file diff --git a/docs-source/source/sentiment.rst b/docs-source/source/sentiment.rst new file mode 100644 index 00000000..d9cdc5ad --- /dev/null +++ b/docs-source/source/sentiment.rst @@ -0,0 +1,26 @@ +.. --------------------------------------------------------------------------- +.. Copyright 2017-2018 Intel Corporation +.. +.. Licensed under the Apache License, Version 2.0 (the "License"); +.. you may not use this file except in compliance with the License. +.. You may obtain a copy of the License at +.. +.. http://www.apache.org/licenses/LICENSE-2.0 +.. +.. Unless required by applicable law or agreed to in writing, software +.. distributed under the License is distributed on an "AS IS" BASIS, +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +.. See the License for the specific language governing permissions and +.. limitations under the License. +.. --------------------------------------------------------------------------- + +================== +Sentiment Analysis +================== + +.. include:: supervised_sentiment.rst + +------ + +.. include:: absa.rst + diff --git a/docs-source/source/service.rst b/docs-source/source/service.rst index 08556f9f..a386c446 100644 --- a/docs-source/source/service.rst +++ b/docs-source/source/service.rst @@ -191,5 +191,4 @@ In order to add a new service to the server you need to go over 3 steps: "" : {"file_name": "", "type": <"core"/"high_level>"} - -.. include:: service_deploy.rst \ No newline at end of file +.. .. include:: service_deploy.rst \ No newline at end of file diff --git a/docs-source/source/static/nlp_arch_theme.css b/docs-source/source/static/nlp_arch_theme.css new file mode 100644 index 00000000..3bdf2ea3 --- /dev/null +++ b/docs-source/source/static/nlp_arch_theme.css @@ -0,0 +1,85 @@ +.rst-content div[class^='highlight'] pre { + border: none; + background: none; + font-family: "Roboto Mono", "Consolas", "Menlo", "Andale Mono WT", "Andale Mono", "Lucida Console", "monospace"; + } + +body, h1, h2, h3, h4, h5, h6, legend { + font-family: 'Open Sans', sans-serif; + /* background: #ffffff; */ + font-weight: 400; +} +.wy-menu-vertical header, .wy-menu-vertical p.caption { + color: #EF4C23; + font-weight: 400; + font-size: 100%; +} + +.rst-content .toctree-wrapper p.caption { + font-family: 'Open Sans', sans-serif; + /* background: #ffffff; */ + font-weight: 300; +} +.wy-menu-vertical { + font-weight: 300; +} + +.wy-menu-vertical a { + color: #d9d9d9; +} + +.wy-menu-vertical li ul li a { + font-weight: 300; +} + +.wy-menu-vertical li.on a, .wy-menu-vertical li.current>a { + font-weight: 400; +} + +.installation_table label { + display: inline; +} + +.wy-table-responsive table td, .wy-table-responsive table th { + white-space: inherit; +} + +.wy-nav-content { + padding: 1.618em 3.236em; + height: 100%; + /* max-width: 1000px !important; */ + margin: 0; +} + +@media screen and (min-width: 767px) { + + .wy-table-responsive table td { + /* !important prevents the common CSS stylesheets from overriding + this as on RTD they are loaded after this stylesheet */ + white-space: normal !important; + } + + .wy-table-responsive { + overflow: visible !important; + } + } + + .wy-side-nav-search { + background-color: #EF4C23; + } + + .wy-side-nav-search input[type=text] { + border-color: #555; + } + + .wy-nav-side { + background: #231F20; + } + + .wy-menu-vertical a:active { + background: #EF4C23; + } + +.rst-content a:link, .rst-content a:visited, .rst-content a:hover, .rst-content a:active { + color: #EF4C23; + } \ No newline at end of file diff --git a/docs-source/source/static/theme.css b/docs-source/source/static/theme.css deleted file mode 100644 index d7f92cf1..00000000 --- a/docs-source/source/static/theme.css +++ /dev/null @@ -1,55 +0,0 @@ -@media screen and (min-width: 767px) { - - .wy-table-responsive table td { - /* !important prevents the common CSS stylesheets from overriding - this as on RTD they are loaded after this stylesheet */ - white-space: normal !important; - } - - .wy-table-responsive { - overflow: visible !important; - } -} - -.wy-side-nav-search { - background-color: #0071c5; -} - -.wy-nav-content-wrap { - background: #ffffff; -} - -.wy-nav-content { - padding: 1.618em 3.236em; - height: 100%; - max-width: 1000px !important; - margin: 0; - background: #ffffff; -} - -.wy-nav-side { - background: #343131; -} - -.rst-content div[class^='highlight'] pre { - border: none; - background: none; - font-family: "Monaco", "Consolas", "Menlo", "Andale Mono WT", "Andale Mono", "Lucida Console", "monospace"; -} - -body, h1, h2, .rst-content .toctree-wrapper p.caption, h3, h4, h5, h6, legend { - font-family: "Lato", "Menlo", "Andale Mono WT", "Andale Mono", "Lucida Console", "monospace"; - background: #ffffff; -} - -.wy-nav-top { - background: #0071c5; -} - -.installation_table label { - display: inline; -} - -.wy-table-responsive table td, .wy-table-responsive table th { - white-space: inherit; -} diff --git a/docs-source/source/chunker.rst b/docs-source/source/tagging/chunker.rst similarity index 98% rename from docs-source/source/chunker.rst rename to docs-source/source/tagging/chunker.rst index 9d3f5e30..b48718dc 100755 --- a/docs-source/source/chunker.rst +++ b/docs-source/source/tagging/chunker.rst @@ -14,11 +14,11 @@ .. limitations under the License. .. --------------------------------------------------------------------------- -Sequence Chunker -################ +Word Chunker +================ Overview -======== +-------- Phrase chunking is a basic NLP task that consists of tagging parts of a sentence (1 or more tokens) syntactically, i.e. POS tagging. @@ -33,7 +33,7 @@ In this example the sentence can be divided into 4 phrases, ``The quick brown fo are noun phrases, ``jumped`` is a verb phrase and ``over`` is a prepositional phrase. Dataset -======= +------- We used the CONLL2000_ shared task dataset in our example for training a phrase chunker. More info about the CONLL2000_ shared task can be found here: https://www.clips.uantwerpen.be/conll2000/chunking/. The terms and conditions of the data set license apply. Intel does not grant any rights to the data files. The annotation of the data has been derived from the WSJ corpus by a program written by Sabine Buchholz from Tilburg University, The Netherlands. @@ -56,7 +56,7 @@ To get the dataset follow these steps: 3. provide ``CONLL2000`` data loader or ``train.py`` sample below the directory containing the files. Model -===== +----- The sequence chunker is a Tensorflow-keras based model and it is implemented in :py:class:`SequenceChunker ` and comes with several options for creating the topology depending on what input is given (tokens, external word embedding model, topology parameters). @@ -72,7 +72,7 @@ The model has additional improvements to the model presented in the paper: The model's embedding vector size and LSTM layer hidden state have equal sizes, the default training optimizer is Adam with default parameters. Running Modalities -================== +------------------ We provide a simple example for training and running inference using the :py:class:`SequenceChunker ` model. @@ -81,7 +81,8 @@ We provide a simple example for training and running inference using the :py:cla ``examples/chunker/inference.py`` will load a saved model and a given text file with sentences and print the chunks found on the stdout. Training --------- +~~~~~~~~ + Quick train ^^^^^^^^^^^ Train a model with default parameters (use sentence words and default network settings): @@ -120,7 +121,7 @@ Saving the model after training is done automatically by specifying a model name * ``chunker_model.params`` - model parameter files (topology parameters, vocabs) Inference ---------- +~~~~~~~~~ Running inference on a trained model using an input file (text based, each line is a document): diff --git a/docs-source/source/tagging/ner.rst b/docs-source/source/tagging/ner.rst new file mode 100644 index 00000000..3fd539a0 --- /dev/null +++ b/docs-source/source/tagging/ner.rst @@ -0,0 +1,139 @@ +.. --------------------------------------------------------------------------- +.. Copyright 2017-2018 Intel Corporation +.. +.. Licensed under the Apache License, Version 2.0 (the "License"); +.. you may not use this file except in compliance with the License. +.. You may obtain a copy of the License at +.. +.. http://www.apache.org/licenses/LICENSE-2.0 +.. +.. Unless required by applicable law or agreed to in writing, software +.. distributed under the License is distributed on an "AS IS" BASIS, +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +.. See the License for the specific language governing permissions and +.. limitations under the License. +.. --------------------------------------------------------------------------- + +Named Entity Recognition +======================== +``NeuralTagger`` +---------------- + +A model for training token tagging tasks, such as NER or POS. ``NeuralTagger`` requires an **embedder** for +extracting the contextual features of the data, see embedders below. +The model uses either a *Softmax* or a *Conditional Random Field* classifier to classify the words into +correct labels. Implemented in PyTorch and support only PyTorch based embedders. + +See :py:class:`NeuralTagger ` for complete documentation of model methods. + + +.. autoclass:: nlp_architect.models.tagging.NeuralTagger + +``CNNLSTM`` +----------- + +This module is a embedder based on `End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF`_ by Ma and Hovy (2016). +The model uses CNNs to embed character representation of words in a sentence and stacked bi-direction LSTM layers to embed the context of words and characters. + +.. figure:: ../assets/cnn-lstm-fig.png + + CNN-LSTM topology (taken from original paper) + + +**Usage** + +Use :py:class:`TokenClsProcessor ` for parsing input files for the model. :py:class:`NeuralTagger ` for training/loading a trained model. + +Training a model:: + + nlp_architect train tagger --model_type cnn-lstm --data_dir --output_dir + +See ```nlp_architect train tagger -h``` for full list of options for training. + +Running inference on trained model:: + + nlp_architect run tagger --data_file --model_dir --output_dir + +See ```nlp_architect run tagger -h``` for full list of options for running a trained model. + +.. autoclass:: nlp_architect.nn.torch.modules.embedders.CNNLSTM + +``IDCNN`` +--------- + + +The module is an embedder based on `Fast and Accurate Entity Recognition with Iterated Dilated Convolutions`_ by Strubell et at (2017). +The model uses Iterated-Dilated convolusions for sequence labelling. An dilated CNN block utilizes CNN and dilations to catpure the context of a whole sentence and relation ships between words. +In the figure below you can see an example for a dilated CNN block with maximum dilation of 4 and filter width of 3. +This model is a fast alternative to LSTM-based models with ~10x speedup compared to LSTM-based models. + +.. figure:: ../assets/idcnn-fig.png + + A dilated CNN block (taken from original paper) + +We added a word character convolution feature extractor which is concatenated to the embedded word representations. + +**Usage** + +Use :py:class:`TokenClsProcessor ` for parsing input files for the model. :py:class:`NeuralTagger ` for training/loading a trained model. + +Training a model:: + + nlp_architect train tagger --model_type id-cnn --data_dir --output_dir + + +See ```nlp_architect train tagger -h``` for full list of options for training. + +Running inference on trained model:: + + nlp_architect run tagger --data_file --model_dir --output_dir + + +See ```nlp_architect run tagger -h``` for full list of options for running a trained model. + +.. autoclass:: nlp_architect.nn.torch.modules.embedders.IDCNN + +.. _transformer_cls: + +``TransformerTokenClassifier`` +------------------------------ + +A tagger using a Transformer-based topology and a pre-trained model on a large collection of data (usually wikipedia and such). + +:py:class:`TransformerTokenClassifier ` We provide token tagging classifier head module for Transformer-based pre-trained models. +Currently we support BERT/XLNet and quantized BERT base models which utilize a fully-connected layer with *Softmax* classifier. Tokens which were broken into multiple sub-tokens (using Wordpiece algorithm or such) are ignored. For a complete list of transformer base models run ```nlp_architect train transformer_token -h``` to see a list of models that can be fine-tuned to your task. + +**Usage** + +Use :py:class:`TokenClsProcessor ` for parsing input files for the model. Depending on which model you choose, the padding and sentence formatting is adjusted to fit the base model you chose. + +See model class :py:class:`TransformerTokenClassifier ` for usage documentation. + +Training a model:: + + nlp_architect train transformer_token \ + --data_dir \ + --model_name_or_path \ + --model_type [bert, quant_bert, xlnet] \ + --output_dir + +See ```nlp_architect train transformer_token -h``` for full list of options for training. + +Running inference on a trained model:: + + nlp_architect run transformer_token \ + --data_file \ + --model_path \ + --model_type [bert, quant_bert, xlnet] \ + --output_dir + +See ``nlp_architect run tagger -h`` for full list of options for running a trained model. + +.. _BIO: https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging) +.. _`Lample et al.`: https://arxiv.org/abs/1603.01360 +.. _`Neural Architectures for Named Entity Recognition`: https://arxiv.org/abs/1603.01360 +.. _`Conditional Random Field classifier`: https://en.wikipedia.org/wiki/Conditional_random_field +.. _`End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF`: https://arxiv.org/abs/1603.01354 +.. _`Fast and Accurate Entity Recognition with Iterated Dilated Convolutions`: https://arxiv.org/abs/1702.02098 +.. _`Deep multi-task learning with low level tasks supervised at lower layers`: http://anthology.aclweb.org/P16-2038 + diff --git a/docs-source/source/ner_crf.rst b/docs-source/source/tagging/ner_crf.rst similarity index 99% rename from docs-source/source/ner_crf.rst rename to docs-source/source/tagging/ner_crf.rst index 7479f4d5..1e4939d0 100755 --- a/docs-source/source/ner_crf.rst +++ b/docs-source/source/tagging/ner_crf.rst @@ -15,7 +15,7 @@ .. --------------------------------------------------------------------------- Named Entity Recognition -######################## +======================== Overview ======== diff --git a/docs-source/source/tagging/sequence_tagging.rst b/docs-source/source/tagging/sequence_tagging.rst new file mode 100644 index 00000000..cf11d17f --- /dev/null +++ b/docs-source/source/tagging/sequence_tagging.rst @@ -0,0 +1,51 @@ +.. --------------------------------------------------------------------------- +.. Copyright 2017-2018 Intel Corporation +.. +.. Licensed under the Apache License, Version 2.0 (the "License"); +.. you may not use this file except in compliance with the License. +.. You may obtain a copy of the License at +.. +.. http://www.apache.org/licenses/LICENSE-2.0 +.. +.. Unless required by applicable law or agreed to in writing, software +.. distributed under the License is distributed on an "AS IS" BASIS, +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +.. See the License for the specific language governing permissions and +.. limitations under the License. +.. --------------------------------------------------------------------------- + +================================== +Neural Models for Sequence Tagging +================================== + +Overview +======== + +Token tagging is a core Information extraction task in which words (or phrases) are classified using a pre-defined label set. +Common core NLP tagging tasks are Word Chunking, Part-of-speech (POS) tagging or Named entity recognition (NER). + +Example +------- +Named Entity Recognition (NER) is a basic Information extraction task in which words (or phrases) are classified into pre-defined entity groups (or marked as non interesting). Entity groups share common characteristics of consisting words or phrases and are identifiable by the shape of the word or context in which they appear in sentences. Examples of entity groups are: names, numbers, locations, currency, dates, company names, etc. + +Example sentence: + +.. code:: bash + + John is planning a visit to London on October + | | | + Name City Date + +In this example, a ``name``, ``city`` and ``date`` entities are identified. + +Models +------ + +NLP Architect includes the following models: + +* Word Chunking +* POS Tagging +* Named Entity Recognition + +.. include:: chunker.rst +.. include:: ner.rst diff --git a/docs-source/source/tcn.rst b/docs-source/source/tcn.rst index f0bda4dd..b749a0d4 100644 --- a/docs-source/source/tcn.rst +++ b/docs-source/source/tcn.rst @@ -14,12 +14,12 @@ .. limitations under the License. .. --------------------------------------------------------------------------- -Language Modeling -################# +Language Modeling with TCN +========================== Overview -======== +-------- A language model (LM) is a probability distribution over a sequence of words. Given a sequence, a trained language model can provide the probability that the sequence is realistic. Using deep learning, one manner of creating an LM is by training a neural network to predict the probability of occurrence of the next word (or character) in the sequence given all the words (or characters) preceding it. (In other words, the joint distribution over elements in a sequence is broken up using the chain rule.) @@ -28,7 +28,7 @@ This folder contains scripts that implement a word-level language model using Te Data Loading -============ +------------ - PTB can be downloaded from `here `_ - Wikitext can be downloaded from `here `_ @@ -44,9 +44,9 @@ Data Loading - Note that the data loader prompts the user to automatically download the data if not already present. Please provide the location to save the data as an argument to the data loader. Running Modalities -================== +------------------ Training --------- +~~~~~~~~ The base class that defines :py:class:`TCN ` topology can be imported as: .. code-block:: python diff --git a/docs-source/source/transformers.rst b/docs-source/source/transformers.rst new file mode 100644 index 00000000..5cfda0d8 --- /dev/null +++ b/docs-source/source/transformers.rst @@ -0,0 +1,81 @@ +.. --------------------------------------------------------------------------- +.. Copyright 2016-2018 Intel Corporation +.. +.. Licensed under the Apache License, Version 2.0 (the "License"); +.. you may not use this file except in compliance with the License. +.. You may obtain a copy of the License at +.. +.. http://www.apache.org/licenses/LICENSE-2.0 +.. +.. Unless required by applicable law or agreed to in writing, softw+are +.. distributed under the License is distributed on an "AS IS" BASIS, +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +.. See the License for the specific language governing permissions and +.. limitations under the License. +.. --------------------------------------------------------------------------- + +============ +Transformers +============ + +NLP Architect integrated the Transformer models available in `pytorch-transformers `_. Using Transformer models based on a pre-trained models usually done by attaching a classification head on the transformer model and fine-tuning the model (transformer and classifier) on the target (down-stream) task. + +Base model +---------- + +:py:class:`TransformerBase ` is a base class for handling +loading, saving, training and inference of transformer models. + +The base model support `pytorch-transformers` configs, tokenizers and base models as documented in their `website `_ (see our base-class for supported models). + +In order to use the Transformer models just sub-class the base model and include: + +* A classifier (head) for your task. +* sub-method handling of input to tensors used by model. +* any sub-method to evaluate the task, do inference, etc. + +Models +------ +Sequence classification +~~~~~~~~~~~~~~~~~~~~~~~ + +:py:class:`TransformerSequenceClassifier ` is a transformer model with sentence classification head (the ``[CLS]`` token is used as classification label) for sentence classification tasks (classification/regression). + +See ``nlp_architect.procedures.transformers.glue`` for an example of training sequence classification models on GLUE benchmark tasks. + +Training a model on GLUE tasks, using BERT-base uncased base model: + +.. code-block:: bash + + nlp_architect train transformer_glue \ + --task_name \ + --model_name_or_path bert-base-uncased \ + --model_type bert \ + --output_dir \ + --evaluate_during_training \ + --data_dir \ + --do_lower_case + +Running a model: + +.. code-block:: bash + + nlp_architect run transformer_glue \ + --model_path \ + --task_name \ + --model_type bert \ + --output_dir \ + --data_dir \ + --do_lower_case \ + --overwrite_output_dir + +Token classification +~~~~~~~~~~~~~~~~~~~~ + +:py:class:`TransformerTokenClassifier ` is a transformer model for token classification for tasks such as NER, POS or chunking. + +See example for usage :ref:`transformer_cls` NER model description. + + + + diff --git a/docs-source/source/transformers_distillation.rst b/docs-source/source/transformers_distillation.rst new file mode 100644 index 00000000..26e32d89 --- /dev/null +++ b/docs-source/source/transformers_distillation.rst @@ -0,0 +1,77 @@ +.. --------------------------------------------------------------------------- +.. Copyright 2017-2019 Intel Corporation +.. +.. Licensed under the Apache License, Version 2.0 (the "License"); +.. you may not use this file except in compliance with the License. +.. You may obtain a copy of the License at +.. +.. http://www.apache.org/licenses/LICENSE-2.0 +.. +.. Unless required by applicable law or agreed to in writing, software +.. distributed under the License is distributed on an "AS IS" BASIS, +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +.. See the License for the specific language governing permissions and +.. limitations under the License. +.. --------------------------------------------------------------------------- + +============================== +Transformer model distillation +============================== + +Overview +======== + +Transformer models which were pre-trained on large corpora, such as BERT/XLNet/XLM, +have shown to improve the accuracy of many NLP tasks. However, such models have two +distinct disadvantages - (1) model size and (2) speed, since such large models are +computationally heavy. + +One possible approach to overcome these cons is to use Knowledge Distillation (KD). +Using this approach a large model is trained on the data set and then used to *teach* +a much smaller and more efficient network. This is often referred to a Student-Teacher +training where a teacher network adds its error to the student's loss function, thus, +helping the student network to converge to a better solution. + +Knowledge Distillation +====================== + +One approach is similar to the method in Hinton 2015 [#]_. The loss function is +modified to include a measure of distributions divergence, which can be measured +using KL divergence or MSE between the logits of the student and the teacher network. + + :math:`loss = w_s \cdot loss_{student} + w_d \cdot KL(logits_{student} / T || logits_{teacher} / T)` + +where *T* is a value representing temperature for softening the logits prior to +applying softmax. `loss_{student}` is the original loss of the student network +obtained during regular training. Finally, the losses are weighted. + +``TeacherStudentDistill`` +------------------------- + +This class can be added to support for distillation in a model. +To add support for distillation, the student model must include handling of training +using ``TeacherStudentDistill`` class, see ``nlp_architect.procedures.token_tagging.do_kd_training`` for +an example how to train a neural tagger using a transformer model using distillation. + +.. autoclass:: nlp_architect.nn.torch.distillation.TeacherStudentDistill + :members: + +Supported models +================ + +``NeuralTagger`` +---------------- + +Useful for training taggers from Transformer models. :py:class:`NeuralTagger ` model that uses LSTM and CNN based embedders are ~3M parameters in size (~30-100x smaller than BERT models) and ~10x faster on average. + +Usage: + +#. Train a transformer tagger using :py:class:`TransformerTokenClassifier ` or using ``nlp_architect train transformer_token`` command +#. Train a neural tagger :py:class:`Neural Tagger ` using the trained transformer model and use the :py:class:`TeacherStudentDistill ` model that was configured with the transformer model. This can be done using :py:class:`Neural Tagger `'s train loop or by using ``nlp_architect train tagger_kd`` command + + +.. note:: + More models supporting distillation will be added in next releases + + +.. [#] Distilling the Knowledge in a Neural Network: Geoffrey Hinton, Oriol Vinyals, Jeff Dean, https://arxiv.org/abs/1503.02531 \ No newline at end of file diff --git a/docs/CONTRIBUTING.html b/docs/CONTRIBUTING.html index 51b6ca90..2bd82e01 100644 --- a/docs/CONTRIBUTING.html +++ b/docs/CONTRIBUTING.html @@ -8,7 +8,7 @@ - Contribution Process — NLP Architect by Intel® AI Lab 0.4.post2 documentation + Contribution Process — NLP Architect by Intel® AI Lab 0.5 documentation @@ -25,6 +25,8 @@ + + @@ -33,8 +35,9 @@ - - + + + @@ -55,7 +58,7 @@ - + @@ -81,31 +84,27 @@

NLP/NLU Models

+

Optimized Models

+

Solutions

@@ -114,14 +113,11 @@
  • Set Expansion
  • Trend Analysis
  • -

    Pipelines

    -

    For Developers

    @@ -187,14 +183,14 @@

    Contribution Process

      -
    1. File an issue (to track your contribution):

      -
    git clone https://github.com/NervanaSystems/nlp-architect.git
    @@ -206,8 +202,8 @@ 

    Contribution Process

      -
    1. Create a new feature branch for your work and switch to it. Give it a -meaningful name related to the task(s) at hand:

    2. +
    3. Create a new feature branch for your work and switch to it. Give it a +meaningful name related to the task(s) at hand:
      -
    1. Ideally you’d start by creating one or more unit tests with the +

    2. Ideally you’d start by creating one or more unit tests with the functionality you expect your new feature to perform. These should reside under the appropriate tests subdirectory of whatever you are changing. Then hack away at the code until you feel your feature is complete. Once -satisfied, run the code through the following checks:

    3. +satisfied, run the code through the following checks:
    -
    nlp_architect test   # ensure all are OK
    -nlp_architect style  # ensure there are no style related issues
    +
    ./scripts/run_tests.sh    # ensure all are OK
    +./scripts/check_style.sh  # ensure there are no style related issues
     
      -
    1. If necessary you may want to update and/or rebuild the documentation. +

    2. If necessary you may want to update and/or rebuild the documentation. This all exists under docs-source/source and is in -Sphinx reStructuredText format:

    3. +Sphinx reStructuredText format:
    -
    cd scripts/
    -sh create_docs.sh   # builds the doc and starts a local server directly
    +
    ./scripts/create_docs.sh   # builds the doc and starts a local server directly
     
      -
    1. Commit your changes and push your feature branch to your GitHub fork. Be +

    2. Commit your changes and push your feature branch to your GitHub fork. Be sure to add a descriptive message and reference the GitHub issue associated with your task (ex. #1). You will also want to rebase your commits down to -a single sensible commit to make things clean for the merge process:

    3. +a single sensible commit to make things clean for the merge process:
    git add my_updated_files
    @@ -252,31 +247,15 @@ 

    Contribution Process

      -
    1. Create a new pull request to get your feature branch merged into master for -others to use. You’ll first need to ensure your feature branch contains the -latest changes from master. Furthermore, internal devs will need to assign -the request to someone else for a code review. You must also ensure there -are no errors when run through the items defined in step 4.

    2. -
    -
    -
    # (external contribs): make a new pull request:
    -https://github.com/NervanaSystems/nlp-architect/pulls
    -
    -# merge latest master changes into your feature branch
    -git fetch origin
    -git checkout master
    -git pull origin master
    -git checkout my_new_feature_branch
    -git merge master  # you may need to manually resolve any merge conflicts
    -
    -
    -
    -
      -
    1. If there are issues you can continue to push commits to your feature branch +

    2. Create a new pull request +to get your feature branch merged into master for others to use. +You’ll first need to ensure your feature branch contains the latest changes from +master.
    3. +
    4. If there are issues you can continue to push commits to your feature branch by following step 6. They will automatically be added to this same merge -request.

    5. -
    6. Once your change has been successfully merged, you can remove the source -branch and ensure your local copy is up to date:

    7. +request. +
    8. Once your change has been successfully merged, you can remove the source +branch and ensure your local copy is up to date:
    git fetch origin
    @@ -288,7 +267,7 @@ 

    Contribution Process

      -
    1. Give yourself a high five for a job well done!

    2. +
    3. Give yourself a high five for a job well done!
    diff --git a/docs/_images/absa_solution_flow.png b/docs/_images/absa_solution_flow.png deleted file mode 100644 index 8c39ce57..00000000 Binary files a/docs/_images/absa_solution_flow.png and /dev/null differ diff --git a/docs/_images/absa_solution_ui_1.png b/docs/_images/absa_solution_ui_1.png deleted file mode 100644 index 0e6ca61e..00000000 Binary files a/docs/_images/absa_solution_ui_1.png and /dev/null differ diff --git a/docs/_images/absa_solution_ui_2.png b/docs/_images/absa_solution_ui_2.png deleted file mode 100644 index 9b1a9ee5..00000000 Binary files a/docs/_images/absa_solution_ui_2.png and /dev/null differ diff --git a/docs/_images/absa_solution_ui_3.png b/docs/_images/absa_solution_ui_3.png new file mode 100644 index 00000000..1d28dc50 Binary files /dev/null and b/docs/_images/absa_solution_ui_3.png differ diff --git a/docs/_images/absa_solution_ui_4.png b/docs/_images/absa_solution_ui_4.png new file mode 100644 index 00000000..9fd9f9f9 Binary files /dev/null and b/docs/_images/absa_solution_ui_4.png differ diff --git a/docs/_images/absa_solution_workflow.png b/docs/_images/absa_solution_workflow.png new file mode 100644 index 00000000..e3197bc8 Binary files /dev/null and b/docs/_images/absa_solution_workflow.png differ diff --git a/docs/_images/cnn-lstm-fig.png b/docs/_images/cnn-lstm-fig.png new file mode 100644 index 00000000..d1a82ed1 Binary files /dev/null and b/docs/_images/cnn-lstm-fig.png differ diff --git a/docs/_images/idcnn-fig.png b/docs/_images/idcnn-fig.png new file mode 100644 index 00000000..47d2e77f Binary files /dev/null and b/docs/_images/idcnn-fig.png differ diff --git a/docs/_images/ner_crf_model.png b/docs/_images/ner_crf_model.png deleted file mode 100644 index bef975bb..00000000 Binary files a/docs/_images/ner_crf_model.png and /dev/null differ diff --git a/docs/_modules/index.html b/docs/_modules/index.html index 05bf1bdf..78ac7467 100644 --- a/docs/_modules/index.html +++ b/docs/_modules/index.html @@ -8,7 +8,7 @@ - Overview: module code — NLP Architect by Intel® AI Lab 0.4.post2 documentation + Overview: module code — NLP Architect by Intel® AI Lab 0.5 documentation @@ -25,6 +25,8 @@ + + @@ -33,8 +35,9 @@ - - + + + @@ -55,7 +58,7 @@ - + @@ -81,31 +84,27 @@

    NLP/NLU Models

    +

    Optimized Models

    +

    Solutions

    @@ -114,14 +113,11 @@
  • Set Expansion
  • Trend Analysis
  • -

    Pipelines

    -

    For Developers

    @@ -184,16 +180,27 @@

    All modules for which code is available

    diff --git a/docs/_modules/nlp_architect/api/abstract_api.html b/docs/_modules/nlp_architect/api/abstract_api.html index 9bf39a5a..6faa6f81 100644 --- a/docs/_modules/nlp_architect/api/abstract_api.html +++ b/docs/_modules/nlp_architect/api/abstract_api.html @@ -8,7 +8,7 @@ - nlp_architect.api.abstract_api — NLP Architect by Intel® AI Lab 0.4.post2 documentation + nlp_architect.api.abstract_api — NLP Architect by Intel® AI Lab 0.5 documentation @@ -25,6 +25,8 @@ + + @@ -33,8 +35,9 @@ - - + + + @@ -55,7 +58,7 @@ - + @@ -81,31 +84,27 @@

    NLP/NLU Models

    +

    Optimized Models

    +

    Solutions

    @@ -114,14 +113,11 @@
  • Set Expansion
  • Trend Analysis
  • -

    Pipelines

    -

    For Developers

    @@ -207,15 +203,15 @@

    Source code for nlp_architect.api.abstract_api

    Not_Implemented = 'Not Implemented' -

    [docs]class AbstractApi: +
    [docs]class AbstractApi: """ Abstract class for API's to the server """ -
    [docs] @abstractmethod +
    [docs] @abstractmethod def load_model(self): raise Not_Implemented
    -
    [docs] @abstractmethod +
    [docs] @abstractmethod def inference(self, doc): raise Not_Implemented
    diff --git a/docs/_modules/nlp_architect/api/base.html b/docs/_modules/nlp_architect/api/base.html new file mode 100644 index 00000000..b6ffe2b1 --- /dev/null +++ b/docs/_modules/nlp_architect/api/base.html @@ -0,0 +1,274 @@ + + + + + + + + + + + nlp_architect.api.base — NLP Architect by Intel® AI Lab 0.5 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    + + + + + +
    + +
    + + + + + + + + + + + + + + + + + +
    + + + + +
    +
    +
    +
    + +

    Source code for nlp_architect.api.base

    +# ******************************************************************************
    +# Copyright 2017-2019 Intel Corporation
    +#
    +# Licensed under the Apache License, Version 2.0 (the "License");
    +# you may not use this file except in compliance with the License.
    +# You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +# ******************************************************************************
    +import logging
    +from typing import Union, List, Dict
    +
    +logger = logging.getLogger(__name__)
    +
    +
    +
    [docs]class ModelAPI: + """ Base class for a model API implementation + Implementing classes must provide a default model and/or a path to a model + + Args: + model_path (str): path to a trained model + + run method must return + """ + default_model = None # pre-trained model from library + + def __init__(self, model_path: str = None): + if model_path is not None: + self.load_model(model_path) + elif self.default_model is not None: + # get default model and load it + # TODO: implement model integration + raise NotImplementedError + else: + logger.error("Not model provided or not pre-trained model configured") + +
    [docs] def load_model(self, model_path: str): + raise NotImplementedError
    + +
    [docs] def run(self, inputs: Union[str, List[str]]) -> Dict: + raise NotImplementedError
    + + def __call__(self, inputs: Union[str, List[str]]): + return self.run(inputs)
    +
    + +
    + +
    + + +
    +
    + +
    + +
    + + + + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/nlp_architect/api/bist_parser_api.html b/docs/_modules/nlp_architect/api/bist_parser_api.html index 7d3bf455..deccb830 100644 --- a/docs/_modules/nlp_architect/api/bist_parser_api.html +++ b/docs/_modules/nlp_architect/api/bist_parser_api.html @@ -8,7 +8,7 @@ - nlp_architect.api.bist_parser_api — NLP Architect by Intel® AI Lab 0.4.post2 documentation + nlp_architect.api.bist_parser_api — NLP Architect by Intel® AI Lab 0.5 documentation @@ -25,6 +25,8 @@ + + @@ -33,8 +35,9 @@ - - + + + @@ -55,7 +58,7 @@ - + @@ -81,31 +84,27 @@

    NLP/NLU Models

    +

    Optimized Models

    +

    Solutions

    @@ -114,14 +113,11 @@
  • Set Expansion
  • Trend Analysis
  • -

    Pipelines

    -

    For Developers

    @@ -205,20 +201,20 @@

    Source code for nlp_architect.api.bist_parser_api

    from nlp_architect.pipelines.spacy_bist import SpacyBISTParser -
    [docs]class BistParserApi(AbstractApi): +
    [docs]class BistParserApi(AbstractApi): """ Bist Parser API """ def __init__(self): self.model = None -
    [docs] def load_model(self): +
    [docs] def load_model(self): """ Load SpacyBISTParser model """ self.model = SpacyBISTParser()
    -
    [docs] def inference(self, doc): +
    [docs] def inference(self, doc): """ Parse according to SpacyBISTParser's model diff --git a/docs/_modules/nlp_architect/api/intent_extraction_api.html b/docs/_modules/nlp_architect/api/intent_extraction_api.html new file mode 100644 index 00000000..7cf2c6fc --- /dev/null +++ b/docs/_modules/nlp_architect/api/intent_extraction_api.html @@ -0,0 +1,373 @@ + + + + + + + + + + + nlp_architect.api.intent_extraction_api — NLP Architect by Intel® AI Lab 0.5 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    + + + + + +
    + +
    + + + + + + + + + + + + + + + + + +
    + +
      + +
    • Docs »
    • + +
    • Module code »
    • + +
    • nlp_architect.api.intent_extraction_api
    • + + +
    • + +
    • + +
    + + +
    +
    +
    +
    + +

    Source code for nlp_architect.api.intent_extraction_api

    +# ******************************************************************************
    +# Copyright 2017-2018 Intel Corporation
    +#
    +# Licensed under the Apache License, Version 2.0 (the "License");
    +# you may not use this file except in compliance with the License.
    +# You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +# ******************************************************************************
    +import numpy as np
    +import pickle
    +from os import makedirs, path, sys
    +
    +from nlp_architect.api.abstract_api import AbstractApi
    +from nlp_architect.models.intent_extraction import MultiTaskIntentModel, Seq2SeqIntentModel
    +from nlp_architect import LIBRARY_OUT
    +from nlp_architect.utils.generic import pad_sentences
    +from nlp_architect.utils.io import download_unlicensed_file
    +from nlp_architect.utils.text import SpacyInstance, bio_to_spans
    +
    +
    +
    [docs]class IntentExtractionApi(AbstractApi): + model_dir = str(LIBRARY_OUT / 'intent-pretrained') + pretrained_model_info = path.join(model_dir, 'model_info.dat') + pretrained_model = path.join(model_dir, 'model.h5') + + def __init__(self, prompt=True): + self.model = None + self.model_type = None + self.word_vocab = None + self.tags_vocab = None + self.char_vocab = None + self.intent_vocab = None + self._download_pretrained_model(prompt) + self.nlp = SpacyInstance(disable=['tagger', 'ner', 'parser', 'vectors', 'textcat']) + +
    [docs] def process_text(self, text): + input_text = ' '.join(text.strip().split()) + return self.nlp.tokenize(input_text)
    + + @staticmethod + def _prompt(): + response = input('\nTo download \'{}\', please enter YES: '. + format('intent_extraction')) + res = response.lower().strip() + if res == "yes" or (len(res) == 1 and res == 'y'): + print('Downloading {}...'.format('ner')) + responded_yes = True + else: + print('Download declined. Response received {} != YES|Y. '.format(res)) + responded_yes = False + return responded_yes + + @staticmethod + def _download_pretrained_model(prompt=True): + """Downloads the pre-trained BIST model if non-existent.""" + model_info_exists = path.isfile(IntentExtractionApi.pretrained_model_info) + model_exists = path.isfile(IntentExtractionApi.pretrained_model) + if not model_exists or not model_info_exists: + print('The pre-trained models to be downloaded for the intent extraction dataset ' + 'are licensed under Apache 2.0. By downloading, you accept the terms ' + 'and conditions provided by the license') + makedirs(IntentExtractionApi.model_dir, exist_ok=True) + if prompt is True: + agreed = IntentExtractionApi._prompt() + if agreed is False: + sys.exit(0) + download_unlicensed_file('https://s3-us-west-2.amazonaws.com/nlp-architect-data' + '/models/intent/', + 'model_info.dat', IntentExtractionApi.pretrained_model_info) + download_unlicensed_file('https://s3-us-west-2.amazonaws.com/nlp-architect-data' + '/models/intent/', + 'model.h5', IntentExtractionApi.pretrained_model) + print('Done.') + +
    [docs] @staticmethod + def display_results(text_str, predictions, intent_type): + ret = {'annotation_set': [], 'doc_text': ' '.join([t for t in text_str])} + spans = [] + available_tags = set() + for s, e, tag in bio_to_spans(text_str, predictions): + spans.append({ + 'start': s, + 'end': e, + 'type': tag + }) + available_tags.add(tag) + ret['annotation_set'] = list(available_tags) + ret['spans'] = spans + ret['title'] = intent_type + return {'doc': ret, 'type': 'high_level'}
    + +
    [docs] def vectorize(self, doc, vocab, char_vocab=None): + words = np.asarray([vocab[w.lower()] if w.lower() in vocab else 1 for w in doc])\ + .reshape(1, -1) + if char_vocab is not None: + sentence_chars = [] + for w in doc: + word_chars = [] + for c in w: + if c in char_vocab: + _cid = char_vocab[c] + else: + _cid = 1 + word_chars.append(_cid) + sentence_chars.append(word_chars) + sentence_chars = np.expand_dims(pad_sentences(sentence_chars, self.model.word_length), + axis=0) + return [words, sentence_chars] + return words
    + +
    [docs] def inference(self, doc): + text_arr = self.process_text(doc) + intent_type = None + if self.model_type == 'mtl': + doc_vec = self.vectorize(text_arr, self.word_vocab, self.char_vocab) + intent, tags = self.model.predict(doc_vec, batch_size=1) + intent = int(intent.argmax(1).flatten()) + intent_type = self.intent_vocab.get(intent, None) + print('Detected intent type: {}'.format(intent_type)) + else: + doc_vec = self.vectorize(text_arr, self.word_vocab, None) + tags = self.model.predict(doc_vec, batch_size=1) + tags = tags.argmax(2).flatten() + tag_str = [self.tags_vocab.get(n, None) for n in tags] + for t, n in zip(text_arr, tag_str): + print('{}\t{}\t'.format(t, n)) + return self.display_results(text_arr, tag_str, intent_type)
    + +
    [docs] def load_model(self): + with open(IntentExtractionApi.pretrained_model_info, 'rb') as fp: + model_info = pickle.load(fp) + self.model_type = model_info['type'] + self.word_vocab = model_info['word_vocab'] + self.tags_vocab = {v: k for k, v in model_info['tags_vocab'].items()} + if self.model_type == 'mtl': + self.char_vocab = model_info['char_vocab'] + self.intent_vocab = {v: k for k, v in model_info['intent_vocab'].items()} + model = MultiTaskIntentModel() + else: + model = Seq2SeqIntentModel() + model.load(self.pretrained_model) + self.model = model
    +
    + +
    + +
    + + +
    +
    + +
    + +
    + + + + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/nlp_architect/api/machine_comprehension_api.html b/docs/_modules/nlp_architect/api/machine_comprehension_api.html new file mode 100644 index 00000000..d8b1a7a5 --- /dev/null +++ b/docs/_modules/nlp_architect/api/machine_comprehension_api.html @@ -0,0 +1,446 @@ + + + + + + + + + + + nlp_architect.api.machine_comprehension_api — NLP Architect by Intel® AI Lab 0.5 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    + + + + + +
    + +
    + + + + + + + + + + + + + + + + + +
    + +
      + +
    • Docs »
    • + +
    • Module code »
    • + +
    • nlp_architect.api.machine_comprehension_api
    • + + +
    • + +
    • + +
    + + +
    +
    +
    +
    + +

    Source code for nlp_architect.api.machine_comprehension_api

    +# ******************************************************************************
    +# Copyright 2017-2018 Intel Corporation
    +#
    +# Licensed under the Apache License, Version 2.0 (the "License");
    +# you may not use this file except in compliance with the License.
    +# You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +# ******************************************************************************
    +
    +from __future__ import division
    +from __future__ import print_function
    +
    +import os
    +import re
    +import zipfile
    +from os import makedirs
    +from random import shuffle
    +
    +import numpy as np
    +import tensorflow as tf
    +
    +from nlp_architect.api.abstract_api import AbstractApi
    +from nlp_architect.models.matchlstm_ansptr import MatchLSTMAnswerPointer
    +from nlp_architect import LIBRARY_OUT
    +from nlp_architect.utils.generic import license_prompt
    +from nlp_architect.utils.io import download_unlicensed_file
    +from nlp_architect.utils.mrc_utils import (
    +    create_squad_training, max_values_squad, get_data_array_squad)
    +
    +
    +
    [docs]class MachineComprehensionApi(AbstractApi): + """ + Machine Comprehension API + """ + dir = str(LIBRARY_OUT / 'mrc-pretrained') + data_path = os.path.join(dir, 'mrc_data', 'data') + data_dir = os.path.join(dir, 'mrc_data') + model_dir = os.path.join(dir, 'mrc_trained_model') + model_path = os.path.join(dir, 'mrc_trained_model', 'trained_model') + + def __init__(self, prompt=True): + self.prompt = None + self.vocab_dict = None + self.vocab_rev = None + self.model = None + self.dev = None + self.sess = None + self.prompt = prompt + self.params_dict = {'batch_size': 1, + 'hidden_size': 150, + 'max_para': 300, + 'epoch_no': 15, + 'inference_only': True} + self.file_name_dict = {'train_para_ids': 'train.ids.context', + 'train_ques_ids': 'train.ids.question', + 'train_answer': 'train.span', + 'val_para_ids': 'dev.ids.context', + 'val_ques_ids': 'dev.ids.question', + 'val_ans': 'dev.span', + 'vocab_file': 'vocab.dat', + 'embedding': 'glove.trimmed.300.npz'} + +
    [docs] def download_model(self): + # Validate contents of data_path folder: + data_path = self.data_path + download = False + for file_name in self.file_name_dict.values(): + if not os.path.exists(os.path.join(data_path, file_name)): + # prompt + download = True + print("The following required file is missing :", file_name) + + if download is True: + if self.prompt is True: + license_prompt('mrc_data', + 'https://s3-us-west-2.amazonaws.com/nlp-architect-data/models/mrc' + '/mrc_data.zip', + self.data_dir) + license_prompt('mrc_model', + 'https://s3-us-west-2.amazonaws.com/nlp-architect-data/models/mrc' + '/mrc_model.zip', + self.model_dir) + data_zipfile = os.path.join(self.data_dir, 'mrc_data.zip') + model_zipfile = os.path.join(self.model_dir, 'mrc_model.zip') + makedirs(self.data_dir, exist_ok=True) + makedirs(self.model_dir, exist_ok=True) + download_unlicensed_file('https://s3-us-west-2.amazonaws.com/nlp-architect-data' + '/models/mrc/', + 'mrc_data.zip', data_zipfile) + download_unlicensed_file('https://s3-us-west-2.amazonaws.com/nlp-architect-data' + '/models/mrc/', + 'mrc_model.zip', model_zipfile) + with zipfile.ZipFile(data_zipfile) as data_zip_ref: + data_zip_ref.extractall(self.data_dir) + with zipfile.ZipFile(model_zipfile) as model_zip_ref: + model_zip_ref.extractall(self.model_dir)
    + +
    [docs] def load_model(self): + select_device = 'GPU' + restore_model = True + # Create dictionary of filenames + self.download_model() + + data_path = self.data_path + # Paths for preprcessed files + path_gen = data_path # data is actually in mrc_data/data not, mrc_data + train_para_ids = os.path.join(path_gen, self.file_name_dict['train_para_ids']) + train_ques_ids = os.path.join(path_gen, self.file_name_dict['train_ques_ids']) + answer_file = os.path.join(path_gen, self.file_name_dict['train_answer']) + val_paras_ids = os.path.join(path_gen, self.file_name_dict['val_para_ids']) + val_ques_ids = os.path.join(path_gen, self.file_name_dict['val_ques_ids']) + val_ans_file = os.path.join(path_gen, self.file_name_dict['val_ans']) + vocab_file = os.path.join(path_gen, self.file_name_dict['vocab_file']) + + model_dir = self.model_path + # Create model dir if it doesn't exist + if not os.path.exists(model_dir): + os.makedirs(model_dir) + + model_path = model_dir + + # Create lists for train and validation sets + data_train = create_squad_training(train_para_ids, train_ques_ids, answer_file) + data_dev = create_squad_training(val_paras_ids, val_ques_ids, val_ans_file) + with open(vocab_file, encoding='UTF-8') as fp: + vocab_list = fp.readlines() + self.vocab_dict = {} + self.vocab_rev = {} + + for i in range(len(vocab_list)): + self.vocab_dict[i] = vocab_list[i].strip() + self.vocab_rev[vocab_list[i].strip()] = i + + self.params_dict['train_set_size'] = len(data_train) + + # Combine train and dev data + data_total = data_train + data_dev + + # obtain maximum length of question + _, max_question = max_values_squad(data_total) + self.params_dict['max_question'] = max_question + + # Load embeddings for vocab + print('Loading Embeddings') + embeddingz = np.load(os.path.join(path_gen, self.file_name_dict['embedding'])) + embeddings = embeddingz['glove'] + + # Create train and dev sets + print("Creating training and development sets") + self.dev = get_data_array_squad(self.params_dict, data_dev, set_val='val') + + # Define Reading Comprehension model + with tf.device('/device:' + select_device + ':0'): + self.model = MatchLSTMAnswerPointer(self.params_dict, embeddings) + + # Define Configs for training + run_config = tf.ConfigProto(allow_soft_placement=True, log_device_placement=False) + + # Create session run training + self.sess = tf.Session(config=run_config) + init = tf.global_variables_initializer() + + # Model Saver + # pylint: disable=no-member + model_saver = tf.train.Saver() + model_ckpt = tf.train.get_checkpoint_state(model_path) + idx_path = model_ckpt.model_checkpoint_path + ".index" if model_ckpt else "" + + # Initialize with random or pretrained weights + # pylint: disable=no-member + if model_ckpt and restore_model and (tf.gfile.Exists( + model_ckpt.model_checkpoint_path) or tf.gfile.Exists(idx_path)): + model_saver.restore(self.sess, model_ckpt.model_checkpoint_path) + print("Loading from previously stored session") + else: + self.sess.run(init) + + shuffle(self.dev)
    + +
    [docs] @staticmethod + def paragraphs(valid, vocab_tuple, num_examples): + paragraphs = [] + vocab_forward = vocab_tuple[0] + for idx in range(num_examples): + test_paragraph = [vocab_forward[ele] for ele in valid[idx][0] if ele != 0] + para_string = " ".join(map(str, test_paragraph)) + paragraphs.append(re.sub(r'\s([?.!,"](?:\s|$))', r'\1', para_string)) # (?:\s|$)) + return paragraphs
    + +
    [docs] @staticmethod + def questions(valid, vocab_tuple, num_examples): + vocab_forward = vocab_tuple[0] + questions = [] + for idx in range(num_examples): + test_question = [vocab_forward[ele] for ele in valid[idx][1] if ele != 0] + ques_string = " ".join(map(str, test_question)) + questions.append(re.sub(r'\s([?.!"",])', r'\1', ques_string)) + return questions
    + +
    [docs] def inference(self, doc): + body = doc + print("Begin Inference Mode") + question = body['question'] + paragraph_id = body['paragraph'] + return self.model.inference_mode(self.sess, self.dev, [self.vocab_dict, self.vocab_rev], + dynamic_question_mode=True, num_examples=1, dropout=1.0, + dynamic_usr_question=question, + dynamic_question_index=paragraph_id)
    + +
    [docs] def get_paragraphs(self): + ret = {'paragraphs': self.paragraphs(self.dev, [self.vocab_dict, self.vocab_rev], + num_examples=5), + 'questions': self.questions(self.dev, [self.vocab_dict, self.vocab_rev], + num_examples=5)} + return ret
    +
    + +
    + +
    + + +
    +
    + +
    + +
    + + + + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/nlp_architect/api/ner_api.html b/docs/_modules/nlp_architect/api/ner_api.html index f8149c80..7ed27d68 100644 --- a/docs/_modules/nlp_architect/api/ner_api.html +++ b/docs/_modules/nlp_architect/api/ner_api.html @@ -8,7 +8,7 @@ - nlp_architect.api.ner_api — NLP Architect by Intel® AI Lab 0.4.post2 documentation + nlp_architect.api.ner_api — NLP Architect by Intel® AI Lab 0.5 documentation @@ -25,6 +25,8 @@ + + @@ -33,8 +35,9 @@ - - + + + @@ -55,7 +58,7 @@ - + @@ -81,31 +84,27 @@

    NLP/NLU Models

    +

    Optimized Models

    +

    Solutions

    @@ -114,14 +113,11 @@
  • Set Expansion
  • Trend Analysis
  • -

    Pipelines

    -

    For Developers

    @@ -213,7 +209,7 @@

    Source code for nlp_architect.api.ner_api

     from nlp_architect.utils.text import SpacyInstance, bio_to_spans
     
     
    -
    [docs]class NerApi(AbstractApi): +
    [docs]class NerApi(AbstractApi): """ NER model API """ @@ -264,7 +260,7 @@

    Source code for nlp_architect.api.ner_api

                                          'model_info_v4.dat', self.pretrained_model_info)
                 print('Done.')
     
    -
    [docs] def load_model(self): +
    [docs] def load_model(self): self.model = NERCRF() self.model.load(self.pretrained_model) with open(self.pretrained_model_info, 'rb') as fp: @@ -273,7 +269,7 @@

    Source code for nlp_architect.api.ner_api

             self.y_vocab = {v: k for k, v in model_info['y_vocab'].items()}
             self.char_vocab = model_info['char_vocab']
    -
    [docs] @staticmethod +
    [docs] @staticmethod def pretty_print(text, tags): spans = [] for s, e, tag in bio_to_spans(text, tags): @@ -290,11 +286,11 @@

    Source code for nlp_architect.api.ner_api

             print({"doc": ret, 'type': 'high_level'})
             return {"doc": ret, 'type': 'high_level'}
    -
    [docs] def process_text(self, text): +
    [docs] def process_text(self, text): input_text = ' '.join(text.strip().split()) return self.nlp.tokenize(input_text)
    -
    [docs] def vectorize(self, doc, vocab, char_vocab): +
    [docs] def vectorize(self, doc, vocab, char_vocab): words = np.asarray([vocab[w.lower()] if w.lower() in vocab else 1 for w in doc]) \ .reshape(1, -1) sentence_chars = [] @@ -311,7 +307,7 @@

    Source code for nlp_architect.api.ner_api

                                             axis=0)
             return words, sentence_chars
    -
    [docs] def inference(self, doc): +
    [docs] def inference(self, doc): text_arr = self.process_text(doc) doc_vec = self.vectorize(text_arr, self.word_vocab, self.char_vocab) seq_len = np.array([len(text_arr)]).reshape(-1, 1) diff --git a/docs/_modules/nlp_architect/cli.html b/docs/_modules/nlp_architect/cli.html new file mode 100644 index 00000000..ccf0aef5 --- /dev/null +++ b/docs/_modules/nlp_architect/cli.html @@ -0,0 +1,285 @@ + + + + + + + + + + + nlp_architect.cli — NLP Architect by Intel® AI Lab 0.5 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    + + + + + +
    + +
    + + + + + + + + + + + + + + + + + +
    + + + + +
    +
    +
    +
    + +

    Source code for nlp_architect.cli

    +# ******************************************************************************
    +# Copyright 2017-2019 Intel Corporation
    +#
    +# Licensed under the Apache License, Version 2.0 (the "License");
    +# you may not use this file except in compliance with the License.
    +# You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +# ******************************************************************************
    +import argparse
    +import logging 
    +
    +# register all procedures by importing
    +import nlp_architect.procedures  # noqa: F401
    +from nlp_architect.cli.cli_commands import (cli_process_cmd, cli_run_cmd,
    +                                            cli_serve_cmd, cli_solution_cmd,
    +                                            cli_train_cmd)
    +from nlp_architect.version import NLP_ARCHITECT_VERSION
    +
    +# Setup logging
    +logging.basicConfig(format='%(asctime)s - %(levelname)s - %(name)s - %(message)s',
    +                    datefmt='%m/%d/%Y %H:%M:%S',
    +                    level=logging.INFO)
    +
    +
    [docs]def run_cli(): + """ Run nlp_architect command line application + """ + prog_name = 'nlp_architect' + desc = 'NLP Architect CLI [{}]'.format(NLP_ARCHITECT_VERSION) + parser = argparse.ArgumentParser(description=desc, prog=prog_name) + parser.add_argument('-v', '--version', action='version', + version='%(prog)s v{}'.format(NLP_ARCHITECT_VERSION)) + + parser.set_defaults(func=lambda _: parser.print_help()) + subparsers = parser.add_subparsers(title='commands', metavar='') + for command in sub_commands: + command(subparsers) + args = parser.parse_args() + if hasattr(args, 'func'): + args.func(args) + else: + parser.print_help()
    + + +# sub commands list +sub_commands = [ + cli_train_cmd, + cli_run_cmd, + cli_process_cmd, + cli_solution_cmd, + cli_serve_cmd +] + +if __name__ == "__main__": + run_cli() +
    + +
    + +
    + + +
    +
    + +
    + +
    + + + + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/nlp_architect/cli/cli_commands.html b/docs/_modules/nlp_architect/cli/cli_commands.html new file mode 100644 index 00000000..a3a67af6 --- /dev/null +++ b/docs/_modules/nlp_architect/cli/cli_commands.html @@ -0,0 +1,300 @@ + + + + + + + + + + + nlp_architect.cli.cli_commands — NLP Architect by Intel® AI Lab 0.5 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    + + + + + +
    + +
    + + + + + + + + + + + + + + + + + +
    + + + + +
    +
    +
    +
    + +

    Source code for nlp_architect.cli.cli_commands

    +# ******************************************************************************
    +# Copyright 2017-2019 Intel Corporation
    +#
    +# Licensed under the Apache License, Version 2.0 (the "License");
    +# you may not use this file except in compliance with the License.
    +# You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +# ******************************************************************************
    +from argparse import _SubParsersAction
    +
    +from nlp_architect.cli.cmd_registry import CMD_REGISTRY
    +
    +
    +
    [docs]def generic_cmd(cmd_name: str, subtitle: str, description: str, subparsers: _SubParsersAction): + parser = subparsers.add_parser(cmd_name, + description=description, + help=description) + + subsubparsers = parser.add_subparsers(title=subtitle, + metavar='') + for model in CMD_REGISTRY[cmd_name]: + sp = subsubparsers.add_parser(model['name'], + description=model['description'], + help=model['description']) + model['arg_adder'](sp) + sp.set_defaults(func=model['fn']) + parser.set_defaults(func=lambda _: parser.print_help())
    + + +""" +cli commands definition +""" + + +
    [docs]def cli_train_cmd(subparsers: _SubParsersAction): + generic_cmd('train', + 'Available models', + 'Train a model from the library', + subparsers)
    + + +
    [docs]def cli_run_cmd(subparsers: _SubParsersAction): + generic_cmd('run', + 'Available models', + 'Run a model from the library', + subparsers)
    + + +
    [docs]def cli_process_cmd(subparsers: _SubParsersAction): + generic_cmd('process', + 'Available data processors', + 'Run a data processor from the library', + subparsers)
    + + +
    [docs]def cli_solution_cmd(subparsers: _SubParsersAction): + generic_cmd('solution', + 'Available solutions', + 'Run a solution process from the library', + subparsers)
    + + +
    [docs]def cli_serve_cmd(subparsers: _SubParsersAction): + generic_cmd('serve', + 'Available models', + 'Server a trained model using REST service', + subparsers)
    +
    + +
    + +
    + + +
    +
    + +
    + +
    + + + + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/nlp_architect/common/cdc/cluster.html b/docs/_modules/nlp_architect/common/cdc/cluster.html new file mode 100644 index 00000000..aec24454 --- /dev/null +++ b/docs/_modules/nlp_architect/common/cdc/cluster.html @@ -0,0 +1,339 @@ + + + + + + + + + + + nlp_architect.common.cdc.cluster — NLP Architect by Intel® AI Lab 0.5 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    + + + + + +
    + +
    + + + + + + + + + + + + + + + + + +
    + +
      + +
    • Docs »
    • + +
    • Module code »
    • + +
    • nlp_architect.common.cdc.cluster
    • + + +
    • + +
    • + +
    + + +
    +
    +
    +
    + +

    Source code for nlp_architect.common.cdc.cluster

    +# ******************************************************************************
    +# Copyright 2017-2018 Intel Corporation
    +#
    +# Licensed under the Apache License, Version 2.0 (the "License");
    +# you may not use this file except in compliance with the License.
    +# You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +# ******************************************************************************
    +from typing import List
    +
    +from nlp_architect.common.cdc.mention_data import MentionData
    +
    +
    +
    [docs]class Cluster(object): + def __init__(self, coref_chain: int = -1) -> None: + """ + Object represent a set of mentions with same coref chain id + + Args: + coref_chain (int): the cluster id/coref_chain value + """ + self.mentions = [] + self.cluster_strings = [] + self.merged = False + self.coref_chain = coref_chain + self.mentions_corefs = set() + +
    [docs] def get_mentions(self): + return self.mentions
    + +
    [docs] def add_mention(self, mention: MentionData) -> None: + if mention is not None: + mention.predicted_coref_chain = self.coref_chain + self.mentions.append(mention) + self.cluster_strings.append(mention.tokens_str) + self.mentions_corefs.add(mention.coref_chain)
    + +
    [docs] def merge_clusters(self, cluster) -> None: + """ + Args: + cluster: cluster to merge this cluster with + """ + for mention in cluster.mentions: + mention.predicted_coref_chain = self.coref_chain + + self.mentions.extend(cluster.mentions) + self.cluster_strings.extend(cluster.cluster_strings) + self.mentions_corefs.update(cluster.mentions_corefs)
    + +
    [docs] def get_cluster_id(self) -> str: + """ + Returns: + A generated cluster unique Id created from cluster mentions ids + """ + return '$'.join([mention.mention_id for mention in self.mentions])
    + + +
    [docs]class Clusters(object): + cluster_coref_chain = 1000 + + def __init__(self, topic_id: str, mentions: List[MentionData] = None) -> None: + """ + + Args: + mentions: ``list[MentionData]``, required + The initial mentions to create the clusters from + """ + self.clusters_list = [] + self.topic_id = topic_id + self.set_initial_clusters(mentions) + +
    [docs] def set_initial_clusters(self, mentions: List[MentionData]) -> None: + """ + + Args: + mentions: ``list[MentionData]``, required + The initial mentions to create the clusters from + + """ + if mentions: + for mention in mentions: + cluster = Cluster(Clusters.cluster_coref_chain) + cluster.add_mention(mention) + self.clusters_list.append(cluster) + Clusters.cluster_coref_chain += 1
    + +
    [docs] def clean_clusters(self) -> None: + """ + Remove all clusters that were already merged with other clusters + """ + + self.clusters_list = [cluster for cluster in self.clusters_list if not cluster.merged]
    + +
    [docs] def set_coref_chain_to_mentions(self) -> None: + """ + Give all cluster mentions the same coref ID as cluster coref chain ID + + """ + for cluster in self.clusters_list: + for mention in cluster.mentions: + mention.predicted_coref_chain = str(cluster.coref_chain)
    + +
    [docs] def add_cluster(self, cluster: Cluster) -> None: + self.clusters_list.append(cluster)
    + +
    [docs] def add_clusters(self, clusters) -> None: + for cluster in clusters.clusters_list: + self.clusters_list.append(cluster)
    +
    + +
    + +
    + + +
    +
    + +
    + +
    + + + + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/nlp_architect/common/cdc/mention_data.html b/docs/_modules/nlp_architect/common/cdc/mention_data.html index b2214e5e..c243545b 100644 --- a/docs/_modules/nlp_architect/common/cdc/mention_data.html +++ b/docs/_modules/nlp_architect/common/cdc/mention_data.html @@ -8,7 +8,7 @@ - nlp_architect.common.cdc.mention_data — NLP Architect by Intel® AI Lab 0.4.post2 documentation + nlp_architect.common.cdc.mention_data — NLP Architect by Intel® AI Lab 0.5 documentation @@ -25,6 +25,8 @@ + + @@ -33,8 +35,9 @@ - - + + + @@ -55,7 +58,7 @@ - + @@ -81,31 +84,27 @@

    NLP/NLU Models

    +

    Optimized Models

    +

    Solutions

    @@ -114,14 +113,11 @@
  • Set Expansion
  • Trend Analysis
  • -

    Pipelines

    -

    For Developers

    @@ -207,8 +203,8 @@

    Source code for nlp_architect.common.cdc.mention_data

    from nlp_architect.utils.string_utils import StringUtils -
    [docs]class MentionDataLight(object): -
    [docs] def __init__(self, tokens_str: str, mention_context: str = None, mention_head: str = None, +
    [docs]class MentionDataLight(object): + def __init__(self, tokens_str: str, mention_context: str = None, mention_head: str = None, mention_head_lemma: str = None, mention_pos: str = None, mention_ner: str = None): """ @@ -227,11 +223,11 @@

    Source code for nlp_architect.common.cdc.mention_data

    self.mention_head = mention_head self.mention_head_lemma = mention_head_lemma self.mention_head_pos = mention_pos - self.mention_ner = mention_ner
    + self.mention_ner = mention_ner
    -
    [docs]class MentionData(MentionDataLight): -
    [docs] def __init__(self, topic_id: str, doc_id: str, sent_id: int, tokens_numbers: List[int], +
    [docs]class MentionData(MentionDataLight): + def __init__(self, topic_id: str, doc_id: str, sent_id: int, tokens_numbers: List[int], tokens_str: str, mention_context: List[str], mention_head: str, mention_head_lemma: str, coref_chain: str, mention_type: str = 'NA', is_continuous: bool = True, is_singleton: bool = False, score: float = float(-1), @@ -270,9 +266,9 @@

    Source code for nlp_architect.common.cdc.mention_data

    self.score = score self.predicted_coref_chain = predicted_coref_chain self.mention_id = self.gen_mention_id() - self.mention_index = mention_index
    + self.mention_index = mention_index -
    [docs] @staticmethod +
    [docs] @staticmethod def read_json_mention_data_line(mention_line: str): """ Args: @@ -361,7 +357,7 @@

    Source code for nlp_architect.common.cdc.mention_data

    return mention_data
    -
    [docs] @staticmethod +
    [docs] @staticmethod def read_mentions_json_to_mentions_data_list(mentions_json_file: str): """ @@ -380,10 +376,10 @@

    Source code for nlp_architect.common.cdc.mention_data

    return mentions
    -
    [docs] def get_tokens(self): +
    [docs] def get_tokens(self): return self.tokens_number
    -
    [docs] def gen_mention_id(self) -> str: +
    [docs] def gen_mention_id(self) -> str: if self.doc_id and self.sent_id is not None and self.tokens_number: tokens_ids = [str(self.doc_id), str(self.sent_id)] tokens_ids.extend([str(token_id) for token_id in self.tokens_number]) @@ -391,12 +387,12 @@

    Source code for nlp_architect.common.cdc.mention_data

    return '_'.join(self.tokens_str.split())
    -
    [docs] def get_mention_id(self) -> str: +
    [docs] def get_mention_id(self) -> str: if not self.mention_id: self.mention_id = self.gen_mention_id() return self.mention_id
    -
    [docs] @staticmethod +
    [docs] @staticmethod def static_gen_token_unique_id(doc_id: int, sent_id: int, token_id: int) -> str: return '_'.join([str(doc_id), str(sent_id), str(token_id)])
    diff --git a/docs/_modules/nlp_architect/common/cdc/topics.html b/docs/_modules/nlp_architect/common/cdc/topics.html new file mode 100644 index 00000000..03b0144a --- /dev/null +++ b/docs/_modules/nlp_architect/common/cdc/topics.html @@ -0,0 +1,310 @@ + + + + + + + + + + + nlp_architect.common.cdc.topics — NLP Architect by Intel® AI Lab 0.5 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    + + + + + +
    + +
    + + + + + + + + + + + + + + + + + +
    + +
      + +
    • Docs »
    • + +
    • Module code »
    • + +
    • nlp_architect.common.cdc.topics
    • + + +
    • + +
    • + +
    + + +
    +
    +
    +
    + +

    Source code for nlp_architect.common.cdc.topics

    +# ******************************************************************************
    +# Copyright 2017-2018 Intel Corporation
    +#
    +# Licensed under the Apache License, Version 2.0 (the "License");
    +# you may not use this file except in compliance with the License.
    +# You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +# ******************************************************************************
    +
    +import logging
    +import time
    +from typing import List
    +
    +from nlp_architect.common.cdc.mention_data import MentionData
    +from nlp_architect.utils.io import load_json_file
    +
    +logger = logging.getLogger(__name__)
    +
    +
    +
    [docs]class Topic(object): + def __init__(self, topic_id): + self.topic_id = topic_id + self.mentions = []
    + + +
    [docs]class Topics(object): + def __init__(self): + self.topics_list = [] + self.keep_order = False + +
    [docs] def create_from_file(self, mentions_file_path: str, keep_order: bool = False) -> None: + """ + + Args: + keep_order: whether to keep original mentions order or not (default = False) + mentions_file_path: this topic mentions json file + """ + self.keep_order = keep_order + self.topics_list = self.load_mentions_from_file(mentions_file_path)
    + +
    [docs] def load_mentions_from_file(self, mentions_file_path: str) -> List[Topic]: + start_data_load = time.time() + logger.info('Loading mentions from-%s', mentions_file_path) + mentions = load_json_file(mentions_file_path) + topics = self.order_mentions_by_topics(mentions) + end_data_load = time.time() + took_load = end_data_load - start_data_load + logger.info('Mentions file-%s, took:%.4f sec to load', mentions_file_path, took_load) + return topics
    + +
    [docs] def order_mentions_by_topics(self, mentions: str) -> List[Topic]: + """ + Order mentions to documents topics + Args: + mentions: json mentions file + + Returns: + List[Topic] of the mentions separated by their documents topics + """ + running_index = 0 + topics = [] + current_topic_ref = None + for mention_line in mentions: + mention = MentionData.read_json_mention_data_line(mention_line) + + if self.keep_order: + if mention.mention_index == -1: + mention.mention_index = running_index + running_index += 1 + + topic_id = mention.topic_id + + if not current_topic_ref or len(topics) > 0 and topic_id != topics[-1].topic_id: + current_topic_ref = Topic(topic_id) + topics.append(current_topic_ref) + + current_topic_ref.mentions.append(mention) + + return topics
    +
    + +
    + +
    + + +
    +
    + +
    + +
    + + + + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/nlp_architect/common/core_nlp_doc.html b/docs/_modules/nlp_architect/common/core_nlp_doc.html index a4b7992a..1a325d81 100644 --- a/docs/_modules/nlp_architect/common/core_nlp_doc.html +++ b/docs/_modules/nlp_architect/common/core_nlp_doc.html @@ -8,7 +8,7 @@ - nlp_architect.common.core_nlp_doc — NLP Architect by Intel® AI Lab 0.4.post2 documentation + nlp_architect.common.core_nlp_doc — NLP Architect by Intel® AI Lab 0.5 documentation @@ -25,6 +25,8 @@ + + @@ -33,8 +35,9 @@ - - + + + @@ -55,7 +58,7 @@ - + @@ -81,31 +84,27 @@

    NLP/NLU Models

    +

    Optimized Models

    +

    Solutions

    @@ -114,14 +113,11 @@
  • Set Expansion
  • Trend Analysis
  • -

    Pipelines

    -

    For Developers

    @@ -203,25 +199,25 @@

    Source code for nlp_architect.common.core_nlp_doc

    import json -def merge_punct_tok(merged_punct_sentence, last_merged_punct_index, punct_text, is_traverse): +
    [docs]def merge_punct_tok(merged_punct_sentence, last_merged_punct_index, punct_text, is_traverse): # merge the text of the punct tok if is_traverse: merged_punct_sentence[last_merged_punct_index]["text"] = \ punct_text + merged_punct_sentence[last_merged_punct_index]["text"] else: merged_punct_sentence[last_merged_punct_index]["text"] = \ - merged_punct_sentence[last_merged_punct_index]["text"] + punct_text + merged_punct_sentence[last_merged_punct_index]["text"] + punct_text
    -def find_correct_index(orig_gov, merged_punct_sentence): +
    [docs]def find_correct_index(orig_gov, merged_punct_sentence): for tok_index, tok in enumerate(merged_punct_sentence): if tok["start"] == orig_gov["start"] and tok["len"] == orig_gov["len"] and tok["pos"] == \ orig_gov["pos"] and tok["text"] == orig_gov["text"]: return tok_index - return None + return None
    -def fix_gov_indexes(merged_punct_sentence, sentence): +
    [docs]def fix_gov_indexes(merged_punct_sentence, sentence): for merged_token in merged_punct_sentence: tok_gov = merged_token['gov'] if tok_gov == -1: # gov is root @@ -229,10 +225,10 @@

    Source code for nlp_architect.common.core_nlp_doc

    else: orig_gov = sentence[tok_gov] correct_index = find_correct_index(orig_gov, merged_punct_sentence) - merged_token['gov'] = correct_index + merged_token['gov'] = correct_index
    -def merge_punctuation(sentence): +
    [docs]def merge_punctuation(sentence): merged_punct_sentence = [] tmp_punct_text = None punct_text = None @@ -252,10 +248,10 @@

    Source code for nlp_architect.common.core_nlp_doc

    merge_punct_tok(merged_punct_sentence, last_merged_punct_index, punct_text, True) tmp_punct_text = None - return merged_punct_sentence + return merged_punct_sentence
    -
    [docs]class CoreNLPDoc(object): +
    [docs]class CoreNLPDoc(object): """Object for core-components (POS, Dependency Relations, etc). Attributes: @@ -264,11 +260,11 @@

    Source code for nlp_architect.common.core_nlp_doc

    represented by a dictionary, structured as follows: {'start': (int), 'len': (int), 'pos': (str), 'ner': (str), 'lemma': (str), 'gov': (int), 'rel': (str)} """ -
    [docs] def __init__(self, doc_text: str = '', sentences: list = None): + def __init__(self, doc_text: str = '', sentences: list = None): if sentences is None: sentences = [] self._doc_text = doc_text - self._sentences = sentences
    + self._sentences = sentences @property def doc_text(self): @@ -286,7 +282,7 @@

    Source code for nlp_architect.common.core_nlp_doc

    def sentences(self, val): self._sentences = val -
    [docs] @staticmethod +
    [docs] @staticmethod def decoder(obj): if '_doc_text' in obj and '_sentences' in obj: return CoreNLPDoc(obj['_doc_text'], obj['_sentences']) @@ -304,26 +300,26 @@

    Source code for nlp_architect.common.core_nlp_doc

    def __len__(self): return len(self.sentences) -
    [docs] def json(self): +
    [docs] def json(self): """Returns json representations of the object.""" return json.dumps(self.__dict__)
    -
    [docs] def pretty_json(self): +
    [docs] def pretty_json(self): """Returns pretty json representations of the object.""" return json.dumps(self.__dict__, indent=4)
    -
    [docs] def sent_text(self, i): +
    [docs] def sent_text(self, i): parsed_sent = self.sentences[i] first_tok, last_tok = parsed_sent[0], parsed_sent[-1] return self.doc_text[first_tok['start']: last_tok['start'] + last_tok['len']]
    -
    [docs] def sent_iter(self): +
    [docs] def sent_iter(self): for parsed_sent in self.sentences: first_tok, last_tok = parsed_sent[0], parsed_sent[-1] sent_text = self.doc_text[first_tok['start']: last_tok['start'] + last_tok['len']] yield sent_text, parsed_sent
    -
    [docs] def brat_doc(self): +
    [docs] def brat_doc(self): """Returns doc adapted to BRAT expected input.""" doc = {'text': '', 'entities': [], 'relations': []} tok_count = 0 @@ -348,7 +344,7 @@

    Source code for nlp_architect.common.core_nlp_doc

    doc['text'] = doc['text'][1:] return doc
    -
    [docs] def displacy_doc(self): +
    [docs] def displacy_doc(self): """Return doc adapted to displacyENT expected input.""" doc = [] for sentence in self.sentences: diff --git a/docs/_modules/nlp_architect/common/high_level_doc.html b/docs/_modules/nlp_architect/common/high_level_doc.html index f58d5c3b..938671a3 100644 --- a/docs/_modules/nlp_architect/common/high_level_doc.html +++ b/docs/_modules/nlp_architect/common/high_level_doc.html @@ -8,7 +8,7 @@ - nlp_architect.common.high_level_doc — NLP Architect by Intel® AI Lab 0.4.post2 documentation + nlp_architect.common.high_level_doc — NLP Architect by Intel® AI Lab 0.5 documentation @@ -25,6 +25,8 @@ + + @@ -33,8 +35,9 @@ - - + + + @@ -55,7 +58,7 @@ - + @@ -81,31 +84,27 @@

    NLP/NLU Models

    +

    Optimized Models

    +

    Solutions

    @@ -114,14 +113,11 @@
  • Set Expansion
  • Trend Analysis
  • -

    Pipelines

    -

    For Developers

    @@ -205,7 +201,7 @@

    Source code for nlp_architect.common.high_level_doc

    import json -
    [docs]class HighLevelDoc: +
    [docs]class HighLevelDoc: """ object for annotation documents @@ -215,12 +211,12 @@

    Source code for nlp_architect.common.high_level_doc

    self.spans (list(dict)): list of span dict, each span_dict is structured as follows: { 'end': (int), 'start': (int), 'type': (str) string of annotation } """ -
    [docs] def __init__(self): + def __init__(self): self.doc_text = None self.annotation_set = [] - self.spans = []
    + self.spans = [] -
    [docs] def json(self): +
    [docs] def json(self): """ Return json representations of the object @@ -229,7 +225,7 @@

    Source code for nlp_architect.common.high_level_doc

    """ return json.dumps(self.__dict__)
    -
    [docs] def pretty_json(self): +
    [docs] def pretty_json(self): """ Return pretty json representations of the object @@ -238,7 +234,7 @@

    Source code for nlp_architect.common.high_level_doc

    """ return json.dumps(self.__dict__, indent=4)
    -
    [docs] def displacy_doc(self): # only change annotations to lowercase +
    [docs] def displacy_doc(self): # only change annotations to lowercase """ Return doc adapted to displacyENT expected input """ diff --git a/docs/_modules/nlp_architect/data/amazon_reviews.html b/docs/_modules/nlp_architect/data/amazon_reviews.html index c2790234..fc8033bb 100644 --- a/docs/_modules/nlp_architect/data/amazon_reviews.html +++ b/docs/_modules/nlp_architect/data/amazon_reviews.html @@ -8,7 +8,7 @@ - nlp_architect.data.amazon_reviews — NLP Architect by Intel® AI Lab 0.4.post2 documentation + nlp_architect.data.amazon_reviews — NLP Architect by Intel® AI Lab 0.5 documentation @@ -25,6 +25,8 @@ + + @@ -33,8 +35,9 @@ - - + + + @@ -55,7 +58,7 @@ - + @@ -81,31 +84,27 @@

    NLP/NLU Models

    +

    Optimized Models

    +

    Solutions

    @@ -114,14 +113,11 @@
  • Set Expansion
  • Trend Analysis
  • -

    Pipelines

    -

    For Developers

    @@ -230,7 +226,7 @@

    Source code for nlp_architect.data.amazon_reviews

    ] -def review_to_sentiment(review): +
    [docs]def review_to_sentiment(review): # Review is coming in as overall (the rating, reviewText, and summary) # this then cleans the summary and review and gives it a positive or negative value norm_text = normalize(review[2] + " " + review[1]) @@ -240,16 +236,16 @@

    Source code for nlp_architect.data.amazon_reviews

    elif review[0] < 3: review_sent = ['negative', norm_text] - return review_sent + return review_sent
    -
    [docs]class Amazon_Reviews(object): +
    [docs]class Amazon_Reviews(object): """ Take the *.json file of Amazon reviews as downloaded from http://jmcauley.ucsd.edu/data/amazon/ Then does data cleaning and balancing, as well as transforms the reviews 1-5 to a sentiment """ -
    [docs] def __init__(self, review_file, run_balance=True): + def __init__(self, review_file, run_balance=True): self.run_balance = run_balance print("Parsing and processing json file") @@ -269,9 +265,9 @@

    Source code for nlp_architect.data.amazon_reviews

    self.all_text = self.amazon['clean_text'] self.labels_0 = pd.get_dummies(self.amazon['Sentiment']) self.labels = self.labels_0.values - self.text = self.amazon['clean_text'].values
    + self.text = self.amazon['clean_text'].values -
    [docs] def process(self): +
    [docs] def process(self): self.amazon = self.amazon[self.amazon['Sentiment'].isin(['positive', 'negative'])] if self.run_balance: diff --git a/docs/_modules/nlp_architect/data/babi_dialog.html b/docs/_modules/nlp_architect/data/babi_dialog.html index aa72e3cf..1d54085e 100644 --- a/docs/_modules/nlp_architect/data/babi_dialog.html +++ b/docs/_modules/nlp_architect/data/babi_dialog.html @@ -8,7 +8,7 @@ - nlp_architect.data.babi_dialog — NLP Architect by Intel® AI Lab 0.4.post2 documentation + nlp_architect.data.babi_dialog — NLP Architect by Intel® AI Lab 0.5 documentation @@ -25,6 +25,8 @@ + + @@ -33,8 +35,9 @@ - - + + + @@ -55,7 +58,7 @@ - + @@ -81,31 +84,27 @@

    NLP/NLU Models

    +

    Optimized Models

    +

    Solutions

    @@ -114,14 +113,11 @@
  • Set Expansion
  • Trend Analysis
  • -

    Pipelines

    -

    For Developers

    @@ -217,7 +213,7 @@

    Source code for nlp_architect.data.babi_dialog

    from nlp_architect.utils.io import download_unlicensed_file, valid_path_append -def pad_sentences(sentences, sentence_length=0, pad_val=0.): +

    [docs]def pad_sentences(sentences, sentence_length=0, pad_val=0.): """ Pad all sentences to have the same length (number of words) """ @@ -231,10 +227,10 @@

    Source code for nlp_architect.data.babi_dialog

    for i, sent in enumerate(sentences): trunc = sent[-sentence_length:] X[i, :len(trunc)] = trunc - return X + return X

    -def pad_stories(stories, sentence_length, max_story_length, pad_val=0.): +
    [docs]def pad_stories(stories, sentence_length, max_story_length, pad_val=0.): """ Pad all stories to have the same number of sentences (max_story_length). """ @@ -251,10 +247,10 @@

    Source code for nlp_architect.data.babi_dialog

    for i, story in enumerate(stories): X[i, :len(story)] = story - return X + return X

    -
    [docs]class BABI_Dialog(object): +
    [docs]class BABI_Dialog(object): """ This class loads in the Facebook bAbI goal oriented dialog dataset and vectorizes them into user utterances, bot utterances, and answers. @@ -298,7 +294,7 @@

    Source code for nlp_architect.data.babi_dialog

    mapping from match-type to the associated fixed index of the candidate vector which indicated this match type. """ -

    [docs] def __init__(self, path='.', task=1, oov=False, use_match_type=False, + def __init__(self, path='.', task=1, oov=False, use_match_type=False, use_time=True, use_speaker_tag=True, cache_match_type=False, cache_vectorized=False): self.url = 'http://www.thespermwhale.com/jaseweston/babi' @@ -409,9 +405,9 @@

    Source code for nlp_architect.data.babi_dialog

    'axes': ('batch', 'cand_axis', 'REC')} self.data_dict['test']['cands_mat'] = { 'data': self.create_cands_mat('test', cache_match_type), - 'axes': ('batch', 'cand_axis', 'REC')}

    + 'axes': ('batch', 'cand_axis', 'REC')} -
    [docs] def load_data(self): +
    [docs] def load_data(self): """ Fetch and extract the Facebook bAbI-dialog dataset if not already downloaded. @@ -478,7 +474,7 @@

    Source code for nlp_architect.data.babi_dialog

    return train_file, dev_file, test_file, cand_file, kb_file, vocab_file, vectorized_file

    -
    [docs] @staticmethod +
    [docs] @staticmethod def parse_dialog(fn, use_time=True, use_speaker_tag=True): """ Given a dialog file, parse into user and bot utterances, adding time and speaker tags. @@ -537,7 +533,7 @@

    Source code for nlp_architect.data.babi_dialog

    return all_dialogues

    -
    [docs] def words_to_vector(self, words): +
    [docs] def words_to_vector(self, words): """ Convert a list of words into vector form. @@ -550,7 +546,7 @@

    Source code for nlp_architect.data.babi_dialog

    return [self.word_to_index[w] if w in self.vocab else self.word_to_index[ '<OOV>'] for w in words]

    -
    [docs] def one_hot_vector(self, answer): +
    [docs] def one_hot_vector(self, answer): """ Create one-hot representation of an answer. @@ -564,7 +560,7 @@

    Source code for nlp_architect.data.babi_dialog

    vector[self.candidate_answers.index(answer)] = 1 return vector

    -
    [docs] def vectorize_stories(self, data): +
    [docs] def vectorize_stories(self, data): """ Convert (memory, user_utt, answer) word data into vectors. @@ -597,7 +593,7 @@

    Source code for nlp_architect.data.babi_dialog

    return (m, m_mask, u, a)

    -
    [docs] def vectorize_cands(self, data): +
    [docs] def vectorize_cands(self, data): """ Convert candidate answer word data into vectors. @@ -616,7 +612,7 @@

    Source code for nlp_architect.data.babi_dialog

    c = pad_sentences(c, self.max_cand_len) return c

    -
    [docs] def get_vocab(self, dialog): +
    [docs] def get_vocab(self, dialog): """ Compute vocabulary from the set of dialogs. """ @@ -647,7 +643,7 @@

    Source code for nlp_architect.data.babi_dialog

    vocab = list(set(all_words)) return vocab

    -
    [docs] def compute_statistics(self): +
    [docs] def compute_statistics(self): """ Compute vocab, word index, and max length of stories and queries. """ @@ -697,14 +693,14 @@

    Source code for nlp_architect.data.babi_dialog

    else: self.max_cand_len = self.max_cand_len_pre_match

    -
    [docs] @staticmethod +
    [docs] @staticmethod def clean_cands(cand): """ Remove leading line number and final newline from candidate answer """ return ' '.join(cand.split(' ')[1:]).replace('\n', '')
    -
    [docs] def load_candidate_answers(self): +
    [docs] def load_candidate_answers(self): """ Load candidate answers from file, compute number, and store for final softmax """ @@ -718,7 +714,7 @@

    Source code for nlp_architect.data.babi_dialog

    map(lambda x: x.split(' '), self.candidate_answers)) return candidate_answers_w

    -
    [docs] def process_interactive( +
    [docs] def process_interactive( self, line_in, context, @@ -802,7 +798,7 @@

    Source code for nlp_architect.data.babi_dialog

    return user_utt_pad, context, memory_pad, cands_mat, time_feat

    -
    [docs] def load_kb(self): +
    [docs] def load_kb(self): """ Load knowledge base from file, parse into entities and types """ @@ -820,7 +816,7 @@

    Source code for nlp_architect.data.babi_dialog

    return kb_ents_to_type

    -
    [docs] def create_match_maps(self): +
    [docs] def create_match_maps(self): """ Create dictionary mapping from each entity in the knowledge base to the set of indicies in the candidate_answers array that contain that entity. Will be used for @@ -840,7 +836,7 @@

    Source code for nlp_architect.data.babi_dialog

    return kb_ents_to_cand_idxs

    -
    [docs] def encode_match_feats(self): +
    [docs] def encode_match_feats(self): """ Replace entity names and match type names with indexes """ @@ -854,7 +850,7 @@

    Source code for nlp_architect.data.babi_dialog

    self.word_to_index[k]: v for k, v in self.match_type_idxs.items()}

    -
    [docs] def create_cands_mat(self, data_split, cache_match_type): +
    [docs] def create_cands_mat(self, data_split, cache_match_type): """ Add match type features to candidate answers for each example in the dataaset. Caches once complete. diff --git a/docs/_modules/nlp_architect/data/cdc_resources/data_types/wiki/wikipedia_page.html b/docs/_modules/nlp_architect/data/cdc_resources/data_types/wiki/wikipedia_page.html new file mode 100644 index 00000000..d52ba856 --- /dev/null +++ b/docs/_modules/nlp_architect/data/cdc_resources/data_types/wiki/wikipedia_page.html @@ -0,0 +1,314 @@ + + + + + + + + + + + nlp_architect.data.cdc_resources.data_types.wiki.wikipedia_page — NLP Architect by Intel® AI Lab 0.5 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    + + + + + +
    + +
    + + + + + + + + + + + + + + + + + +
    + +
      + +
    • Docs »
    • + +
    • Module code »
    • + +
    • nlp_architect.data.cdc_resources.data_types.wiki.wikipedia_page
    • + + +
    • + +
    • + +
    + + +
    +
    +
    +
    + +

    Source code for nlp_architect.data.cdc_resources.data_types.wiki.wikipedia_page

    +# ******************************************************************************
    +# Copyright 2017-2018 Intel Corporation
    +#
    +# Licensed under the Apache License, Version 2.0 (the "License");
    +# you may not use this file except in compliance with the License.
    +# You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +# ******************************************************************************
    +import re
    +from typing import Dict
    +
    +from nlp_architect.data.cdc_resources.data_types.wiki.wikipedia_page_extracted_relations import \
    +    WikipediaPageExtractedRelations, DISAMBIGUATION_TITLE
    +from nlp_architect.utils.string_utils import StringUtils
    +
    +
    +
    [docs]class WikipediaPage(object): + def __init__(self, orig_phrase: str = None, orig_phrase_norm: str = None, + wiki_title: str = None, wiki_title_norm: str = None, + score: int = 0, pageid: int = 0, description: str = None, + relations: WikipediaPageExtractedRelations = None) -> None: + """ + Object represent a Wikipedia Page and extracted fields. + + Args: + orig_phrase (str): original search phrase + orig_phrase_norm (str): original search phrase normalized + wiki_title (str): page title + wiki_title_norm (str): page title normalized + score (int): score for getting wiki_title from orig_phrase + pageid (int): the unique page identifier + description (str, optional): the page description + relations (WikipediaPageExtractedRelations): Object that represent all + extracted Wikipedia relations + """ + self.orig_phrase = orig_phrase + if orig_phrase_norm is None: + self.orig_phrase_norm = StringUtils.normalize_str(orig_phrase) + else: + self.orig_phrase_norm = orig_phrase_norm + + self.wiki_title = wiki_title.replace(DISAMBIGUATION_TITLE, '') + if wiki_title_norm is None: + self.wiki_title_norm = StringUtils.normalize_str(wiki_title) + else: + self.wiki_title_norm = wiki_title_norm + + self.score = score + self.pageid = int(pageid) + self.description = description + self.relations = relations + +
    [docs] def toJson(self) -> Dict: + result_dict = {} + result_dict['orig_phrase'] = self.orig_phrase + result_dict['orig_phrase_norm'] = self.orig_phrase_norm + result_dict['wiki_title'] = self.wiki_title + result_dict['wiki_title_norm'] = self.wiki_title_norm + result_dict['score'] = self.score + result_dict['pageid'] = self.pageid + result_dict['description'] = self.description + result_dict['relations'] = self.relations.toJson() + return result_dict
    + + def __eq__(self, other): + return self.orig_phrase == other.orig_phrase and self.wiki_title == other.wiki_title and \ + self.pageid == other.pageid + + def __hash__(self): + return hash(self.orig_phrase) + hash(self.pageid) + hash(self.wiki_title) + + def __str__(self) -> str: + result_str = '' + try: + title_strip = re.sub(u'(\u2018|\u2019)', '\'', self.orig_phrase) + wiki_title_strip = re.sub(u'(\u2018|\u2019)', '\'', self.wiki_title) + result_str = str(title_strip) + ', ' + str(wiki_title_strip) + ', ' + \ + str(self.score) + ', ' + str(self.pageid) + ', ' + \ + str(self.description) + ', ' + str(self.relations) + except Exception: + result_str = 'error in to_string()' + + return result_str
    +
    + +
    + +
    + + +
    +
    + +
    + +
    + + + + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/nlp_architect/data/cdc_resources/data_types/wiki/wikipedia_page_extracted_relations.html b/docs/_modules/nlp_architect/data/cdc_resources/data_types/wiki/wikipedia_page_extracted_relations.html new file mode 100644 index 00000000..34741de0 --- /dev/null +++ b/docs/_modules/nlp_architect/data/cdc_resources/data_types/wiki/wikipedia_page_extracted_relations.html @@ -0,0 +1,423 @@ + + + + + + + + + + + nlp_architect.data.cdc_resources.data_types.wiki.wikipedia_page_extracted_relations — NLP Architect by Intel® AI Lab 0.5 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    + + + + + +
    + +
    + + + + + + + + + + + + + + + + + +
    + +
      + +
    • Docs »
    • + +
    • Module code »
    • + +
    • nlp_architect.data.cdc_resources.data_types.wiki.wikipedia_page_extracted_relations
    • + + +
    • + +
    • + +
    + + +
    +
    +
    +
    + +

    Source code for nlp_architect.data.cdc_resources.data_types.wiki.wikipedia_page_extracted_relations

    +# ******************************************************************************
    +# Copyright 2017-2018 Intel Corporation
    +#
    +# Licensed under the Apache License, Version 2.0 (the "License");
    +# you may not use this file except in compliance with the License.
    +# You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +# ******************************************************************************
    +import re
    +import string
    +from typing import Set, Dict
    +
    +from nlp_architect.utils.string_utils import StringUtils
    +
    +PART_NAME_CATEGORIES = ['name', 'given name', 'surname']
    +DISAMBIGUATION_TITLE = '(disambiguation)'
    +DISAMBIGUATION_CATEGORY = ['disambig', 'disambiguation']
    +
    +
    +
    [docs]class WikipediaPageExtractedRelations(object): + def __init__(self, is_part_name: bool = False, is_disambiguation: bool = False, + parenthesis: Set[str] = None, + disambiguation_links: Set[str] = None, categories: Set[str] = None, + aliases: Set[str] = None, + be_comp: Set[str] = None, + disambiguation_links_norm: Set[str] = None, categories_norm: Set[str] = None, + aliases_norm: Set[str] = None, + title_parenthesis_norm: Set[str] = None, be_comp_norm: Set[str] = None) -> None: + """ + Object represent a Wikipedia Relations Schema + + Args: + is_part_name (bool): Weather page title is part of a Name (ie-family name/given name..) + is_disambiguation (bool): Weather page is a disambiguation page + parenthesis (set): a set of all parenthesis links/titles + disambiguation_links (set): a set of all disambiguation links/titles + categories (set): a set of all category links/titles + aliases (set): a set of all aliases links/titles + be_comp (set): a set of all "is a" links/titles + disambiguation_links_norm (set): same as disambiguation_link just normalized + categories_norm (set): same as categories just normalized, lower and clean + aliases_norm (set): same as aliases just normalized, lower and clean + title_parenthesis_norm (set): same as parenthesis just normalized, lower and clean + be_comp_norm (set): same as be_comp just normalized, lower and clean + """ + self.is_part_name = is_part_name + self.is_disambiguation = is_disambiguation + self.disambiguation_links = disambiguation_links + self.title_parenthesis = parenthesis + self.categories = categories + self.aliases = aliases + self.be_comp = be_comp + + self.disambiguation_links_norm = disambiguation_links_norm + self.categories_norm = categories_norm + self.aliases_norm = aliases_norm + self.title_parenthesis_norm = title_parenthesis_norm + self.be_comp_norm = be_comp_norm + +
    [docs] def extract_relations_from_text_v0(self, text): + self.disambiguation_links = set() + self.categories = set() + self.title_parenthesis = set() + + self.disambiguation_links_norm = set() + self.categories_norm = set() + self.title_parenthesis_norm = set() + self.be_comp_norm = set() + + ext_links = set() + title_parenthesis = set() + + text_lines = text.split('\n') + for line in text_lines: + cat_links = self.extract_categories(line) + if not self.is_part_name: + self.is_part_name = self.is_name_part(line) + if not self.is_part_name and [s for s in PART_NAME_CATEGORIES if s in cat_links]: + self.is_part_name = True + + self.categories.update(cat_links) + self.categories_norm.update(StringUtils.normalize_string_list(cat_links)) + + links, parenthesis_links = self.extract_links_and_parenthesis(line) + ext_links.update(links) + title_parenthesis.update(parenthesis_links) + + if self.is_disambiguation: + self.disambiguation_links = ext_links + self.disambiguation_links_norm = StringUtils.normalize_string_list(ext_links) + self.title_parenthesis = title_parenthesis + self.title_parenthesis_norm = StringUtils.normalize_string_list(title_parenthesis)
    + + def __str__(self) -> str: + return str(self.is_disambiguation) + ', ' + str(self.is_part_name) + ', ' + \ + str(self.disambiguation_links) + ', ' + str(self.be_comp) + ', ' + str( + self.title_parenthesis) + ', ' + str(self.categories) + +
    [docs] def toJson(self) -> Dict: + result_dict = dict() + result_dict['isPartName'] = self.is_part_name + result_dict['isDisambiguation'] = self.is_disambiguation + + if self.disambiguation_links is not None: + result_dict['disambiguationLinks'] = list(self.disambiguation_links) + result_dict['disambiguationLinksNorm'] = list(self.disambiguation_links_norm) + if self.categories is not None: + result_dict['categories'] = list(self.categories) + result_dict['categoriesNorm'] = list(self.categories_norm) + if self.aliases is not None: + result_dict['aliases'] = list(self.aliases) + if self.title_parenthesis is not None: + result_dict['titleParenthesis'] = list(self.title_parenthesis) + result_dict['titleParenthesisNorm'] = list(self.title_parenthesis_norm) + if self.be_comp_norm is not None: + result_dict['beCompRelations'] = list(self.be_comp) + result_dict['beCompRelationsNorm'] = list(self.be_comp_norm) + + return result_dict
    + +
    [docs] @staticmethod + def extract_categories(line: str) -> Set[str]: + categories = set() + category_form1 = re.findall(r'\[\[Category:(.*)\]\]', line) + for cat in category_form1: + if DISAMBIGUATION_TITLE in cat: + cat = cat.replace(DISAMBIGUATION_TITLE, '') + categories.add(cat) + + prog = re.search('^{{(disambig.*|Disambig.*)}}$', line) + if prog is not None: + category_form2 = prog.group(1) + cats = category_form2.split('|') + categories.update(cats) + + return categories
    + + + +
    [docs] @staticmethod + def is_name_part(line: str) -> bool: + line = line.lower() + val = False + if WikipediaPageExtractedRelations.find_in_line(line, '===as surname==='): + val = True + elif WikipediaPageExtractedRelations.find_in_line(line, '===as given name==='): + val = True + elif WikipediaPageExtractedRelations.find_in_line(line, '===given names==='): + val = True + elif WikipediaPageExtractedRelations.find_in_line(line, '==as a surname=='): + val = True + elif WikipediaPageExtractedRelations.find_in_line(line, '==people with the surname=='): + val = True + elif WikipediaPageExtractedRelations.find_in_line(line, '==family name and surname=='): + val = True + elif WikipediaPageExtractedRelations.find_in_line(line, 'category:given names'): + val = True + elif WikipediaPageExtractedRelations.find_in_line(line, '{{given name}}'): + val = True + return val
    + +
    [docs] @staticmethod + def find_in_line(text: str, pattern: str) -> bool: + found = re.findall(pattern, text) + if found: + return True + return False
    +
    + +
    + +
    + + +
    +
    + +
    + +
    + + + + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/nlp_architect/data/cdc_resources/data_types/wiki/wikipedia_pages.html b/docs/_modules/nlp_architect/data/cdc_resources/data_types/wiki/wikipedia_pages.html new file mode 100644 index 00000000..9d8e54ce --- /dev/null +++ b/docs/_modules/nlp_architect/data/cdc_resources/data_types/wiki/wikipedia_pages.html @@ -0,0 +1,328 @@ + + + + + + + + + + + nlp_architect.data.cdc_resources.data_types.wiki.wikipedia_pages — NLP Architect by Intel® AI Lab 0.5 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    + + + + + +
    + +
    + + + + + + + + + + + + + + + + + +
    + +
      + +
    • Docs »
    • + +
    • Module code »
    • + +
    • nlp_architect.data.cdc_resources.data_types.wiki.wikipedia_pages
    • + + +
    • + +
    • + +
    + + +
    +
    +
    +
    + +

    Source code for nlp_architect.data.cdc_resources.data_types.wiki.wikipedia_pages

    +# ******************************************************************************
    +# Copyright 2017-2018 Intel Corporation
    +#
    +# Licensed under the Apache License, Version 2.0 (the "License");
    +# you may not use this file except in compliance with the License.
    +# You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +# ******************************************************************************
    +
    +
    +
    [docs]class WikipediaPages(object): + def __init__(self): + """ + Object represent a set of Wikipedia Pages + """ + self.pages = set() + self.is_empty_norm_phrase = True + +
    [docs] def get_pages(self): + return self.pages
    + +
    [docs] def add_page(self, page): + self.pages.add(page) + if page.orig_phrase_norm is not None and page.orig_phrase_norm != '': + self.is_empty_norm_phrase = False
    + +
    [docs] def get_and_set_all_disambiguation(self): + all_disambiguations = [] + for page in self.pages: + if page.relations.disambiguation_links_norm is not None: + all_disambiguations.extend(page.relations.disambiguation_links_norm) + if page.relations.disambiguation_links is not None: + all_disambiguations.extend(page.relations.disambiguation_links) + return set(all_disambiguations)
    + +
    [docs] def get_and_set_all_categories(self): + all_categories = [] + for page in self.pages: + if page.relations.categories_norm is not None: + all_categories.extend(page.relations.categories_norm) + if page.relations.categories is not None: + all_categories.extend(page.relations.categories) + return set(all_categories)
    + +
    [docs] def get_and_set_all_aliases(self): + all_aliases = [] + for page in self.pages: + if page.relations.aliases_norm is not None: + all_aliases.extend(page.relations.aliases_norm) + if page.relations.aliases is not None: + all_aliases.extend(page.relations.aliases) + return set(all_aliases)
    + +
    [docs] def get_and_set_parenthesis(self): + all_parenthesis = [] + for page in self.pages: + if page.relations.title_parenthesis_norm is not None: + all_parenthesis.extend(page.relations.title_parenthesis_norm) + if page.relations.title_parenthesis is not None: + all_parenthesis.extend(page.relations.title_parenthesis) + return set(all_parenthesis)
    + +
    [docs] def get_and_set_be_comp(self): + all_be_comp = [] + for page in self.pages: + if page.relations.be_comp_norm is not None: + all_be_comp.extend(page.relations.be_comp_norm) + if page.relations.be_comp is not None: + all_be_comp.extend(page.relations.be_comp) + return set(all_be_comp)
    + +
    [docs] def get_and_set_titles(self): + all_titles = [] + for page in self.pages: + if page.orig_phrase != '': + all_titles.append(page.orig_phrase) + all_titles.append(page.orig_phrase_norm) + if page.wiki_title != '': + all_titles.append(page.wiki_title) + all_titles.append(page.wiki_title_norm) + return set(all_titles)
    + +
    [docs] def toJson(self): + result_dict = {} + page_list = [] + for page in self.pages: + page_list.append(page.toJson()) + + result_dict['pages'] = page_list + return result_dict
    + + def __str__(self) -> str: + result_str = '' + for page in self.pages: + result_str += str(page) + ', ' + + return result_str.strip()
    +
    + +
    + +
    + + +
    +
    + +
    + +
    + + + + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/nlp_architect/data/cdc_resources/data_types/wn/wordnet_page.html b/docs/_modules/nlp_architect/data/cdc_resources/data_types/wn/wordnet_page.html new file mode 100644 index 00000000..ee9bd1ed --- /dev/null +++ b/docs/_modules/nlp_architect/data/cdc_resources/data_types/wn/wordnet_page.html @@ -0,0 +1,295 @@ + + + + + + + + + + + nlp_architect.data.cdc_resources.data_types.wn.wordnet_page — NLP Architect by Intel® AI Lab 0.5 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    + + + + + +
    + +
    + + + + + + + + + + + + + + + + + +
    + +
      + +
    • Docs »
    • + +
    • Module code »
    • + +
    • nlp_architect.data.cdc_resources.data_types.wn.wordnet_page
    • + + +
    • + +
    • + +
    + + +
    +
    +
    +
    + +

    Source code for nlp_architect.data.cdc_resources.data_types.wn.wordnet_page

    +# ******************************************************************************
    +# Copyright 2017-2018 Intel Corporation
    +#
    +# Licensed under the Apache License, Version 2.0 (the "License");
    +# you may not use this file except in compliance with the License.
    +# You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +# ******************************************************************************
    +from typing import Set, Dict
    +
    +
    +
    [docs]class WordnetPage(object): + def __init__(self, orig_phrase: str, clean_phrase: str, head: str, head_lemma: str, + head_synonyms: Set[str], head_lemma_synonyms: Set[str], + head_derivationally: Set[str], head_lemma_derivationally: Set[str], + all_clean_words_synonyms: Set[str]) -> None: + """ + Object represent a Wikipedia Page and extracted fields. + + Args: + orig_phrase (str): original search phrase + clean_phrase (str): original search phrase normalized + head (str): page title head + head_lemma (str): page title head lemma + head_synonyms (set): head synonyms words extracted from wordnet + head_lemma_synonyms (set): head lemma synonyms words extracted from wordnet + head_derivationally (set): wordnet head derivationally_related_forms() + head_lemma_derivationally (set): wordnet head lemma derivationally_related_forms() + all_clean_words_synonyms (set): clean_phrase wordnet synonyms + """ + self.orig_phrase = orig_phrase + self.clean_phrase = clean_phrase + self.head = head + self.head_lemma = head_lemma + self.head_synonyms = head_synonyms + self.head_lemma_synonyms = head_lemma_synonyms + self.head_derivationally = head_derivationally + self.head_lemma_derivationally = head_lemma_derivationally + self.all_clean_words_synonyms = all_clean_words_synonyms + + def __eq__(self, other): + return self.orig_phrase == other.orig_phrase and self.head == other.head and \ + self.head_lemma == other.head_lemma + + def __hash__(self): + return hash(self.orig_phrase) + hash(self.head) + hash(self.head_lemma) + +
    [docs] def toJson(self) -> Dict: + result_dict = dict() + result_dict['orig_phrase'] = self.orig_phrase + result_dict['clean_phrase'] = self.clean_phrase + result_dict['head'] = self.head + result_dict['head_lemma'] = self.head_lemma + result_dict['head_synonyms'] = list(self.head_synonyms) + result_dict['head_lemma_synonyms'] = list(self.head_lemma_synonyms) + result_dict['head_derivationally'] = list(self.head_derivationally) + result_dict['head_lemma_derivationally'] = list(self.head_lemma_derivationally) + if self.all_clean_words_synonyms is not None: + all_as_list = [] + for set_ in self.all_clean_words_synonyms: + all_as_list.append(list(set_)) + result_dict['all_clean_words_synonyms'] = all_as_list + + return result_dict
    +
    + +
    + +
    + + +
    +
    + +
    + +
    + + + + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/nlp_architect/data/cdc_resources/embedding/embed_elmo.html b/docs/_modules/nlp_architect/data/cdc_resources/embedding/embed_elmo.html new file mode 100644 index 00000000..d8116de5 --- /dev/null +++ b/docs/_modules/nlp_architect/data/cdc_resources/embedding/embed_elmo.html @@ -0,0 +1,337 @@ + + + + + + + + + + + nlp_architect.data.cdc_resources.embedding.embed_elmo — NLP Architect by Intel® AI Lab 0.5 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    + + + + + +
    + +
    + + + + + + + + + + + + + + + + + +
    + +
      + +
    • Docs »
    • + +
    • Module code »
    • + +
    • nlp_architect.data.cdc_resources.embedding.embed_elmo
    • + + +
    • + +
    • + +
    + + +
    +
    +
    +
    + +

    Source code for nlp_architect.data.cdc_resources.embedding.embed_elmo

    +# ******************************************************************************
    +# Copyright 2017-2018 Intel Corporation
    +#
    +# Licensed under the Apache License, Version 2.0 (the "License");
    +# you may not use this file except in compliance with the License.
    +# You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +# ******************************************************************************
    +
    +import logging
    +import pickle
    +from typing import List
    +
    +import numpy as np
    +
    +from nlp_architect.common.cdc.mention_data import MentionDataLight
    +from nlp_architect.utils.embedding import ELMoEmbedderTFHUB
    +
    +logger = logging.getLogger(__name__)
    +
    +
    +
    [docs]class ElmoEmbedding(object): + def __init__(self): + logger.info('Loading Elmo Embedding module') + self.embeder = ELMoEmbedderTFHUB() + self.cache = dict() + logger.info('Elmo Embedding module lead successfully') + +
    [docs] def get_head_feature_vector(self, mention: MentionDataLight): + if mention.mention_context is not None and mention.mention_context: + sentence = ' '.join(mention.mention_context) + return self.apply_get_from_cache(sentence, True, mention.tokens_number) + + sentence = mention.tokens_str + return self.apply_get_from_cache(sentence, False, [])
    + +
    [docs] def apply_get_from_cache(self, sentence: str, context: bool = False, indexs: List[int] = None): + if context and indexs is not None: + if sentence in self.cache: + elmo_full_vec = self.cache[sentence] + else: + elmo_full_vec = self.embeder.get_vector(sentence.split()) + self.cache[sentence] = elmo_full_vec + + elmo_ret_vec = self.get_mention_vec_from_sent(elmo_full_vec, indexs) + else: + if sentence in self.cache: + elmo_ret_vec = self.cache[sentence] + else: + elmo_ret_vec = self.get_elmo_avg(sentence.split()) + self.cache[sentence] = elmo_ret_vec + + return elmo_ret_vec
    + +
    [docs] def get_avrg_feature_vector(self, tokens_str): + if tokens_str is not None: + return self.apply_get_from_cache(tokens_str) + return None
    + +
    [docs] def get_elmo_avg(self, sentence): + sentence_embedding = self.embeder.get_vector(sentence) + return np.mean(sentence_embedding, axis=0)
    + +
    [docs] @staticmethod + def get_mention_vec_from_sent(sent_vec, indexs): + if len(indexs) > 1: + elmo_ret_vec = np.mean(sent_vec[indexs[0]: indexs[-1] + 1], axis=0) + else: + elmo_ret_vec = sent_vec[indexs[0]] + + return elmo_ret_vec
    + + +
    [docs]class ElmoEmbeddingOffline(object): + def __init__(self, dump_file): + logger.info('Loading Elmo Offline Embedding module') + + if dump_file is not None: + with open(dump_file, 'rb') as out: + self.embeder = pickle.load(out) + else: + logger.warning('Elmo Offline without loaded embeder!') + + logger.info('Elmo Offline Embedding module lead successfully') + +
    [docs] def get_head_feature_vector(self, mention: MentionDataLight): + embed = None + if mention.mention_context is not None and mention.mention_context: + sentence = ' '.join(mention.mention_context) + if sentence in self.embeder: + elmo_full_vec = self.embeder[sentence] + return ElmoEmbedding.get_mention_vec_from_sent( + elmo_full_vec, mention.tokens_number) + + sentence = mention.tokens_str + if sentence in self.embeder: + embed = self.embeder[sentence] + + return embed
    + +
    [docs] def get_avrg_feature_vector(self, tokens_str): + embed = None + if tokens_str in self.embeder: + embed = self.embeder[tokens_str] + + return embed
    +
    + +
    + +
    + + +
    +
    + +
    + +
    + + + + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/nlp_architect/data/cdc_resources/embedding/embed_glove.html b/docs/_modules/nlp_architect/data/cdc_resources/embedding/embed_glove.html new file mode 100644 index 00000000..a93e4e5b --- /dev/null +++ b/docs/_modules/nlp_architect/data/cdc_resources/embedding/embed_glove.html @@ -0,0 +1,303 @@ + + + + + + + + + + + nlp_architect.data.cdc_resources.embedding.embed_glove — NLP Architect by Intel® AI Lab 0.5 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    + + + + + +
    + +
    + + + + + + + + + + + + + + + + + +
    + +
      + +
    • Docs »
    • + +
    • Module code »
    • + +
    • nlp_architect.data.cdc_resources.embedding.embed_glove
    • + + +
    • + +
    • + +
    + + +
    +
    +
    +
    + +

    Source code for nlp_architect.data.cdc_resources.embedding.embed_glove

    +# ******************************************************************************
    +# Copyright 2017-2018 Intel Corporation
    +#
    +# Licensed under the Apache License, Version 2.0 (the "License");
    +# you may not use this file except in compliance with the License.
    +# You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +# ******************************************************************************
    +
    +import logging
    +import pickle
    +
    +import numpy as np
    +
    +from nlp_architect.common.cdc.mention_data import MentionDataLight
    +
    +logger = logging.getLogger(__name__)
    +
    +
    +
    [docs]class GloveEmbedding(object): + def __init__(self, glove_file): + logger.info('Loading Glove Online Embedding module, This my take a while...') + self.word_to_ix, self.word_embeddings = self.load_glove_for_vocab(glove_file) + logger.info('Glove Offline Embedding module lead successfully') + +
    [docs] @staticmethod + def load_glove_for_vocab(glove_filename): + vocab = [] + embd = [] + with open(glove_filename) as glove_file: + for line in glove_file: + row = line.strip().split(' ') + word = row[0] + vocab.append(word) + embd.append(row[1:]) + + embeddings = np.asarray(embd, dtype=float) + word_to_ix = {word: i for i, word in enumerate(vocab)} + return word_to_ix, embeddings
    + + +
    [docs]class GloveEmbeddingOffline(object): + def __init__(self, embed_resources): + logger.info('Loading Glove Offline Embedding module') + with open(embed_resources, 'rb') as out: + self.word_to_ix, self.word_embeddings = pickle.load(out, encoding='latin1') + logger.info('Glove Offline Embedding module lead successfully') + +
    [docs] def get_feature_vector(self, mention: MentionDataLight): + embed = None + head = mention.mention_head + lemma = mention.mention_head_lemma + if head in self.word_to_ix: + embed = self.word_embeddings[self.word_to_ix[head]] + elif lemma in self.word_to_ix: + embed = self.word_embeddings[self.word_to_ix[lemma]] + + return embed
    + +
    [docs] def get_avrg_feature_vector(self, tokens_str): + embed = np.zeros(300, dtype=np.float64) + mention_size = 0 + for token in tokens_str.split(): + if token in self.word_to_ix: + token_embed = self.word_embeddings[self.word_to_ix[token]] + embed = np.add(embed, token_embed) + mention_size += 1 + + if mention_size == 0: + mention_size = 1 + + return np.true_divide(embed, mention_size)
    +
    + +
    + +
    + + +
    +
    + +
    + +
    + + + + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/nlp_architect/data/cdc_resources/gen_scripts/create_word_embed_elmo_dump.html b/docs/_modules/nlp_architect/data/cdc_resources/gen_scripts/create_word_embed_elmo_dump.html new file mode 100644 index 00000000..375c35fd --- /dev/null +++ b/docs/_modules/nlp_architect/data/cdc_resources/gen_scripts/create_word_embed_elmo_dump.html @@ -0,0 +1,311 @@ + + + + + + + + + + + nlp_architect.data.cdc_resources.gen_scripts.create_word_embed_elmo_dump — NLP Architect by Intel® AI Lab 0.5 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    + + + + + +
    + +
    + + + + + + + + + + + + + + + + + +
    + +
      + +
    • Docs »
    • + +
    • Module code »
    • + +
    • nlp_architect.data.cdc_resources.gen_scripts.create_word_embed_elmo_dump
    • + + +
    • + +
    • + +
    + + +
    +
    +
    +
    + +

    Source code for nlp_architect.data.cdc_resources.gen_scripts.create_word_embed_elmo_dump

    +# ******************************************************************************
    +# Copyright 2017-2018 Intel Corporation
    +#
    +# Licensed under the Apache License, Version 2.0 (the "License");
    +# you may not use this file except in compliance with the License.
    +# You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +# ******************************************************************************
    +import argparse
    +import logging
    +import os
    +import pickle
    +from os.path import join
    +
    +from nlp_architect.common.cdc.mention_data import MentionData
    +from nlp_architect.data.cdc_resources.embedding.embed_elmo import ElmoEmbedding
    +from nlp_architect.utils import io
    +
    +logging.basicConfig(level=logging.DEBUG)
    +logger = logging.getLogger(__name__)
    +
    +
    +
    [docs]def load_elmo_for_vocab(mentions): + """ + Create the embedding using the cache logic in the embedding class + Args: + mentions: + + Returns: + + """ + elmo_embeddings = ElmoEmbedding() + + for mention in mentions: + elmo_embeddings.get_head_feature_vector(mention) + + logger.info('Total words/contexts in vocabulary %d', len(elmo_embeddings.cache)) + return elmo_embeddings.cache
    + + +
    [docs]def elmo_dump(): + out_file = args.output + mention_files = list() + if os.path.isdir(args.mentions): + for (dirpath, _, files) in os.walk(args.mentions): + for file in files: + if file == '.DS_Store': + continue + + mention_files.append(join(dirpath, file)) + else: + mention_files.append(args.mentions) + + mentions = [] + for _file in mention_files: + mentions.extend(MentionData.read_mentions_json_to_mentions_data_list(_file)) + + elmo_ecb_embeddings = load_elmo_for_vocab(mentions) + + with open(out_file, 'wb') as f: + pickle.dump(elmo_ecb_embeddings, f) + + logger.info('Saving dump to file-%s', out_file)
    + + +if __name__ == '__main__': + parser = argparse.ArgumentParser(description='Create Elmo Embedding dataset only dump') + parser.add_argument('--mentions', type=str, help='mentions_file file', required=True) + parser.add_argument('--output', type=str, help='location were to create dump file', + required=True) + + args = parser.parse_args() + + if os.path.isdir(args.mentions): + io.validate_existing_directory(args.mentions) + else: + io.validate_existing_filepath(args.mentions) + + elmo_dump() + print('Done!') +
    + +
    + +
    + + +
    +
    + +
    + +
    + + + + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/nlp_architect/data/cdc_resources/relations/computed_relation_extraction.html b/docs/_modules/nlp_architect/data/cdc_resources/relations/computed_relation_extraction.html index f0fba6c5..04d917a2 100644 --- a/docs/_modules/nlp_architect/data/cdc_resources/relations/computed_relation_extraction.html +++ b/docs/_modules/nlp_architect/data/cdc_resources/relations/computed_relation_extraction.html @@ -8,7 +8,7 @@ - nlp_architect.data.cdc_resources.relations.computed_relation_extraction — NLP Architect by Intel® AI Lab 0.4.post2 documentation + nlp_architect.data.cdc_resources.relations.computed_relation_extraction — NLP Architect by Intel® AI Lab 0.5 documentation @@ -25,6 +25,8 @@ + + @@ -33,8 +35,9 @@ - - + + + @@ -55,7 +58,7 @@ - + @@ -81,31 +84,27 @@

    NLP/NLU Models

    +

    Optimized Models

    +

    Solutions

    @@ -114,14 +113,11 @@
  • Set Expansion
  • Trend Analysis
  • -

    Pipelines

    -

    For Developers

    @@ -214,11 +210,11 @@

    Source code for nlp_architect.data.cdc_resources.relations.computed_relation logger = logging.getLogger(__name__) -
    [docs]class ComputedRelationExtraction(RelationExtraction): +
    [docs]class ComputedRelationExtraction(RelationExtraction): """ Extract Relation between two mentions according to computation and rule based algorithms """ -
    [docs] def extract_all_relations(self, mention_x: MentionDataLight, +
    [docs] def extract_all_relations(self, mention_x: MentionDataLight, mention_y: MentionDataLight) -> Set[RelationType]: """ Try to find if mentions has anyone or more of the relations this class support @@ -259,7 +255,7 @@

    Source code for nlp_architect.data.cdc_resources.relations.computed_relation return relations

    -
    [docs] def extract_sub_relations(self, mention_x: MentionDataLight, mention_y: MentionDataLight, +
    [docs] def extract_sub_relations(self, mention_x: MentionDataLight, mention_y: MentionDataLight, relation: RelationType) -> RelationType: """ Check if input mentions has the given relation between them @@ -293,7 +289,7 @@

    Source code for nlp_architect.data.cdc_resources.relations.computed_relation return RelationType.NO_RELATION_FOUND

    -
    [docs] @staticmethod +
    [docs] @staticmethod def extract_same_head_lemma(mention_x: MentionDataLight, mention_y: MentionDataLight) -> RelationType: """ @@ -315,7 +311,7 @@

    Source code for nlp_architect.data.cdc_resources.relations.computed_relation return RelationType.SAME_HEAD_LEMMA return RelationType.NO_RELATION_FOUND

    -
    [docs] @staticmethod +
    [docs] @staticmethod def extract_fuzzy_head_fit(mention_x: MentionDataLight, mention_y: MentionDataLight) -> RelationType: """ @@ -339,7 +335,7 @@

    Source code for nlp_architect.data.cdc_resources.relations.computed_relation return RelationType.FUZZY_HEAD_FIT return RelationType.NO_RELATION_FOUND

    -
    [docs] @staticmethod +
    [docs] @staticmethod def extract_fuzzy_fit(mention_x: MentionDataLight, mention_y: MentionDataLight) -> RelationType: """ @@ -371,7 +367,7 @@

    Source code for nlp_architect.data.cdc_resources.relations.computed_relation relation = RelationType.FUZZY_FIT return relation

    -
    [docs] @staticmethod +
    [docs] @staticmethod def extract_exact_string(mention_x: MentionDataLight, mention_y: MentionDataLight) -> RelationType: """ @@ -396,7 +392,7 @@

    Source code for nlp_architect.data.cdc_resources.relations.computed_relation return relation

    -
    [docs] @staticmethod +
    [docs] @staticmethod def get_supported_relations() -> List[RelationType]: """ Return all supported relations by this class diff --git a/docs/_modules/nlp_architect/data/cdc_resources/relations/referent_dict_relation_extraction.html b/docs/_modules/nlp_architect/data/cdc_resources/relations/referent_dict_relation_extraction.html index 940c6582..46b24a0a 100644 --- a/docs/_modules/nlp_architect/data/cdc_resources/relations/referent_dict_relation_extraction.html +++ b/docs/_modules/nlp_architect/data/cdc_resources/relations/referent_dict_relation_extraction.html @@ -8,7 +8,7 @@ - nlp_architect.data.cdc_resources.relations.referent_dict_relation_extraction — NLP Architect by Intel® AI Lab 0.4.post2 documentation + nlp_architect.data.cdc_resources.relations.referent_dict_relation_extraction — NLP Architect by Intel® AI Lab 0.5 documentation @@ -25,6 +25,8 @@ + + @@ -33,8 +35,9 @@ - - + + + @@ -55,7 +58,7 @@ - + @@ -81,31 +84,27 @@

    NLP/NLU Models

    +

    Optimized Models

    +

    Solutions

    @@ -114,14 +113,11 @@
  • Set Expansion
  • Trend Analysis
  • -

    Pipelines

    -

    For Developers

    @@ -215,8 +211,8 @@

    Source code for nlp_architect.data.cdc_resources.relations.referent_dict_rel logger = logging.getLogger(__name__) -
    [docs]class ReferentDictRelationExtraction(RelationExtraction): -
    [docs] def __init__(self, method: OnlineOROfflineMethod = OnlineOROfflineMethod.ONLINE, +
    [docs]class ReferentDictRelationExtraction(RelationExtraction): + def __init__(self, method: OnlineOROfflineMethod = OnlineOROfflineMethod.ONLINE, ref_dict: str = None): """ Extract Relation between two mentions according to Referent Dictionary knowledge @@ -236,15 +232,15 @@

    Source code for nlp_architect.data.cdc_resources.relations.referent_dict_rel else: raise FileNotFoundError('Referent Dict file not found or not in path:' + ref_dict) - super(ReferentDictRelationExtraction, self).__init__()

    + super(ReferentDictRelationExtraction, self).__init__() -
    [docs] def extract_all_relations(self, mention_x: MentionDataLight, +
    [docs] def extract_all_relations(self, mention_x: MentionDataLight, mention_y: MentionDataLight) -> Set[RelationType]: ret_ = set() ret_.add(self.extract_sub_relations(mention_x, mention_y, RelationType.REFERENT_DICT)) return ret_
    -
    [docs] def extract_sub_relations(self, mention_x: MentionDataLight, mention_y: MentionDataLight, +
    [docs] def extract_sub_relations(self, mention_x: MentionDataLight, mention_y: MentionDataLight, relation: RelationType) -> RelationType: """ Check if input mentions has the given relation between them @@ -272,7 +268,7 @@

    Source code for nlp_architect.data.cdc_resources.relations.referent_dict_rel return RelationType.NO_RELATION_FOUND

    -
    [docs] def is_referent_dict(self, mention_x: MentionDataLight, mention_y: MentionDataLight) -> bool: +
    [docs] def is_referent_dict(self, mention_x: MentionDataLight, mention_y: MentionDataLight) -> bool: """ Check if input mentions has referent dictionary relation between them @@ -304,7 +300,7 @@

    Source code for nlp_architect.data.cdc_resources.relations.referent_dict_rel return match_result

    -
    [docs] @staticmethod +
    [docs] @staticmethod def get_supported_relations(): """ Return all supported relations by this class @@ -314,7 +310,7 @@

    Source code for nlp_architect.data.cdc_resources.relations.referent_dict_rel """ return [RelationType.REFERENT_DICT]

    -
    [docs] @staticmethod +
    [docs] @staticmethod def load_reference_dict(dict_fname: str) -> Dict[str, List[str]]: """ Method to load referent dictionary to memory diff --git a/docs/_modules/nlp_architect/data/cdc_resources/relations/relation_extraction.html b/docs/_modules/nlp_architect/data/cdc_resources/relations/relation_extraction.html index 4cf17553..ec5b6f51 100644 --- a/docs/_modules/nlp_architect/data/cdc_resources/relations/relation_extraction.html +++ b/docs/_modules/nlp_architect/data/cdc_resources/relations/relation_extraction.html @@ -8,7 +8,7 @@ - nlp_architect.data.cdc_resources.relations.relation_extraction — NLP Architect by Intel® AI Lab 0.4.post2 documentation + nlp_architect.data.cdc_resources.relations.relation_extraction — NLP Architect by Intel® AI Lab 0.5 documentation @@ -25,6 +25,8 @@ + + @@ -33,8 +35,9 @@ - - + + + @@ -55,7 +58,7 @@ - + @@ -81,31 +84,27 @@

    NLP/NLU Models

    +

    Optimized Models

    +

    Solutions

    @@ -114,14 +113,11 @@
  • Set Expansion
  • Trend Analysis
  • -

    Pipelines

    -

    For Developers

    @@ -206,11 +202,11 @@

    Source code for nlp_architect.data.cdc_resources.relations.relation_extracti from nlp_architect.data.cdc_resources.relations.relation_types_enums import RelationType -class RelationExtraction(object): +
    [docs]class RelationExtraction(object): def __init__(self): pass - def extract_relation(self, mention_x: MentionDataLight, mention_y: MentionDataLight, +
    [docs] def extract_relation(self, mention_x: MentionDataLight, mention_y: MentionDataLight, relation: RelationType) -> RelationType: """ Base Class Check if Sub class support given relation before executing the sub class @@ -227,15 +223,15 @@

    Source code for nlp_architect.data.cdc_resources.relations.relation_extracti ret_relation = RelationType.NO_RELATION_FOUND if relation in self.get_supported_relations(): ret_relation = self.extract_sub_relations(mention_x, mention_y, relation) - return ret_relation + return ret_relation

    - def extract_sub_relations(self, mention_x: MentionDataLight, mention_y: MentionDataLight, +
    [docs] def extract_sub_relations(self, mention_x: MentionDataLight, mention_y: MentionDataLight, relation: RelationType) -> RelationType: - raise NotImplementedError + raise NotImplementedError
    - @staticmethod +
    [docs] @staticmethod def get_supported_relations() -> List[RelationType]: - raise NotImplementedError + raise NotImplementedError

    diff --git a/docs/_modules/nlp_architect/data/cdc_resources/relations/relation_types_enums.html b/docs/_modules/nlp_architect/data/cdc_resources/relations/relation_types_enums.html index 6f73bc3e..9a1ec37b 100644 --- a/docs/_modules/nlp_architect/data/cdc_resources/relations/relation_types_enums.html +++ b/docs/_modules/nlp_architect/data/cdc_resources/relations/relation_types_enums.html @@ -8,7 +8,7 @@ - nlp_architect.data.cdc_resources.relations.relation_types_enums — NLP Architect by Intel® AI Lab 0.4.post2 documentation + nlp_architect.data.cdc_resources.relations.relation_types_enums — NLP Architect by Intel® AI Lab 0.5 documentation @@ -25,6 +25,8 @@ + + @@ -33,8 +35,9 @@ - - + + + @@ -55,7 +58,7 @@ - + @@ -81,31 +84,27 @@

    NLP/NLU Models

    +

    Optimized Models

    +

    Solutions

    @@ -114,14 +113,11 @@
  • Set Expansion
  • Trend Analysis
  • -

    Pipelines

    -

    For Developers

    @@ -204,25 +200,25 @@

    Source code for nlp_architect.data.cdc_resources.relations.relation_types_en from enum import Enum -class EmbeddingMethod(Enum): +
    [docs]class EmbeddingMethod(Enum): GLOVE = 'glove' GLOVE_OFFLINE = 'glove_offline' ELMO = 'elmo' - ELMO_OFFLINE = 'elmo_offline' + ELMO_OFFLINE = 'elmo_offline'
    -class WikipediaSearchMethod(Enum): +
    [docs]class WikipediaSearchMethod(Enum): ONLINE = 'online' OFFLINE = 'offline' - ELASTIC = 'elastic' + ELASTIC = 'elastic'
    -class OnlineOROfflineMethod(Enum): +
    [docs]class OnlineOROfflineMethod(Enum): ONLINE = 'online' - OFFLINE = 'offline' + OFFLINE = 'offline'
    -
    [docs]class RelationType(Enum): +
    [docs]class RelationType(Enum): NO_RELATION_FOUND = 0 WIKIPEDIA_REDIRECT_LINK = 1 WIKIPEDIA_ALIASES = 2 diff --git a/docs/_modules/nlp_architect/data/cdc_resources/relations/verbocean_relation_extraction.html b/docs/_modules/nlp_architect/data/cdc_resources/relations/verbocean_relation_extraction.html index 2b05d5e8..d31cd0f1 100644 --- a/docs/_modules/nlp_architect/data/cdc_resources/relations/verbocean_relation_extraction.html +++ b/docs/_modules/nlp_architect/data/cdc_resources/relations/verbocean_relation_extraction.html @@ -8,7 +8,7 @@ - nlp_architect.data.cdc_resources.relations.verbocean_relation_extraction — NLP Architect by Intel® AI Lab 0.4.post2 documentation + nlp_architect.data.cdc_resources.relations.verbocean_relation_extraction — NLP Architect by Intel® AI Lab 0.5 documentation @@ -25,6 +25,8 @@ + + @@ -33,8 +35,9 @@ - - + + + @@ -55,7 +58,7 @@ - + @@ -81,31 +84,27 @@

    NLP/NLU Models

    +

    Optimized Models

    +

    Solutions

    @@ -114,14 +113,11 @@
  • Set Expansion
  • Trend Analysis
  • -

    Pipelines

    -

    For Developers

    @@ -215,8 +211,8 @@

    Source code for nlp_architect.data.cdc_resources.relations.verbocean_relatio logger = logging.getLogger(__name__) -
    [docs]class VerboceanRelationExtraction(RelationExtraction): -
    [docs] def __init__(self, method: OnlineOROfflineMethod = OnlineOROfflineMethod.ONLINE, +
    [docs]class VerboceanRelationExtraction(RelationExtraction): + def __init__(self, method: OnlineOROfflineMethod = OnlineOROfflineMethod.ONLINE, vo_file: str = None): """ Extract Relation between two mentions according to VerbOcean knowledge @@ -235,15 +231,15 @@

    Source code for nlp_architect.data.cdc_resources.relations.verbocean_relatio logger.info('Verb Ocean module lead successfully') else: raise FileNotFoundError('VerbOcean file not found or not in path..') - super(VerboceanRelationExtraction, self).__init__()

    + super(VerboceanRelationExtraction, self).__init__() -
    [docs] def extract_all_relations(self, mention_x: MentionDataLight, +
    [docs] def extract_all_relations(self, mention_x: MentionDataLight, mention_y: MentionDataLight) -> Set[RelationType]: ret_ = set() ret_.add(self.extract_sub_relations(mention_x, mention_y, RelationType.VERBOCEAN_MATCH)) return ret_
    -
    [docs] def extract_sub_relations(self, mention_x: MentionDataLight, mention_y: MentionDataLight, +
    [docs] def extract_sub_relations(self, mention_x: MentionDataLight, mention_y: MentionDataLight, relation: RelationType) -> RelationType: """ Check if input mentions has the given relation between them @@ -271,7 +267,7 @@

    Source code for nlp_architect.data.cdc_resources.relations.verbocean_relatio return RelationType.NO_RELATION_FOUND

    -
    [docs] def is_verbocean_relation(self, mention_x: MentionDataLight, +
    [docs] def is_verbocean_relation(self, mention_x: MentionDataLight, mention_y: MentionDataLight) -> bool: """ Check if input mentions has VerbOcean relation between them @@ -300,7 +296,7 @@

    Source code for nlp_architect.data.cdc_resources.relations.verbocean_relatio return match_result

    -
    [docs] @staticmethod +
    [docs] @staticmethod def get_supported_relations(): """ Return all supported relations by this class @@ -310,7 +306,7 @@

    Source code for nlp_architect.data.cdc_resources.relations.verbocean_relatio """ return [RelationType.VERBOCEAN_MATCH]

    -
    [docs] @staticmethod +
    [docs] @staticmethod def load_verbocean_file(fname: str) -> Dict[str, Dict[str, str]]: """ Method to load referent dictionary to memory diff --git a/docs/_modules/nlp_architect/data/cdc_resources/relations/wikipedia_relation_extraction.html b/docs/_modules/nlp_architect/data/cdc_resources/relations/wikipedia_relation_extraction.html index b19706ec..01e1bb2c 100644 --- a/docs/_modules/nlp_architect/data/cdc_resources/relations/wikipedia_relation_extraction.html +++ b/docs/_modules/nlp_architect/data/cdc_resources/relations/wikipedia_relation_extraction.html @@ -8,7 +8,7 @@ - nlp_architect.data.cdc_resources.relations.wikipedia_relation_extraction — NLP Architect by Intel® AI Lab 0.4.post2 documentation + nlp_architect.data.cdc_resources.relations.wikipedia_relation_extraction — NLP Architect by Intel® AI Lab 0.5 documentation @@ -25,6 +25,8 @@ + + @@ -33,8 +35,9 @@ - - + + + @@ -55,7 +58,7 @@ - + @@ -81,31 +84,27 @@

    NLP/NLU Models

    +

    Optimized Models

    +

    Solutions

    @@ -114,14 +113,11 @@
  • Set Expansion
  • Trend Analysis
  • -

    Pipelines

    -

    For Developers

    @@ -220,8 +216,8 @@

    Source code for nlp_architect.data.cdc_resources.relations.wikipedia_relatio logger = logging.getLogger(__name__) -
    [docs]class WikipediaRelationExtraction(RelationExtraction): -
    [docs] def __init__(self, method: WikipediaSearchMethod = WikipediaSearchMethod.ONLINE, +
    [docs]class WikipediaRelationExtraction(RelationExtraction): + def __init__(self, method: WikipediaSearchMethod = WikipediaSearchMethod.ONLINE, wiki_file: str = None, host: str = None, port: int = None, index: str = None, filter_pronouns: bool = True, filter_time_data: bool = True) -> None: @@ -252,9 +248,9 @@

    Source code for nlp_architect.data.cdc_resources.relations.wikipedia_relatio self.pywiki_impl = WikiElastic(host, port, index) logger.info('Wikipedia module lead successfully') - super(WikipediaRelationExtraction, self).__init__()

    + super(WikipediaRelationExtraction, self).__init__() -