Merge pull request #479 from microsoft/staging

Staging
microsoft · Nov 18, 2019 · 967abcd · 967abcd
2 parents a2ac143 + 5647db7
commit 967abcd
Show file tree

Hide file tree

Showing 62 changed files with 7,659 additions and 2,935 deletions.
diff --git a/NOTICE.txt b/NOTICE.txt
@@ -18,7 +18,7 @@ General Public License.
 
 --
 
-https://github.com/huggingface/pytorch-transformers
+https://github.com/huggingface/transformers
 
                                  Apache License
                            Version 2.0, January 2004
@@ -664,3 +664,5 @@ https://github.com/allenai/bi-att-flow
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
+
+
diff --git a/README.md b/README.md
@@ -48,13 +48,13 @@ The following is a summary of the commonly used NLP scenarios covered in the rep
 
 | Scenario                              |  Models | Description|Languages|
 |-------------------------|  ------------------- |-------|---|
-|Text Classification                     |BERT <br> XLNet| Text classification is a supervised learning method of learning and predicting the category or the class of a document given its text content. |English, Hindi, Arabic|
+|Text Classification                     |BERT, XLNet, RoBERTa| Text classification is a supervised learning method of learning and predicting the category or the class of a document given its text content. |English, Hindi, Arabic|
 |Named Entity Recognition                |BERT| Named entity recognition (NER) is the task of classifying words or key phrases of a text into predefined entities of interest. |English|
-|Entailment                              |BERT| Textual entailment is the task of classifying the binary relation between two natural-language texts,  ‘text’ and ‘hypothesis’,  to determine if the `text' agrees with the `hypothesis` or not. |English|
-|Question Answering                      |BiDAF <br> BERT| Question answering (QA) is the task of retrieving or generating a valid answer for a given query in natural language, provided with a passage related to the query. |English|
-|Sentence Similarity                     |Representation: TF-IDF, Word Embeddings, Doc Embeddings<br>Metrics: Cosine Similarity, Word Mover's Distance<br>Models: BERT, GenSen| Sentence similarity is the process of computing a similarity score given a pair of text documents. |English|
+|Entailment                              |BERT, XLNet, RoBERTa| Textual entailment is the task of classifying the binary relation between two natural-language texts,  *text* and *hypothesis*, to determine if the *text* agrees with the *hypothesis* or not. |English|
+|Question Answering                      |BiDAF, BERT, XLNet| Question answering (QA) is the task of retrieving or generating a valid answer for a given query in natural language, provided with a passage related to the query. |English|
+|Sentence Similarity                     |BERT, GenSen| Sentence similarity is the process of computing a similarity score given a pair of text documents. |English|
 |Embeddings| Word2Vec<br>fastText<br>GloVe| Embedding is the process of converting a word or a piece of text to a continuous vector space of real number, usually, in low dimension.|English|
-
+|Sentiment Analysis| Dependency Parser <br>GloVe| Provides an example of train and use Aspect Based Sentiment Analysis with Azure ML and [Intel NLP Architect](http://nlp_architect.nervanasys.com/absa.html) .|English|
 ## Getting Started
 While solving NLP problems, it is always good to start with the prebuilt [Cognitive Services](https://azure.microsoft.com/en-us/services/cognitive-services/directory/lang/). When the needs are beyond the bounds of the prebuilt cognitive service and when you want to search for custom machine learning methods,  you will find this repository  very useful. To get started, navigate to the [Setup Guide](SETUP.md), which lists instructions on how to setup your environment and dependencies.
 
@@ -80,7 +80,7 @@ The following is a list of related repositories that we like and think are usefu
 
 |||
 |---|---|
-|[pytorch-transformers](https://github.com/huggingface/pytorch-transformers)|A great PyTorch library from Hugging Face with implementations of popular transformer-based models. We've been using their package extensively in this repo and greatly appreciate their effort.|
+|[transformers](https://github.com/huggingface/transformers)|A great PyTorch library from Hugging Face with implementations of popular transformer-based models. We've been using their package extensively in this repo and greatly appreciate their effort.|
 |[Azure Machine Learning Notebooks](https://github.com/Azure/MachineLearningNotebooks/)|ML and deep learning examples with Azure Machine Learning.|
 |[AzureML-BERT](https://github.com/Microsoft/AzureML-BERT)|End-to-end recipes for pre-training and fine-tuning BERT using Azure Machine Learning service.|
 |[MASS](https://github.com/microsoft/MASS)|MASS: Masked Sequence to Sequence Pre-training for Language Generation.|

diff --git a/cgmanifest.json b/cgmanifest.json
@@ -1,14 +1,4 @@
 {"Registrations":[ 
-    {
-      "component": { 
-       "type": "git", 
-       "git": { 
-         "repositoryUrl": "https://github.com/huggingface/pytorch-transformers", 
-         "commitHash": "b33a385091de604afb566155ec03329b84c96926" 
-         }
-       },
-       "license": "Apache-2.0"
-    },
     {
       "component": { 
        "type": "git", 

diff --git a/examples/README.md b/examples/README.md
@@ -4,20 +4,20 @@ This folder contains examples and best practices, written in Jupyter notebooks,
 
 |Category|Applications|Methods|Languages|
 |---| ------------------------ | ------------------- |---|
-|[Text Classification](text_classification)|Topic Classification|BERT, XLNet|en, hi, ar|
+|[Text Classification](text_classification)|Topic Classification|BERT, XLNet, RoBERTa, DistilBERT|en, hi, ar|
 |[Named Entity Recognition](named_entity_recognition) |Wikipedia NER|BERT|en|
 |[Entailment](entailment)|MultiNLI Natural Language Inference|BERT|en|
-|[Question Answering](question_answering) |SQuAD|BiDAF, BERT|en|
-|[Sentence Similarity](sentence_similarity)|STS Benchmark|Representation: TF-IDF, Word Embeddings, Doc Embeddings<br>Metrics: Cosine Similarity, Word Mover's Distance<br> Models: BERT, GenSen||
-|[Embeddings](embeddings)|Custom Embeddings Training|Word2Vec, fastText, GloVe||
-|[Annotation](annotation)|Text Annotation|Doccano||
-|[Model Explainability](model_explainability)|DNN Layer Explanation|DUUDNM (Guan et al.)|
+|[Question Answering](question_answering) |SQuAD|BiDAF, BERT, XLNet, DistilBERT|en|
+|[Sentence Similarity](sentence_similarity)|STS Benchmark|BERT, GenSen|en|
+|[Embeddings](embeddings)|Custom Embeddings Training|Word2Vec, fastText, GloVe|en|
+|[Annotation](annotation)|Text Annotation|Doccano|en|
+|[Model Explainability](model_explainability)|DNN Layer Explanation|DUUDNM (Guan et al.)|en|
 
 ## Data/Telemetry
 The Azure Machine Learning notebooks collect browser usage data and send it to Microsoft to help improve our products and services. Read Microsoft's [privacy statement to learn more](https://privacy.microsoft.com/en-US/privacystatement).
 
-To opt out of tracking, please go to the raw `.ipynb` files and remove the following line of code (the URL will be slightly different depending on the file):
+To opt out of tracking, a Python [script](../tools/remove_pixelserver.py) under the `tools` folder is also provided. Executing the script will check all notebooks under the `examples` folder, and automatically remove the telemetry cell:
 
 ```sh
-    "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/nlp/examples/text_classification/tc_bert_azureml.png)"
-```
+python ../tools/remove_pixelserver.py
+```
diff --git a/examples/embeddings/README.md b/examples/embeddings/README.md
@@ -25,6 +25,6 @@ therefore can be very useful for  tasks like sentence similary, text classifcati
 ## Summary
 
 
-|Notebook|Environment|Description|Dataset|
+|Notebook|Environment|Description|Dataset| Language | 
 |---|---|---|---|
-|[Developing Word Embeddings](embedding_trainer.ipynb)|Local| A notebook shows how to learn word representation with Word2Vec, fastText and Glove|[STS Benchmark dataset](http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark#STS_benchmark_dataset_and_companion_dataset) |
+|[Developing Word Embeddings](embedding_trainer.ipynb)|Local| A notebook shows how to learn word representation with Word2Vec, fastText and Glove|[STS Benchmark dataset](http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark#STS_benchmark_dataset_and_companion_dataset) | en |
diff --git a/examples/entailment/README.md b/examples/entailment/README.md
@@ -21,7 +21,7 @@ entailment. For example,
 
 ## Summary
 
-|Notebook|Environment|Description|Dataset|
-|--------|:-----------:|-------|----------|
-|[entailment_multinli_bert.ipynb](entailment_multinli_bert.ipynb)|Local|Fine-tuning of pre-trained BERT model for NLI|[MultiNLI](https://www.nyu.edu/projects/bowman/multinli/)|
-|[entailment_xnli_bert_azureml.ipynb](entailment_xnli_bert_azureml.ipynb)|AzureML|**Distributed** fine-tuning of pre-trained BERT model for NLI|[XNLI](https://www.nyu.edu/projects/bowman/xnli/)|Yes
+|Notebook|Environment|Description|Dataset| Language | 
+|--------|:-----------:|-------|----------|---------| 
+|[entailment_multinli_bert.ipynb](entailment_multinli_bert.ipynb)|Local|Fine-tuning of pre-trained BERT model for NLI|[MultiNLI](https://www.nyu.edu/projects/bowman/multinli/)| en | 
+|[entailment_xnli_bert_azureml.ipynb](entailment_xnli_bert_azureml.ipynb)|AzureML|**Distributed** fine-tuning of pre-trained BERT model for NLI|[XNLI](https://www.nyu.edu/projects/bowman/xnli/)| en