Skip to content
This repository was archived by the owner on Nov 16, 2023. It is now read-only.

Commit

Permalink
Merge pull request #479 from microsoft/staging
Browse files Browse the repository at this point in the history
Staging
  • Loading branch information
saidbleik authored Nov 18, 2019
2 parents a2ac143 + 5647db7 commit 967abcd
Show file tree
Hide file tree
Showing 62 changed files with 7,659 additions and 2,935 deletions.
4 changes: 3 additions & 1 deletion NOTICE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ General Public License.

--

https://github.com/huggingface/pytorch-transformers
https://github.com/huggingface/transformers

Apache License
Version 2.0, January 2004
Expand Down Expand Up @@ -664,3 +664,5 @@ https://github.com/allenai/bi-att-flow
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.


12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,13 +48,13 @@ The following is a summary of the commonly used NLP scenarios covered in the rep

| Scenario | Models | Description|Languages|
|-------------------------| ------------------- |-------|---|
|Text Classification |BERT <br> XLNet| Text classification is a supervised learning method of learning and predicting the category or the class of a document given its text content. |English, Hindi, Arabic|
|Text Classification |BERT, XLNet, RoBERTa| Text classification is a supervised learning method of learning and predicting the category or the class of a document given its text content. |English, Hindi, Arabic|
|Named Entity Recognition |BERT| Named entity recognition (NER) is the task of classifying words or key phrases of a text into predefined entities of interest. |English|
|Entailment |BERT| Textual entailment is the task of classifying the binary relation between two natural-language texts, text and hypothesis’, to determine if the `text' agrees with the `hypothesis` or not. |English|
|Question Answering |BiDAF <br> BERT| Question answering (QA) is the task of retrieving or generating a valid answer for a given query in natural language, provided with a passage related to the query. |English|
|Sentence Similarity |Representation: TF-IDF, Word Embeddings, Doc Embeddings<br>Metrics: Cosine Similarity, Word Mover's Distance<br>Models: BERT, GenSen| Sentence similarity is the process of computing a similarity score given a pair of text documents. |English|
|Entailment |BERT, XLNet, RoBERTa| Textual entailment is the task of classifying the binary relation between two natural-language texts, *text* and *hypothesis*, to determine if the *text* agrees with the *hypothesis* or not. |English|
|Question Answering |BiDAF, BERT, XLNet| Question answering (QA) is the task of retrieving or generating a valid answer for a given query in natural language, provided with a passage related to the query. |English|
|Sentence Similarity |BERT, GenSen| Sentence similarity is the process of computing a similarity score given a pair of text documents. |English|
|Embeddings| Word2Vec<br>fastText<br>GloVe| Embedding is the process of converting a word or a piece of text to a continuous vector space of real number, usually, in low dimension.|English|

|Sentiment Analysis| Dependency Parser <br>GloVe| Provides an example of train and use Aspect Based Sentiment Analysis with Azure ML and [Intel NLP Architect](http://nlp_architect.nervanasys.com/absa.html) .|English|
## Getting Started
While solving NLP problems, it is always good to start with the prebuilt [Cognitive Services](https://azure.microsoft.com/en-us/services/cognitive-services/directory/lang/). When the needs are beyond the bounds of the prebuilt cognitive service and when you want to search for custom machine learning methods, you will find this repository very useful. To get started, navigate to the [Setup Guide](SETUP.md), which lists instructions on how to setup your environment and dependencies.

Expand All @@ -80,7 +80,7 @@ The following is a list of related repositories that we like and think are usefu

|||
|---|---|
|[pytorch-transformers](https://github.com/huggingface/pytorch-transformers)|A great PyTorch library from Hugging Face with implementations of popular transformer-based models. We've been using their package extensively in this repo and greatly appreciate their effort.|
|[transformers](https://github.com/huggingface/transformers)|A great PyTorch library from Hugging Face with implementations of popular transformer-based models. We've been using their package extensively in this repo and greatly appreciate their effort.|
|[Azure Machine Learning Notebooks](https://github.com/Azure/MachineLearningNotebooks/)|ML and deep learning examples with Azure Machine Learning.|
|[AzureML-BERT](https://github.com/Microsoft/AzureML-BERT)|End-to-end recipes for pre-training and fine-tuning BERT using Azure Machine Learning service.|
|[MASS](https://github.com/microsoft/MASS)|MASS: Masked Sequence to Sequence Pre-training for Language Generation.|
Expand Down
10 changes: 0 additions & 10 deletions cgmanifest.json
Original file line number Diff line number Diff line change
@@ -1,14 +1,4 @@
{"Registrations":[
{
"component": {
"type": "git",
"git": {
"repositoryUrl": "https://github.com/huggingface/pytorch-transformers",
"commitHash": "b33a385091de604afb566155ec03329b84c96926"
}
},
"license": "Apache-2.0"
},
{
"component": {
"type": "git",
Expand Down
18 changes: 9 additions & 9 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,20 @@ This folder contains examples and best practices, written in Jupyter notebooks,

|Category|Applications|Methods|Languages|
|---| ------------------------ | ------------------- |---|
|[Text Classification](text_classification)|Topic Classification|BERT, XLNet|en, hi, ar|
|[Text Classification](text_classification)|Topic Classification|BERT, XLNet, RoBERTa, DistilBERT|en, hi, ar|
|[Named Entity Recognition](named_entity_recognition) |Wikipedia NER|BERT|en|
|[Entailment](entailment)|MultiNLI Natural Language Inference|BERT|en|
|[Question Answering](question_answering) |SQuAD|BiDAF, BERT|en|
|[Sentence Similarity](sentence_similarity)|STS Benchmark|Representation: TF-IDF, Word Embeddings, Doc Embeddings<br>Metrics: Cosine Similarity, Word Mover's Distance<br> Models: BERT, GenSen||
|[Embeddings](embeddings)|Custom Embeddings Training|Word2Vec, fastText, GloVe||
|[Annotation](annotation)|Text Annotation|Doccano||
|[Model Explainability](model_explainability)|DNN Layer Explanation|DUUDNM (Guan et al.)|
|[Question Answering](question_answering) |SQuAD|BiDAF, BERT, XLNet, DistilBERT|en|
|[Sentence Similarity](sentence_similarity)|STS Benchmark|BERT, GenSen|en|
|[Embeddings](embeddings)|Custom Embeddings Training|Word2Vec, fastText, GloVe|en|
|[Annotation](annotation)|Text Annotation|Doccano|en|
|[Model Explainability](model_explainability)|DNN Layer Explanation|DUUDNM (Guan et al.)|en|

## Data/Telemetry
The Azure Machine Learning notebooks collect browser usage data and send it to Microsoft to help improve our products and services. Read Microsoft's [privacy statement to learn more](https://privacy.microsoft.com/en-US/privacystatement).

To opt out of tracking, please go to the raw `.ipynb` files and remove the following line of code (the URL will be slightly different depending on the file):
To opt out of tracking, a Python [script](../tools/remove_pixelserver.py) under the `tools` folder is also provided. Executing the script will check all notebooks under the `examples` folder, and automatically remove the telemetry cell:

```sh
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/nlp/examples/text_classification/tc_bert_azureml.png)"
```
python ../tools/remove_pixelserver.py
```
4 changes: 2 additions & 2 deletions examples/embeddings/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,6 @@ therefore can be very useful for tasks like sentence similary, text classifcati
## Summary


|Notebook|Environment|Description|Dataset|
|Notebook|Environment|Description|Dataset| Language |
|---|---|---|---|
|[Developing Word Embeddings](embedding_trainer.ipynb)|Local| A notebook shows how to learn word representation with Word2Vec, fastText and Glove|[STS Benchmark dataset](http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark#STS_benchmark_dataset_and_companion_dataset) |
|[Developing Word Embeddings](embedding_trainer.ipynb)|Local| A notebook shows how to learn word representation with Word2Vec, fastText and Glove|[STS Benchmark dataset](http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark#STS_benchmark_dataset_and_companion_dataset) | en |
8 changes: 4 additions & 4 deletions examples/entailment/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ entailment. For example,

## Summary

|Notebook|Environment|Description|Dataset|
|--------|:-----------:|-------|----------|
|[entailment_multinli_bert.ipynb](entailment_multinli_bert.ipynb)|Local|Fine-tuning of pre-trained BERT model for NLI|[MultiNLI](https://www.nyu.edu/projects/bowman/multinli/)|
|[entailment_xnli_bert_azureml.ipynb](entailment_xnli_bert_azureml.ipynb)|AzureML|**Distributed** fine-tuning of pre-trained BERT model for NLI|[XNLI](https://www.nyu.edu/projects/bowman/xnli/)|Yes
|Notebook|Environment|Description|Dataset| Language |
|--------|:-----------:|-------|----------|---------|
|[entailment_multinli_bert.ipynb](entailment_multinli_bert.ipynb)|Local|Fine-tuning of pre-trained BERT model for NLI|[MultiNLI](https://www.nyu.edu/projects/bowman/multinli/)| en |
|[entailment_xnli_bert_azureml.ipynb](entailment_xnli_bert_azureml.ipynb)|AzureML|**Distributed** fine-tuning of pre-trained BERT model for NLI|[XNLI](https://www.nyu.edu/projects/bowman/xnli/)| en
Loading

0 comments on commit 967abcd

Please sign in to comment.