Skip to content

Commit

Permalink
adding links to preprint + dataset in notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
anaistrate authored Sep 28, 2022
1 parent 113568c commit 1a6dc9c
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions sample_notebooks/Interacting with the dataset.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
"This notebook offers examples of **interacting** with the <b>CZI Software Mentions dataset </b><br>\n",
"The <b>CZI Software Mentions dataset </b> is a large dataset of software mentions mined from the literature. \n",
"\n",
"**Dataset Overview**: Plain-text software mentions are extracted with a trained [SciBERT](#references_scibert)model from several sources: the NIH PubMed Central collection and from papers provided by various publishers to the Chan Zuckerberg Initiative. The dataset provides sources, context and metadata, and, for a number of mentions, the disambiguated software entities and links. Full description of the dataset, methodology, algorihms and evaluation used to create the dataset can be found in our preprint, [A large dataset of software mentions in the biomedical literature](Link) and on our [Github page](https://github.com/chanzuckerberg/software-mentions). \n",
"**Dataset Overview**: Plain-text software mentions are extracted with a trained [SciBERT](#references_scibert)model from several sources: the NIH PubMed Central collection and from papers provided by various publishers to the Chan Zuckerberg Initiative. The dataset provides sources, context and metadata, and, for a number of mentions, the disambiguated software entities and links. Full description of the dataset, methodology, algorihms and evaluation used to create the dataset can be found in our preprint, [A large dataset of software mentions in the biomedical literature](https://arxiv.org/abs/2209.00693) and on our [Github page](https://github.com/chanzuckerberg/software-mentions). \n",
"\n",
"\n",
"**The notebook is structured and offers the following information and examples, as follows:**\n",
Expand All @@ -35,11 +35,11 @@
"There is a different notebook, [CZI Software Mentions Dataset - Sample Use Cases](#link_here), that offers sample use cases for the dataset.\n",
"\n",
"**The full list of resources we have available for the dataset is**:\n",
"1. [Preprint: A large dataset of software mentions in the biomedical literature](link)\n",
"1. [Preprint: A large dataset of software mentions in the biomedical literature](https://arxiv.org/abs/2209.00693)\n",
"2. [Github Repository](https://github.com/chanzuckerberg/software-mentions)\n",
"3. [Dataset README.md](link)\n",
"4. [CZI Software Mentions Dataset - Interacting with the Dataset](#link_here) - Jupyter Notebook\n",
"5. [CZI Software Mentions Dataset - Sample Use Cases](#link_here) - Jupyter Notebook\n",
"3. [Dataset](https://datadryad.org/stash/dataset/doi:10.5061/dryad.6wwpzgn2c?)\n",
"4. [Interacting with the Dataset](https://github.com/chanzuckerberg/software-mentions/blob/main/sample_notebooks/Interacting%20with%20the%20dataset.ipynb) - Jupyter Notebook\n",
"5. [Sample Use Cases](https://github.com/chanzuckerberg/software-mentions/blob/main/sample_notebooks/Sample%20Use%20Cases.ipynb) - Jupyter Notebook\n",
"\n",
"For questions, please contact [email protected]"
]
Expand All @@ -62,7 +62,7 @@
"<a id='dataset_interaction'></a>\n",
"\n",
"## Interacting with the dataset\n",
"We offer a brief overview of the dataset below. For a full description, including detailed information about the available files and fields, and how they were obtained, please consult the dataset [README.md](#Linkhere) file, or the Appendix section of our [preprint](link)"
"We offer a brief overview of the dataset below. For a full description, including detailed information about the available files and fields, and how they were obtained, please consult the dataset [README.md](https://datadryad.org/stash/dataset/doi:10.5061/dryad.6wwpzgn2c?) file, or the Appendix section of our [preprint](https://arxiv.org/abs/2209.00693)"
]
},
{
Expand Down Expand Up @@ -2431,7 +2431,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that for mentions that are marked as **unclear**, we don't recommend excluding them from analyses. They should rather be interpreted as *it cannot be assumed that this plain-text software mention will always be a true software mention when appearing in text*. The curators have only been provided with 5 sentences per software mention, and they did not curate each individual sentence in which a mention appears. The evaluations are based solely on those 5 sentences. We offer a more in-depth discussion about this in our [preprint](link) and [curation documents](link)"
"Note that for mentions that are marked as **unclear**, we don't recommend excluding them from analyses. They should rather be interpreted as *it cannot be assumed that this plain-text software mention will always be a true software mention when appearing in text*. The curators have only been provided with 5 sentences per software mention, and they did not curate each individual sentence in which a mention appears. The evaluations are based solely on those 5 sentences. We offer a more in-depth discussion about this in our [preprint](https://arxiv.org/abs/2209.00693) and [curation documents](link)"
]
},
{
Expand Down

0 comments on commit 1a6dc9c

Please sign in to comment.