-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
adding links to preprint + dataset in notebook
- Loading branch information
1 parent
113568c
commit 1a6dc9c
Showing
1 changed file
with
7 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,7 +10,7 @@ | |
"This notebook offers examples of **interacting** with the <b>CZI Software Mentions dataset </b><br>\n", | ||
"The <b>CZI Software Mentions dataset </b> is a large dataset of software mentions mined from the literature. \n", | ||
"\n", | ||
"**Dataset Overview**: Plain-text software mentions are extracted with a trained [SciBERT](#references_scibert)model from several sources: the NIH PubMed Central collection and from papers provided by various publishers to the Chan Zuckerberg Initiative. The dataset provides sources, context and metadata, and, for a number of mentions, the disambiguated software entities and links. Full description of the dataset, methodology, algorihms and evaluation used to create the dataset can be found in our preprint, [A large dataset of software mentions in the biomedical literature](Link) and on our [Github page](https://github.com/chanzuckerberg/software-mentions). \n", | ||
"**Dataset Overview**: Plain-text software mentions are extracted with a trained [SciBERT](#references_scibert)model from several sources: the NIH PubMed Central collection and from papers provided by various publishers to the Chan Zuckerberg Initiative. The dataset provides sources, context and metadata, and, for a number of mentions, the disambiguated software entities and links. Full description of the dataset, methodology, algorihms and evaluation used to create the dataset can be found in our preprint, [A large dataset of software mentions in the biomedical literature](https://arxiv.org/abs/2209.00693) and on our [Github page](https://github.com/chanzuckerberg/software-mentions). \n", | ||
"\n", | ||
"\n", | ||
"**The notebook is structured and offers the following information and examples, as follows:**\n", | ||
|
@@ -35,11 +35,11 @@ | |
"There is a different notebook, [CZI Software Mentions Dataset - Sample Use Cases](#link_here), that offers sample use cases for the dataset.\n", | ||
"\n", | ||
"**The full list of resources we have available for the dataset is**:\n", | ||
"1. [Preprint: A large dataset of software mentions in the biomedical literature](link)\n", | ||
"1. [Preprint: A large dataset of software mentions in the biomedical literature](https://arxiv.org/abs/2209.00693)\n", | ||
"2. [Github Repository](https://github.com/chanzuckerberg/software-mentions)\n", | ||
"3. [Dataset README.md](link)\n", | ||
"4. [CZI Software Mentions Dataset - Interacting with the Dataset](#link_here) - Jupyter Notebook\n", | ||
"5. [CZI Software Mentions Dataset - Sample Use Cases](#link_here) - Jupyter Notebook\n", | ||
"3. [Dataset](https://datadryad.org/stash/dataset/doi:10.5061/dryad.6wwpzgn2c?)\n", | ||
"4. [Interacting with the Dataset](https://github.com/chanzuckerberg/software-mentions/blob/main/sample_notebooks/Interacting%20with%20the%20dataset.ipynb) - Jupyter Notebook\n", | ||
"5. [Sample Use Cases](https://github.com/chanzuckerberg/software-mentions/blob/main/sample_notebooks/Sample%20Use%20Cases.ipynb) - Jupyter Notebook\n", | ||
"\n", | ||
"For questions, please contact [email protected]" | ||
] | ||
|
@@ -62,7 +62,7 @@ | |
"<a id='dataset_interaction'></a>\n", | ||
"\n", | ||
"## Interacting with the dataset\n", | ||
"We offer a brief overview of the dataset below. For a full description, including detailed information about the available files and fields, and how they were obtained, please consult the dataset [README.md](#Linkhere) file, or the Appendix section of our [preprint](link)" | ||
"We offer a brief overview of the dataset below. For a full description, including detailed information about the available files and fields, and how they were obtained, please consult the dataset [README.md](https://datadryad.org/stash/dataset/doi:10.5061/dryad.6wwpzgn2c?) file, or the Appendix section of our [preprint](https://arxiv.org/abs/2209.00693)" | ||
] | ||
}, | ||
{ | ||
|
@@ -2431,7 +2431,7 @@ | |
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Note that for mentions that are marked as **unclear**, we don't recommend excluding them from analyses. They should rather be interpreted as *it cannot be assumed that this plain-text software mention will always be a true software mention when appearing in text*. The curators have only been provided with 5 sentences per software mention, and they did not curate each individual sentence in which a mention appears. The evaluations are based solely on those 5 sentences. We offer a more in-depth discussion about this in our [preprint](link) and [curation documents](link)" | ||
"Note that for mentions that are marked as **unclear**, we don't recommend excluding them from analyses. They should rather be interpreted as *it cannot be assumed that this plain-text software mention will always be a true software mention when appearing in text*. The curators have only been provided with 5 sentences per software mention, and they did not curate each individual sentence in which a mention appears. The evaluations are based solely on those 5 sentences. We offer a more in-depth discussion about this in our [preprint](https://arxiv.org/abs/2209.00693) and [curation documents](link)" | ||
] | ||
}, | ||
{ | ||
|