Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
fixing repo names capitalizations
  • Loading branch information
anaistrate authored Jul 25, 2022
1 parent c80d654 commit abf4fe7
Showing 1 changed file with 9 additions and 9 deletions.
18 changes: 9 additions & 9 deletions software-mentions-linker-disambiguator/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Software Mentions Linking + Disambiguation

The goal of this project is to produce a high quality dataset of software used in the biomedical literature to facilitate analysis of adoption and impact of open-source scientific software. Our overall methodology is the following:
1. Extract plain-text software mentions from the PMC-OA access using an [NER Machine Learning Algorithm](https://github.com/chanzuckerberg/software-mention-extraction) (developed by Ivana Williams)
2. Link the software mentions to repositories and generate metadata by querying a number of databases. We link mentions to: PyPI, Bioconductor, CRAN, Scicrunch and Github
1. Extract plain-text software mentions from the PMC-OA access using an [NER Machine Learning Algorithm](https://github.com/chanzuckerberg/software-mention-extraction) (developed by Ivana Williams) G
2. Link the software mentions to repositories and generate metadata by querying a number of databases. We link mentions to: PyPI, Bioconductor, CRAN, SciCrunch and GitHub
3. Disambiguate the software mentions

More detailed descriptions of the **[linking](#linking)** and **[disambiguation](#disambiguation)** steps can be found below, together with instructions on how to run the code.
Expand All @@ -14,11 +14,11 @@ More detailed descriptions of the **[linking](#linking)** and **[disambiguation]

## Linking Task description ##
1. We query the following databases, searching for exact matches for plain text sofware mentions in our dataset:
- pypi Index: https://pypi.org/simple/
- PyPI Index: https://pypi.org/simple/
- Bioconductor Index: https://www.bioconductor.org/packages/release/bioc/
- CRAN Index: https://cran.r-project.org/web/packages/available_packages_by_name.html
- Github API: https://github.com
- Scicrunch API: https://scicrunch.org/resources
- GitHub API: https://github.com
- SciCrunch API: https://scicrunch.org/resources

2. We normalize the metadata files to a [common schema](#linking-schema).
### Linking Schema
Expand All @@ -29,15 +29,15 @@ Metadata files are normalized to the following fields:
| ID | unique ID of software mention (generated by us) |
| software_mention | plain-text software mention |
| mapped_to | value the software_mention is being mapped to |
| source | source of the mapping - eg Bioconductor Index, Github API|
| platform | platform of software_mention - eg pypi, CRAN |
| source | source of the mapping - eg Bioconductor Index, GitHub API|
| platform | platform of software_mention - eg PyPI, CRAN |
| package_url | URL linking software_mention to source |
| description | description of software_mention |
| homepage_url | homepage_url of software_mention|
| other_urls |other related URLs |
| license | software license |
| github_repo | Github repository |
| github_repo_license | Github repository license |
| github_repo | GitHub repository |
| github_repo_license | GitHub repository license |
| exact_match | whether or not this mapping was an exact match |
| RRID | RRID for software_mention |
| reference | journal articles linked to software_mention (identified either through DOI, pmid or RRID)|
Expand Down

0 comments on commit abf4fe7

Please sign in to comment.