Skip to content

Commit

Permalink
Rephrased
Browse files Browse the repository at this point in the history
  • Loading branch information
borisveytsman committed Jul 6, 2022
1 parent ec9f5af commit 4b402cb
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions software-mentions-extractor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

## Introduction

The scripts in this directory extract software mentions from NXML files downloaded from PMC OA files, [https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/](https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/). We assume that the user downloaded the files, and extracted the contents. Then there is a file system with two main directories, comm and non\_comm for the commercial use subset and the non commercial use only subset. Each directory has subdurectories, one per journal. The subdirectories contain NXML files, one per paper.
The scripts in this directory extract software mentions from NXML files downloaded from PMC OA files, [https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/](https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/). We assume that you downloaded the files, and extracted the contents. Then you have file system with two main directories, comm and non\_comm for the commercial use subset and the non commercial use only subset. Each directory has subdurectories, one per journal. The subdirectories contain NXML files, one per paper.

You use software-mentions-extractor.py to extract the mentions. If needed, the result can be detokenized using detokenyzer.py auxillary script.

The extractor is designed to work with GNU parallel command [https://doi.org/10.5281/zenodo.6682930](https://doi.org/10.5281/zenodo.6682930). For example, suppose your working directory contains comm and non\_comm directories, then you can extract all mentions from the commercial subset using

ls -d comm/* | parallel -d ./software-mentions-extractor -o output/
ls -d comm/* | parallel -d software-mentions-extractor -o output


## software-mentions-extractor.py ##
Expand Down

0 comments on commit 4b402cb

Please sign in to comment.