A simple name cleaner in python using the Phylotastic TNRastic API at Taxosaurus.
mr-naims is a Phylotastic 2 project
- Python 2.7. You should run mr-naims in a virtualenv
- Requests: HTTP for humans. Install it in your virtualenv with
pip install requests
- DendroPy, for reading Newick and NeXML trees.
pip install dendropy
.
python simple.py [options] -f inputfile
inputfile may be a PDF, image, Office Document, Text file, Newick tree, or NeXML file (NeXML support is experimental). It will be sent to Global Names Recognition and Discovery to extract a list of scientific names, unless you specify -s/--skip-gnrd. Run python simple.py -h
for help.
If providing a newick tree, specify the -n option.
If you would like to limit the TNRS search to a specific provider, use the --source option, e.g. --source MSW3
The test-set.txt
is included as an example list of names
mr-naims producecs a inputfile.clean
file containing the cleaned list, and outputs a CSV report including the match score and provenance of each result.
There are issues at the TNRS level with unicode names.