-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
check spp.key #1
Comments
@JWMorley @bselden I made a video to show how I was making some of the corrections --- they still aren't perfect by any means. But I do show how I filter down to some of the entries that might merit a second glance. https://www.youtube.com/watch?v=RZlUds2Ph_0 Feel free to go about this however you want, if you can find time. Any help is much appreciated. |
@JWMorley @bselden Note that I edited the link to the video; so in your email you will still only see the old link. The new link is here: https://www.youtube.com/watch?v=RZlUds2Ph_0 And that is the same link that will appear on the GitHub issues site. |
So, I'm updating the data sets (the US ones for now), and I found 1131 new raw taxonomic ID's that aren't in spp.key already ... whoa. I'll put my auto-match code to work, but everything added in this way will be given the Working on properly adding these to the spp.key. It's basically done, just need to integrate it well with |
I've recently gone through most lines of the spp.key manually. I've manually checked 2654 rows in the recent effort; another 548 are "ok", 53 "manual", 577 "fine", 586 "bad", 316 "becca_batch2", and a lot of other random flags that indicate it's been checked in some way. In theory, the "bad" rows might need to be fixed, but they generally aren't ID'd to species, and are tossed out in the trim row due to that flag; so they aren't a big worry. There are 1009 rows that were "added_automatically", and 349 have an NA flag. None of these rows pertain to species that are in the current trawlDiversity analysis (due to subsampling years, day of year, and strata). So this is very near completion, and is much less of a worry for my current analysis, but could still use some work. I also wouldn't be surprised is some of my "check" rows had errors/ typos (I found 1 or 2 already). So it ain't perfect. |
@bselden @JWMorley
This repo will become an R package, but it's still in development.
The file spp.key.csv has all of the known "raw" (as-entered) taxonomic identifiers (species names) from all regions. But it needs to be checked.
Most species have had something found. The "raw" column is named "ref", and the "corrected" column is named "spp".
Looking through, some of the "corrected" spp names are clearly wrong, as are some of the common names.
Feel free to make corrections, and commit/ push the changes. But please use Git. You may want to install git lfs before downloading this repo (otherwise, the large file storage might break, or you'll end up with bigger files than you want; I'm not sure what happens).
Note that each value in "ref" is unique, but the "spp" values are not. Make sure you do not create any inconsistencies as you edit the file. E.g., if you see that spp=="zoroaster" does not actually have a common name of "frogfish", don't change the common name to "seastar" on only 1 line ... make sure that the updated file has the same common name for all "zoroaster".
I can explain further when you decide to take a look. Just let me know.
The text was updated successfully, but these errors were encountered: