-
Notifications
You must be signed in to change notification settings - Fork 6
/
FOSS4Spectroscopy.Rmd
234 lines (185 loc) · 10.3 KB
/
FOSS4Spectroscopy.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
---
title: FOSS For Spectroscopy
author: Bryan A. Hanson, DePauw University
date: >-
`r paste(format(Sys.time(),
format = "%Y-%m-%d %H:%M", tz = "GMT"), "UTC")`
output:
html_document:
theme: united
---
<script data-goatcounter="https://foss4spec.goatcounter.com/count"
async src="//gc.zgo.at/count.js"></script>
<!-- Following code to keep printed version cleaner; see https://stackoverflow.com/a/24467913/633251 -->
<style type="text/css" media="print">
a:link:after, a:visited:after {
content: normal !important;
}
</style>
<!-- See https://kryogenix.org/code/browser/sorttable/ -->
<script src="Include/sorttable.js"></script>
The following table collects information about free and open source software ([FOSS](https://en.wikipedia.org/wiki/Free_and_open-source_software)) for spectroscopy. All information is taken from the respective websites and/or repositories. Some projects have been described in publications. If so, a link is provided, but the document may be behind a paywall. This [blog post](https://chemospec.org/posts/2021-04-19-Search-GH-Topics/2021-04-19-Search-GH-Topics.html) give some details about how we find interesting repositories.
Unless otherwise noted, the software mentioned here:
* Is suitable for one or more of the following techniques: NMR, IR, Raman, ESR/EPR, fluorescence, XRF, LIBS and UV-Vis.
* Mass *spectrometry* software is not included (see the work by Stanstrup just below).
* Software for MRI is not included.
* Software for remote sensing is generally not included, though some projects involving hyperspectral imaging are included.
* While some folks publish open source add-ons for `Matlab`, `Matlab` is not open source and projects written in `Matlab` are not included.
[Stanstrup *et al.*](https://www.mdpi.com/2218-1989/9/10/200) have published a comprehensive paper describing the `R` packages suitable for use in metabolomics, which partially overlaps with the information here. The authors have also created a dynamic [document](https://rformassspectrometry.github.io/metaRbolomics-book/) with the same information and more. In addition, [awesome-spectra](https://github.com/erwanp/awesome-spectra) is a page somewhat in the spirit of the work here, but apparently depends on authors to add their own material, and is missing some key entries (e.g. no NMR packages).
The [CRAN Task View for Chemometrics & Computational Physics](https://cran.r-project.org/web/views/ChemPhys.html) includes some `R` packages listed here as well as related software.
#### How Does One Choose a Package?
*The projects listed here have been lightly vetted. If a project looks incomplete, is a class project, or I can't tell what it does, it's not included!* Of course, if you feel I have not included your package in error, feel free to request its inclusion. With that in mind, there are still many packages to consider. As a general guide, the excellent checklist provided by [Lortie *et al.*](https://onlinelibrary.wiley.com/doi/full/10.1002/ece3.5970) is included here for your consideration.
<img src="Include/Table1.png" alt="Lortie Table 1" width="50%">
```{r setupR, echo = FALSE, results = "hide"}
# Clean up the workspace but keep the local token, if present
# This is necessary for the local build
keep <- "github_token"
rm(list = ls()[!(ls() %in% keep)])
suppressPackageStartupMessages(library("knitr"))
suppressPackageStartupMessages(library("kableExtra"))
suppressPackageStartupMessages(library("readxl"))
suppressPackageStartupMessages(library("httr"))
suppressPackageStartupMessages(library("lubridate"))
suppressPackageStartupMessages(library("jsonlite"))
suppressPackageStartupMessages(library("rvest")) #???
suppressPackageStartupMessages(library("stringr"))
suppressPackageStartupMessages(library("xml2"))
suppressPackageStartupMessages(library("webu"))
opts_chunk$set(echo = FALSE)
set_config(timeout(20)) # httr setting for GET calls (default 13)
cnt <- 0L # counter for the number of GET calls to Github
username <- "bryanhanson"
```
```{r readDB}
# Please edit FOSS4Spec.xlsx to add information or make corrections.
# Please follow the conventions of existing entries for consistency.
# Remember the table in the web page is sortable so consistency in
# description and focus is especially important for users to obtain
# useful information easily.
DF <- as.data.frame(read_excel("FOSS4Spec.xlsx", na = "NA"))
# shorten names for less typing
names(DF) <- c("pkgname", "web", "repo", "pub", "lang", "desc", "focus", "maint")
DF <- DF[order(DF$pkgname),]
```
```{r token}
junk <- check_for_github_token(github_token)
```
```{r verifyURLs}
# This takes some time!
ne <- nrow(DF) # number of entries
# Check all URLs, if the site is down handle so table always looks good.
# Site URL might also just be missing from table, handle this too.
webLink <- rep(FALSE, ne) # If TRUE, there is a link in the input table
repoLink <- rep(FALSE, ne)
pubLink <- rep(FALSE, ne)
webOK <- rep(FALSE, ne) # If TRUE, URL was reachable
repoOK <- rep(FALSE, ne)
pubOK <- rep(FALSE, ne)
# These are used for internal reporting
# If TRUE, URL was given but not reachable
badWeb <- rep(FALSE, ne)
badRepo <- rep(FALSE, ne)
badPub <- rep(FALSE, ne)
for (i in 1:ne) {
if (!is.na(DF$web[i])) {
webOK[i] <- good_url(DF$web[i])
webLink[i] <- TRUE
if (webLink[i] != webOK[i]) badWeb[i] <- TRUE
}
if (!is.na(DF$repo[i])) {
repoOK[i] <- good_url(DF$repo[i])
repoLink[i] <- TRUE
if (repoLink[i] != repoOK[i]) badRepo[i] <- TRUE
}
if (!is.na(DF$pub[i])) {
pubOK[i] <- good_url(DF$pub[i])
pubLink[i] <- TRUE
if (pubLink[i] != pubOK[i]) badPub[i] <- TRUE
}
}
# If URLs are bad they will still be added to the table as hyperlinks, but
# those links will give status 404.
# Write a report so maintainers can check & fix if it's on our end
LinkReport <- data.frame(name = DF$pkgname, webLink, webOK, repoLink, repoOK, pubLink, pubOK, stringsAsFactors = FALSE)
keep <- badPub | badRepo | badWeb
LinkReport <- LinkReport[keep,]
if (nrow(LinkReport) > 0) write.csv(LinkReport, row.names = FALSE, file = "Reports/Links404.csv")
```
```{r checkUpdateDate, warning = FALSE}
# Use the info from checking URLs above
webDate <- as.POSIXct(rep(NA, ne)) # see stackoverflow.com/a/33002710/633251
commitDate <- as.POSIXct(rep(NA, ne))
issueDate <- as.POSIXct(rep(NA, ne))
updateDate <- as.POSIXct(rep(NA, ne))
repoType <- rep("xx", ne)
repoType[grepl("github\\.com", DF$repo)] <- "gh"
for (i in 1:ne) {
if (webOK[i]) {
ans <- find_page_date(flatten_web_page(DF$web[i]))
if (!is.na(ans)) webDate[i] <- ans
}
if (repoOK[i]) {
if (repoType[i] == "gh") {
# NA returned when repo path bad
tmp <- get_github_dates(DF$repo[i], "commits")
cnt <- cnt + 1
if (!is.na(tmp)) commitDate[i] <- ymd(tmp)
tmp <- get_github_dates(DF$repo[i], "issues")
cnt <- cnt + 1
if (!is.na(tmp)) issueDate[i] <- ymd(tmp)
}
}
# updateDate will be the most recent of webDate, issueDate, commitDate
# If all are NA, a warning is issued so to avoid that do:
if (is.na(webDate[i]) & is.na(commitDate[i]) & is.na(issueDate[i])) next
updateDate[i] <- max(webDate[i], commitDate[i], issueDate[i], na.rm = TRUE)
}
updateDate <- date(updateDate) # -> ymd
# Write a report
DateReport <- data.frame(name = DF$pkgname, webDate, commitDate, issueDate, updateDate, stringsAsFactors = FALSE)
write.csv(DateReport, row.names = FALSE, file = "Reports/DateReport.csv")
```
```{r createNamelink}
# Additional processing of the input values
# Combine name, website and pub as available to create hyperlink
namelink <- DF$pkgname # There must be at least a pkgname in the input table
for (i in 1:ne) {
if (!is.na(DF$web[i])) {
namelink[i] <- paste("[", DF$pkgname[i], "](", DF$web[i], ")", sep = "")
}
if (!is.na(DF$pub[i])) {
namelink[i] <- paste(namelink[i], " ([pub](", DF$pub[i], "))", sep = "")
}
}
```
```{r createTable}
DF2 <- data.frame(namelink, DF$desc, DF$lang, DF$focus, updateDate, stringsAsFactors = FALSE)
names(DF2) <- c("Name", "Description", "Lang", "Focus", "[Status](#status)")
```
<hr>
* *Click on a header to sort the table ("Focus" may be a good starting point)*
* [Abbreviations](#abbreviations) & terms below the table
* The table currently has `r nrow(DF2)` entries; the majority are `R` (`r length(DF2$Lang[DF2$Lang == "R"])`) or `Python` (`r length(DF2$Lang[DF2$Lang == "Python"])`).
```{r printTable, results = "asis"}
options(knitr.kable.NA = '')
kable(DF2, table.attr = "class=\"sortable\"") %>%
kable_styling(c("striped", "bordered", "sortable"))
```
Additions or corrections? Please [file an issue](https://github.com/bryanhanson/FOSS4Spectroscopy/issues) with the necessary information or submit a pull request at the [repo](https://github.com/bryanhanson/FOSS4Spectroscopy).
##### Status Column: {#status}
The status column in the table gives the date of the most recent:
* commit to a repository,
* activity on an issue filed in a repository,
* web site update, or
* submission to an archival network such as CRAN or PyPI
as a proxy for regular maintenance. Keep in mind that some software is fairly mature and thus an older status date does not necessarily mean the software is not maintained. Follow the links to the websites for more details.
*The status date is found by automatic checking of sites with valid links. Commits and issues are only checked for Github sites, via the Github API. Web site updates are found using a simple search for common date formats and may be inaccurate. Keep in mind the most recent activity may be on any branch in a repo, and may be newer than any official releases.*[^1]
##### Abbreviations & Terms: {#abbreviations}
* __CRAN:__ [Comprehensive R Archival Network](https://cran.r-project.org/)
* __EDA:__ Exploratory data analysis (unsupervised chemometrics)
* __Focus:__ The type of spectroscopy the software is focused on. Keep in mind that software designed with a given spectroscopy in mind may still work with other types of spectroscopic data.
* __Language:__ Most software is built on several languages. In the table, "lang" refers to the primary language used to create the software.
* __PyPI:__ [Python Package Index](https://pypi.org/)
* __Python:__ Multipurpose programming language [details](https://www.python.org/about/)
* __R:__ A software environment for statistical computing and graphics [details, platforms](https://www.r-project.org/)
[^1]: We made `r cnt` inquiries to Github to produce this report.