Skip to content

Commit f96b0ba

Browse files
Release V0.1.0 (as submitted to CRAN)
0 parents  commit f96b0ba

23 files changed

+2469
-0
lines changed

DESCRIPTION

+37
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
Package: ECOTOXr
2+
Type: Package
3+
Title: Download and Extract Data from US EPA's ECOTOX Database
4+
Version: 0.1.0
5+
Date: 2021-10-03
6+
Authors@R: c(person("Pepijn", "de Vries", role = c("aut", "cre", "dtc"),
7+
email = "[email protected]"))
8+
Author:
9+
Pepijn de Vries [aut, cre, dtc]
10+
Maintainer: Pepijn de Vries <[email protected]>
11+
Description: The US EPA ECOTOX database is a freely available database
12+
with a treasure of aquatic and terrestrial ecotoxicological data.
13+
As the online search interface doesn't come with an API, this
14+
package provides the means to easily access and search the database
15+
in R. To this end, all raw tables are downloaded from the EPA website
16+
and stored in a local SQLite database.
17+
Depends:
18+
R (>= 3.5.0),
19+
RSQLite
20+
Imports:
21+
crayon,
22+
dplyr,
23+
rappdirs,
24+
readr,
25+
rvest,
26+
stringr,
27+
utils
28+
Suggests:
29+
testthat (>= 3.0.0),
30+
webchem
31+
URL: https://github.com/pepijn-devries/ECOTOXr
32+
BugReports: https://github.com/pepijn-devries/ECOTOXr/issues
33+
License: GPL (>= 3)
34+
Encoding: UTF-8
35+
LazyData: true
36+
RoxygenNote: 7.1.2
37+
Config/testthat/edition: 3

LICENSE.md

+595
Large diffs are not rendered by default.

NAMESPACE

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Generated by roxygen2: do not edit by hand
2+
3+
export(build_ecotox_sqlite)
4+
export(check_ecotox_availability)
5+
export(cite_ecotox)
6+
export(dbConnectEcotox)
7+
export(dbDisconnectEcotox)
8+
export(download_ecotox_data)
9+
export(get_ecotox_info)
10+
export(get_ecotox_path)
11+
export(get_ecotox_sqlite_file)
12+
export(list_ecotox_fields)
13+
export(search_ecotox)
14+
export(search_query_ecotox)
15+
importFrom(RSQLite,dbConnect)
16+
importFrom(RSQLite,dbDisconnect)
17+
importFrom(RSQLite,dbExecute)
18+
importFrom(RSQLite,dbWriteTable)

NEWS

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
ECOTOXr v0.1.0 (Release date: 2021-10-03)
2+
=============
3+
4+
* Inital release which can:
5+
6+
* Download raw ECOTOX database tables from the EPA website
7+
* Build an SQLite database from those files
8+
* Search and extract data from the created local database

R/ECOTOXr.r

+103
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
#' Package description
2+
#'
3+
#' Everything you need to know when you start using the ECOTOXr package.
4+
#'
5+
#' The ECOTOXr provides the means to efficiently search, extract and analyse \href{https://www.epa.gov/}{US EPA}
6+
#' \href{https://cfpub.epa.gov/ecotox/}{ECOTOX} data, with a focus on reproducible results. Although the package
7+
#' creator/maintainer is confident in the quality of this software, it is the end users sole responsibility to
8+
#' assure the quality of his or her work while using this software. As per the provided license terms the package
9+
#' maintainer is not liable for any damage resulting from its usage. That being said, below we present some tips
10+
#' for generating reproducible results with this package.
11+
#'
12+
#' @section How do I get started?:
13+
#' Installing this package is only the first step to get things started. You need to perform the following steps
14+
#' in order to use the package to its full capacity.
15+
#'
16+
#' \itemize{
17+
#' \item{
18+
#' First download a copy of the complete EPA database. This can be done by calling \code{\link{download_ecotox_data}}.
19+
#' This may not always work on all machines as R does not always accept the website SSL certificate from the EPA.
20+
#' In those cases the zipped archive with the database files can be downloaded manually with a different (more
21+
#' forgiving) browser. The files from the zip archive can be extracted to a location of choice.
22+
#' }
23+
#' \item{
24+
#' Next, an SQLite database needs to be build from the downloaded files. This will be done automatically when
25+
#' you used \code{\link{download_ecotox_data}} in the previous step. When you have manually downloaded the files
26+
#' you can call \code{\link{build_ecotox_sqlite}} to build the database locally.
27+
#' }
28+
#' \item{
29+
#' When the previous steps have been performed successfully, you can now search the database by calling
30+
#' \code{\link{search_ecotox}}. You can also use \code{\link{dbConnectEcotox}} to open a connection to the
31+
#' database. You can query the database using this connection and any of the methods provided from the
32+
#' \link[DBI:DBI]{DBI} or \link[RSQLite:RSQLite]{RSQLite} packages.
33+
#' }
34+
#' }
35+
#'
36+
#' @section How do I obtain reproducible results?:
37+
#' Each individual user is responsible for evaluating the reproducibility of his or her work. Although
38+
#' this package offers instruments to achieve reproducibility, it is not guaranteed. In order to increase the
39+
#' chances of generating reproducible results, one should adhere at least to the following rules:
40+
#' \itemize{
41+
#' \item{
42+
#' Always use an official release from CRAN, and cite the version used in your analyses (\code{citation("ECOTOXr")}).
43+
#' Different versions, may produce different end results (although we will strive for backward compatibility).
44+
#' }
45+
#' \item{
46+
#' Make sure you are working with a clean (unaltered) version of the database. When in doubt, download and build
47+
#' a fresh copy of the database (\code{\link{download_ecotox_data}}). Also cite the (release) version of the downloaded
48+
#' database (\code{\link{cite_ecotox}}), and the system operating system in which the local database was build
49+
#' \code{\link{get_ecotox_info}}). Or, just make sure that you never modify the database (e.g., write data to it, delete
50+
#' data from it, etc.)
51+
#' }
52+
#' \item{
53+
#' In order to avoid platform dependencies it is advised to only include non-accented alpha-numerical characters in
54+
#' search terms. See also \link{search_ecotox} and \link{build_ecotox_sqlite}.
55+
#' }
56+
#' \item{
57+
#' When trying to reproduce database extractions from earlier database releases, filter out additions after
58+
#' that specific release. This can be done by adding output fields 'tests.modified_date', 'tests.created_date' and
59+
#' 'tests.published_date' to your search and compare those with the release date of the database you are trying to
60+
#' reproduce results from.
61+
#' }
62+
#' }
63+
#'
64+
#' @section Why isn't the database included in the package?:
65+
#' This package doesn't come bundled with a copy of the database which needs to be downloaded the first time the
66+
#' package is used. Why is this? There are several reasons:
67+
#' \itemize{
68+
#' \item{
69+
#' The database is maintained and updated by the \href{https://www.epa.gov/}{US EPA}. This process is and should be
70+
#' outside the sphere of influence of the package maintainer.
71+
#' }
72+
#' \item{
73+
#' Packages on CRAN are not allowed to contain large amounts of data. Publication on CRAN is key to control
74+
#' the quality of this package and therefore outweighs the convenience of having the data bundled with the package.
75+
#' }
76+
#' \item{
77+
#' The user has full control over the release version of the database that is being used.
78+
#' }
79+
#' }
80+
#'
81+
#' @section Why doesn't this package search the online ECOTOX database?:
82+
#' Although this is possible, there are several reasons why we opted for creating a local copy:
83+
#' \itemize{
84+
#' \item{
85+
#' The user would be restricted to the search options provided on the website (\href{https://cfpub.epa.gov/ecotox/}{ECOTOX}).
86+
#' }
87+
#' \item{
88+
#' The online database doesn't come with an API that would allow for convenient interface.
89+
#' }
90+
#' \item{
91+
#' The user is not limited by an internet connection and its bandwidth.
92+
#' }
93+
#' \item{
94+
#' Not all database fields can be retrieved from the online interface.
95+
#' }
96+
#' }
97+
#' @docType package
98+
#' @name ECOTOXr
99+
#' @author Pepijn de Vries
100+
#' @references
101+
#' Official US EPA ECOTOX website:
102+
#' \url{https://cfpub.epa.gov/ecotox/}
103+
NULL

R/database_access.r

+171
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
#' @rdname get_path
2+
#' @name get_ecotox_sqlite_file
3+
#' @export
4+
get_ecotox_sqlite_file <- function(path = get_ecotox_path(), version) {
5+
if (missing(version)) {
6+
version <- NULL
7+
} else {
8+
if (length(version) != 1) stop("Argument 'version' should hold a single element!")
9+
version <- as.Date(version, format = "%m_%d_%Y")
10+
}
11+
files <- attributes(.fail_on_missing(path))$files
12+
results <- nrow(files)
13+
files <- files[which(files$date == ifelse(is.null(version), max(files$date)[[1]], version)),]
14+
if (results > 1 && is.null(version)) {
15+
warning(sprintf("Multiple versions of the database found and not one specified. Using the most recent version (%s)",
16+
format(files$date, "%Y-%m-%d")))
17+
}
18+
return(file.path(files$path, files$database))
19+
}
20+
21+
#' Open or close a connection to the local ECOTOX database
22+
#'
23+
#' Wrappers for \code{\link[RSQLite:SQLite]{dbConnect}} and \code{\link[RSQLite:SQLite]{dbDisconnect}} methods.
24+
#'
25+
#' Open or close a connection to the local ECOTOX database. These functions are only required when you want
26+
#' to send custom queries to the database. For most searches the \code{\link{search_ecotox}} function
27+
#' will be adequate.
28+
#'
29+
#' @param path A \code{character} string with the path to the location of the local database (default is
30+
#' \code{\link{get_ecotox_path}()}).
31+
#' @param version A \code{character} string referring to the release version of the database you wish to locate.
32+
#' It should have the same format as the date in the EPA download link, which is month, day, year, separated by
33+
#' underscores ("\%m_\%d_\%Y"). When missing, the most recent available copy is selected automatically.
34+
#' @param conn An open connection to the ECOTOX database that needs to be closed.
35+
#' @param ... Arguments that are passed to \code{\link[RSQLite:SQLite]{dbConnect}} method
36+
#' or \code{\link[RSQLite:SQLite]{dbDisconnect}} method.
37+
#' @return A database connection in the form of a \code{\link[DBI]{DBIConnection-class}} object.
38+
#' The object is tagged with: a time stamp; the package version used; and the
39+
#' file path of the SQLite database used in the connection. These tags are added as attributes
40+
#' to the object.
41+
#' @rdname dbConnectEcotox
42+
#' @name dbConnectEcotox
43+
#' @examples
44+
#' \dontrun{
45+
#' ## This will only work when a copy of the database exists:
46+
#' con <- dbConnectEcotox()
47+
#'
48+
#' ## check if the connection works by listing the tables in the database:
49+
#' dbListTables(con)
50+
#'
51+
#' ## Let's be a good boy/girl and close the connection to the database when we're done:
52+
#' dbDisconnectEcotox(con)
53+
#' }
54+
#' @author Pepijn de Vries
55+
#' @export
56+
dbConnectEcotox <- function(path = get_ecotox_path(), version, ...) {
57+
f <- get_ecotox_sqlite_file(path, version)
58+
return(.add_tags(RSQLite::dbConnect(RSQLite::SQLite(), f, ...), f))
59+
}
60+
61+
#' @rdname dbConnectEcotox
62+
#' @name dbDisconnectEcotox
63+
#' @export
64+
dbDisconnectEcotox <- function(conn, ...) {
65+
RSQLite::dbDisconnect(conn, ...)
66+
}
67+
68+
#' Cite the downloaded copy of the ECOTOX database
69+
#'
70+
#' Cite the downloaded copy of the ECOTOX database and this package for reproducible results.
71+
#'
72+
#' When you download a copy of the EPA ECOTOX database using \code{\link{download_ecotox_data}()}, a BibTex file
73+
#' is stored that registers the database release version and the access (= download) date. Use this function
74+
#' to obtain a citation to that specific download.
75+
#'
76+
#' In order for others to reproduce your results, it is key to cite the data source as accurately as possible.
77+
#' @param path A \code{character} string with the path to the location of the local database (default is
78+
#' \code{\link{get_ecotox_path}()}).
79+
#' @param version A \code{character} string referring to the release version of the database you wish to locate.
80+
#' It should have the same format as the date in the EPA download link, which is month, day, year, separated by
81+
#' underscores ("\%m_\%d_\%Y"). When missing, the most recent available copy is selected automatically.
82+
#' @return Returns a \code{vector} of \code{\link{bibentry}}'s, containing a reference to the downloaded database
83+
#' and this package.
84+
#' @rdname cite_ecotox
85+
#' @name cite_ecotox
86+
#' @examples
87+
#' \dontrun{
88+
#' ## In order to cite downloaded database and this package:
89+
#' cite_ecotox()
90+
#' }
91+
#' @author Pepijn de Vries
92+
#' @export
93+
cite_ecotox <- function(path = get_ecotox_path(), version) {
94+
db <- get_ecotox_sqlite_file(path, version)
95+
bib <- gsub(".sqlite", "_cit.txt", db, fixed = T)
96+
if (!file.exists(bib)) stop("No bibentry reference to database download found!")
97+
result <- utils::readCitationFile(bib)
98+
return(c(result, utils::citation("ECOTOXr")))
99+
}
100+
101+
#' Get information on the local ECOTOX database when available
102+
#'
103+
#' Get information on how and when the local ECOTOX database was build.
104+
#'
105+
#' Get information on how and when the local ECOTOX database was build. This information is retrieved
106+
#' from the log-file that is (optionally) stored with the local database when calling \code{\link{download_ecotox_data}}
107+
#' or \code{\link{build_ecotox_sqlite}}.
108+
#' @param path A \code{character} string with the path to the location of the local database (default is
109+
#' \code{\link{get_ecotox_path}()}).
110+
#' @param version A \code{character} string referring to the release version of the database you wish to locate.
111+
#' It should have the same format as the date in the EPA download link, which is month, day, year, separated by
112+
#' underscores ("\%m_\%d_\%Y"). When missing, the most recent available copy is selected automatically.
113+
#' @return Returns a \code{vector} of \code{character}s, containing a information on the selected local ECOTOX database.
114+
#' @rdname get_ecotox_info
115+
#' @name get_ecotox_info
116+
#' @examples
117+
#' \dontrun{
118+
#' ## Show info on the current database (only works when one is downloaded and build):
119+
#' get_ecotox_info()
120+
#' }
121+
#' @author Pepijn de Vries
122+
#' @export
123+
get_ecotox_info <- function(path = get_ecotox_path(), version) {
124+
default <- "No information available\n"
125+
inf <- tryCatch({
126+
db <- get_ecotox_sqlite_file(path, version)
127+
gsub(".sqlite", ".log", db, fixed = T)
128+
}, error = function(e) return(default))
129+
if (file.exists(inf)) {
130+
inf <- readLines(inf)
131+
} else {
132+
inf <- default
133+
}
134+
cat(paste(inf, collapse = "\n"))
135+
return(invisible(inf))
136+
}
137+
138+
#' List the field names that are available from the ECOTOX database
139+
#'
140+
#' List the field names (table headers) that are available from the ECOTOX database
141+
#'
142+
#' This can be useful when specifying a \code{\link{search_ecotox}}, to identify which fields
143+
#' are available from the database, for searching and output.
144+
#' @param which A \code{character} string that specifies which fields to return. Can be any of:
145+
#' '\code{default}': returns default output field names; '\code{all}': returns all fields; or
146+
#' '\code{full}': returns all except fields from table 'dose_response_details'.
147+
#' @param include_table A \code{logical} value indicating whether the table name should be included
148+
#' as prefix. Default is \code{TRUE}.
149+
#' @return Returns a \code{vector} of type \code{character} containing the field names from the ECOTOX database.
150+
#' @rdname list_ecotox_fields
151+
#' @name list_ecotox_fields
152+
#' @examples
153+
#' ## Fields that are included in search results by default:
154+
#' list_ecotox_fields("default")
155+
#'
156+
#' ## All fields that are available from the ECOTOX database:
157+
#' list_ecotox_fields("all")
158+
#'
159+
#' ## All except fields from the table 'dose_response_details'
160+
#' ## that are available from the ECOTOX database:
161+
#' list_ecotox_fields("all")
162+
#' @author Pepijn de Vries
163+
#' @export
164+
list_ecotox_fields <- function(which = c("default", "full", "all"), include_table = TRUE) {
165+
which <- match.arg(which)
166+
result <- .db_specs$field_name
167+
if (include_table) result <- paste(.db_specs$table, result, sep = ".")
168+
if (which == "default") result <- result[.db_specs$default_output]
169+
if (which == "full") result <- result[.db_specs$table != "dose_response_details"]
170+
return(result)
171+
}

R/helpers.r

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
.add_tags <- function(x, sqlite) {
2+
if (missing(sqlite)) sqlite <- attributes(x)$database_file
3+
attributes(x)$date_created <- Sys.Date()
4+
attributes(x)$created_with <- sprintf("Package ECOTOXr v%s", utils::packageVersion("ECOTOXr"))
5+
attributes(x)$database_file <- sqlite
6+
return(x)
7+
}

R/imports.r

+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
.onAttach <- function(libname, pkgname){
2+
packageStartupMessage({
3+
if (check_ecotox_availability()) {
4+
crayon::green("ECOTOX database file located, you are ready to go!\n")
5+
} else {
6+
crayon::red("ECOTOX database file not present! Invoke download and database build using 'download_ecotox_data()'\n")
7+
}
8+
})
9+
}
10+
11+
#' @importFrom RSQLite dbExecute dbConnect dbDisconnect dbWriteTable
12+
NULL

0 commit comments

Comments
 (0)