update R files for documentation

rBatt · rBatt · commit 527cc1a1f8f0 · 2015-12-17T15:58:34.000-05:00
diff --git a/R/check_strat.R b/R/check_strat.R
@@ -8,7 +8,7 @@
 #' @param prompt_strat_tol Logical, if in \code{\link{interactive}} mode, prompt user for tolerance? If not, and if \code{append_keep_strat} is TRUE and \code{strat_tol} is left \code{\link{missing}}, then a default will be selected for \code{strat_tol}
 #' @param strat_tol The maximum number of unsampled years that is tolerated for any stratum before all rows corresponding to that stratum have their value in the "keep_strat" column set to FALSE
 #' @param plot Logical, visualize strata over time and the number of strata sampled in all but N years?
-#'
+#' 
 #' @details
 #' The aim of the function is to guide the selection of which strata to exclude from analysis because they are not sampled often enough. Having fewer gaps in your data set is better, but sometimes tolerating a tiny amount of missingness can result in huge increases in data; the visualization provided by this funciton will help gauge that tradeoff.
 #' 
@@ -22,8 +22,12 @@
 #' \dontrun{
 #' 	# trim shelf
 #' 	shelf <- trawlTrim("shelf", c.add=c("val.src", "flag"))
-#' 	shelf <- shelf[(taxLvl=="species" |taxLvl=="subspecies") & (flag!="bad" | is.na(flag)) & (val.src!="m3" | (!is.na(flag) & flag!="bad"))]
-#'
+#' 	shelf <- shelf[
+#' 		(taxLvl=="species" |taxLvl=="subspecies") & 
+#' 		(flag!="bad" | is.na(flag)) & 
+#' 		(val.src!="m3" | (!is.na(flag) & flag!="bad"))
+#' 	]
+#' 
 #' 	# aggregate species within a haul (among individuals)
 #' 	# this means taking the sum of many bio metrics
 #' 	shelf <- trawlAgg(
@@ -36,7 +40,7 @@
 #' 		metaCols=c("reg","common","year","datetime","stratum", "lon", "lat"),
 #' 		meta.action=c("unique1")
 #' 	)
-#'
+#' 
 #' 	# aggregate within a species within stratum
 #' 	# refer to the time_lvl column from previous trawlAgg()
 #' 	# can use mean for both bio and env
@@ -53,7 +57,10 @@
 #' 	)
 #' 	setnames(shelf, "time_lvl", "year")
 #' 	shelf[,year:=as.integer(as.character(year))]
-#' 	setcolorder(shelf, c("reg", "year", "stratum", "lon", "lat", "spp", "common", "btemp", "wtcpue", "nAgg"))
+#' 	setcolorder(shelf, c(
+#' 		"reg", "year", "stratum", "lon", "lat", 
+#' 		"spp", "common", "btemp", "wtcpue", "nAgg"
+#' 	))
 #' 	setkey(shelf, reg, year, stratum, spp, common)
 #' }
 #' 
diff --git a/R/clean.columns.R b/R/clean.columns.R
@@ -7,6 +7,8 @@
 #' 
 #' @template X_reg
 #' 
+#' @template clean_seeAlso_template
+#' 
 #' @import data.table
 #' @export clean.columns
 clean.columns <- function(X, reg=c("ai", "ebs", "gmex", "goa", "neus", "newf", "ngulf", "sa", "sgulf", "shelf", "wcann", "wctri")){
diff --git a/R/clean.format.R b/R/clean.format.R
@@ -6,10 +6,15 @@
 #' 
 #' @details
 #' It is this function that makes specific corrections for data entry errors. For example, in one region a tow duration of 3 should have been 30. In another region some of the \code{effort} values were entered as \code{0} or \code{NA}, but should have had a particular value.  
+#' 
 #' This function also ensures that longitude and latitude are in the same format among regions.  
+#' 
 #' Other data entry errors or necessary corrections are implemented here, too.
+#' 
 #' Dates are not thoroughly formatted here, except in some cases where getting a \code{year}, e.g., requires parsing values out of other columns. POSIX class dates not created.
 #' 
+#' @template clean_seeAlso_template
+#' 
 #' @import data.table
 #' @export clean.format
 clean.format <- function(X, reg=c("ai", "ebs", "gmex", "goa", "neus", "newf", "ngulf", "sa", "sgulf", "shelf", "wcann", "wctri")){
diff --git a/R/clean.names.R b/R/clean.names.R
@@ -7,6 +7,8 @@
 #' @details
 #' Regions tend to have very different column names for what are essentiallythe same measurements, descriptors, etc. This function tries to give everything a standardized name when it's appropriate.  
 #' 
+#' @template clean_seeAlso_template
+#' 
 #' @import data.table
 #' @export clean.names
 clean.names <- function(X, reg=c("ai", "ebs", "gmex", "goa", "neus", "newf", "ngulf", "sa", "sgulf", "shelf", "wcann", "wctri")){
diff --git a/R/clean.tax.R b/R/clean.tax.R
@@ -10,6 +10,8 @@
 #' 
 #' The \code{ref} column in the output is the name of the original species name/ taxonomic identifier.
 #' 
+#' @template clean_seeAlso_template
+#' 
 #' @import data.table
 #' @export clean.tax
 clean.tax <- function(X, reg=c("ai", "ebs", "gmex", "goa", "neus", "newf", "ngulf", "sa", "sgulf", "shelf", "wcann", "wctri")){
diff --git a/R/clean.trimCol.R b/R/clean.trimCol.R
@@ -22,6 +22,8 @@
 #' Names passed to \code{c.drop} take precedence over names passed to \code{cols} or \code{c.add}; e.g., if the same name is passed to both \code{c.drop} and \code{c.add}, it will not be included in the final data.table. The choice is somewhat arbitrary, although giving preference to dropping names is consistent with the intended use of the function.  
 #' 
 #' Finally, duplicate columns will not be returned if a name is supplied to both \code{cols} and to \code{c.add}.  
+#' 
+#' @template clean_seeAlso_template
 #'   
 #' @examples
 #' # use a subset of Aleutian Islands
diff --git a/R/clean.trimRow.R b/R/clean.trimRow.R
@@ -7,6 +7,8 @@
 #' @details
 #' Recommended rows to drop according to Malin's original scripts and what's in the OceanAdapt repo. Rows are not actually dropped; rather, a column called \code{keep.row} is added to the data.table; when \code{keep.row} is \code{FALSE}, it is recommended that the row be dropped.
 #' 
+#' @template clean_seeAlso_template
+#' 
 #' @export clean.trimRow
 clean.trimRow <- function(X, reg=c("ai", "ebs", "gmex", "goa", "neus", "newf", "ngulf", "sa", "sgulf", "shelf", "wcann", "wctri")){
 	
diff --git a/R/formatStrat.R b/R/formatStrat.R
@@ -38,6 +38,8 @@ ll2km <- function(x,y){
 #' @details
 #' If \code{frac} is 1, then round to the nearest whole number. If \code{frac} is 0.5, then snap everything to the nearest half a degree grid. If 10, then snap to the nearest multiple of 10, plus 5 (6 goes to 5, 8 goes to 5, 10 goes to 15, 21 goes to 25, etc). Handy if you have lat-lon data that you want to redefine as being on a grid.
 #' 
+#' @seealso \code{\link{ll2strat}}
+#' 
 #' @export
 roundGrid <- function(x, frac=1){
 	# if frac is 1, then place in a 1º grid
@@ -65,126 +67,6 @@ ll2strat <- function(lon, lat, gridSize=1){
 }
 
 
-
-
-
-
-# save tolerance: "/Users/Battrd/Documents/School&Work/pinskyPost/trawl/Data/stratTol/"
-# save tolerance figures: "/Users/Battrd/Documents/School&Work/pinskyPost/trawl/Figures/stratTolFigs"
-
-# Function can operate in 1 of 2 ways
- # 1) don't save .txt or figures, don't display figures, don't ask for the tolerance (just read in from .txt file), but change stratum in data.table
- # 2) Figures of tolerance are saved, figures are displayed, .txt of tolerance is saved, and stratum is change in data.table
-#' Make Strata
-#' 
-#' Function to make strata for a region, examing missingness
-#' 
-#' @param x a data.table of trawl data
-#' @param regName the name of the region
-#' @param doLots option to specify tolerance for missingness; otherwise reads in file for it
-#' 
-#' @section Warning:  
-#' This function is not ready to be used. Saves figures, has hard-coded paths, looks for reference files outisde of package, etc.
-#' 
-makeStrat <- function(x, regName, doLots=NULL){
-	
-	stopifnot(is.data.table(x))
-	
-	tolLoc <- "/Users/Battrd/Documents/School&Work/pinskyPost/trawl/Results/stratTol/"
-	figLoc <- "/Users/Battrd/Documents/School&Work/pinskyPost/trawl/Figures/stratTolFigs/"
-	tol.txt <- paste0(regName,"Tol.txt")
-	
-	if(is.null(doLots)){
-		if(!tol.txt%in%list.files(tolLoc)){
-			doLots <- TRUE
-		}else{
-			doLots <- FALSE
-		}
-	}
-	
-	if(!doLots & !tol.txt%in%list.files(tolLoc)){
-		stop("cannot set doLots to FALSE b/c tolerance files not found")
-	}
-	
-	
-	# ==================
-	# = Create Stratum =
-	# ==================
-	nyears <- x[,length(unique(year))]
-	x[,strat2:=ll2strat(lon, lat)]
-
-	
-	if(doLots){
-		# ===============
-		# = Make Figure =
-		# ===============
-		lat.range <- x[,range(lat, na.rm=TRUE)]
-		lon.range <- x[,range(lon, na.rm=TRUE)]
-	
-		nstrata <- c()
-		nstrata.orig <- c()
-		for(i in 0:(nyears-1)){
-			nstrata[i+1] <- x[,sum(colSums(table(year, strat2)>0)>=(nyears-i))]
-			nstrata.orig[i+1] <- x[,sum(colSums(table(year, stratum)>0)>=(nyears-i))]
-		}
-	
-		# Initialize graphical device
-		png(paste0(figLoc,paste0(regName,".StratTol.png")), width=7, height=8.5, res=150, units="in")
-		layout(matrix(c(rep(1,3), rep(2,3), rep(1,3), rep(2,3), 3:8),ncol=3))
-		par(mar=c(2.0,1.75,1,0.1), mgp=c(1,0.15,0), tcl=-0.15, ps=8, cex=1, family="Times")
-	
-		# Tolerance vs. Missingness Panels
-		plot(0:(nyears-1), nstrata, type="o", xlab="threshold # years missing", ylab="# strata below threshold missingness", main="# strata vs. tolerance of missingness")
-		lines(0:(nyears-1), nstrata.orig, type="o", col="red")
-		legend("topleft", legend=c("original strata definition", "1 degree grid definition"), lty=1, pch=21, col=c("red","black"))
-		image(x=x[,sort(unique(year))], y=x[,1:length(unique(strat2))], z=x[,table(year, strat2)>0], xlab="year", ylab="1 degree stratum ID", main="stratum presence vs. time; red is absent")
-
-		# Tolerance Maps
-		par(mar=c(1.25,1.25,0.1,0.1), mgp=c(1,0.15,0), tcl=-0.15, ps=8, cex=1, family="Times")
-		tol0 <- x[strat2%in%x[,names(colSums(table(year, strat2)>0))[colSums(table(year, strat2)>0)>=(nyears-0)]]]
-		tol0[,c("lat","lon"):=list(roundGrid(lat),roundGrid(lon))]
-		for(i in 1:6){
-			tolC <- x[strat2%in%x[,names(colSums(table(year, strat2)>0))[colSums(table(year, strat2)>0)>=(nyears-i)]]]
-			tolC[,c("lat","lon"):=list(roundGrid(lat),roundGrid(lon))]
-			setkey(tolC, lat, lon)
-			tolC <- unique(tolC)
-			tolC[,plot(lon, lat, xlab="", ylab="", xlim=lon.range, ylim=lat.range, col=1+(!paste(lon,lat)%in%tol0[,paste(lon,lat)]))]
-			legend("topleft", paste("missing years =",i), inset=c(-0.1, -0.12), bty="n")
-			
-			tol0 <- tolC
-		}
-		dev.off()
-	
-	
-		# ==========================================
-		# = Determine and Save Extent of Tolerance =
-		# ==========================================
-		toleranceChoice <- as.integer(readline("How many years missing should be tolerated?"))
-		write.table(cbind("region"=regName, "tolerance"=toleranceChoice), file=paste0(tolLoc,tol.txt), row.names=FALSE)
-	
-	}else{
-		# ===============================
-		# = Read in Extent of Tolerance =
-		# ===============================
-		toleranceChoice <- as.integer(read.table(file=paste0(tolLoc,tol.txt), header=TRUE)[,"tolerance"])
-	}
-	
-	
-	
-	
-	# ===================================
-	# = Trim Strata (line 160 of malin) =
-	# ===================================
-	goodStrat2 <- x[,names(colSums(table(year, strat2)>0))[colSums(table(year, strat2)>0)>=(nyears-toleranceChoice)]]
-	x <- x[strat2%in%goodStrat2]
-	x[,stratum:=strat2]
-	x[,strat2:=NULL]
-	x
-	
-	
-}
-
-
 #' Calculate Area
 #' 
 #' Calculate the area of a region defined by a vector of lon-lat coordinates
diff --git a/R/formatValue.R b/R/formatValue.R
@@ -18,6 +18,8 @@
 #' 
 #' @return a character vector that has been altered by removing content unlikely to belong to a species name.
 #' 
+#' @seealso \code{\link{clean.tax}} \code{\link{clean.trimRow}}
+#' 
 #' @export
 cull <- function(x) cullPost2(cullParen(cullSp(fixCase(cullExSpace(x)))))
 
@@ -55,7 +57,7 @@ cullPost2 <- function(x){
 #' 
 #' @return
 #' logical vector of same length as x
-#' @export is.species
+#' @export
 is.species <- function(x){
 	sapply(strsplit(x, " "), length) >= 2
 }
@@ -72,6 +74,8 @@ is.species <- function(x){
 #' @return
 #' Nothing, but has the side affect of impacting whatever object was passed as \code{x}.
 #' 
+#' @seealso \code{\link{rm9s}} \code{\link{clean.format}}
+#' 
 #' @export
 rmWhite <- function(x){
 	stopifnot(is.data.table(x))
@@ -90,6 +94,9 @@ rmWhite <- function(x){
 #' All instances of -9999 (numeric or integer) are replaced as NA's of the appropriate class. Checks also for class "integer64".
 #' 
 #' @return Nothing, but affects data.table passed as \code{x}.
+#' 
+#' @seealso \code{\link{rmWhite}} \code{\link{clean.format
+#' 
 #' @export
 rm9s <- function(x){
 	stopifnot(is.data.table(x))
@@ -115,6 +122,8 @@ rm9s <- function(x){
 #' @details
 #' Dual functionality: turn factors into a characters, and ensure those characters are encoded as ASCII. Converting to ASCII relies on the \code{stringi} package, particularly  \code{stringi::stri_enc_mark} (for detection of non-ASCII) and \code{stringi::stri_enc_toascii} (for conversion to ASCII).
 #' 
+#' This function is used when resaving data sets when building the package to ensure that it is portable.
+#' 
 #' @return NULL (invisibly), but affects the contents of the data.table whose name was passed to this function
 #' 
 #' @export
@@ -161,18 +170,28 @@ makeAsciiChar <- function(X){
 #' @details
 #' See \code{\link{lubridate::parse_date_time}} for a summary of how to specify \code{orders}. Examples show a conversion of variable formats. The only reason this function exists is that \code{parse_date_time} did not handle the century very well on some test data.
 #' 
-#' The default \code{orders} is \code{paste0(rep(c("ymd", "mdy", "Ymd", "mdY"),each=5), c(" HMS"," HM", " H", "M", ""))}
+#' The default \code{orders} is 
+#' \code{paste0(
+#' 	rep(c("ymd", "mdy", "Ymd", "mdY"),each=5), 
+#' 	c(" HMS"," HM", " H", "M", "")
+#' )}
 #' 
 #' @section Note:
 #' In 2056 I will turn 70. At that point, I'll still be able to assume that a date of '57 associated with an ecological field observation was probably made in 1957. If I see '56, I'll round it up to 2056. I'll probably retire by the time I'm 70, or hopefully someone else will have cleaned up the date formats in all ecological data sets by that time. Either way, it is in my own self interest to set the default as `year=1957`; I do not currently use very many data sets that begin before 1957 (and none of such vast size that I need computer code to automate the corrections), and as a result, the default 1957 will continue to work for me until I retire. After that, a date of '57 that was actually taken in 2057 will have its date reverted to 1957. Shame on them.
-#'
+#' 
 #' Oh, and the oldest observation in this package is 1958, I believe (the soda bottom temperatures). As for trawl data, NEUS goes back to 1963. So 1957 is a date choice that will work for all dates currently in this package, and given a 1 year buffer, maximizes the duration of the appropriateness of this default for these data sets into the future.
 #' 
 #' @return a vector of dates formatted as POSIXct
-
+#' 
 #' @examples
-#' test <- c("2012-11-11", "12-5-23", "12/5/86", "2015-12-16 1300", "8/6/92 3:00", "11/6/14 4", "10/31/14 52", "06/15/2014 14:37:01", "2/10/06", "95-06-26", "82-10-03", "11/18/56 2:30:42pm", "11/18/57 1:00", "11/18/58")
-#' getDate(test, orders=orders, truncated=3) # note that default orders ignores the pm!
+#' test <- c(
+#' 	"2012-11-11", "12-5-23", "12/5/86",
+#' 	"2015-12-16 1300", "8/6/92 3:00", 
+#' 	"11/6/14 4", "10/31/14 52", "06/15/2014 14:37:01", 
+#' 	"2/10/06", "95-06-26", "82-10-03", 
+#' 	"11/18/56 2:30:42pm", "11/18/57 1:00", "11/18/58"
+#' )
+#' getDate(test, orders=orders, truncated=3) # default orders ignores pm
 #' 
 #' @export
 getDate <-  function(x, orders, year=1957, tz="GMT", ...){
diff --git a/R/helperFile.R b/R/helperFile.R
@@ -11,6 +11,9 @@
 #' @details uses data.table and LaF packages. The read is performed entirely by \code{LaF:laf_open_fwf}, but the output is converted to a data.table.
 #' 
 #' @return a data.table
+#' 
+#' @seealso \code{\link{read.zip}} \code{\link{read.trawl}}
+#' 
 #' @export fread.fwf
 fread.fwf <- function(..., cols, column_types, column_names){
 	# if(!requireNamespace("LaF", quietly = TRUE)){
@@ -57,6 +60,8 @@ fread.fwf <- function(..., cols, column_types, column_names){
 #' 
 #' @return a data.table, or list of data.tables. The name of each element of the list is the name of the file within the .zip file.
 #' 
+#' @seealso \code{\link{fread.fwf}} \code{\link{read.trawl}}
+#' 
 #' @export read.zip
 read.zip <- function(zipfile, pattern="\\.csv$", SIMPLIFY=TRUE, use.fwf=FALSE, ...){
 
diff --git a/R/helperMisc.R b/R/helperMisc.R
@@ -1,5 +1,6 @@
 
 #' String to call
+#' 
 #' Convert a character vector to a call of the desired class
 #' 
 #' @param x a character string (e.g., of column names in a data.table)
@@ -115,6 +116,8 @@ orderD1 <- function(x, ord){
 #' una(c(1:3,NA))
 #' una(c(1:3,NA), na.rm=TRUE)
 #' 
+#' @seealso \code{\link{lu}}
+#' 
 #' @export
 una <- function(x, na.rm=FALSE, ...){
 	nax <- is.na(x)
@@ -181,7 +184,9 @@ lu <- function(x, ...) length(una(x, ...))
 #' 
 #' @details
 #' Expects that duplicate column names are given the suffixes ".x" and ".y". The ".y" columns are dropped (thus, columns from the second data set in \code{merge(x, y)}), and the suffix dropped from the name of the ".x" column.
+#' 
 #' The function also checks to make sure that the column being dropped has the same content as the column being retained. If the ".x" column had NA values where the ".y" column had non-NA's, then the ".x" column gets teh values of the ".y" column for those rows. If there are diffences between the two columns for non-NA cases, a \code{\link{message}} is printed indicating as much. 
+#' 
 #' A message is also printed if this function does not find any columns to trim.
 #' 
 #' @return NULL or nothing; however, has the side affect of change the content of \code{X}.
@@ -273,7 +278,7 @@ trim.autoColumn <- function(X){
 #' }
 #' Fuzzy matching performed with \code{\link{agrep}}, with arguments \code{ignore.case=TRUE, max.distance=0.25}.
 #' @section Warning:
-#' I am suspicous that the values returned in \code{tbl.ref} may be in accurate. However, this quality, and the function in general, has not been thoroughly tested. Although use-cases have given desirable results, albeit I think that the fuzzy matching can be a bit too fuzzy (finding matches where there shouldn't be any). Be aware.
+#' I am suspicous that the values returned in \code{tbl.ref} may be inaccurate. However, this quality, and the function in general, has not been thoroughly tested. Although use-cases have given desirable results, albeit I think that the fuzzy matching can be a bit too fuzzy (finding matches where there shouldn't be any). Be aware.
 #' 
 #' @examples
 #' library(data.table)
diff --git a/R/mpick.R b/R/mpick.R
@@ -9,6 +9,8 @@
 #' @param screen Logical If TRUE (default) then before random searching, will screen out factor levels that definitely cannot satisfy the full sweet of conditions in \code{p}. Can be a little slow, but is extremely effect when most combinations of factors in \code{p} do not exist. See 'Details'.  
 #' @param dt Logical, if TRUE, returns a data.table; if FALSE (default), returns an index of that data.table?  
 #' 
+#' @seealso \code{\link{pick}}
+#' 
 #' @details
 #' This problem may ultimately be better suited for a real optimization algorithm. Right now, relies and arbitrary guess-and-check. Does not "forget" failed guesses (only specific combinations are worth forgetting, and for large data sets there's a very low probability of happening upon same combination). Thus, this is a very brute-force approach, with the exception of the checking done when \code{screen=TRUE}.
 #' 
diff --git a/R/pick.R b/R/pick.R
@@ -11,6 +11,8 @@
 #' @details
 #' The anticipated use-case related to trawl data is subsampling a data set.
 #' 
+#' @seealso \code{\link{mpick}}
+#' 
 #' @return Either a logical vector of same length as \code{x}, or a a vector with same class as \code{x} and length \code{n}
 #' 
 #' @examples
diff --git a/R/trawlAgg.R b/R/trawlAgg.R
diff --git a/R/trawlTrim.R b/R/trawlTrim.R
diff --git a/man-roxygen/clean_seeAlso_template.R b/man-roxygen/clean_seeAlso_template.R