Skip to content

[Feature]: Data Documentation Utility Helpers #1

@jimbrig

Description

@jimbrig

Create helper functions to aid in the annoyance of creating data documentation roxygen2 comments, data dictionaries, codebooks, reports, visualizations, and metadata files.

  • document_dataset: given provided data_obj (i.e. data.frame or tibble), description, source, column_names and column_descriptions, output the roxygen2 skeleton for the dataset to an R/data.R file:
#' Document Datasets
#'
#' @description
#' Helper function to auto-generate the necessary `roxygen2` documentation for
#' datasets included/exported with an R package.
#'
#' @param data_obj The data object to be documented. Should be a `data.frame` or
#'   [tibble::tibble()], or any other object that can be coerced to a `data.frame`
#'   or `list`.
#' @param name Name of the dataset. If not provided, the name of the `data_obj`
#'   object will be used.
#' @param description Description of the dataset. If not provided, a
#'   placeholder will be used.
#' @param source The source of the dataset. If not provided, a
#'   placeholder will be used.
#' @param file Path to the file where the documentation will be written. If not provided, the
#'   documentation will be written to `R/data.R` by default. If you want to
#'   document individual datasets in separate files, you can provide a path to
#'   the file where the documentation will be written. The file will be created
#'   if it does not exist.
#' @param column_descriptions A named list of column descriptions for the dataset.
#'   The names should match the column names of the dataset. If not provided, a
#'   placeholder will be used.
#' @param ... Additional arguments not in use, yet.
#'
#' @return Invisibly returns the documentation string.
#'
#' @example examples/ex_document_datasets.R
#'
#' @export
document_data <- function(
  data_obj,
  name = deparse(substitute(data_obj)),
  description = "<Add a description here>",
  source = "<Add a source here>",
  file = "R/data.R",
  column_descriptions = NULL,
  ...
) {

  # validate data_obj and name
  if (!exists(deparse(substitute(data_obj)))) {
    rlang::abort("The dataset does not exist in the current environment.")
  }

  if (!is.data.frame(data_obj) && !inherits(data_obj, "tbl_df")) {
    rlang::abort("The provided object is not a data frame or tibble object.")
  }

  dataset_name <- deparse(substitute(x))
  data_description <- get_dataset_description(x, dataset_name)


  file_name <- paste0("./", dataset_name, ".R")
  cat(data_description, file = file_name)

  # Coerce the data to a data.frame
  dat <- as.data.frame(data_obj)

  # Check if the column descriptions are provided
  if (!is.null(column_descriptions)) {
    if (!is.list(column_descriptions)) {
      rlang::abort("Column descriptions must be a named list.")
    }
    if (length(column_descriptions) != ncol(dat)) {
      rlang::abort("Number of column descriptions must match the number of columns in the dataset.")
    }
  } else {
    column_descriptions <- rep("<Add a description here>", ncol(dat))
    names(column_descriptions) <- names(dat)
  }

  # title
  title <- paste0("#' @title ", name, "\n")
  description <- paste0("#' @description ", description, "\n")
  format <- paste0("#' @format A data frame with ", nrow(dat), " rows and ", ncol(dat), " columns.\n")

  # Create the documentation string
  doc <- paste0(
    "#' @title ", title, "\n",
    "#' @description ", description, "\n",
    "#' @usage data(", name, ")\n",
    "#' @format A data frame with ", nrow(dat), " rows and ", ncol(dat), " columns.\n",
    "#' @source <Add a source here>\n",
    "#' @export\n",
    " '", name, "'"
  )

  # Write the documentation to the file
  if (write_to_file) {
    cat(doc, file = file, append = TRUE)
  }

  # Return the documentation string
  invisible(doc)

}

Metadata

Metadata

Assignees

Labels

featureNew enhancements and features.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions