Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reframed JSON Tutorial #11

Merged
merged 11 commits into from
May 1, 2024
8 changes: 6 additions & 2 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
on:
workflow_dispatch:
push:
branches: main
branches: [main, master]
pull_request:
branches: [main, master]
release:
types: [published]
workflow_dispatch:

name: Quarto Publish

Expand Down
3 changes: 2 additions & 1 deletion _freeze/best_practices/execute-results/html.json

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
{
"hash": "0b74bcb0ee0f9019d9c30ea7a67b3169",
"result": {
"markdown": "\nThis section contains our recommendations for handling **file paths**. When you code collaboratively (e.g., with GitHub), accounting for the difference between your folder structure and those of your colleagues becomes critical. Ideally your code should be completely agnostic about (1) the operating system of the computer it is running on (i.e., Windows vs. Mac) and (2) the folder structure of the computer. We can--fortunately--handle these two considerations relatively simply.\n\nThis may seem somewhat dry but it is worth mentioning that failing to use relative file paths is a significant hindrance to reproducibility (see [Trisovic et al. 2022](https://www.nature.com/articles/s41597-022-01143-6)).\n\n### 1. Preserve File Paths as Objects Using `file.path`\n\nDepending on the operating system of the computer, the slashes between folder names are different (`\\` versus `/`). The `file.path` function automatically detects the computer operating system and inserts the correct slash. We recommend using this function and assigning your file path to an object.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_path <- file.path(\"path\", \"to\", \"my\", \"file\")\nmy_path\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"path/to/my/file\"\n```\n:::\n:::\n\n\nOnce you have that path object, you can use it everywhere you import or export information to/from the code (with another use of `file.path` to get the right type of slash!).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Import\nmy_raw_data <- read.csv(file = file.path(my_path, \"raw_data.csv\"))\n\n# Export\nwrite.csv(x = data_object, file = file.path(my_path, \"tidy_data.csv\"))\n```\n:::\n\n\n### 2. Create Necessary Sub-Folders in the Code with `dir.create`\n\nUsing `file.path` guarantees that your code will work regardless of the upstream folder structure but what about the folders that you need to export or import things to/from? For example, say your `graphs.R` script saves a couple of useful exploratory graphs to the \"Plots\" folder, how would you guarantee that everyone running `graphs.R` *has* a \"Plots folder\"? You can use the `dir.create` function to create the folder in the code (and include your path object from step 1!).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Create needed folder\ndir.create(path = file.path(my_path, \"Plots\"), showWarnings = FALSE)\n\n# Then export to that folder\nggplot2::ggsave(filename = file.path(my_path, \"Plots\", \"my_plot.png\"))\n```\n:::\n\n\nThe `showWarnings` argument of `dir.create` simply warns you if the folder you're creating already exists or not. There is no negative to \"creating\" a folder that already exists (nothing is overwritten!!) but the warning can be confusing so we can silence it ahead of time.\n\n### File Paths Summary\n\nWe strongly recommend following these guidelines so that your scripts work regardless of (1) the operating system, (2) folders \"upstream\" of the working directory, and (3) folders within the project. This will help your code by flexible and reproducible when others are attempting to re-run your scripts!\n\nAlso, for more information on how to read files in cloud storage locations such as Google Drive, Box, Dropbox, etc., please refer to our [Other Tutorials](https://nceas.github.io/scicomp.github.io/tutorials.html).",
"engine": "knitr",
"markdown": "\nThis section contains our recommendations for handling **file paths**. When you code collaboratively (e.g., with GitHub), accounting for the difference between your folder structure and those of your colleagues becomes critical. Ideally your code should be completely agnostic about (1) the operating system of the computer it is running on (i.e., Windows vs. Mac) and (2) the folder structure of the computer. We can--fortunately--handle these two considerations relatively simply.\n\nThis may seem somewhat dry but it is worth mentioning that failing to use relative file paths is a significant hindrance to reproducibility (see [Trisovic et al. 2022](https://www.nature.com/articles/s41597-022-01143-6)).\n\n### 1. Preserve File Paths as Objects Using `file.path`\n\nDepending on the operating system of the computer, the slashes between folder names are different (`\\` versus `/`). The `file.path` function automatically detects the computer operating system and inserts the correct slash. We recommend using this function and assigning your file path to an object.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_path <- file.path(\"path\", \"to\", \"my\", \"file\")\nmy_path\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] \"path/to/my/file\"\n```\n\n\n:::\n:::\n\n\nOnce you have that path object, you can use it everywhere you import or export information to/from the code (with another use of `file.path` to get the right type of slash!).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Import\nmy_raw_data <- read.csv(file = file.path(my_path, \"raw_data.csv\"))\n\n# Export\nwrite.csv(x = data_object, file = file.path(my_path, \"tidy_data.csv\"))\n```\n:::\n\n\n### 2. Create Necessary Sub-Folders in the Code with `dir.create`\n\nUsing `file.path` guarantees that your code will work regardless of the upstream folder structure but what about the folders that you need to export or import things to/from? For example, say your `graphs.R` script saves a couple of useful exploratory graphs to the \"Plots\" folder, how would you guarantee that everyone running `graphs.R` *has* a \"Plots folder\"? You can use the `dir.create` function to create the folder in the code (and include your path object from step 1!).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Create needed folder\ndir.create(path = file.path(my_path, \"Plots\"), showWarnings = FALSE)\n\n# Then export to that folder\nggplot2::ggsave(filename = file.path(my_path, \"Plots\", \"my_plot.png\"))\n```\n:::\n\n\nThe `showWarnings` argument of `dir.create` simply warns you if the folder you're creating already exists or not. There is no negative to \"creating\" a folder that already exists (nothing is overwritten!!) but the warning can be confusing so we can silence it ahead of time.\n\n### File Paths Summary\n\nWe strongly recommend following these guidelines so that your scripts work regardless of (1) the operating system, (2) folders \"upstream\" of the working directory, and (3) folders within the project. This will help your code by flexible and reproducible when others are attempting to re-run your scripts!\n\nAlso, for more information on how to read files in cloud storage locations such as Google Drive, Box, Dropbox, etc., please refer to our [Other Tutorials](https://nceas.github.io/scicomp.github.io/tutorials.html).",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{
"hash": "ea1d300bfc65af384c36509a735900c2",
"result": {
"engine": "knitr",
"markdown": "\nLoading packages / libraries in R can be cumbersome when working collaboratively because there is no guarantee that you all have the same packages installed. While you could comment-out an `install.packages()` line for every package you need for a given script, we recommend using the R package `librarian` to greatly simplify this process!\n\n`librarian::shelf()` accepts the names of all of the packages--either CRAN or GitHub--installs those that are missing in that particular R session and then attaches all of them. See below for an example:\n\nTo load packages typically you'd have something like the following in your script:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Install packages (if needed)\n# install.packages(\"tidyverse\")\n# install.packages(\"devtools\")\n# devtools::install_github(\"NCEAS/scicomptools\")\n\n# Load libraries\nlibrary(tidyverse); library(scicomptools)\n```\n:::\n\n\nWith `librarian::shelf()` however this becomes *much* cleaner! In addition to being fewer lines, using `librarian` also removes the possibility that someone running your code misses one of the packages that your script depends on and then the script breaks for them later on. `librarian::shelf()` automatically detects whether a package is installed, installs it if necessary, and then attaches the package.\n\nIn essence, `librarian::shelf()` wraps `install.packages()`, `devtools::install_github()`, and `library()` into a single, human-readable function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Install and load packages!\nlibrarian::shelf(tidyverse, NCEAS/scicomptools)\n```\n:::\n\n\nWhen using `librarian::shelf()`, package names do not need to be quoted and GitHub packages can be installed without the additional steps of installing the `devtools` package and using `devtools::install_github()` instead of `install.packages()`.\n",
"supporting": [],
"filters": [
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{
"hash": "87d419837b2e5b1cb69cb0bc6e260ff3",
"result": {
"engine": "knitr",
"markdown": "\nThe following steps include a sequence of command line operations that will be relayed in code chunks below. **Unless otherwise stated, all of the following code should be run in \"Terminal\".**\n\nIf you didn't check the \"Create a git repository\" button while creating the R project, you'll need to do that via the command line now. **If you did check that box, you should skip this step!**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Start a git repository on the \"main\" branch\ngit init -b main\n```\n:::\n\n\n**Stage all of the files in your project to the git repository.** This includes the .yml file, all .qmd files and all of their rendered versions created when you ran `quarto render` earlier. This code is equivalent to checking the box for the files in the \"Git\" pane of RStudio.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Stage all files\ngit add .\n```\n:::\n\n\nOnce everything has been staged, **you now must commit those staged files** with a message.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Commit all files with the message in quotes\ngit commit -m \"Initial commit\"\n```\n:::\n\n\nNow that your project files have been committed, you need to **tell your computer where you will be pushing to and pulling from.** Paste the link you copied at the end of the \"Make a New GitHub Repository\" into the code shown in the chunk below (instead of `GITHUB_URL`) and run it.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Tell your computer which GitHub repository to connect to\ngit remote add origin GITHUB_URL\n```\n:::\n\n\n**Verify that URL** before continuing.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Confirm that URL worked\ngit remote -v\n```\n:::\n\n\nFinally, **push your commited changes** to the repostory that you set as the remote in the preceding two steps.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Push all of the content to the main branch\ngit push -u origin main\n```\n:::\n\n\nNow, **go back to GitHub** and refresh the page to see your project content safe and sound in your new GitHub repository!\n\n<img src=\"images/tutorial_github-modules/git-github-connect-1.png\" width = \"100%\" />\n",
"supporting": [],
"filters": [
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
{
"hash": "3fde62c40fe7ac0792286608d3336563",
"result": {
"engine": "knitr",
"markdown": "\nIn order to connect R with a GoogleDrive, we'll need to authorize `googledrive` to act on our behalf. This only needs to be done once (per computer) so follow along and you'll be building GoogleDrive into your workflows in no time!\n\nFirst, **install the `googledrive` and `httpuv` R packages**. The `googledrive` package's need is self-evident while the `httpuv` package makes the following steps a little easier than `googledrive` makes it alone. Be sure to load the `googledrive` package after you install it!\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Install packages\ninstall.packages(c(\"googledrive\", \"httpuv\"))\n\n# Load them\nlibrary(googledrive)\n```\n:::\n\n\nOnce you've installed the packages we can begin the authentication in R using the `drive_auth` function in the `googledrive` package.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngoogledrive::drive_auth(email = \"enter your gmail here!\")\n```\n:::\n\n\nIf this is your *first time* using `googledrive`, `drive_auth` will kick you to a new tab of your browser (see below for a screen grab of that screen) where you can pick which Gmail you'd like to connect to R.\n\n<p align=\"center\">\n<img src=\"images/tutorial_drive-auth/drive-auth-1.png\" width = \"50%\" />\n</p>\n\n**Click the Gmail you want to use** and you will get a second screen where Google tells you that \"Tidyverse API\" wants access to your Google Account. This message is followed by three checkboxes, the first two are grayed out but the third is unchecked.\n\n<p align=\"center\">\n<img src=\"images/tutorial_drive-auth/drive-auth-2.png\" width = \"50%\" />\n</p>\n\n:::callout-important\n### NOTE\nThis next bit is vitally important so *carefully* read and follow the next instruction!\n:::\n\nIn this screen, **you must check the unchecked box** to be able to use the `googledrive` R package. If you do not check this box all attempts to use `googledrive` functions will get an error that says \"insufficient permissions\".\n\n<p align=\"center\">\n<img src=\"images/tutorial_drive-auth/drive-auth-3.png\" width = \"50%\" />\n</p>\n\nWhile granting access to \"see, edit, create, and \"delete\" all of your Google Drive files\" sounds like a significant security risk, those powers are actually why you're using the `googledrive` package in the first place! You want to be able to download existing Drive files, change them in R on your computer, and then put them back in Google Drive which is exactly what is meant by \"see, edit, create, and delete\".\n\nAlso, this power *only applies to the computer you're currently working on!* Granting access on your work computer allows **only** that computer to access your Drive files. So don't worry about giving access to your Drive to the whole world, that is protected by the same failsafes that you use when you let your computer remember a password to a website you frequent.\n\n*After* you've checked the authorization box, **scroll down and click the \"Continue\" button**.\n\n<p align=\"center\">\n<img src=\"images/tutorial_drive-auth/drive-auth-4.png\" width = \"50%\" />\n</p>\n\nThis should result in a plain text page that tells you to close this window and return to R. If you see this message you are ready to use the `googledrive` package!\n\n<p align=\"center\">\n<img src=\"images/tutorial_drive-auth/drive-auth-5.png\" width = \"75%\" />\n</p>",
"supporting": [],
"supporting": [
"googledrive-auth_files"
],
"filters": [
"rmarkdown/pagebreak.lua"
],
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{
"hash": "0efea0999008606750af05957b3fc20e",
"result": {
"engine": "knitr",
"markdown": "\nNow that you've authorized the `googledrive` package, you can start downloading the Google Drive files you need through R! Let's say that you want to download a csv file from a folder or shared drive. You can save the URL of that folder/shared drive to a variable. \n\nThe `googledrive` package makes it straightforward to access Drive folders and files with the `as_id` function. This function allows the full link to a file or folder to serve as a direct connection to that file/folder. Most of the other `googledrive` functions will require a URL that is wrapped with `as_id` in this way. You would replace \"your url here\" with the actual link but make sure it is in quotation marks.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndrive_url <- googledrive::as_id(\"your url here\")\n```\n:::\n\n\nTo list all the contents of this folder, we can use the `drive_ls` function. You will get a dataframe-like object of the files back as the output. An example is shown below in the screenshot. Here, this Google Drive folder contains 4 csv files: `ingredients.csv`, `favorite_soups.csv`, `favorite_fruits.csv` and `favorite_desserts.csv`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndrive_folder <- googledrive::drive_ls(path = drive_url)\ndrive_folder\n```\n:::\n\n\n<p align=\"center\">\n<img src=\"images/tutorial_drive-auth/drive-download-0.png\" width = \"90%\" />\n</p>\n\nIf it has been a while since you've used `googledrive`, it will prompt you to refresh your token. Simply enter the number that corresponds to the correct Google Drive account.\n\n<p align=\"center\">\n<img src=\"images/tutorial_drive-auth/drive-download-1.png\" width = \"90%\" />\n</p>\n\nIf you only want to list files of a certain type, you can specify this in the `type` argument. And let's say that my folder contains a bunch of csv files, but I only want to download the one named \"favorite_desserts.csv\". In that case, I can also put a matching string in the `pattern` argument in order to filter down to 1 file.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndrive_folder <- googledrive::drive_ls(path = drive_url,\n type = \"csv\", \n pattern = \"favorite_desserts\")\ndrive_folder\n```\n:::\n\n\n<p align=\"center\">\n<img src=\"images/tutorial_drive-auth/drive-download-2.png\" width = \"90%\" />\n</p>\n\nOnce we've narrowed down to the file we want, we can download it using `drive_download`. This function takes the file identifier as an argument so we can access it using `drive_folder$id`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngoogledrive::drive_download(file = drive_folder$id)\n```\n:::\n\n\nThis will automatically download the file to our working directory. If you want, you can specify a different path to download to. Just put the new file path into the `path` argument, replacing the \"your path here\", but keep the quotation marks.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngoogledrive::drive_download(file = drive_folder$id, \n path = \"your path here\")\n```\n:::\n\n\nIf you've downloaded the file before, and you want to overwrite it, there's a handy `overwrite` argument that you can set to `TRUE`. Note that the default is `FALSE`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngoogledrive::drive_download(file = drive_folder$id, \n path = \"your path here\",\n overwrite = T)\n```\n:::\n\n\nIf there are multiple files in the Drive folder and you want to download them all, you can use a loop like so:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# For each file:\nfor(focal_file in drive_folder$name){\n \n # Find the file identifier for that file\n file_id <- subset(drive_folder, name == focal_file)\n\n # Download that file\n drive_download(file = file_id$id, \n path = \"your path here\",\n overwrite = T)\n}\n```\n:::\n",
"supporting": [],
"filters": [
Expand Down
Loading
Loading