Skip to content
Open
Changes from 5 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
1bc8755
Move file from master to individual branch
stephan-koenig May 30, 2020
3100f3b
Fix text formatting and add donwload link for data
stephan-koenig Sep 25, 2020
09c6fca
Apply suggestions from TA code review
stephan-koenig Oct 6, 2020
5d9bcbb
Apply suggestions from code review
cathy-y Oct 17, 2020
a6ab1b7
Apply suggestions from code review
cathy-y Oct 17, 2020
53f8905
Fixed markups for code chunks
cathy-y Oct 23, 2020
4d92829
Restructured order of sections, moved learning objectives to the begi…
cathy-y Oct 24, 2020
d5f518b
Update r_and_rstudio_basic.html
cathy-y Oct 31, 2020
4f3e598
Rewrote questions (and uploaded images to go with them)
cathy-y Oct 31, 2020
2187691
Question revisions
cathy-y Oct 31, 2020
7267170
Removed the working with data section
cathy-y Oct 31, 2020
9c8116b
Changed capitalization
cathy-y Nov 7, 2020
1d952f5
Added section on data types, amended questions
cathy-y Nov 7, 2020
88bf430
Update inst/tutorials/r_and_rstudio_basic/r_and_rstudio_basic.Rmd
cathy-y Nov 9, 2020
9c93172
Added section on vectors and data frames, covered logical operators i…
cathy-y Nov 15, 2020
64ffef4
Started the troubleshooting section
cathy-y Dec 1, 2020
5a98376
Completed getting help section
cathy-y Dec 1, 2020
493826e
Edited the troubleshooting section
cathy-y Dec 6, 2020
ee6ba8e
Added internal links
cathy-y Dec 12, 2020
a9ca887
Updated internal links
cathy-y Dec 12, 2020
11a2ff1
Merge branch 'main' into r-and-rstudio-basic
cathy-y Feb 3, 2021
9e8bd34
Merge branch 'main' into r-and-rstudio-basic
stephan-koenig Feb 3, 2021
b6c988a
Merge branch 'main' into r-and-rstudio-basic
stephan-koenig Mar 17, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
389 changes: 389 additions & 0 deletions inst/tutorials/r_and_rstudio_basic/r_and_rstudio_basic.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,389 @@
---
title: "Introduction to R and RStudio fundamentals"
author: "Michelle Kang (adapted from Dr. Kim Dill-McFarland)"
date: "version `r format(Sys.time(), '%B %d, %Y')`"
output:
learnr::tutorial:
progressive: true
allow_skip: true
runtime: shiny_prerendered
description: Welcome to R! If you want to analyze and visualize data reproducibly, you've come to the right place. This tutorial covers the basics of R and RStudio. RStudio is a free program used for coding in R. After learning about its features and functionality, we will dive into R language basics, where you will create functions and load packages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, much better description.

---

```{r setup, include = FALSE}
# General learnr setup
library(learnr)
knitr::opts_chunk$set(echo = TRUE)
library(educer)
# Helper function to set path to images to "/images" etc.
setup_resources()

# Tutorial specific setup
library(dplyr)
library(readr)
total <- 4
```

## Learning objectives

By the end of this tutorial you should be able to:

- Identify the different components of RStudio.
- Declare variables in R.
- Recognize and use functions.
- Install and load R packages.
- Load and subset tabular data using tidyverse.
- Use the `help` function in R console to troubleshoot and identify required arguments for a given function


## A Tour of RStudio
By the end of this section, you will be able to:
- Name the three panes in RStudio and what they do
- Change the sizes of the panes
- Navigate through the console using common keyboard shortcuts
- Change the appearance of RStudio
When you start RStudio, you will see something like the following window appear:

![](/images/rstudio.png){width=100%}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This image shows up


Notice that the window has three "panes":
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are four panes and an option in, tools > global options > pane layout to customize the panels. Would it be useful to include here?


- Console (lower left side): this is your view of the R engine. You can type in R commands here and see the output printed by R. (To tell them apart, your input is in blue, and the output is black.) There are several editing conveniences available: up and down arrow keys to go back to previously entered commands which you then can edit and re-run, TAB for completing the name before the cursor, and so on. See more in [online docs](http://www.rstudio.com/ide/docs/using/keyboard_shortcuts).

- Environment/History (tabbed in the upper right): view current user-defined objects and previously-entered commands, respectively.

- Files/Help/Plots/Packages (tabbed in the lower right): as their names suggest, you can view the contents of the current directory, the built-in help pages, and the graphics you created, as well as manage R packages.

To change the look of RStudio, you can go to Tools &rarr; Global Options &rarr; Appearance and select colours, font size, etc. If you plan on working for longer periods of time, we suggest choosing a dark background colour which is less hard on your computer battery and your eyes.
You can also change the sizes of the panes by dragging the dividers or clicking on the expand and compress icons at the top right corner of each pane.
```{r quiz: R Tour, echo=FALSE}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be careful here--users can re-arrange the panes in the Rstudio options. I can imagine someone might not have this configuration...I certainly don't!

question("Which pane enables you to manage R packages?",
answer("The console"),
answer("Lower right pane", correct=TRUE),
answer("Upper right pane")
)

## RStudio Projects
By the end of this section, you will be able to:
- List the benefits of using RStudio Projects
- Create a new RStudio Project
- Open or switch to an existing RStudio Project
When you create a project, RStudio creates an `.Rproj` file that links all of your files and outputs to the project directory. When you import data from a file, R automatically looks for it in the project directory instead of you having to specify a full file path on your computer (like `/Users/<username>/Desktop/`). R also automatically saves any output to the project directory. Finally, projects allow you to save your R environment in `.RData` so that when you close RStudio and then re-open it, you can start right where you left off without re-importing any data or re-calculating any intermediate steps.

RStudio has a simple interface to create and switch between projects, accessed from the button in the top-right corner of the RStudio window. (Labeled "Project: (None)", initially.)

Let's create a project to work in for this tutorial. Start by clicking the "Project" button in the upper right or going to the "File" menu. Select "New Project", and the following will appear:

![](/images/create_project.png){width=75%}


Choose "New Directory" followed by "New Project" and click on "Browse...". Navigate to your Desktop, and name the directory `<course_name> R`(replace `<course_name>` with the name of your class, e.g. `MICB301`) for this project.

After your project is created, navigate to its directory using your Finder/File explorer or the integrated Terminal in RStudio. You will see the ".RProj" file has been created.

You can open this project in the future in one of three ways:

- In your file browser (e.g. Finder or Explorer), simply double-click on the `.RProj` file
- In an open RStudio window, choose "File" &rarr; "Open Project"
- Switch among projects by clicking on the R project symbol in the upper left
corner of RStudio

```{r quiz: R Projects, echo=FALSE}
question("What is not a benefit of using RStudio projects?",
answer("All of your files and outputs are linked to the project directory"),
answer("R automatically looks for files in the project directory so you don't have to specify a full file path"),
answer("When you reopen a project, your code is saved so all you need to do is rerun it", correct=TRUE)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how useful it is having quizzes for this kind of content. What do you think, Cathy and Stephan? Though I can appreciate how it might be useful in getting someone to learn about RStudio projects, I might just give a few use cases, link to the docs, and leave it at that.


## Variables in R
By the end of this section, you will be able to:
- Declare variables
- Perform operations to change the value of variables
We use variables to store data that we want to access or manipulate later. Variables must have unique names.

Without declaring a variable the sum of these two numbers will be printed to console but cannot be accessed for future use:

```{r novar, exercise=TRUE}
2 + 2
```

To declare a variable, follow the pattern of: `variable <- value`. Let's declare a variable `total` as the sum of two numbers.

```{r d_var, exercise=TRUE}
total <- 2 + 2
```

We access the value of `total`:

```{r var, exercise=TRUE}
total
```

We can use the value stored in `total`:

```{r sub_var, exercise=TRUE}
total - 1
```

After declaring a variable, we can perform operations to change the value stored in the variable:

```{r sub_var2, exercise=TRUE}
total <- total - 1


total
```
Now it's your turn! Declare a variable "product" and set its value to 3 * 5. Next, operating on "product", declare a variable called "difference", whose final value is 8.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Now it's your turn! Declare a variable "product" and set its value to 3 * 5. Next, operating on "product", declare a variable called "difference", whose final value is 8.
Now it's your turn! Declare a variable "product" and set its value to the product of the numbers 3 and 5. Next, using the variable "product", declare a variable called "difference", whose final value is 8.


```{r product, exercise=TRUE}
# First declare "product"
product

# Operate on "product" to get 8 as the value for "difference"
difference

```{r product-hint-1}
# First declare "product"
product <- #your code here

# Operate on "product" to get 8 as the value for "difference"
difference <- product #your code here

```{r product-solution}
# First declare "product"
product <- 3 * 5

# Operate on "product" to get 8 as the value for "difference"
difference <- product - 7


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! I might also give a few use cases for functions. Python has those neat recursive functions (not sure if those exist in R), but you could also talk about taking some raw data doing some long processing all in one shot, if you have the processing function already written. This could open the door to the whole API style of programming, though that might be beyond the scope of this tutorial.

## Functions
By the end of this section, you will be able to:
- Explain what functions and arguments are
Functions are one of the basic units in programming. Generally speaking, a function takes some input and generates some output, in a reproducible way. Every R function follows the same basic syntax, where `function()` is the name of the function and `arguments` are the different parameters you can specify (i.e. your input):

`function(argument1 = ..., argument2 = ..., ...)`

You can treat functions as a black box and do not necessarily need to know how it works under the hood as long as your provided input conforms to a specific format.

![](/images/function.png){width=75%}

For example, the function `sum()` (which outputs the sum of the arguments) expects numbers:

```{r sum_function, exercise=TRUE}
sum(3, 5, 9, 18)
```

If you instead pass text as arguments to `sum()` you will receive an error:

```{r sum_text, exercise=TRUE, error = TRUE}
sum("Sum", "does", "not", "accept", "text!")
```
The use of functions isn't limited to mathematical calculations. Function can also be used to transform data.

For example, the function `t()` can be used to transpose a matrix. In the example below, we generate a matrix (`example_matrix`) using the functions `c()` and `as.matrix()`, which were used to combine values into a vector and transform the vector into a matrix. We then use the `t()` function to transpose `example_matrix` into the matrix `example_transposed`.

```{r t_function, exercise=TRUE}
# Generate Matrix
example_list <- c("T", "does", "accept", "text!")
example_matrix <- as.matrix(example_list)

#Transpose Matrix
example_transposed <- t(example_matrix)

#Display Original and Transposed Matrices
example_matrix
example_transposed
```{r quiz: R Functions, echo=FALSE}
question("True or False: Functions accept inputs of all types",
answer("True"),
answer("False", correct=TRUE)
)
## R packages
By the end of this section, you will be able to:
- Understand what R packages are and how they are used
- Install and load packages
The first functions we will look at are used to install and load R packages. R packages are units of shareable code, containing functions that facilitate and enhance analyses. In simpler terms, think of R packages as iPhone Applications. Each App has specific capabilities that can be accessed when we install and then open the application. The same holds true for R packages. To use the functions contained in a specific R package, we first need to install the package, then each time we want to use the package we need to "open" the package by loading it.

### Installing Packages

In this tutorial, we will be using the "tidyverse" package. This package contains a versatile set of functions designed for easy manipulation of data.

You should have already installed the "tidyverse" package using RStudio's graphical interface. Packages can also be installed by entering the function `install.packages()` in the console (to install a different package just replace "tidyverse" with the name of the desired package):

```{r eval = FALSE}
install.packages("tidyverse")
```



### Loading packages

After installing a package, and *everytime* you open a new RStudio session, you need to first load (open) the packages you want to use with the `library()` function. This tells R to access the package's functions and prevents RStudio from lags that would occur if it automatically loaded every downloaded package every time you opened it.

Packages can be loaded like this:

```{r eval = FALSE}
library(tidyverse)
```
It is a little tricker to load Bioconductor packages, as they are often not stored on the Comprehensive R Archive Network (CRAN) where most packages live. There is a package, however, that lives on CRAN and serves as an interface between CRAN and the Bioconductor packages.

To load a Bioconductor package, you must first install and load the BiocManager package, like so.

`install.packages("BiocManager")`
`library("BiocManager)`

You can then use the function `BiocManager::install()` to install a Bioconductor package. To install the Annotate package, we would execute the following code.

`BiocManager::install("annotate")`
Sometimes two packages include functions with the same name. A common example is that a `select()` function is included both in the `dplyr` and `MASS` packages. Therefore, to specify the use of a function from a particular package, you can precede the function with a the following notation: `package::function()`.
```{r quiz: R Packages, echo=FALSE}
question("True or False: Packages are installed once, but loaded every time",
answer("True", correct=TRUE),
answer("False")
)
## Working with data
By the end of this section, you will be able to:
- Load data into R
- Save loaded data in the environment
### Data description
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be in a "child document?"

I couldn't figure out how to do this on my tutorial though...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. Look at the Slack#educe, Gil talks about how he set up the child document.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I just replace all of the text here with his code? Not super sure what exactly the user is supposed to see here.


The data used throughout this module were collected as part of an on-going oceanographic time-series program in Saanich Inlet, a seasonally anoxic fjord on the East coast of Vancouver Island, British Columbia.

The data that you will use in R are 16S rRNA amplicon profiles of microbial communities at several depths in Saanich Inlet from one time-point in this series (August 2012). These ~300 bp sequences were processed using [mothur](https://www.mothur.org/wiki/Main_Page) to yield 97% (approximately species-level) operational taxonomic units (OTUs).

`Saanich_OTU_metadata.csv` is a comma-delimited table of counts of four OTUs in each sample, normalized to 100,000 sequences per sample and the corresponding conditions of each sample (Depth, NO2, NO3 etc.).

For a brief introduction to these data, see Hallam SJ et al. 2017. Monitoring microbial responses to ocean deoxygenation in a model oxygen minimum zone. Sci Data 4: 170158 [doi:10.1038/sdata.2017.158](https://www.nature.com/articles/sdata2017158).

### Loading tabular data

Tabular data can be loaded into R using the tidyverse `read_*` functions, which generate data frames. Each row in a data frame represents one observation, and each column represents one variable.

in your file browser, create a `data` directory in your project directory. Download the [`Saanich_OTU_metadata.csv`](https://github.com/EDUCE-UBC/educer/blob/master/data-raw/Saanich_OTU_metadata.csv) file and save it in your `data` directory.

For example, we can load our Saanich data into R with `read_csv` for a comma-separated file and specify the arguments that describe our data as follows:

- `file`: the name of the file you want to load
- `col_names`: can take either the value `TRUE` or `FALSE` and tells R if the first row contains column names

```{r, eval = FALSE}
read_csv(file="data/Saanich_OTU_metadata.csv", col_names = TRUE)
```

```{r, echo = FALSE}
OTU_metadata_table <- combined
```

### Save data in the environment

Since we want to do more with our data after reading it in, we need to save it as a variable in R as we did previously with the `<-` operator. You can choose to name the object whatever you like, though this module assumes the names used below.

```{r eval = FALSE}
OTU_metadata_table <- read_csv(file="data/Saanich_OTU_metadata.csv", col_names = TRUE)
```



## Getting Help
By the end of this section, you will be able to:
- Use R to understand how any given function works
- Identify required and optional arguments for functions

You can get help with any function in R by inputting `?function_name` into the Console. This will open a window in the bottom right under the Help tab with information on that function, including input options and example code.

```{r eval = FALSE}
?read_delim
```

The **Description** section tells us that `read_delim()` is a general case of the function we used, `read_csv()`, and `read_tsv()`.

The **Usage** section tells us the inputs that need to be specified and default inputs of read_delim:

- `file` and `delim` need to be specified as they are not followed by `=`
- all other parameters have a default value e.g. `quote = "\"` and do not have to be specified to run the function.

The **Arguments** Section describes the requirements of each input argument in detail.

The **Examples** Section has examples of the function that can be directly copy and pasted into your terminal and ran.

Another example from base R that may be widely used is the function `nrow()`

```{r eval = FALSE}
?nrow
```

The **Description** section tells us that `nrow()` from a matrix or an array.

The **Usage** section tells us the inputs that need to be specified and default inputs of `nrow()`:

- `x` is the data matrix or array for which the user is interested in identifying the number of rows

The **Arguments** Section describes the requirements of each input argument in detail.

The **Examples** Section has examples of the function that can be directly copy and pasted into your terminal and ran.



Tidyverse is a wrapper for many valuable functions widely used in R. One of the examples from Tidyverse would be `select()`

```{r eval = FALSE}
if (!require("tidyverse")) install.packages("tidyverse")
library(tidyverse)
?select
```

The **Description** section tells us that `select()` can be used to select certain columns from a parent dataset, or optionally rename the columns

The **Overview of selection features** section provides the user a list of operators and selection helpers to fully realize the power of `select()`

The **Usage** section tells us the inputs that need to be specified and default inputs of `select()`

- `.data` is a mandatory input which refers to the parent dataset or tibble to subset from
- the helper functions and list of operators could be used to specify the columns to subset

The **Value** section describes an object integral to `select()`. A more descriptive account can be read in the help section

The **Method** section describes the implementation method of the function `select()`

The **Examples** Section has examples of the function that can be directly copy and pasted into your terminal and ran.
```{r quiz: navigating help, echo=FALSE}
question("How would you launch a help section for a given function",
answer("Place an exclamation mark in front of the function"),
answer("Place an question mark in front of the function", correct = TRUE),
answer("Place an question mark after of the function"),
answer("Type out help and function name in the console")
)

question("What does an = sign indicate in the help section",
answer("There exists a default and is not mandatory", correct = TRUE),
answer("Is a mandatory input in the function")
)
```
## R Scripts
By the end of this section, you will be able to:
- Create an R script file
- List the benefits of using R scripts
- Annotate R scripts with comments
R script files are the primary way in which R facilitates reproducible research. They contain the code that loads your raw data, cleans it, performs the analyses, and creates and saves visualizations. R scripts maintain a record of everything that is done to the raw data to reach the final result. That way, it is very easy to write up and communicate your methods because you have a document listing the precise steps you used to conduct your analyses. This is one of R's primary advantages compared to traditional tools like Excel, where it may be unclear how to reproduce the results.

Generally, if you are testing an operation (*e.g.* what would my data look like if I applied a log-transformation to it?), you should do it in the console (left pane of RStudio). If you are committing a step to your analysis (*e.g.* I want to apply a log-transformation to my data and then conduct the rest of my analyses on the log-transformed data), you should add it to your R script so that it is saved for future use.

Additionally, you should annotate your R scripts with comments. In each line of code, any text preceded by the `#` symbol will not execute. Comments can be useful to remind yourself and to tell other readers what a specific chunk of code does.

Let's create an R script (File > New File > R Script) and save it as `tidyverse.R` in your main project directory. If you again look to the project directory on your computer, you will see `tidyverse.R` is now saved there.

We can copy and paste the previous commands in this tutorial and aggregate it in our R script.

```{r quiz: R Scripts, echo=FALSE}
question("How do R scripts make your work reproducible?",
answer("Trick question: they work just like Excel and don't make work reproducible"),
answer("They keep a record of all actions done to get from raw data to final results", correct=TRUE),
answer("You can use them to test different operations on your data")
)

question("How do you annotate R scripts with comments?",
answer("You start the line with 'comment:'),
answer("You start the line with the # symbol", correct=TRUE),
answer("You can just start typing and R will know it's a comment automatically")
)