Skip to content
Open
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
1bc8755
Move file from master to individual branch
stephan-koenig May 30, 2020
3100f3b
Fix text formatting and add donwload link for data
stephan-koenig Sep 25, 2020
09c6fca
Apply suggestions from TA code review
stephan-koenig Oct 6, 2020
5d9bcbb
Apply suggestions from code review
cathy-y Oct 17, 2020
a6ab1b7
Apply suggestions from code review
cathy-y Oct 17, 2020
53f8905
Fixed markups for code chunks
cathy-y Oct 23, 2020
4d92829
Restructured order of sections, moved learning objectives to the begi…
cathy-y Oct 24, 2020
d5f518b
Update r_and_rstudio_basic.html
cathy-y Oct 31, 2020
4f3e598
Rewrote questions (and uploaded images to go with them)
cathy-y Oct 31, 2020
2187691
Question revisions
cathy-y Oct 31, 2020
7267170
Removed the working with data section
cathy-y Oct 31, 2020
9c8116b
Changed capitalization
cathy-y Nov 7, 2020
1d952f5
Added section on data types, amended questions
cathy-y Nov 7, 2020
88bf430
Update inst/tutorials/r_and_rstudio_basic/r_and_rstudio_basic.Rmd
cathy-y Nov 9, 2020
9c93172
Added section on vectors and data frames, covered logical operators i…
cathy-y Nov 15, 2020
64ffef4
Started the troubleshooting section
cathy-y Dec 1, 2020
5a98376
Completed getting help section
cathy-y Dec 1, 2020
493826e
Edited the troubleshooting section
cathy-y Dec 6, 2020
ee6ba8e
Added internal links
cathy-y Dec 12, 2020
a9ca887
Updated internal links
cathy-y Dec 12, 2020
11a2ff1
Merge branch 'main' into r-and-rstudio-basic
cathy-y Feb 3, 2021
9e8bd34
Merge branch 'main' into r-and-rstudio-basic
stephan-koenig Feb 3, 2021
b6c988a
Merge branch 'main' into r-and-rstudio-basic
stephan-koenig Mar 17, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added inst/resources/images/mean.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added inst/resources/images/meanNA.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added inst/resources/images/objecterror.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added inst/resources/images/selecterror.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
391 changes: 391 additions & 0 deletions inst/tutorials/r_and_rstudio_basic/r_and_rstudio_basic.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,391 @@
---
title: "Introduction to R and RStudio fundamentals"
author: "Michelle Kang (adapted from Dr. Kim Dill-McFarland)"
date: "version `r format(Sys.time(), '%B %d, %Y')`"
output:
learnr::tutorial:
progressive: true
allow_skip: true
runtime: shiny_prerendered
description: Welcome to R! If you want to analyze and visualize data reproducibly, you've come to the right place. This tutorial covers the basics of R and RStudio. RStudio is a free program used for coding in R. After learning about its features and functionality, we will dive into R language basics.
---

```{r setup, include = FALSE}
# General learnr setup
library(learnr)
knitr::opts_chunk$set(echo = TRUE)
library(educer)
# Helper function to set path to images to "/images" etc.
setup_resources()
# Tutorial specific setup
library(dplyr)
library(readr)
total <- 4
```

## Learning objectives

Here's what you'll learn from each section of this tutorial:

A Tour of RStudio:

- Name the three panes in RStudio and what they do
- Change the sizes of the panes
- Navigate through the console using common keyboard shortcuts
- Change the appearance of RStudio

RStudio Projects:

- List the benefits of using RStudio Projects
- Create a new RStudio Project
- Open or switch to an existing RStudio Project

R Scripts:

- Create an R script file
- List the benefits of using R scripts
- Annotate R scripts with comments

Variables in R:

- Declare variables
- Perform operations to change the value of variables

Functions in R:

- Explain what functions and arguments are
- Use R to understand how any given function works
- Identify required and optional arguments for functions

R Packages:

- Understand what R packages are and how they are used
- Install and load packages

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is a much clearer way of conveying the learning goals.

## A Tour of RStudio

When you start RStudio, you will see something like the following window appear:

![](/images/rstudio.png){width=100%}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This image shows up


Notice that the window has three "panes":
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are four panes and an option in, tools > global options > pane layout to customize the panels. Would it be useful to include here?


- Console (lower left side): this is your view of the R engine. You can type in R commands here and see the output printed by R. (To tell them apart, your input is in blue, and the output is black.) There are several editing conveniences available: up and down arrow keys to go back to previously entered commands which you then can edit and re-run, TAB for completing the name before the cursor, and so on. See more in [online docs](http://www.rstudio.com/ide/docs/using/keyboard_shortcuts).

- Environment/History (tabbed in the upper right): view current user-defined objects and previously-entered commands, respectively.

- Files/Help/Plots/Packages (tabbed in the lower right): as their names suggest, you can view the contents of the current directory, the built-in help pages, and the graphics you created, as well as manage R packages.

To change the look of RStudio, you can go to Tools &rarr; Global Options &rarr; Appearance and select colours, font size, etc. If you plan on working for longer periods of time, we suggest choosing a dark background colour which is less hard on your computer battery and your eyes.
You can also change the sizes of the panes by dragging the dividers or clicking on the expand and compress icons at the top right corner of each pane.

### Check Your Understanding

When trying to run a command, you see this error:

![](/images/objecterror.png){width=100%}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This image does not show

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So where in the repo did you place your image?


```{r Tour, echo=FALSE}
quiz(
question("Where is the command typed?",
answer("The console", correct=TRUE),
answer("Lower right pane"),
answer("Upper right pane")),
question("Where can you go to try to locate 'object'?",
answer("The console"),
answer("Lower right pane"),
answer("Upper right pane", correct=TRUE))
)
```

## RStudio Projects

When you create a project, RStudio creates an `.Rproj` file that links all of your files and outputs to the project directory. When you import data from a file, R automatically looks for it in the project directory instead of you having to specify a full file path on your computer (like `/Users/<username>/Desktop/`). R also automatically saves any output to the project directory. Finally, projects allow you to save your R environment in `.RData` so that when you close RStudio and then re-open it, you can start right where you left off without re-importing any data or re-calculating any intermediate steps.

RStudio has a simple interface to create and switch between projects, accessed from the button in the top-right corner of the RStudio window. (Labeled "Project: (None)", initially.)

Let's create a project to work in for this tutorial. Start by clicking the "Project" button in the upper right or going to the "File" menu. Select "New Project", and the following will appear:

![](/images/create_project.png){width=75%}


Choose "New Directory" followed by "New Project" and click on "Browse...". Navigate to your Desktop, and name the directory `<course_name> R`(replace `<course_name>` with the name of your class, e.g. `MICB301`) for this project.

After your project is created, navigate to its directory using your Finder/File explorer or the integrated Terminal in RStudio. You will see the ".RProj" file has been created.

You can open this project in the future in one of three ways:

- In your file browser (e.g. Finder or Explorer), simply double-click on the `.RProj` file
- In an open RStudio window, choose "File" &rarr; "Open Project"
- Switch among projects by clicking on the R project symbol in the upper left
corner of RStudio


## R Scripts

R script files are the primary way in which R facilitates reproducible research. They contain the code that loads your raw data, cleans it, performs the analyses, and creates and saves visualizations. R scripts maintain a record of everything that is done to the raw data to reach the final result. That way, it is very easy to write up and communicate your methods because you have a document listing the precise steps you used to conduct your analyses. This is one of R's primary advantages compared to traditional tools like Excel, where it may be unclear how to reproduce the results.

Generally, if you are testing an operation (*e.g.* what would my data look like if I applied a log-transformation to it?), you should do it in the console (left pane of RStudio). If you are committing a step to your analysis (*e.g.* I want to apply a log-transformation to my data and then conduct the rest of my analyses on the log-transformed data), you should add it to your R script so that it is saved for future use.

Additionally, you should annotate your R scripts with comments. In each line of code, any text preceded by the `#` symbol will not execute. Comments can be useful to remind yourself and to tell other readers what a specific chunk of code does.

Let's create an R script (File > New File > R Script) and save it as `tidyverse.R` in your main project directory. If you again look to the project directory on your computer, you will see `tidyverse.R` is now saved there.

We can copy and paste the previous commands in this tutorial and aggregate it in our R script.


## Variables in R

We use variables to store data that we want to access or manipulate later. Variables must have unique names.

Without declaring a variable the sum of these two numbers will be printed to console but cannot be accessed for future use:

```{r novar, exercise=TRUE}
2 + 2
```

To declare a variable, follow the pattern of: `variable <- value`. Let's declare a variable `total` as the sum of two numbers.

```{r d_var, exercise=TRUE}
total <- 2 + 2
```

We access the value of `total`:

```{r var, exercise=TRUE}
total
```

We can use the value stored in `total`:

```{r sub_var, exercise=TRUE}
total - 1
```

After declaring a variable, we can perform operations to change the value stored in the variable:

```{r sub_var2, exercise=TRUE}
total <- total - 1
total
```

Now it's your turn! Declare a variable "product" and set its value to the product of the numbers 3 and 5. Next, using the variable "product", declare a variable called "difference", whose final value is 8.

```{r product, exercise=TRUE}
# First declare "product"
product
# Operate on "product" to get 8 as the value for "difference"
difference
```

```{r product-hint-1}
# First declare "product"
product <- #your code here
# Operate on "product" to get 8 as the value for "difference"
difference <- product #your code here
```

```{r product-solution}
# First declare "product"
product <- 3 * 5
# Operate on "product" to get 8 as the value for "difference"
difference <- product - 7
```

### Check Your Understanding

Without running the code below, what is the final value of x?

```{r solve-x, exercise=TRUE, exercise.eval=FALSE}
x <- 5
y <- 2
x <- y * x
y <- x - 4
```

```{r solve-x-q, echo=FALSE}
quiz(
question("What is the final value of x?",
answer("5"),
answer("10", correct=TRUE),
answer("6"))
)
```

## Functions in R

### Overview

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! I might also give a few use cases for functions. Python has those neat recursive functions (not sure if those exist in R), but you could also talk about taking some raw data doing some long processing all in one shot, if you have the processing function already written. This could open the door to the whole API style of programming, though that might be beyond the scope of this tutorial.

Functions are one of the basic units in programming. Generally speaking, a function takes some input and generates some output, in a reproducible way. Every R function follows the same basic syntax, where `function()` is the name of the function and `arguments` are the different parameters you can specify (i.e. your input):

`function(argument1 = ..., argument2 = ..., ...)`

You can treat functions as a black box and do not necessarily need to know how it works under the hood as long as your provided input conforms to a specific format.

![](/images/function.png){width=75%}

For example, the function `sum()` (which outputs the sum of the arguments) expects numbers:

```{r sum_function, exercise=TRUE}
sum(3, 5, 9, 18)
```

If you instead pass text as arguments to `sum()` you will receive an error:

```{r sum_text, exercise=TRUE, error = TRUE}
sum("Sum", "does", "not", "accept", "text!")
```

On the other hand, the function `paste()`, which links together words, does accept text as arguments.

```{r paste_function, exercise=TRUE}
paste("Hello", "world", sep = " ")
```


### Getting Help

You can get help with any function in R by inputting `?function_name` into the Console. This will open a window in the bottom right under the Help tab with information on that function, including input options and example code.

```{r eval = FALSE}
?read_delim
```

The **Description** section tells us that `read_delim()` is a general case of the function we used, `read_csv()`, and `read_tsv()`.

The **Usage** section tells us the inputs that need to be specified and default inputs of read_delim:

- `file` and `delim` need to be specified as they are not followed by `=`
- all other parameters have a default value e.g. `quote = "\"` and do not have to be specified to run the function.

The **Arguments** Section describes the requirements of each input argument in detail.

The **Examples** Section has examples of the function that can be directly copy and pasted into your terminal and ran.

Another example from base R that may be widely used is the function `nrow()`

```{r eval = FALSE}
?nrow
```

The **Description** section tells us that `nrow()` from a matrix or an array.

The **Usage** section tells us the inputs that need to be specified and default inputs of `nrow()`:

- `x` is the data matrix or array for which the user is interested in identifying the number of rows

The **Arguments** Section describes the requirements of each input argument in detail.

The **Examples** Section has examples of the function that can be directly copy and pasted into your terminal and ran.



Tidyverse is a wrapper for many valuable functions widely used in R. One of the examples from Tidyverse would be `select()`

```{r eval = FALSE}
if (!require("tidyverse")) install.packages("tidyverse")
library(tidyverse)
?select
```

The **Description** section tells us that `select()` can be used to select certain columns from a parent dataset, or optionally rename the columns

The **Overview of selection features** section provides the user a list of operators and selection helpers to fully realize the power of `select()`

The **Usage** section tells us the inputs that need to be specified and default inputs of `select()`

- `.data` is a mandatory input which refers to the parent dataset or tibble to subset from
- the helper functions and list of operators could be used to specify the columns to subset

The **Value** section describes an object integral to `select()`. A more descriptive account can be read in the help section

The **Method** section describes the implementation method of the function `select()`

The **Examples** Section has examples of the function that can be directly copy and pasted into your terminal and ran.

### Check Your Understanding

Here is the help page for the function mean():

![](/images/mean.png){width=100%}

```{r Functions1, echo=FALSE}
quiz(
question("What types of arguments can be passed to mean()?",
answer("Logical (True/False)", correct=TRUE),
answer("Numeric", correct=TRUE),
answer("Text"),
answer("Numeric vectors", correct=TRUE))
)
```

When trying to find the mean of x, NA is the output:

![](/images/meanNA.png){width=100%}

```{r Functions2, echo=FALSE}
quiz(
question("Why did this happen?",
answer("The last value of x needs to be removed using the 'trim' argument"),
answer("Since x is not composed of only numbers and logicals, an error is thrown"),
answer("The NA in x need to be removed using the na.rm argument", correct=TRUE))
)
```

## R packages

The first functions we will look at are used to install and load R packages. R packages are units of shareable code, containing functions that facilitate and enhance analyses. In simpler terms, think of R packages as iPhone Applications. Each App has specific capabilities that can be accessed when we install and then open the application. The same holds true for R packages. To use the functions contained in a specific R package, we first need to install the package, then each time we want to use the package we need to "open" the package by loading it.

### Installing Packages

In this tutorial, we will be using the "tidyverse" package. This package contains a versatile set of functions designed for easy manipulation of data.

You should have already installed the "tidyverse" package using RStudio's graphical interface. Packages can also be installed by entering the function `install.packages()` in the console (to install a different package just replace "tidyverse" with the name of the desired package):

```{r eval = FALSE}
install.packages("tidyverse")
```

### Loading packages

After installing a package, and *everytime* you open a new RStudio session, you need to first load (open) the packages you want to use with the `library()` function. This tells R to access the package's functions and prevents RStudio from lags that would occur if it automatically loaded every downloaded package every time you opened it.

Packages can be loaded like this:

```{r eval = FALSE}
library(tidyverse)
```

It is a little tricker to load Bioconductor packages, as they are often not stored on the Comprehensive R Archive Network (CRAN) where most packages live. There is a package, however, that lives on CRAN and serves as an interface between CRAN and the Bioconductor packages.

To load a Bioconductor package, you must first install and load the BiocManager package, like so.

`install.packages("BiocManager")`
`library("BiocManager)`

You can then use the function `BiocManager::install()` to install a Bioconductor package. To install the Annotate package, we would execute the following code.

`BiocManager::install("annotate")`
Sometimes two packages include functions with the same name. A common example is that a `select()` function is included both in the `dplyr` and `MASS` packages. Therefore, to specify the use of a function from a particular package, you can precede the function with a the following notation: `package::function()`.

### Check Your Understanding

After installing the `dplyr` package, you encounter this error:

![](/images/selecterror.png){width=100%}

```{r Packages, echo=FALSE}
quiz(
question("Why did this happen?",
answer("The function `select()` wasn't installed"),
answer("The package `dplyr` wasn't loaded after installation", correct=TRUE),
answer("The arguments given to `select()` are invalid"))
)
```
Loading