-
Notifications
You must be signed in to change notification settings - Fork 1
R and rstudio basic #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 11 commits
1bc8755
3100f3b
09c6fca
5d9bcbb
a6ab1b7
53f8905
4d92829
d5f518b
4f3e598
2187691
7267170
9c8116b
1d952f5
88bf430
9c93172
64ffef4
5a98376
493826e
ee6ba8e
a9ca887
11a2ff1
9e8bd34
b6c988a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,391 @@ | ||
--- | ||
title: "Introduction to R and RStudio fundamentals" | ||
author: "Michelle Kang (adapted from Dr. Kim Dill-McFarland)" | ||
date: "version `r format(Sys.time(), '%B %d, %Y')`" | ||
output: | ||
learnr::tutorial: | ||
progressive: true | ||
allow_skip: true | ||
runtime: shiny_prerendered | ||
description: Welcome to R! If you want to analyze and visualize data reproducibly, you've come to the right place. This tutorial covers the basics of R and RStudio. RStudio is a free program used for coding in R. After learning about its features and functionality, we will dive into R language basics. | ||
--- | ||
|
||
```{r setup, include = FALSE} | ||
# General learnr setup | ||
library(learnr) | ||
knitr::opts_chunk$set(echo = TRUE) | ||
library(educer) | ||
# Helper function to set path to images to "/images" etc. | ||
setup_resources() | ||
# Tutorial specific setup | ||
library(dplyr) | ||
library(readr) | ||
total <- 4 | ||
``` | ||
|
||
## Learning objectives | ||
|
||
Here's what you'll learn from each section of this tutorial: | ||
|
||
A Tour of RStudio: | ||
|
||
- Name the three panes in RStudio and what they do | ||
- Change the sizes of the panes | ||
- Navigate through the console using common keyboard shortcuts | ||
- Change the appearance of RStudio | ||
|
||
RStudio Projects: | ||
|
||
- List the benefits of using RStudio Projects | ||
- Create a new RStudio Project | ||
- Open or switch to an existing RStudio Project | ||
|
||
R Scripts: | ||
|
||
- Create an R script file | ||
- List the benefits of using R scripts | ||
- Annotate R scripts with comments | ||
|
||
Variables in R: | ||
|
||
- Declare variables | ||
- Perform operations to change the value of variables | ||
|
||
Functions in R: | ||
|
||
- Explain what functions and arguments are | ||
- Use R to understand how any given function works | ||
- Identify required and optional arguments for functions | ||
|
||
R Packages: | ||
|
||
- Understand what R packages are and how they are used | ||
- Install and load packages | ||
|
||
## A Tour of RStudio | ||
|
||
cathy-y marked this conversation as resolved.
Show resolved
Hide resolved
|
||
When you start RStudio, you will see something like the following window appear: | ||
|
||
{width=100%} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This image shows up |
||
|
||
Notice that the window has three "panes": | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are four panes and an option in, tools > global options > pane layout to customize the panels. Would it be useful to include here? |
||
|
||
- Console (lower left side): this is your view of the R engine. You can type in R commands here and see the output printed by R. (To tell them apart, your input is in blue, and the output is black.) There are several editing conveniences available: up and down arrow keys to go back to previously entered commands which you then can edit and re-run, TAB for completing the name before the cursor, and so on. See more in [online docs](http://www.rstudio.com/ide/docs/using/keyboard_shortcuts). | ||
cathy-y marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- Environment/History (tabbed in the upper right): view current user-defined objects and previously-entered commands, respectively. | ||
|
||
- Files/Help/Plots/Packages (tabbed in the lower right): as their names suggest, you can view the contents of the current directory, the built-in help pages, and the graphics you created, as well as manage R packages. | ||
|
||
To change the look of RStudio, you can go to Tools → Global Options → Appearance and select colours, font size, etc. If you plan on working for longer periods of time, we suggest choosing a dark background colour which is less hard on your computer battery and your eyes. | ||
You can also change the sizes of the panes by dragging the dividers or clicking on the expand and compress icons at the top right corner of each pane. | ||
|
||
cathy-y marked this conversation as resolved.
Show resolved
Hide resolved
|
||
### Check Your Understanding | ||
|
||
When trying to run a command, you see this error: | ||
|
||
{width=100%} | ||
|
||
|
||
```{r Tour, echo=FALSE} | ||
quiz( | ||
question("Where is the command typed?", | ||
answer("The console", correct=TRUE), | ||
answer("Lower right pane"), | ||
answer("Upper right pane")), | ||
question("Where can you go to try to locate 'object'?", | ||
answer("The console"), | ||
answer("Lower right pane"), | ||
answer("Upper right pane", correct=TRUE)) | ||
) | ||
``` | ||
|
||
## RStudio Projects | ||
|
||
cathy-y marked this conversation as resolved.
Show resolved
Hide resolved
|
||
When you create a project, RStudio creates an `.Rproj` file that links all of your files and outputs to the project directory. When you import data from a file, R automatically looks for it in the project directory instead of you having to specify a full file path on your computer (like `/Users/<username>/Desktop/`). R also automatically saves any output to the project directory. Finally, projects allow you to save your R environment in `.RData` so that when you close RStudio and then re-open it, you can start right where you left off without re-importing any data or re-calculating any intermediate steps. | ||
|
||
RStudio has a simple interface to create and switch between projects, accessed from the button in the top-right corner of the RStudio window. (Labeled "Project: (None)", initially.) | ||
|
||
Let's create a project to work in for this tutorial. Start by clicking the "Project" button in the upper right or going to the "File" menu. Select "New Project", and the following will appear: | ||
|
||
{width=75%} | ||
|
||
|
||
Choose "New Directory" followed by "New Project" and click on "Browse...". Navigate to your Desktop, and name the directory `<course_name> R`(replace `<course_name>` with the name of your class, e.g. `MICB301`) for this project. | ||
|
||
After your project is created, navigate to its directory using your Finder/File explorer or the integrated Terminal in RStudio. You will see the ".RProj" file has been created. | ||
|
||
You can open this project in the future in one of three ways: | ||
|
||
- In your file browser (e.g. Finder or Explorer), simply double-click on the `.RProj` file | ||
- In an open RStudio window, choose "File" → "Open Project" | ||
- Switch among projects by clicking on the R project symbol in the upper left | ||
corner of RStudio | ||
|
||
|
||
cathy-y marked this conversation as resolved.
Show resolved
Hide resolved
|
||
## R Scripts | ||
|
||
R script files are the primary way in which R facilitates reproducible research. They contain the code that loads your raw data, cleans it, performs the analyses, and creates and saves visualizations. R scripts maintain a record of everything that is done to the raw data to reach the final result. That way, it is very easy to write up and communicate your methods because you have a document listing the precise steps you used to conduct your analyses. This is one of R's primary advantages compared to traditional tools like Excel, where it may be unclear how to reproduce the results. | ||
|
||
Generally, if you are testing an operation (*e.g.* what would my data look like if I applied a log-transformation to it?), you should do it in the console (left pane of RStudio). If you are committing a step to your analysis (*e.g.* I want to apply a log-transformation to my data and then conduct the rest of my analyses on the log-transformed data), you should add it to your R script so that it is saved for future use. | ||
|
||
Additionally, you should annotate your R scripts with comments. In each line of code, any text preceded by the `#` symbol will not execute. Comments can be useful to remind yourself and to tell other readers what a specific chunk of code does. | ||
|
||
Let's create an R script (File > New File > R Script) and save it as `tidyverse.R` in your main project directory. If you again look to the project directory on your computer, you will see `tidyverse.R` is now saved there. | ||
|
||
We can copy and paste the previous commands in this tutorial and aggregate it in our R script. | ||
|
||
|
||
## Variables in R | ||
|
||
cathy-y marked this conversation as resolved.
Show resolved
Hide resolved
|
||
We use variables to store data that we want to access or manipulate later. Variables must have unique names. | ||
|
||
Without declaring a variable the sum of these two numbers will be printed to console but cannot be accessed for future use: | ||
|
||
```{r novar, exercise=TRUE} | ||
2 + 2 | ||
``` | ||
|
||
To declare a variable, follow the pattern of: `variable <- value`. Let's declare a variable `total` as the sum of two numbers. | ||
|
||
```{r d_var, exercise=TRUE} | ||
total <- 2 + 2 | ||
``` | ||
|
||
We access the value of `total`: | ||
|
||
```{r var, exercise=TRUE} | ||
total | ||
``` | ||
|
||
We can use the value stored in `total`: | ||
|
||
```{r sub_var, exercise=TRUE} | ||
total - 1 | ||
``` | ||
|
||
After declaring a variable, we can perform operations to change the value stored in the variable: | ||
|
||
```{r sub_var2, exercise=TRUE} | ||
total <- total - 1 | ||
total | ||
``` | ||
|
||
cathy-y marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Now it's your turn! Declare a variable "product" and set its value to the product of the numbers 3 and 5. Next, using the variable "product", declare a variable called "difference", whose final value is 8. | ||
|
||
```{r product, exercise=TRUE} | ||
# First declare "product" | ||
product | ||
# Operate on "product" to get 8 as the value for "difference" | ||
difference | ||
``` | ||
|
||
```{r product-hint-1} | ||
# First declare "product" | ||
product <- #your code here | ||
# Operate on "product" to get 8 as the value for "difference" | ||
difference <- product #your code here | ||
``` | ||
cathy-y marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```{r product-solution} | ||
# First declare "product" | ||
product <- 3 * 5 | ||
# Operate on "product" to get 8 as the value for "difference" | ||
difference <- product - 7 | ||
``` | ||
|
||
### Check Your Understanding | ||
|
||
Without running the code below, what is the final value of x? | ||
|
||
```{r solve-x, exercise=TRUE, exercise.eval=FALSE} | ||
x <- 5 | ||
y <- 2 | ||
x <- y * x | ||
y <- x - 4 | ||
``` | ||
|
||
```{r solve-x-q, echo=FALSE} | ||
quiz( | ||
question("What is the final value of x?", | ||
answer("5"), | ||
answer("10", correct=TRUE), | ||
answer("6")) | ||
) | ||
``` | ||
|
||
## Functions in R | ||
|
||
### Overview | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice job! I might also give a few use cases for functions. Python has those neat recursive functions (not sure if those exist in R), but you could also talk about taking some raw data doing some long processing all in one shot, if you have the processing function already written. This could open the door to the whole API style of programming, though that might be beyond the scope of this tutorial. |
||
Functions are one of the basic units in programming. Generally speaking, a function takes some input and generates some output, in a reproducible way. Every R function follows the same basic syntax, where `function()` is the name of the function and `arguments` are the different parameters you can specify (i.e. your input): | ||
|
||
`function(argument1 = ..., argument2 = ..., ...)` | ||
|
||
You can treat functions as a black box and do not necessarily need to know how it works under the hood as long as your provided input conforms to a specific format. | ||
|
||
{width=75%} | ||
|
||
For example, the function `sum()` (which outputs the sum of the arguments) expects numbers: | ||
|
||
```{r sum_function, exercise=TRUE} | ||
sum(3, 5, 9, 18) | ||
``` | ||
|
||
If you instead pass text as arguments to `sum()` you will receive an error: | ||
|
||
```{r sum_text, exercise=TRUE, error = TRUE} | ||
sum("Sum", "does", "not", "accept", "text!") | ||
``` | ||
|
||
On the other hand, the function `paste()`, which links together words, does accept text as arguments. | ||
|
||
```{r paste_function, exercise=TRUE} | ||
paste("Hello", "world", sep = " ") | ||
``` | ||
|
||
|
||
### Getting Help | ||
|
||
You can get help with any function in R by inputting `?function_name` into the Console. This will open a window in the bottom right under the Help tab with information on that function, including input options and example code. | ||
cathy-y marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
```{r eval = FALSE} | ||
?read_delim | ||
``` | ||
|
||
The **Description** section tells us that `read_delim()` is a general case of the function we used, `read_csv()`, and `read_tsv()`. | ||
|
||
The **Usage** section tells us the inputs that need to be specified and default inputs of read_delim: | ||
|
||
- `file` and `delim` need to be specified as they are not followed by `=` | ||
- all other parameters have a default value e.g. `quote = "\"` and do not have to be specified to run the function. | ||
|
||
The **Arguments** Section describes the requirements of each input argument in detail. | ||
|
||
The **Examples** Section has examples of the function that can be directly copy and pasted into your terminal and ran. | ||
|
||
Another example from base R that may be widely used is the function `nrow()` | ||
|
||
```{r eval = FALSE} | ||
?nrow | ||
``` | ||
|
||
The **Description** section tells us that `nrow()` from a matrix or an array. | ||
|
||
The **Usage** section tells us the inputs that need to be specified and default inputs of `nrow()`: | ||
|
||
- `x` is the data matrix or array for which the user is interested in identifying the number of rows | ||
|
||
The **Arguments** Section describes the requirements of each input argument in detail. | ||
|
||
The **Examples** Section has examples of the function that can be directly copy and pasted into your terminal and ran. | ||
|
||
|
||
|
||
Tidyverse is a wrapper for many valuable functions widely used in R. One of the examples from Tidyverse would be `select()` | ||
|
||
```{r eval = FALSE} | ||
if (!require("tidyverse")) install.packages("tidyverse") | ||
library(tidyverse) | ||
?select | ||
``` | ||
|
||
The **Description** section tells us that `select()` can be used to select certain columns from a parent dataset, or optionally rename the columns | ||
|
||
The **Overview of selection features** section provides the user a list of operators and selection helpers to fully realize the power of `select()` | ||
|
||
The **Usage** section tells us the inputs that need to be specified and default inputs of `select()` | ||
|
||
- `.data` is a mandatory input which refers to the parent dataset or tibble to subset from | ||
- the helper functions and list of operators could be used to specify the columns to subset | ||
|
||
The **Value** section describes an object integral to `select()`. A more descriptive account can be read in the help section | ||
|
||
The **Method** section describes the implementation method of the function `select()` | ||
|
||
The **Examples** Section has examples of the function that can be directly copy and pasted into your terminal and ran. | ||
cathy-y marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
### Check Your Understanding | ||
|
||
Here is the help page for the function mean(): | ||
|
||
{width=100%} | ||
|
||
```{r Functions1, echo=FALSE} | ||
quiz( | ||
question("What types of arguments can be passed to mean()?", | ||
answer("Logical (True/False)", correct=TRUE), | ||
answer("Numeric", correct=TRUE), | ||
answer("Text"), | ||
answer("Numeric vectors", correct=TRUE)) | ||
) | ||
``` | ||
|
||
When trying to find the mean of x, NA is the output: | ||
|
||
{width=100%} | ||
|
||
```{r Functions2, echo=FALSE} | ||
quiz( | ||
question("Why did this happen?", | ||
answer("The last value of x needs to be removed using the 'trim' argument"), | ||
answer("Since x is not composed of only numbers and logicals, an error is thrown"), | ||
answer("The NA in x need to be removed using the na.rm argument", correct=TRUE)) | ||
) | ||
``` | ||
|
||
cathy-y marked this conversation as resolved.
Show resolved
Hide resolved
|
||
## R packages | ||
|
||
The first functions we will look at are used to install and load R packages. R packages are units of shareable code, containing functions that facilitate and enhance analyses. In simpler terms, think of R packages as iPhone Applications. Each App has specific capabilities that can be accessed when we install and then open the application. The same holds true for R packages. To use the functions contained in a specific R package, we first need to install the package, then each time we want to use the package we need to "open" the package by loading it. | ||
|
||
### Installing Packages | ||
|
||
In this tutorial, we will be using the "tidyverse" package. This package contains a versatile set of functions designed for easy manipulation of data. | ||
|
||
You should have already installed the "tidyverse" package using RStudio's graphical interface. Packages can also be installed by entering the function `install.packages()` in the console (to install a different package just replace "tidyverse" with the name of the desired package): | ||
|
||
```{r eval = FALSE} | ||
install.packages("tidyverse") | ||
``` | ||
|
||
### Loading packages | ||
|
||
After installing a package, and *everytime* you open a new RStudio session, you need to first load (open) the packages you want to use with the `library()` function. This tells R to access the package's functions and prevents RStudio from lags that would occur if it automatically loaded every downloaded package every time you opened it. | ||
|
||
Packages can be loaded like this: | ||
|
||
```{r eval = FALSE} | ||
library(tidyverse) | ||
``` | ||
|
||
It is a little tricker to load Bioconductor packages, as they are often not stored on the Comprehensive R Archive Network (CRAN) where most packages live. There is a package, however, that lives on CRAN and serves as an interface between CRAN and the Bioconductor packages. | ||
|
||
To load a Bioconductor package, you must first install and load the BiocManager package, like so. | ||
|
||
`install.packages("BiocManager")` | ||
`library("BiocManager)` | ||
|
||
You can then use the function `BiocManager::install()` to install a Bioconductor package. To install the Annotate package, we would execute the following code. | ||
|
||
`BiocManager::install("annotate")` | ||
Sometimes two packages include functions with the same name. A common example is that a `select()` function is included both in the `dplyr` and `MASS` packages. Therefore, to specify the use of a function from a particular package, you can precede the function with a the following notation: `package::function()`. | ||
|
||
### Check Your Understanding | ||
|
||
After installing the `dplyr` package, you encounter this error: | ||
|
||
{width=100%} | ||
|
||
```{r Packages, echo=FALSE} | ||
quiz( | ||
question("Why did this happen?", | ||
answer("The function `select()` wasn't installed"), | ||
answer("The package `dplyr` wasn't loaded after installation", correct=TRUE), | ||
answer("The arguments given to `select()` are invalid")) | ||
) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this is a much clearer way of conveying the learning goals.