-
Notifications
You must be signed in to change notification settings - Fork 24
Basic Data in R: Objects and Dataframes
R is "object-oriented". Basically, most things we do in R are built around objects. Objects are like the nouns of the R language. They are things and can be many types of things from a single word, to a graph, to the HTML code of a webpage, to a large data set.
In RStudio you can see all of your objects in the Workspace window.
For much more information on the topics covered in this page see:
Chapter 1: Matloff, Norman. 2011. The Art of R Programming: A Tour of Statistical Design Software. San Francisco: No Starch Press.
To create a simple object called x
that contains the sentnece "Hello World" (letters and words are called character strings in R) just type in the Console:
x <- "Hello World"
The <-
basically takes the letters in "Hello World" and puts them into the object x
.
The [1]
in the output means that "Hello World" is on the first row of our object (clearly there is only one row in this example).
To see what is inside the x
just type x
into the Console. You should get the following output:
x
## [1] "Hello World"
You can also click on x
in RStudio's Workplace window and it will show you the same information.
Numbers can be put into objects in the same way. For example:
x <- 10
x
## [1] 10
Multiple numbers or words can be put into an object in a particular order. For example, let's create an object called countryNames
from this list of country names:
Country |
---|
Albania |
Botswana |
Cambodia |
To do this we type into the console:
countryNames <- c("Albania", "Botswana", "Cambodia")
countryNames
## [1] "Albania" "Botswana" "Cambodia"
The c()
(concatenate) command simply combines the three character strings into one object. This type of simple object is called a vector.
Note: object names like
x
orcountryNames
can't have spaces and they can't start with a number. If you want to have an object name with multiple words in it, either separate the words with a.
(e.g.country.names
) or capitalize the first letter of each word (e.g.countryNames
). The second approach is often called 'camel back' because the words' resemblance to a camel.
Here is an example with numbers the numbers 1, 10, 423:
someNumbers <- c(1, 10, 423)
someNumbers
## [1] 1 10 423
Notice that you don't need to put ""
around numbers the way you do with character strings.
Another type of R object, and probably the one we will use the most in this class is called a dataframe. Dataframes are what we generally think data sets look like. For example:
Country | Population (in millions) |
---|---|
Albania | 2,800,000 |
Botswana | 2,000,000 |
Cambodia | 14,800,000 |
One way to think of this data frame is as two equal length vector objects attached to each other. So, we can create this dataframe by first creating the two vector objects Country
and Population
.
Country <- c("Albania", "Botswana", "Cambodia")
Population <- c(2.8, 2, 14.8)
Then we attach them using the data.frame
command:
myData <- data.frame(Country, Population, stringsAsFactors = FALSE)
If you are curious about what stringsAsFactors
means, see the data types wiki page (underdevelopment).
We can check to see what the dataframe looks like:
myData
## Country Population
## 1 Albania 2.8
## 2 Botswana 2.0
## 3 Cambodia 14.8
Luckily, we don't normally need to build dataframes like this since we can usually import data directly into R as data frames (see the Importing Data, Basic page). But it is useful to know what dataframes are and how they work.
You can isolate particular columns of data frames with the component selector $
. If we wanted to just see the countries' populations in the myData
object we would type:
myData$Population
## [1] 2.8 2.0 14.8
This is important for many tasks later in the course.