Skip to content

Basic Data in R: Objects and Dataframes

christophergandrud edited this page Sep 3, 2012 · 3 revisions

Objects

R is "object-oriented". Basically, most things we do in R are built around objects. Objects are like the nouns of the R language. They are things and can be many types of things from a single word, to a graph, to the HTML code of a webpage, to a large data set.

In RStudio you can see all of your objects in the Workspace window.

For much more information on the topics covered in this page see:

Chapter 1: Matloff, Norman. 2011. The Art of R Programming: A Tour of Statistical Design Software. San Francisco: No Starch Press.

Examples

Strings and Numbers

To create a simple object called x that contains the sentnece "Hello World" (letters and words are called character strings in R) just type in the Console:

x <- "Hello World"

The <- basically takes the letters in "Hello World" and puts them into the object x.

The [1] in the output means that "Hello World" is on the first row of our object (clearly there is only one row in this example).

To see what is inside the x just type x into the Console. You should get the following output:

x
## [1] "Hello World"

You can also click on x in RStudio's Workplace window and it will show you the same information.

Numbers can be put into objects in the same way. For example:

x <- 10
x
## [1] 10

Vectors

Multiple numbers or words can be put into an object in a particular order. For example, let's create an object called countryNames from this list of country names:

Country
Albania
Botswana
Cambodia

To do this we type into the console:

countryNames <- c("Albania", "Botswana", "Cambodia")
countryNames
## [1] "Albania"  "Botswana" "Cambodia"

The c() (concatenate) command simply combines the three character strings into one object. This type of simple object is called a vector.

Note: object names like x or countryNames can't have spaces and they can't start with a number. If you want to have an object name with multiple words in it, either separate the words with a . (e.g. country.names) or capitalize the first letter of each word (e.g. countryNames). The second approach is often called 'camel back' because the words' resemblance to a camel.

Here is an example with numbers the numbers 1, 10, 423:

someNumbers <- c(1, 10, 423)
someNumbers
## [1]   1  10 423

Notice that you don't need to put "" around numbers the way you do with character strings.

Dataframes

Another type of R object, and probably the one we will use the most in this class is called a dataframe. Dataframes are what we generally think data sets look like. For example:

Country Population (in millions)
Albania 2,800,000
Botswana 2,000,000
Cambodia 14,800,000

One way to think of this data frame is as two equal length vector objects attached to each other. So, we can create this dataframe by first creating the two vector objects Country and Population.

Country <- c("Albania", "Botswana", "Cambodia")

Population <- c(2.8, 2, 14.8)

Then we attach them using the data.frame command:

myData <- data.frame(Country, Population, stringsAsFactors = FALSE)

If you are curious about what stringsAsFactors means, see the data types wiki page (underdevelopment).

We can check to see what the dataframe looks like:

myData
##    Country Population
## 1  Albania        2.8
## 2 Botswana        2.0
## 3 Cambodia       14.8

Luckily, we don't normally need to build dataframes like this since we can usually import data directly into R as data frames (see the Importing Data, Basic page). But it is useful to know what dataframes are and how they work.

You can isolate particular columns of data frames with the component selector $. If we wanted to just see the countries' populations in the myData object we would type:

myData$Population
## [1]  2.8  2.0 14.8

This is important for many tasks later in the course.