Skip to content

Importing Data, Basic

christophergandrud edited this page Jul 11, 2012 · 18 revisions

From a .csv file

The easiest way to get data from a spreadsheet program like Excel into R is to save it as a .csv formatted file (CSV stands for comma-separated values. Basically the data set's text is separated by commas which programs like Excel know how to turn into rows and columns.).

Note: the first row of the .csv file should contain the variable names.

If you have your file stored in .csv formatted files saved on your computer you can simply use the read.csv command from the utils package (loaded by default) to open it in R as a dataframe. For example, to load a file with the name myFile.csv:

    # Create myData data frame from myFile.csv
    myData <- read.csv("myFile.csv")

Also, the foreign package allows you to import data stored in formats that are 'foreign' to R, such as .csv or Stata .dta formats.

From the non-secure web (http)

If you have your file stored in .csv format and it is based in the web on a non-secure site (like your Dropbox Public folder) you can simply use getURL from the RCurl package to download the document. Then use read.csv from the foreign package to load it into R as a data frame. For example:

    # Load required packages 
    library(RCurl)

    # Create an object for the URL where your data is stored.
    url <- "http://myFile.csv"

    # Use getURL from RCurl to download the file.
    myData <- getURL(url)

    # Finally let R know that the file is in .csv format so that it can create a data frame.
    myData <- read.csv(myData)   

From the secure web (https)

Data stored in .csv files on secure sites like github (these have URLs that start with https) can also be downloaded and turned into R data frames. This process is similar to that for http sites. The only difference is that you just add one extra command: textConnection (this is in base R). For example:

    # Load required packages 
    library(RCurl)

    # Create an object for the URL where your data is stored.
    url <- "https://myFile.csv"

    # Use getURL from RCurl to download the file.
    myData <- getURL(url)

    # Finally let R know that the file is in .csv format so that it can create a data frame.
    myData <- read.csv(textConnection(myData))   

When you use `getURL with an HTTP url you might get an error message like

    Error in function (type, msg, asError = TRUE)  : 
    SSL certificate problem, verify that the CA cert is OK. Details:
    error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

There is a simple solution: just add ssl.verifypeer = FALSE. So in this example you would type:

    myData <- getURL(url, ssl.verifypeer = FALSE)                

For more details see this blog post.