-
Notifications
You must be signed in to change notification settings - Fork 24
Importing Data, Basic
The easiest way to get data from a spreadsheet program like Excel into R is to save it as a .csv
formatted file (CSV stands for comma-separated values. Basically the data set's text is separated by commas which programs like Excel know how to turn into rows and columns.).
Note: the first row of the
.csv
file should contain the variable names.
If you have your file stored in .csv
formatted files saved on your computer you can simply use the read.csv
command from the utils
package (loaded by default) to open it in R as a dataframe. For example, to load a file with the name myFile.csv
:
# Create myData data frame from myFile.csv
myData <- read.csv("myFile.csv")
Also, the foreign
package allows you to import data stored in formats that are 'foreign' to R, such as .csv
or Stata .dta
formats.
If you have your file stored in .csv
format and it is based in the web on a non-secure site (like your Dropbox Public
folder) you can simply use getURL
from the RCurl
package to download the document. Then use read.csv
from the foreign
package to load it into R as a data frame. For example:
# Load required packages
library(RCurl)
# Create an object for the URL where your data is stored.
url <- "http://myFile.csv"
# Use getURL from RCurl to download the file.
myData <- getURL(url)
# Finally let R know that the file is in .csv format so that it can create a data frame.
myData <- read.csv(myData)
Data stored in .csv
files on secure sites like github (these have URLs that start with https
) can also be downloaded and turned into R data frames. This process is similar to that for http
sites. The only difference is that you just add one extra command: textConnection
(this is in base
R). For example:
# Load required packages
library(RCurl)
# Create an object for the URL where your data is stored.
url <- "https://myFile.csv"
# Use getURL from RCurl to download the file.
myData <- getURL(url)
# Finally let R know that the file is in .csv format so that it can create a data frame.
myData <- read.csv(textConnection(myData))
When you use `getURL with an HTTP url you might get an error message like
Error in function (type, msg, asError = TRUE) :
SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
There is a simple solution: just add ssl.verifypeer = FALSE
. So in this example you would type:
myData <- getURL(url, ssl.verifypeer = FALSE)
For more details see this blog post.