-
Notifications
You must be signed in to change notification settings - Fork 24
Summarising Data
We need tools to describe or summarise our data since a whole data set or even all the values of one variable are basically impossible to understand by themselves. R has many tools for summarising data.
## summary
The summary
command in base R is one of the most basic tools for summarising our variables. To summarise every variable in our dataframe we can just put the data frame object into the summary
command. To summarise the cars data set, which is automatically loaded into R, we just type:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2
## 1st Qu.:12.0 1st Qu.: 26
## Median :15.0 Median : 36
## Mean :15.4 Mean : 43
## 3rd Qu.:19.0 3rd Qu.: 56
## Max. :25.0 Max. :120
Using summary
on dataframe tells us a number of things for the two variables in the cars dataframe:
-
Min.
: the minimum value, -
Max.
: the maximum value, -
Median
: the median (middle) value, -
Mean
: the mean (average) value, -
1st Qu.
: the value separating the bottom 25% of the data from the top 75%, -
3rd Qu.
: the value separating the top 25% of the data from the bottom 75%.
If we only want to summarise one variable we can use the $
. Imagine we only want to summarize the speed variable. We simply type:
summary(cars$speed)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.0 12.0 15.0 15.4 19.0 25.0
The
summary
command can be used on almost any object and gives us a quick sense of what the object is.
Graphs are often a more effective way to summarise our data. Some graphical summary techniques are discussed in the Basic Graphics and ggplot2 pages of this wiki.