Skip to content
christophergandrud edited this page Jul 8, 2012 · 2 revisions

We need tools to describe or summarise our data since a whole data set or even all the values of one variable are basically impossible to understand by themselves. R has many tools for summarising data.

## summary

The summary command in base R is one of the most basic tools for summarising our variables. To summarise every variable in our dataframe we can just put the data frame object into the summary command. To summarise the cars data set, which is automatically loaded into R, we just type:

summary(cars)
##      speed           dist    
##  Min.   : 4.0   Min.   :  2  
##  1st Qu.:12.0   1st Qu.: 26  
##  Median :15.0   Median : 36  
##  Mean   :15.4   Mean   : 43  
##  3rd Qu.:19.0   3rd Qu.: 56  
##  Max.   :25.0   Max.   :120  

Using summary on dataframe tells us a number of things for the two variables in the cars dataframe:

  • Min.: the minimum value,

  • Max.: the maximum value,

  • Median: the median (middle) value,

  • Mean: the mean (average) value,

  • 1st Qu.: the value separating the bottom 25% of the data from the top 75%,

  • 3rd Qu.: the value separating the top 25% of the data from the bottom 75%.

If we only want to summarise one variable we can use the $. Imagine we only want to summarize the speed variable. We simply type:

summary(cars$speed)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     4.0    12.0    15.0    15.4    19.0    25.0 

The summary command can be used on almost any object and gives us a quick sense of what the object is.

Graphical Summaries

Graphs are often a more effective way to summarise our data. Some graphical summary techniques are discussed in the Basic Graphics and ggplot2 pages of this wiki.