Summarizing Numerical Data

Author
Affiliation

K.C. Cupido

St. Francis Xavier University

This script will provide a quick demonstration of summarizing data visually.

Pixar Data

The data we will be looking at contains information about Pixar films released prior to 2021, as provided by Wikipedia. The following variables were collected for each film:

  • film: The name of the film.
  • release_date: The date the film premiered.
  • run_time: The length of the film in minutes.
  • rotten_tomatoes: Score from the review-aggregation website Rotten Tomatoes; scored out of 100.
  • box_office_worldwide: Box office gross amount in U.S. dollars worldwide.
  • budget: Movie budget in U.S. dollars.

The data can be found in the following table:

film release_date run_time rotten_tomatoes box_office_worldwide budget
Toy Story 1995-11-22 81 100 373,554,033 30,000,000
A Bug's Life 1998-11-25 95 92 363,258,859 120,000,000
Toy Story 2 1999-11-24 92 100 497,374,776 90,000,000
Monsters, Inc. 2001-11-02 92 96 632,316,649 115,000,000
Finding Nemo 2003-05-30 100 99 871,014,978 94,000,000
The Incredibles 2004-11-05 115 97 631,606,713 92,000,000
Cars 2006-06-09 117 74 461,983,149 120,000,000
Ratatouille 2007-06-29 111 96 623,726,085 150,000,000
WALL-E 2008-06-27 98 95 521,311,860 180,000,000
Up 2009-05-29 96 98 735,099,082 175,000,000
Toy Story 3 2010-06-18 103 98 1,066,969,703 200,000,000
Cars 2 2011-06-24 106 40 559,852,396 200,000,000
Brave 2012-06-22 93 78 538,983,207 185,000,000
Monsters University 2013-06-21 104 80 743,559,607 200,000,000
Inside Out 2015-06-19 95 98 857,611,174 175,000,000
The Good Dinosaur 2015-11-25 93 76 332,207,671 175,000,000
Finding Dory 2016-06-17 97 94 1,028,570,889 200,000,000
Cars 3 2017-06-16 102 69 383,930,656 175,000,000
Coco 2017-11-22 105 97 807,082,196 175,000,000
Incredibles 2 2018-06-15 118 93 1,242,805,359 200,000,000
Toy Story 4 2019-06-21 100 97 1,073,394,593 200,000,000
Onward 2020-03-06 102 88 141,950,121 175,000,000
Soul 2020-12-25 100 96 135,435,315 175,000,000

Descriptive Statistics

The following functions are useful for summarizing numerical data: mean, median, min, max, range, sd, var, fivenum

Try them here:

Basic Plots

There is a generic plot function in R which is useful for creating quick visualizations of a data set. If you pass one numeric variable into this function, a quick plot will be made by creating a point for each observation in the order they occur in the data frame (index). Passing two numeric variables into this function will create a scatterplot, with the first variable corresponding to the x-axis and the second corresponding to the y-axis.

There are some arguments we can add to the plot function to make our visualizations more informative:

  • main = “Title of the Graph”
  • xlab = “x-axis label”
  • ylab = “y-axis label”
  • xlim = c(0,100) (changing the values on the x-axis)
  • ylim = c(0,100) (changing the values on the y-axis)
  • col = “colour” (full list of colours can be found here)
  • pch = number (list of potential shapes for points can be found here)

We can even add text to our plots by incorporating the text function.

Barplots

To draw a basic bar graph, we can simply use the barplot function and specify which variable it is that we want to represent the height of the bars. For example, we can use the following code to quickly plot the Rotten Tomatoes score for each of the Pixar movies:

There are some arguments we can add to the barplot function to make our bar graph more informative.

  • horiz = T (draw the bars horizontally)
  • name = variable to label the bars
  • las = 2 (rotates the labels)
  • cex.names = 0.5 (change the labels font size)

Histograms

We can create histograms of the data using the hist function. For example, we can use the following code to quickly create a histogram of the Rotten Tomatoes scores for the Pixar movies:

Arguments we can add to the hist function to make our graph more informative.

  • breaks = number (number of bars displayed)
  • labels = T (add numbers to bars)

Boxplots

The boxplot function can be used to (unsurprisingly) create boxplots. Keeping with our Rotten Tomatoes examples:

Arguments we can add to the boxplot function to make our graph more informative.

  • horizontal = T (draw the boxplot horizontally)