| film | release_date | run_time | rotten_tomatoes | box_office_worldwide | budget |
|---|---|---|---|---|---|
| Toy Story | 1995-11-22 | 81 | 100 | 373,554,033 | 30,000,000 |
| A Bug's Life | 1998-11-25 | 95 | 92 | 363,258,859 | 120,000,000 |
| Toy Story 2 | 1999-11-24 | 92 | 100 | 497,374,776 | 90,000,000 |
| Monsters, Inc. | 2001-11-02 | 92 | 96 | 632,316,649 | 115,000,000 |
| Finding Nemo | 2003-05-30 | 100 | 99 | 871,014,978 | 94,000,000 |
| The Incredibles | 2004-11-05 | 115 | 97 | 631,606,713 | 92,000,000 |
| Cars | 2006-06-09 | 117 | 74 | 461,983,149 | 120,000,000 |
| Ratatouille | 2007-06-29 | 111 | 96 | 623,726,085 | 150,000,000 |
| WALL-E | 2008-06-27 | 98 | 95 | 521,311,860 | 180,000,000 |
| Up | 2009-05-29 | 96 | 98 | 735,099,082 | 175,000,000 |
| Toy Story 3 | 2010-06-18 | 103 | 98 | 1,066,969,703 | 200,000,000 |
| Cars 2 | 2011-06-24 | 106 | 40 | 559,852,396 | 200,000,000 |
| Brave | 2012-06-22 | 93 | 78 | 538,983,207 | 185,000,000 |
| Monsters University | 2013-06-21 | 104 | 80 | 743,559,607 | 200,000,000 |
| Inside Out | 2015-06-19 | 95 | 98 | 857,611,174 | 175,000,000 |
| The Good Dinosaur | 2015-11-25 | 93 | 76 | 332,207,671 | 175,000,000 |
| Finding Dory | 2016-06-17 | 97 | 94 | 1,028,570,889 | 200,000,000 |
| Cars 3 | 2017-06-16 | 102 | 69 | 383,930,656 | 175,000,000 |
| Coco | 2017-11-22 | 105 | 97 | 807,082,196 | 175,000,000 |
| Incredibles 2 | 2018-06-15 | 118 | 93 | 1,242,805,359 | 200,000,000 |
| Toy Story 4 | 2019-06-21 | 100 | 97 | 1,073,394,593 | 200,000,000 |
| Onward | 2020-03-06 | 102 | 88 | 141,950,121 | 175,000,000 |
| Soul | 2020-12-25 | 100 | 96 | 135,435,315 | 175,000,000 |
Summarizing Numerical Data
This script will provide a quick demonstration of summarizing data visually.
Pixar Data
The data we will be looking at contains information about Pixar films released prior to 2021, as provided by Wikipedia. The following variables were collected for each film:
film: The name of the film.release_date: The date the film premiered.run_time: The length of the film in minutes.rotten_tomatoes: Score from the review-aggregation website Rotten Tomatoes; scored out of 100.box_office_worldwide: Box office gross amount in U.S. dollars worldwide.budget: Movie budget in U.S. dollars.
The data can be found in the following table:
Descriptive Statistics
The following functions are useful for summarizing numerical data: mean, median, min, max, range, sd, var, fivenum
Try them here:
Basic Plots
There is a generic plot function in R which is useful for creating quick visualizations of a data set. If you pass one numeric variable into this function, a quick plot will be made by creating a point for each observation in the order they occur in the data frame (index). Passing two numeric variables into this function will create a scatterplot, with the first variable corresponding to the x-axis and the second corresponding to the y-axis.
There are some arguments we can add to the plot function to make our visualizations more informative:
- main = “Title of the Graph”
- xlab = “x-axis label”
- ylab = “y-axis label”
- xlim = c(0,100) (changing the values on the x-axis)
- ylim = c(0,100) (changing the values on the y-axis)
- col = “colour” (full list of colours can be found here)
- pch = number (list of potential shapes for points can be found here)
We can even add text to our plots by incorporating the text function.
Barplots
To draw a basic bar graph, we can simply use the barplot function and specify which variable it is that we want to represent the height of the bars. For example, we can use the following code to quickly plot the Rotten Tomatoes score for each of the Pixar movies:
There are some arguments we can add to the barplot function to make our bar graph more informative.
- horiz = T (draw the bars horizontally)
- name = variable to label the bars
- las = 2 (rotates the labels)
- cex.names = 0.5 (change the labels font size)
Histograms
We can create histograms of the data using the hist function. For example, we can use the following code to quickly create a histogram of the Rotten Tomatoes scores for the Pixar movies:
Arguments we can add to the hist function to make our graph more informative.
- breaks = number (number of bars displayed)
- labels = T (add numbers to bars)
Boxplots
The boxplot function can be used to (unsurprisingly) create boxplots. Keeping with our Rotten Tomatoes examples:
Arguments we can add to the boxplot function to make our graph more informative.
- horizontal = T (draw the boxplot horizontally)