| speed | dist |
|---|---|
| 4 | 2 |
| 4 | 10 |
| 7 | 4 |
| 7 | 22 |
| 8 | 16 |
| 9 | 10 |
| 10 | 18 |
| 10 | 26 |
| 10 | 34 |
| 11 | 17 |
| 11 | 28 |
| 12 | 14 |
| 12 | 20 |
| 12 | 24 |
| 12 | 28 |
| 13 | 26 |
| 13 | 34 |
| 13 | 34 |
| 13 | 46 |
| 14 | 26 |
| 14 | 36 |
| 14 | 60 |
| 14 | 80 |
| 15 | 20 |
| 15 | 26 |
| 15 | 54 |
| 16 | 32 |
| 16 | 40 |
| 17 | 32 |
| 17 | 40 |
| 17 | 50 |
| 18 | 42 |
| 18 | 56 |
| 18 | 76 |
| 18 | 84 |
| 19 | 36 |
| 19 | 46 |
| 19 | 68 |
| 20 | 32 |
| 20 | 48 |
| 20 | 52 |
| 20 | 56 |
| 20 | 64 |
| 22 | 66 |
| 23 | 54 |
| 24 | 70 |
| 24 | 92 |
| 24 | 93 |
| 24 | 120 |
| 25 | 85 |
Simple Linear Regression
Cars Data
Consider a simple example of how the speed of a car affects its stopping distance. To examine this relationship, we will use the cars dataset, which is a default dataset within the R software.
This data was recorded in the the 1920s, and has two variables:
speed: the speed of cars (in mph)dist: the distance that it takes the car to stop (in ft)
Considering that we are interested in studying how far a car will travel before coming to a complete stop when traveling at a certain speed, a natural starting point will be to visually represent the data we will be working with to see if there are any patterns.
Perhaps intuitively, from this plot we observe a positive relationship between the two variables; as the speed of the cars increases, the cars travel further before coming to a complete stop. Now letβs see if we can develop a regression model to explain the relationship.
To put this into a regression framework, we will denote the predictor variable speed as \(X\), which will be used to explain the response variable dist, which will be denoted \(Y\).
Linear regression models are very simply fit in R with the lm function. In general, we can think of the syntax of the lm function as response ~ predictor. For the cars data, we can use the following line of code to fit a simple linear regression model using the method of least squares.
Once we have fit the model to the data, we can use the abline function to add a line of best fit to our plot.