Inferential Statistics
This script will provide a quick demonstration of conducting inferential statistics.
Basketball Data
The following dataset contains information about the 494 basketball players that played in the NBA during the 2018-2019 season.
Population Distribution
In lecture, we have defined parameters as characteristics which describe the population that we are interested in studying. If we have access to the entire population, we can simply calculate the value of these parameters using techniques we have previously seen in lecture. For example, suppose that we are interested in summarizing the heights (in inches) of the players that played in the 2018-2019 NBA season. As demonstrated previously, we can use R to visualize the population distribution and to summarize the population characteristics.
As a review of our previous coding sessions, let’s try to do the the following:
- Create a visualization to summarize the heights of the NBA players.
- Calculate some the mean and standard deviation of these heights.
Random Sampling
As we do not typically have access to the entire population when conducting a study, we rely on sampling to help us to understand the world around us. We can use the sample_n() command from the dplyr R package to randomly sample individuals from a dataset. To use this function, we simply need to input the name of the dataset we are sampling from and the sample size. For example, we can use the following code to produce a random sample of 25 NBA players:
Now that we have the ability to randomly sample observations from a population, we are ready to use R for inferential statistics. The t.test function can be used to produce confidence intervals and conduct hypothesis tests for the population mean.
Things to input into the t.test function:
- conf.level = confidence level of the interval estimate.
- alternative = one of “two.sided” (default), “greater” or “less”.
- mu = value of the mean under the null hypothesis