Simulating Probability Models

Author

Affiliation

K.C. Cupido

St. Francis Xavier University

Simulating random variables is important in applied statistics for a variety of reasons. As we use probability models to mimic variation in the world, simulations can allow us to better make sense of how averages and variability play out in the long run. We can use simulations to approximate sampling distributions of data. Further, as most relationships we will be interested in studying involve some type of randomness, simulations can be a convenient way to represent uncertainties in forecasts. This script will introduce some basic ideas and tools required to perform simulations in R.

Simuating Probability Models

According to Statistics Canada, 51.4% of babies born in Canada are boys, and 48.6% are girls. Further, St. Martha’s Hospital in Antigonish, Nova Scotia reports an average of around 350 births per year. Suppose we are interested in modelling the number of girls that will be born in a year. We can model births with the binomial distribution, by using the rbinom function. To use this function, we just need to include the number of simulated observations we would like to generate, the number of trials, and the probability of success on each trial.

For example, the following code will show us what could happen in 350 births at St. Martha’s Hospital:

To get a sense of the distribution of what could happen, let’s simulate this 1000 times:

Now let’s suppose that the weight for babies born at the hospital is normally distributed with a mean of 7.5lbs and standard deviation 1.5lbs. We can simulate this with the rnorm distribution, where we will just need to include the number of simulated observations we would like to generate, along with the mean and standard deviation of the normal distribution.

Here is code to generate the weight of one randomly chosen baby:

Suppose we select 10 babies at random. What can we say about their weight?

To simulate the distribution of the weight 1000 times: