Computing distribuitional quantities using R

Accessing the Program

All of the quantities associated with the distributions that we look at in class so far can be calculated using a hand calculator. In practice, however, these quantities are more often calculated using a computer. Almost any statistical computing package will calculate these probabilities. If you want to try these calclations by computer, I suggest a package called R. It's free, and calculating what we need is very convenient. A Windows version is available. This program will spawn a variety of windows. One of these will be labeled "console"; you type what I list below in this window. A Linux version is also availabe for free, and a version is available on eden. These versions are command-line; typing R and hitting carriage return will get you a carriage return as a prompt. You type the commands below after this prompt, and hit carriage return. Source code is also available.

Particular distributions

Discrete Distributions

To get the cdf for the binomial, type pbinom(, followed by the number of successes for which the probability is desired, the number of trials, and the success probability for each trial, all separated by a comma, and then ). For example, to calculate the probability of getting 3 or fewer heads from a set of 5 coin tosses, do pbinom(3,5,.5), and to calculate the probability of getting 5 or fewer 1s on 10 rolls of a fair die, do pbinom(5,10,1/6).

We can also do calculations for the negative binomial. This job is complicated because R defines the negative binomial variable differently than does our textbook. According to R, a negative binomial variable is the number of failures that occur up to and including the time of r successes. The text defined the variable as the number of trials, both successes and failures, that occur up to and including the time of r successes. So, in order to calculate the probability that if each trial has success probability .6, then in order to calculate the probability that it takes 8 or fewer trials to get three successes, use pnbinom(8-3,3,.6).

You can also do calculations for the hypergeometric distribution; in this case, use phyper, with four arguments: the number of items of the desired kind occurring in a certain number of draws, the number of that kind of item in the urn, the number of items of the other kind present in the urn, and the numer of items drawn. So, for example, if a box has 20 red and 30 blue tickets, and you draw 11, and you want the probability that 5 or fewer of these drawn tickets are red, you can do phyper(5,20,30,11).

Continuous Distributions

We can obtain the CDF for the exponential, gamma, Weibull, and beta distributions using pexp(x,rate), pgamma(x,shape,rate), pweibull(x,shape,scale), and pbeta(x,a,b). Note that the Weibull scale parameter is the inverse of the second parameter for the Weibull that we introduced in class. This requirement for heightened attention to parameterization is a common problem when manipulating these distributions related to the exponential.

You might also want to get access to the gamma, and maybe beta, functions directly. You can do this with gamma(k) and beta(a,b).

Getting help

> You can get documentation (which, as I have noted above, you really should read if you'll be using the functions) by typing, for example for the cdf of the gamma function, ?pgamma.

Getting the density or PMF and quantiles.

You can get the density or PMF, as appropriate, for each of these distributions by substituting d for p at the beginning of each of these calls; keep the root part the same.

You can get the quantiles for each of these distributions by substituting q for p at the beginning of each of these calls; keep the root part the same. The first argument, which was the potential data value at which the CDF was evaluated, should then be replaced by a number strictly between 0 and 1, determining the quantile.

Back to course home page.