WSU STAT 360
Class Session 4 Summary and Notes Autumn 2000

Notes from September 22, 2000

Today we practiced using Excel to work problems in statistics and probability. The main points covered were:

How to add the Data Analysis tools from the ToolPak Add-ins.
How to read data files from the diskette included with the class.
Sorting data.
Producing a stem and leaf diagram from the sorted data.
Using the Descriptive Statistics operation from the ToolPak.
How to use the data import wizard to read multi-field data records.
Using elementary statistics functions such as Quartile, Average, Stdev, Skew, and so forth.
How to develop functions of cell values, and anchor the calculations to specific cells.
Making a histogram, editing graphics objects, and moving objects among ply on different workbooks.
Making and interpreting probability plots.
We learned that the San Diego Golf Academy has 190 students and there is still room available for Spring Semester.

Some of these activities were complex and perhaps incomprehensible today. Yet, we will cover all of these subjects again as we perform example calculations in class over the next few weeks.

The notes from last year, which you will find below, have some interesting material regarding how new probability distributions are developed from ones that we currently know. This was a subject that I touched only briefly in last week's notes. There is also an extended discussion of Stirling's approximation and using it to calculate the probability of polio cases is a city.

Notes from the 1999 class.

Main points regarding joint probability density

I managed to describe what a joint probability density is, but I didn't manage to explain why it is important.

Using the example of coin tosses once again. The joint probability density is
```
P(X₁=H,X₂=T)=Probability of a tail following a head.
```
This two dimensional distribution is trivial to calculate. However, note that the joint probability is the product of the probabilities of the separate tosses. That is...
```
P(X₁=H,X₂=T)=P(H)*P(T)=1/4
```
This is a characteristic of events that are independent of one another. If the two events were not independent of one another, then the joint probability would have to account for the degree to which the events could occur together.
Another example of a two dimensional joint probability density is the outcome of a roll of two die. Once again...
```
P(X₁=j,X₂=k)=Probability of rolling "j" and "k."

for example,

P(X₁=1,X₂=6)=P(1)*P(6)=1/36
```
We "collapse" the joint distribution into a single dimension by limiting our inquiry to something simple like the number of heads, in the first example, or the sum of the dots on the two die in the second. Certain events in the joint probability density are equivalent and indistinguishable in the one-dimensional density. The process of collapsing this distribution is a process like integration or, for a discrete random variable, like summation.
Here is an interesting application of a joint distribution. Suppose we sample k individuals at a time from a population. The population itself has a Cdf that is known to us. In each such sample there is a largest element that we refer to as the maximum; call it m. Is there a way to calculate the probability density function of this maximum value?

This seems like a difficult question at first, but it is not difficult at all. First, sort the sample from largest to smallest. We can write down the probability associated with a maximum value as follows.
```
P(X₁=m,X₂<=m,...,X_k<=m)=Probability of k items <=m.
```
We assume the samples are random and independent, and make use of the cumulative distribution ,Cdf(x). By definition then...
```
P(X₁=m,X₂<=m,...,X_k<=m)=Cdf(m)*Cdf(m)*...*Cdf(m)

or, equivalently

P(X₁=m,X₂<=m,...,X_k<=m)=Cdf(m)^k
```
Is this cool, or what? The distribution obtained here is the distribution of the extreme value, and it is of great importance in engineering work. Often we design things for the maximum credible badness that can happen, and the distribution of the extreme value is useful for calculating how big the badness could be.

Notes: The following points are minor, but I needed to fill space somehow.

One student suggested that we put together a table that summarizes as much as possible about the various probability distributions. Good suggestion. I am working on it now.
An additional capability of all spreadsheet programs, and Excel is no exception, is the ability to sort entries. This makes it easy to organize numeric values from smallest to largest, and use the sorted list to prepare median, quartiles, and percentiles. Even if the spreadsheet has no means of making stem and leaf plots, or box plots, sorting makes these graphs easy to prepare by hand.
I promised to make the Excel sample file available on the web-site, and here it is.
Despite what I said about Tim's calculator in the example, below, we learned in class that it calls 40,000! infinity. Can you imagine that? I know of many numbers much larger than 40,000!, such as 40,000!+1.

Answers to selected homework problems only
Hey, if I put all the problems on this page, then where would you find any challenge?

    Problem 3.6
           y    p(y)     yp(y)    y²p(y)
           0   0.1022        0        0
           1   0.3633   0.3633   0.3633
           2   0.3814   0.7628   3.0512
           3   0.1387   0.4161   3.7449
           4   0.0144   0.0576   0.9216
    Sums            1   1.5998    8.081

    3.6.a    p(3)+p(4)=  0.1531
    3.6.b    p(0)+p(1)=  0.4655
    3.6.c    E(y)=      1.5998
             Var(y)=    5.522
             StdDev=    2.35

Problem: How to calculate binomial coefficients when they involve truly enormous numbers.

Tim's calculator has demonstrated that it can calculate just about anything up to 10¹⁰⁰⁰, which is something like 10⁹⁰⁰ times the number of protons in the universe. By the way 10¹⁰⁰ is a number called a googleplex. It is the largest number that I know of with a name of its own. No one has ever named a larger number because they can't think of how a larger number could occur. They hadn't reckoned with Tim's (and Sherrie's as it turns out), calculator(s). Not all calculating machines are capable of handling huge numbers, so I thought I'd show you how to use Stirling's approximation to calculate truly awful binomial factors. The essence of Stirling's approximation is that the ratio...


2.50662827*N^(N+1/2)e^-N/N!->1

as N->infinity

Thus the numerator in the above limit makes a good approximation for the factorial in the denominator. The Table below shows how this approximation fares for numbers from 1! to 15!. The agreement is quite good, and the approximation gets even better, in terms of relative error, at larger values. I suspect Tim's calculator uses this approximation for large factorials.

 

Stirling's Approximation and Factorial Comparison
         =========================
           N  Factorial  Stirling
         =========================
           1         1   0.922137
           2         2   1.919004
           3         6   5.83621
           4        24  23.50618
           5       120  118.0192
           6       720  710.0782
           7      5040  4980.396
           8     40320  39902.4
           9    362880  359536.9
          10   3628800  3598696
          11  39916800  39615625
          12  4.79E+08  4.76E+08
          13  6.23E+09  6.19E+09
          14  8.72E+10  8.67E+10
          15  1.31E+12  1.3E+12
        ==========================

An impressive example, calculating odds of the number of Polio cases in a medium sized town.

Suppose the town has 40000 people and the probability of contracting the dread polio is normally 0.0003 per season before 1954 (the actual rate was 0.000023 in 1998). The probability that, say, 10 people contract polio in the season thus involves calculating ₄₀₀₀₀C₁₀ and (0.9997)³⁹⁹⁹⁰. These are pretty awful to contemplate. However, using Stirling's approximation _NC_(N-n) becomes...


0.39894228*[N/((N-n)*n)]^1/2*N^N/(N-n)^(N-n)*nⁿ

and the binomial factor...

_NC_(N-n)pⁿq^(N-n) becomes...

0.39894228*[N/((N-n)*n)]^1/2*N^N[q/(N-n)]^(N-n)*[p/n]ⁿ

The binomial factor is still beyond the normal calculation, but because we have forged it into the product of powers, we are now in a position to take the logarithm which reduces the product into a sum of factors. We add the factors together and then take the antilogarithm. Does everyone remember how to do such a calculation using logarithms and antilogarithms? This is a potentially interesting project: Towit, show the derivation of these results in detail.

The table that follows shows the binomial factors for this exact problem. I have also calculated the cumulative probability. Note that we have reached a cumulative probability of 1.0 by about the 21st success; whereas by definition we reach 1.0 exactly at the 40,000th success. There is a little round-off error in working with such enormous factors, but this is not a serious problem because the probability of additional successes beyond 20 or so is miniscule. By the way, rather than calculate numbers like these, people would instead use an approximation to the binomial distribution that makes use of the standard normal distribution (see paragraph below). This calculation was meant only to show-off.

         p        q        N        n      (N-n)  P(x=n) Cumulative
   ================================================================
      0.0003   0.9997    40000        1    39999 7.98E-05 7.98E-05
      0.0003   0.9997    40000        2    39998  0.00046  0.00054
      0.0003   0.9997    40000        3    39997 0.001817 0.002357
      0.0003   0.9997    40000        4    39996 0.005415 0.007773
      0.0003   0.9997    40000        5    39995 0.012946 0.020719
      0.0003   0.9997    40000        6    39994 0.025825 0.046544
      0.0003   0.9997    40000        7    39993  0.04419 0.090734
      0.0003   0.9997    40000        8    39992 0.066195  0.15693
      0.0003   0.9997    40000        9    39991 0.088167 0.245097
      0.0003   0.9997    40000       10    39990 0.105711 0.350808
      0.0003   0.9997    40000       11    39989  0.11524 0.466048
      0.0003   0.9997    40000       12    39988  0.11517 0.581217
      0.0003   0.9997    40000       13    39987 0.106254 0.687471
      0.0003   0.9997    40000       14    39986 0.091031 0.778502
      0.0003   0.9997    40000       15    39985 0.072792 0.851294
      0.0003   0.9997    40000       16    39984 0.054571 0.905865
      0.0003   0.9997    40000       17    39983 0.038505  0.94437
      0.0003   0.9997    40000       18    39982  0.02566  0.97003
      0.0003   0.9997    40000       19    39981   0.0162  0.98623
      0.0003   0.9997    40000       20    39980 0.009716 0.995946
      0.0003   0.9997    40000       21    39979  0.00555 1.001496
      0.0003   0.9997    40000       22    39978 0.003026 1.004522
      0.0003   0.9997    40000       23    39977 0.001578   1.0061
      0.0003   0.9997    40000       24    39976 0.000789 1.006889
      0.0003   0.9997    40000       25    39975 0.000378 1.007267
      0.0003   0.9997    40000       26    39974 0.000175 1.007442
      0.0003   0.9997    40000       27    39973 7.76E-05 1.007519
      0.0003   0.9997    40000       28    39972 3.32E-05 1.007552
      0.0003   0.9997    40000       29    39971 1.37E-05 1.007566
      0.0003   0.9997    40000       30    39970 5.49E-06 1.007572
      0.0003   0.9997    40000       31    39969 2.13E-06 1.007574
      0.0003   0.9997    40000       32    39968 7.96E-07 1.007574

The easier way to handle the binomial distribution in this situation is to approximate it as a normal distribution with mean value Np and standard deviation (Npq)^1/2. As another potential project assignment, show that the distribution in the table above behaves exactly this way.

Link forward to the next set of class notes for Friday, September 29, 2000