Chapter 3 :

The Normal Distribution

In Chapter 2, we spent a lot of time plotting distributions and calculating numbers to represent the distributions

This raises the obvious question:

Why Bother?

Answer: because once we know (or assume) the shape of the distribution and have calculated the relevant statistics, we are then able to make certain inferences about values of the variable

In the current chapter, this will be show how this works using the Normal Distribution

Why the Normal Distribution?

As shown by Galton (19th century guy), just about anything you measure turns out to be normally distributed, at least approximately so

That is, usually most of the observations cluster around the mean, with progressively fewer observations out towards the extremes

Thus, if we don't know how some variable is distributed, our best guess is normality

A Cautionary Note

Although most variables are normally distributed, it is not the case that all variables are normally distributed

As examples, consider the following:

Values of a dice roll
Flipping a coin

We will encounter some of these critters (i.e. distributions) later in the course The Relation Between Histograms and Line Graphs

Any Histogram:

Can be represented as a line graph:

Example: Pop Quiz #1

Why line graphs?

Line graphs make it easier to talk of the "area under the curve" between two points where:

area = proportion (or percent) = probability

That is, we could ask what proportion of our class scored between 7 & 9 on the quiz

If we assume that the total area under the curve equals one, then the area between 7 & 9 equals the proportion of our class that scored between 7 & 9 and also indicates our best guess concerning the probability that some new data point would fall between 7 & 9

The problem is that in order to calculate the area under a curve, you must either:

use calculus (find the integral), or
use a table that specifies the areas associated with given values of your variable

The good news is that a table does exist, thereby allowing you to avoid calculus. The bad news is that in order to use it you must:

assume that your variable is normally distributed
use your mean and standard deviation to convert your data into z-scores such that the new distribution has a mean of 0 and a standard deviation of 1 - standard normal distribution or N(0,1)

The Standard Normal Distribution

Mean to Larger Smaller

z z Portion Portion
..... ........ ........ ........

 .98  .3365  .8365  .1635
 .99  .3389  .8389  .1611
1.00  .3413  .8413  .1587
1.01  .3438  .8438  .1562



..... ........ ........ ........

Converting data into Z-scores

It would be too much work to provide a table of area values for every possible mean and standard deviation

Instead, a table was created for the standard normal distribution, and the dataset of interest is converted to a standard normal before using the table

How do we get our mean equal to zero? Simple, subtract the mean from each data point

What about the standard deviation? Well, if we divide all values by a constant, we divide the standard deviation by a constant. Thus, to make the standard deviation 1, we just divide each new value by the standard deviation

In computational form then,

where z is the z-score for the value of X we enter into the above equation

Once we have calculated a z-score, we can then look at the z table in Appendix Z to find the area we are interested in relevant to that value

as we'll see, the z table actually provides a number of areas relevant to any specific z-score

What percent of students scored better than 9.2 out of 10 on the quiz, given that the mean was 7.6 and the standard deviation was 1.6

Because we are interested in the area greater than z=1, we look at the "smaller portion" part of the z table and find the value .1587

Thus, 15.87% of the students scored better than 9.2 on the quiz

How percent of students scored between 7 & 9 on the quiz?

Java Applet (Z-scores)

Here's a nice link to a Java Applet that demonstrates the calculation of a z-scores. The results of the calculations are graphically mapped onto a normal distribution curve.

Open Mind Tree

Statistics Chapter 3

Chapter 3 :

The Normal Distribution

Why Bother?

Why the Normal Distribution?

A Cautionary Note

Why line graphs?

The Standard Normal Distribution

Converting data into Z-scores

Java Applet (Z-scores)

0 comments:

Popular Posts

Visitors

Archives

Infolinks In Text Ads

Featured Posts

Blogger Tips