Wednesday, December 21, 2011

Statistics Chapter 8


Chapter 8

Power




Chapter 4 flashback ...

    • H0 true H0 false
    • Reject H0 Type I error Correct
    • Fail to Correct Type II error
    • reject H0
    • Type I error is the probability of rejecting the null hypothesis when it is really true
    • The probability of making a type I error is denoted as 
    • Type II error is the probability of failing to reject a null hypothesis that is really false
    • The probability of making a type II error is denoted as 
    In this chapter, you'll often see these outcomes represented with distributions
    To make these representations clear, let's first consider the situation where H0 is, in fact, true:
    Now assume that H0 is false (i.e., that some "treatment" has an effect on our dependent variable, shifting the mean to the right)
    Thus, power can be defined as follows:
    Assuming some manipulation effects the dependent variable, power is the probability that the sample mean will be sufficiently different from the mean under H0 to allow us to reject H0
    As such, the power of an experiment depends on three (or four) factors:

Alpha:

    As alpha is moved to the left (for example, if one used an alpha of 0.10 instead of 0.05), beta would decrease, power would increase ... but, the probability of making a type I error would increase
    :
    The further that H1 is shifted away from H0, the more power (and lower beta) an experiment will have

Standard error of the mean:

    The smaller the standard error of the mean (i.e., the less the two distributions overlap), the greater the power. As suggested by the CLT, the standard error of the mean is a function of the population variance and N. Thus, of all the factors mentioned, the only one we can really control is N

Effect Size (d)

    Most power calculations use a term called effect size which is actually a measure of the degree to which the H0 and H1 distributions overlap
    As such, effect size is sensitive to both the difference between the means under H0 and H1, and the standard deviation of the parent populations
    Specifically:
    In English then, d is the number of standard deviations separating the mean of H0 and the mean of H1
    Note: N has not been incorporated in the above formula. You'll see why shortly

Estimating the Effect Size

    As d forms the basis of all calculations of power, the first step in these calculations is to estimate d
    Since we do not typically know how big the effect will be a priori, we must make an educated guess on the basis of:
    • Prior research
    • An assessment of the size of effect that would be important
    • Rule of thumb:
        • small effect d=.20
          medium effect d=.50
          large effect d=.80

Bringing N back into the picture:

    The calculation of d took into account 1) the difference between the means of H0 and H1 and 2) the standard deviation of the population
    However, it did not take into account the third variable the effects the overlap of the two distributions; N
    This was done purposefully so that we have one term that represents the relevant variables we, as experimenters, can do nothing about (d) and another representing the variable we can do something about; N
    The statistic we use to recombine these factors is called delta and is computed as follows:
    where the specific  differs depending on the type of t-test you are computing the power for

Power Calcs for One Sample t

    In the context of a one sample t-test, the  alluded to above is simply 
    Thus, when calculating the power associated with a one sample t, you must go through the following steps:
    1) Estimate d, or calculate it using:
    2) Calculate  using:
    3) Go to the power table, and find the power associated with the calculated  given the level of  you plan to use (or used) for the t-test

Examples:

    Say I find a new stats textbook and after looking at it, I think it will raise the average mark of the class by about 8 points. From previous classes, I am able to estimate the population standard deviation as 15. If I now test out the new text by using it with 20 new students, what is my power to reject the null hypothesis (that the new students marks are the same as the old students marks)
    How many new students would I have to test to bring my power up to .90?
    Note: Don't worry about the bit on "noncentrality parameters" in the book

Power Calcs for Independent Samples t

    When an independent t-test is used, the power calculations use the same computation for calculating d, but the calculations of  are different because of a different 
    When sample sizes are equal, you do the following:
    1) Estimate d, or calculate it using:
    2) Calculate  using:
    where N is the number of subjects in one of the samples
    3) Go to the power table, and find the power associated with the calculated  given the level of  you plan to use (or used) for the t-test

More Examples:

    Assume I am going to run two groups of 18 subjects through a non-smoking study. One group will receive the treatment of interest, the other will not. I expect the treatment to have a medium effect, but I have nothing to go on other than that. Assuming there really is a medium effect, what is my power to detect it?
    How many subjects would I need to run to increase my power to 0.80?

Unequal N

    Power calculations for independent samples t-tests become slightly more complicated when Ns are unequal.
    The proper way to deal with the situation is to do everything the same as above except to use the harmonic mean of the two Ns (N1 & N2) in the place where you enter N
    The harmonic mean of two Ns is denoted and computed as follows:
    So, as a final example, reconsider the power of my smoking study if I had run 24 subjects in my stop smoking group, but only 12 in my control group.

0 comments: