Chapter 8
Power
Chapter 4 flashback ...
- H0 true H0 false
- Reject H0 Type I error Correct
- Fail to Correct Type II error
- reject H0
- Type I error is the probability of rejecting the null hypothesis when it is really true
- The probability of making a type I error is denoted as
- Type II error is the probability of failing to reject a null hypothesis that is really false
- The probability of making a type II error is denoted as
- In this chapter, you'll often see these outcomes represented with distributions
To make these representations clear, let's first consider the situation where H0 is, in fact, true:
Now assume that H0 is false (i.e., that some "treatment" has an effect on our dependent variable, shifting the mean to the right)
Thus, power can be defined as follows:
Assuming some manipulation effects the dependent variable, power is the probability that the sample mean will be sufficiently different from the mean under H0 to allow us to reject H0
As such, the power of an experiment depends on three (or four) factors:
Alpha:
- As alpha is moved to the left (for example, if one used an alpha of 0.10 instead of 0.05), beta would decrease, power would increase ... but, the probability of making a type I error would increase
Standard error of the mean:
- The smaller the standard error of the mean (i.e., the less the two distributions overlap), the greater the power. As suggested by the CLT, the standard error of the mean is a function of the population variance and N. Thus, of all the factors mentioned, the only one we can really control is N
Effect Size (d)
- Most power calculations use a term called effect size which is actually a measure of the degree to which the H0 and H1 distributions overlap
As such, effect size is sensitive to both the difference between the means under H0 and H1, and the standard deviation of the parent populations
Specifically:
- In English then, d is the number of standard deviations separating the mean of H0 and the mean of H1
Note: N has not been incorporated in the above formula. You'll see why shortly
Estimating the Effect Size
- As d forms the basis of all calculations of power, the first step in these calculations is to estimate d
- Prior research
- An assessment of the size of effect that would be important
- Rule of thumb:
- small effect d=.20
- medium effect d=.50
- large effect d=.80
Since we do not typically know how big the effect will be a priori, we must make an educated guess on the basis of:
Bringing N back into the picture:
- The calculation of d took into account 1) the difference between the means of H0 and H1 and 2) the standard deviation of the population
However, it did not take into account the third variable the effects the overlap of the two distributions; N
This was done purposefully so that we have one term that represents the relevant variables we, as experimenters, can do nothing about (d) and another representing the variable we can do something about; N
The statistic we use to recombine these factors is called delta and is computed as follows:
- where the specific differs depending on the type of t-test you are computing the power for
Power Calcs for One Sample t
- In the context of a one sample t-test, the alluded to above is simply
Thus, when calculating the power associated with a one sample t, you must go through the following steps:
1) Estimate d, or calculate it using:
- 2) Calculate using:
- 3) Go to the power table, and find the power associated with the calculated given the level of you plan to use (or used) for the t-test
Examples:
- Say I find a new stats textbook and after looking at it, I think it will raise the average mark of the class by about 8 points. From previous classes, I am able to estimate the population standard deviation as 15. If I now test out the new text by using it with 20 new students, what is my power to reject the null hypothesis (that the new students marks are the same as the old students marks)
How many new students would I have to test to bring my power up to .90?
Note: Don't worry about the bit on "noncentrality parameters" in the book
Power Calcs for Independent Samples t
- When an independent t-test is used, the power calculations use the same computation for calculating d, but the calculations of are different because of a different
When sample sizes are equal, you do the following:
1) Estimate d, or calculate it using:
- 2) Calculate using:
- where N is the number of subjects in one of the samples
- 3) Go to the power table, and find the power associated with the calculated given the level of you plan to use (or used) for the t-test
More Examples:
- Assume I am going to run two groups of 18 subjects through a non-smoking study. One group will receive the treatment of interest, the other will not. I expect the treatment to have a medium effect, but I have nothing to go on other than that. Assuming there really is a medium effect, what is my power to detect it?
How many subjects would I need to run to increase my power to 0.80?
Unequal N
- Power calculations for independent samples t-tests become slightly more complicated when Ns are unequal.
The proper way to deal with the situation is to do everything the same as above except to use the harmonic mean of the two Ns (N1 & N2) in the place where you enter N
The harmonic mean of two Ns is denoted and computed as follows:
- So, as a final example, reconsider the power of my smoking study if I had run 24 subjects in my stop smoking group, but only 12 in my control group.
0 comments:
Post a Comment