Chapter 8

Power

Chapter 4 flashback ...

H₀ true H₀ false
Reject H₀ Type I error Correct
Fail to Correct Type II error
reject H₀
Type I error is the probability of rejecting the null hypothesis when it is really true
The probability of making a type I error is denoted as
Type II error is the probability of failing to reject a null hypothesis that is really false
The probability of making a type II error is denoted as

In this chapter, you'll often see these outcomes represented with distributions

To make these representations clear, let's first consider the situation where H

₀

is, in fact, true:

Now assume that H

₀

is false (i.e., that some "treatment" has an effect on our dependent variable, shifting the mean to the right)

Thus, power can be defined as follows:

Assuming some manipulation effects the dependent variable, power is the probability that the sample mean will be sufficiently different from the mean under H

₀

to allow us to reject H

₀

As such, the power of an experiment depends on three (or four) factors:

Alpha:

As alpha is moved to the left (for example, if one used an alpha of 0.10 instead of 0.05), beta would decrease, power would increase ... but, the probability of making a type I error would increase

The further that H

₁

is shifted away from H

₀

, the more power (and lower beta) an experiment will have

Standard error of the mean:

The smaller the standard error of the mean (i.e., the less the two distributions overlap), the greater the power. As suggested by the CLT, the standard error of the mean is a function of the population variance and N. Thus, of all the factors mentioned, the only one we can really control is N

Effect Size (d)

Most power calculations use a term called effect size which is actually a measure of the degree to which the H₀ and H₁ distributions overlap

As such, effect size is sensitive to both the difference between the means under H₀ and H₁, and the standard deviation of the parent populations

Specifically:

In English then, d is the number of standard deviations separating the mean of H

₀

and the mean of H

₁

Note: N has not been incorporated in the above formula. You'll see why shortly

Estimating the Effect Size

As d forms the basis of all calculations of power, the first step in these calculations is to estimate d

Since we do not typically know how big the effect will be a priori, we must make an educated guess on the basis of:

Prior research
An assessment of the size of effect that would be important
Rule of thumb:

small effect d=.20

medium effect d=.50

large effect d=.80

Bringing N back into the picture:

The calculation of d took into account 1) the difference between the means of H

₀

and H

₁

and 2) the standard deviation of the population

However, it did not take into account the third variable the effects the overlap of the two distributions; N

This was done purposefully so that we have one term that represents the relevant variables we, as experimenters, can do nothing about (d) and another representing the variable we can do something about; N

The statistic we use to recombine these factors is called delta and is computed as follows:

where the specific

differs depending on the type of t-test you are computing the power for

Power Calcs for One Sample t

In the context of a one sample t-test, the

alluded to above is simply

Thus, when calculating the power associated with a one sample t, you must go through the following steps:

1) Estimate d, or calculate it using:

2) Calculate

using:

3) Go to the power table, and find the power associated with the calculated

given the level of

you plan to use (or used) for the t-test

Examples:

Say I find a new stats textbook and after looking at it, I think it will raise the average mark of the class by about 8 points. From previous classes, I am able to estimate the population standard deviation as 15. If I now test out the new text by using it with 20 new students, what is my power to reject the null hypothesis (that the new students marks are the same as the old students marks)

How many new students would I have to test to bring my power up to .90?

Note: Don't worry about the bit on "noncentrality parameters" in the book

Power Calcs for Independent Samples t

When an independent t-test is used, the power calculations use the same computation for calculating d, but the calculations of

are different because of a different

When sample sizes are equal, you do the following:

1) Estimate d, or calculate it using:

2) Calculate

using:

where N is the number of subjects in one of the samples

3) Go to the power table, and find the power associated with the calculated

given the level of

you plan to use (or used) for the t-test

More Examples:

Assume I am going to run two groups of 18 subjects through a non-smoking study. One group will receive the treatment of interest, the other will not. I expect the treatment to have a medium effect, but I have nothing to go on other than that. Assuming there really is a medium effect, what is my power to detect it?

How many subjects would I need to run to increase my power to 0.80?

Unequal N

Power calculations for independent samples t-tests become slightly more complicated when Ns are unequal.

The proper way to deal with the situation is to do everything the same as above except to use the harmonic mean of the two Ns (N

₁

& N

₂

) in the place where you enter N

The harmonic mean of two Ns is denoted and computed as follows:

So, as a final example, reconsider the power of my smoking study if I had run 24 subjects in my stop smoking group, but only 12 in my control group.

Open Mind Tree

Statistics Chapter 8

Chapter 8

Power

Chapter 4 flashback ...

Alpha:

Standard error of the mean:

Effect Size (d)

Estimating the Effect Size

Bringing N back into the picture:

Power Calcs for One Sample t

Examples:

Power Calcs for Independent Samples t

More Examples:

Unequal N

0 comments:

Popular Posts

Visitors

Archives

Infolinks In Text Ads

Featured Posts

Blogger Tips