When you use an ANOVA and find a significant F, all that says is that the various means are not all equal
It does not say which means are different
The purpose of this chapter is describe a number of different ways of testing which means are different
Before describing the tests, it is necessary to consider two different ways of thinking about error and how they are relevant to doing multiple comparisons
Error Rate per Comparison (PC)
The collection of comparisons we do is described as the "family"
The familywise error rate is the probability that at least one of these comparisons will include a type I error
Assuming that a ¢ is the per comparison error rate, then:
but, the familywise error: a = 1 - (1-a ¢ )c
=1 - (0.95)2 = 1 - 0.9025 = 0.0975
The basic problem then, is that if we are doing many comparisons, we want to somehow control our familywise error so that we don�t end up concluding that differences are there, when they really are not
The various tests we will talk about differ in terms of how they do this
They will also be categorized as being either "A priori" or "post hoc"
Post hoc: Post hoc tests are comparisons the experimenter has decided to test after collecting the data, looking at the means, and noting which means "seem" different.
Steve: Significant F issue
- See page 351 for a very complete description of the Morphine Tolerance study .. Seigel (1975)
Highlights:
- paw lick latency as a measure of pain resistance
- tolerance to morphine develops quickly
- notion of a compensatory mechanism
- this mechanism very context dependent
Source | df | SS | MS | F |
Treatment | 4 | 3497.60 | 874.40 | 27.33 |
Within | 35 | 1120.00 | 32.00 | |
Total | 39 | 2455.22 |
A Priori Comparisons
Group Mc-M versus Group M-M
Must first understand the notion of a linear combination of means:
To make a linear combination a linear contrast, we simply impose the restriction that
So, we select our values of in a way that defines the contrast we are interested in
For example, say we had three means and we want to compare the first two .. ()
You can basically make any contrast you want as long as
Of course, the trick then is testing if the contrast is significant:
SS for contrasts:
Contrasts always are assumed to have 1 df
When we run the overall ANOVA, we find a SStreat of 11.67
Contrast 1:
But what about when it gets more complicated � say you have seven means, and you want to compare the average of the first 2 with the average of the last 5
The trick .. think of those sets of means forming 2 groups, Group A (means 1 & 2) and Group B (the rest). Now, write out each mean, and before all of the Group A means, put the number of Group B means, then before all the Group B means, put the number of Group A means. Then, give one of the groups a plus sign, the other a minus:
If you wanted to compare the first three means with the last 4, it would be:
Know about the unequal n stuff, but don�t worry about it (you will only be asked to do equal n)
Critical F(1,35) = 4.12 (about 5.5 with alpha .01)
See the text (p. 359) for a detailed description of how the SSs for these contrasts where calculated
Note: With 4 contrasts, FWerror @ 0.20 � could reduce this by using a lower PC level of alpha, or by doing less comparisons
Orthogonal Comparisons
For example, if you find that mean1 is greater than the average of means 2 & 3, that tells us nothing about whether mean4 is bigger than mean 5 � those contrasts are independent
However, if you find that mean1 is greater than the average of means 2 & 3, then chances are that mean1 is greater than mean2 � those two contrasts would not be independent
When members of a set of contrasts are independent of one another, they are termed "orthogonal contrasts"
The total SSs for a complete set of orthogonal contrasts always equals SStreat
This is a nice property as it is like the SStreat is being composed into a set of independent chunks, each of relevance to the experimenter
Bonferroni t (Dunn�s test)
One way to control this, is to try hard to limit the number of comparisons (perhaps using contrasts instead of a bunch of t-tests)
Another way is to reduce your per comparison level of alpha to compensate for the inflation caused by doing multiple tests
If you want to continue using the tables we have, then you can only reduce alpha in crude steps (e.g., from .05 to .01)
In many cases, that may be overkill (e.g., three comparisons)
Dunn�s test allows you to do this same thing in a more precise manner
If that comparison is a planned t-test, then t¢ simple equals your tobtained and has the same degrees of freedom as that t
If that comparison is a linear contrast, the t¢ equals the square root of the F associated with that contrast, and has a degrees of freedom equal to that of MSerror
Now compare each t¢ obtained with that critical t value, which is really the critical t associated with a per comparison alpha of the desired level of family-wise error divided by the number of comparisons
* Don�t worry about multi-stage bonferronis
Post-Hoc Comparisons
However, there are situations in which the experiment really is not sure what outcome(s) to expect
In these situations, the correct thing to do is one of a number of post-hoc comparison procedures depending on the experimental context, and how liberal versus conservative the experimenter wishes to be
We will talk about the following procedures:
- Fisher�s Least Significant Difference Procedure
- The Newman-Keuls Test
- Tukey�s Test
- The Ryan Procedure
- The Sheffe Test
- Dunnett�s Test
In fact, this procedure is not different from the a priori t-test described earlier EXCEPT that it requires that the F test (from the ANOVA) be significant prior to computing the t values
The requirement of a significant overall F insures that the family-wise error for the complete null hypothesis (i.e., that all the means are equal) will remain at alpha
However, it does nothing to control for inflation of the family-wise error when performing the actual comparisons
This is OK if you have only three means (see text for a description of why)
But if you have more than three, then the LSD test is very liberal (i.e., high probability of Type I errors), too liberal for most situations
Thus, it is important to understand it first
The mathematical definition of it is:
Note that this statistic is very similar to the t statistic .. in fact
4 10 11 24 29
So, comparing the largest and smallest:
Note how large the qcritical is � that is because it controls for the number of means there is (as Steve will hopefully explain)
Newman-Keuls Test
How to:
Example
Step 1:
W3 To be filled in
W4 To be filled in
|
|
|
| |||||
|
| |||||||
M-S | ||||||||
M-M |
|
| ||||||
S-S |
|
|
| |||||
S-M |
|
|
|
| ||||
|
|
|
|
|
|
|
Step 5:
M-S M-M S-S S-M Mc-M
4 10 11 24 29
--- ---------- -----------
In words then, these results suggest the following.First, the rats who received morphine on all occasions are acting the same as those who received saline on all occasions .. suggesting that a tolerance has developed very quickly.
Those rats who received morphine 3 times, but then only saline on the test trial are significantly more sensitive to pain than those who received saline all the time, or morphine all the time. This suggests that a compensatory mechanism was operating, making the rats hypersensitive to pain when not opposed by morphine.
Finally, those rats who received morphine in their cage three times before receiving it in the testing context seem as non-sensitive to pain as those who received morphine for the first time at test, both groups being significantly less sensitive to pain that the S-S or M-M groups. This suggests the compensatory mechanism is very context specific and does not operate when the context is changed.
The problem with the Newman-Keuls is that is doesn�t control FW error very well (i.e., it tends to be fairly liberal .. too liberal for the taste of many)
Situation 1:
So, as the number of means increases, FW error can increase considerably and is typically around 0.10 for most experiments (four or five means)
The real difference is that instead of comparing the difference between a pair of means to a q value tied to the range of those means � the q of the largest range is always used (qHSD = qmax)
So in the morphine rats example, we would compare each difference to the q5 value of 8.14 � producing the following results
|
|
|
| |||||
|
| |||||||
M-S | ||||||||
M-M |
|
| ||||||
S-S |
|
|
| |||||
S-M |
|
|
|
| ||||
|
|
|
|
|
|
|
M-S M-M S-S S-M Mc-M
4 10 11 24 29
------------------- ------------
However, given that using the procedure requires either specialized tables or a statistical software package .. you will never be required to actually do it
Thus, get the general idea, but don�t worry about details
As before, a linear contrast is always described by the equation:
Doing this will hold FW error constant for all linear contrasts
However, there is a cost � the Sheffe is the least powerful of all the post-hoc procedures (i.e., very conservative)
Moral: Don�t use when you only want to compare means or when you can justify comparisons a-priori
Note, this situation is somewhat different from the previous post-hoc tests in that it is somewhat a-priori (i.e., the "position" of the control condition can vary .. Steve will explain .. hopefully)
This allows the Dunnett�s test to be more powerful � FW error can be controlled in less stringent ways
All that is really involved is using a different t table when looking up the critical t � td
However, when using this test, the most typical thing to do is to calculate a critical difference .. that is, when the difference between any two means exceeds this value .. those means are signficantly different
M-S M-M S-S S-M Mc-M
4 10 11 24 29
We get the td value from the table with k=5 and dferror=35 � producing a value of 2.56
Critical Difference = = 7.24
So, assuming the S-S group is the control group � any mean that is more than 7.24 units from it is considered significantly different
That is � the S-M and Mc-M groups
Nonetheless, read the "comparison of the alternative procedures" and "which test?" sections of the text to make sure you have a good feel for this
Make sure you understand the distinction between a-prior versus post-hoc tests � and the distinction between the tests within each category
0 comments:
Post a Comment