Chapter 11
Simple Analysis of Variance
T-Test reminder:
- During PsyB07 we talked about analyses that could be conducted to test whether pairs of means were significantly different.
We could now look at the exam marks for those students and compare the means of the two groups using a "between-subjects" (or independent samples) t-test.
The critical point of the previous example is the following:
- The basic logic for testing whether or not two means are different is to exam the size of the differences between the groups (which we assume is due to caffeine), relative to the differences within the groups (which we assume is due to random variation .. or error)
measure of effect (or treatment) assessed by examining variance (or difference) between the groups
- This exact logic underlies virtually all statistical tests, including analysis of variance, an analysis that allows us to compare multiple means simultaneously.
Other Review Tidbits
Central Limit Theorem: For any given sampling distribution, as n increases (1) the sampling distribution becomes more normal, (2) the mean of the sampling distribution approaches m , and (3) the variance of the sampling distribution approaches s 2/n (and therefore the standard deviation approaches s /)
For next weeks quiz you should be able to (1) compute means, variances and standard deviations, (2) do an independent samples t-test, and (3) be able to demonstrate your understanding of sampling distributions and the central limits theorem.
Analysis of Variance (ANOVA) - Why?
In the caffeine study for example, we were interested in only one variable caffeine and we examined two levels of that variable, no caffeine versus some caffeine.
Alternately, we might want to test different dosages of caffeine where each dosage would now be considered a "level" of caffeine.
As youll see, as we look at more complicated ANOVAs (and the experimental designs associated with them) we may even be interested in multiple variables, each of which may have more than two levels.
For example, we might want to simultaneously consider the effect of caffeine (perhaps several different dose levels) and gender (generally just two levels) on test performance.
ANOVA - What?
Part I - A pictorial attempt at the logic
- To be filled in!
When it gets tricky:
ANOVA - Why?
Part 2 - A statistical attempt at the logic
- The textbook presents the logic in a more verbal/statistical manner, and it cant hurt to think of this in as manner different ways as possible, so, in that style:
First of all, use of analysis of variance assumes that these groups have (1) data that is approximately normally distributed, (2) approximately equal variances, and (3) that the observations that make up each group are independent.
Given the first two assumptions, only the means can be different across the groups - thus, if the variable we are interested in is having an affect on performance, we assume it will do so by affecting the mean performance level.
Lets have some data, shall we?
"Error" estimate (): One estimate we can generate makes no assumptions about the veracity (trueness or falseness) of the null hypothesis
Specifically, the variance within each group provides an estimate of s 2
Given that we have assumed that all groups have the same variance (all of which provide estimates of s 2), our best estimate of s 2 would be the mean of the group variances.
Treatment estimate (): Alternatively, if we assume the null hypothesis is true (i.e., that there is no difference between the groups), then another way to estimate the population variance is to use the variance of the means across the groups
Recall that, by the central limits theorem, the variance of our sample means equals the population variance divided by n, where n equals the number of subjects in each group
Now, the logic!
The treatment will not affect , therefore, by comparing these two estimates of the population variance, we can assess whether the treatment is having an effect:
Memory Reminder
- This is just to remind you (when you are reviewing your notes for an exam or whatever) that we spent a class with the following demo.
- I "randomly" sampled three groups of 5 people from the class and asked them their age. Since these groups all came from the same population, we have no reason to expect a difference in their mean ages. We took that data at calculated and . As expected, these two estimates of were approximately equal.
- I then added twenty years to each of the subjects in the third group (under the pretense that they were a night-school class). Now when we calculated and we found that got much larger whereas didnt change.
- The purpose of that demo was to show that is sensitive to differences in the means whereas is not, so the ratio of the two can be used to assess whether there is a difference between the means or not.
Taking a Step Backwards
Steve note divergences from the text
- In the computations we did for our demo and in our "Logic of ANOVA" section, we went straight to calculating estimates of MSwithin () and MStreat ().
- Calculate a SSwithin, SStreat, and SStotal
- Calculate a dfwithin, dftreat and dftotal
- By dividing each SS by its relevant df, we then arrive at MSwithin and MStreat (and MStotal)
- Then we divide MStreat by MSwithin to get our F-ratio, which we then use for hypothesis testing
In fact, there is nothing wrong with doing it this way and we could go on to calculate the ratio of these two things (which we will call an F-ratio).
However, for reasons that I will try to convince you are important, when we actually do ANOVAs we usually dont go straight to mean squared estimates, but instead take the following steps:
Step 1 - Sums of Squares (SS)
- Recall that when calculating the variance, we talked about two formulas that could be used:
Thus, the sum of squares is simply a measure of the sum of the squared deviations of observations from some mean:
Back to our caffeine data:
SSwithin
- To get SSwithin, the first thing we need to do is calculate a S X2 for each group.
Once we have them, we then calculate the sum of squares for each group using the computational formula:
SSwithin = SS1+SS2+SS3 = 781.67+538.92+656.92 = 1977.50
SStreat
- Calculating SStreat is easier since we only have to worry about the means. Basically, all we need are our three means and the squares of those means
SStotal
- The sum of squares total is simply the sum of squares of all of the data points, ignoring the fact that there are separate groups at all.
An easy way to get this is to just add up the S X and the S X2 for the groups:
Degrees of Freedom
- OK, so now we have our three sum of squares, step two is to figure the appropriate degrees of freedom for each.
Mean Squared Estimates
F-ratio
ANOVA source (or summary) tables
- Once all these values have been calculated, they are typically presented in what is called an analysis of variance source (or summary) table
The table for our data would look like this:
Source | df | SS | MS | F |
Treatment | 2 | 477.72 | 238.86 | 3.99 |
Within | 33 | 1977.50 | 59.92 | |
Total | 35 | 2455.22 |
OK, now what?
If there is really no effect of caffeine on performance, what is the probability of observing an F-ratio as large as 3.99
More specifically, is that probability less that our chosen level of alpha (e.g., .05)
Note: Ignore sections 11.5 (computer solutions) and 11.6 (derivations of the analysis of variance)
Unequal Sample Size
- When the sample sizes in the various conditions are not equal, then the calculation of the SStreat must be done differently to take into account this inequality.
As an example, consider the stoned rats example described in the textbook on page 319
Rats are given either 0, 0.1, 0.5, 1, or 2 micrograms (m g) of THC and then their activity level is assessed. Does THC affect level of activity?
The entire dataset is presented on page 320 of the text. For our purposes, we need only know the following:
Thus, the grand mean is 45.96
SSwithin
However, since we already have the variances (which are simply SS/n-1), we could get the SSs by multiplying by each variance by n-1
SSwithin = 10095.67
SStreat
We are going to calculate SStreat a little differently from before. Specifically, we are going to take each group mean, subtract the grand mean from it, square that difference, then multiply the result by the n for that group. We will then sum up all these weighted squared deviations.
SStotal
The Rest - Source Table
From here, we can do the rest in a source table:
Source | df | SS | MS | F |
Treatment | 4 | 4192.55 | 1048.14 | 4.36 |
Within | 42 | 10095.67 | 240.37 | |
Total | 46 | 14288.22 |
- Assuming an alpha of .05
Violation of Assumptions
- The textbook discusses this issue in detail and offers a couple of solutions (including some really nasty formulae) for what to do when the variances of the groups are not homogeneous.
Transformations
Read through that section just to get an idea of the possibilities and why (and when) they might be used. Dont worry about details
Magnitude of Experimental Effect
We will discuss two ways of measuring this, Eta-Squared (h 2) and Omega-Squared (w 2).
Eta-Squared (h 2)
So, according to the PRE logic, if we had no idea which group a score was in, our best estimate of the value of that score would be the mean, and the error of the estimate would be reflected by SStotal.
However, when we know the group a subject is in, now our best estimate of their value would be the group mean, and the error in that estimate would be reflected by SSwithin.
Therefore, by knowing the treatment group a subject is in, we have reduced the error by an amount equal to SStotal - SSwithin. Since SStotal equals SSwithin plus SStreat, the difference between SStotal and SSwithin equals SStreat.
So, we have reduced the estimate of error by an amount equal to SStreat
If we then express these reduction in error in percentage form (and call it h 2), we arrive at the following equation:
Omega-Squared (w 2)
Applying it to the stoned rats data:
Power for One-Way ANOVAs
- Type I error is the probability of rejecting the null hypothesis when it is really true
Type II error is the probability of failing to reject a null hypothesis that is really false
The probability of making a type II error is denoted as
Power is the probability of rejecting the null hypothesis, when it is indeed false (i.e., the probability of concluding the means come from different distributions when they really do) .. Power = 1 -
- Assuming H0 is in fact true, then in terms of distributions:
- (Note, these distributions are very poor depictions of the shape of the F distribution, it is much more positively skewed)
So, even when H0 is in fact true (i.e., when the samples you have really do come from distributions with different means), there is still a chance than when you run an ANOVA, you will fail to detect a significant difference between the means.
Thus, power reflects your probability of finding a significant difference between the means when the samples really do come from distributions with different means.
See textbook example (p. 336) about the stoned rats experiment if this is still unclear to you
The text goes into some detail about estimating the mean of the F distribution if H0 really is false leading to equations like:
Calculating Power
Then you can calculated power via the following steps
Step 1: Calculate
Step 2: Convert to
Step 3: Get associated value for ncF table
Example
Group 1: 45 Group 2: 65 Group 3: 85
Further, say we expect the MSerror, based on previous studies, to be about 500
OK, we are planning to run 10 subjects per group, how much power would we have to reject the null if it were really false in the manner we expect?
Step 1: Calculate
Step 2: Convert to
Step 3: Get associated value for ncF table
Example Continued
Power = 1 - = 1 - 0.11 = 0.89
Lets say we would have been happy with a power of around .70 to .75, how many fewer subjects per group could we run?
Working backwards, if we still assume 2 & 20 dfs, then a of .26 (power of 0.74) occurs when = 1.80
0 comments:
Post a Comment