Open Mind Tree: Statistics Chapter 11

Chapter 11

Simple Analysis of Variance

T-Test reminder:

During PsyB07 we talked about analyses that could be conducted to test whether pairs of means were significantly different.

For example, consider an experiment in which we are testing whether using caffeine improves final marks on an exam. We might have two groups, one group (say 12 subjects) who is given normal coffee while they study, another group (say also 12 subjects) who is given the same amount of decaffeinated coffee.
We could now look at the exam marks for those students and compare the means of the two groups using a "between-subjects" (or independent samples) t-test.

Sub	Caf(X)	Decaf(Y)	X²	Y²
1	72	68	5184	4624
2	65	74	4225	5476
3	68	59	4624	3481
4	83	61	6889	3721
5	79	65	6241	4225
6	92	72	8464	5184
7	69	80	4761	6400
8	74	58	5476	3364
9	78	65	6084	4225
10	83	60	6889	3600
11	88	78	7744	6084
12	71	75	5041	5625
å =	922	815	71622	56009
Mean	78.83	67.92
s²	71.06	59.72
s	8.43	7.73

The critical point of the previous example is the following:

The basic logic for testing whether or not two means are different is to exam the size of the differences between the groups (which we assume is due to caffeine), relative to the differences within the groups (which we assume is due to random variation .. or error)

measure of effect (or treatment)
assessed by examining variance
(or difference) between the groups

measure of random variation (or error)

assessed by examining variance

within the groups

This exact logic underlies virtually all statistical tests, including analysis of variance, an analysis that allows us to compare multiple means simultaneously.

Other Review Tidbits

Sampling Distribution: The notion of a sampling distribution is that if we had some population, we could repeatedly sample groups of size n from the population, calculate some statistic (e.g., mean) for each sample, then plot the distribution of that statistic (e.g., the sampling distribution of the mean).
Central Limit Theorem: For any given sampling distribution, as n increases (1) the sampling distribution becomes more normal, (2) the mean of the sampling distribution approaches m , and (3) the variance of the sampling distribution approaches s ²/n (and therefore the standard deviation approaches s /)
For next weeks quiz you should be able to (1) compute means, variances and standard deviations, (2) do an independent samples t-test, and (3) be able to demonstrate your understanding of sampling distributions and the central limits theorem.

Analysis of Variance (ANOVA) - Why?

The purpose of analysis of variance is to let us ask whether means are different when we have more than just two means (or, said another way, when our variable has more than two levels).
In the caffeine study for example, we were interested in only one variable caffeine and we examined two levels of that variable, no caffeine versus some caffeine.
Alternately, we might want to test different dosages of caffeine where each dosage would now be considered a "level" of caffeine.
As youll see, as we look at more complicated ANOVAs (and the experimental designs associated with them) we may even be interested in multiple variables, each of which may have more than two levels.
For example, we might want to simultaneously consider the effect of caffeine (perhaps several different dose levels) and gender (generally just two levels) on test performance.

ANOVA - What?

Part I - A pictorial attempt at the logic

To be filled in!

When it gets tricky:

To be filled in!

The Question is, is the variance between the groups significantly bigger than the variance within the groups to allow us to conclude that the between group differences are more than just random variation?

ANOVA - Why?

Part 2 - A statistical attempt at the logic

The textbook presents the logic in a more verbal/statistical manner, and it cant hurt to think of this in as manner different ways as possible, so, in that style:

Lets say we were interested in testing three doses of caffeine; none, moderate and high
First of all, use of analysis of variance assumes that these groups have (1) data that is approximately normally distributed, (2) approximately equal variances, and (3) that the observations that make up each group are independent.
Given the first two assumptions, only the means can be different across the groups - thus, if the variable we are interested in is having an affect on performance, we assume it will do so by affecting the mean performance level.

Lets have some data, shall we?

Sub	High	Moderate	None
1	72	68	68
2	65	80	74
3	68	64	59
4	83	65	61
5	79	69	65
6	92	79	72
7	69	80	80
8	74	63	58
9	78	69	65
10	83	70	60
11	88	83	78
12	71	75	75
å=	922	865	815
Mean	78.83	72.08	67.92
s²	71.06	48.99	59.72
s	8.43	7.00	7.73

From this data, we can generate two estimates of the population variance s ²
"Error" estimate (): One estimate we can generate makes no assumptions about the veracity (trueness or falseness) of the null hypothesis
Specifically, the variance within each group provides an estimate of s ²
Given that we have assumed that all groups have the same variance (all of which provide estimates of s ²), our best estimate of s ² would be the mean of the group variances.

This estimate of the population variance is sometimes called the mean squared error (MS_e) or the mean squared within (MS_within)
Treatment estimate (): Alternatively, if we assume the null hypothesis is true (i.e., that there is no difference between the groups), then another way to estimate the population variance is to use the variance of the means across the groups
Recall that, by the central limits theorem, the variance of our sample means equals the population variance divided by n, where n equals the number of subjects in each group

Therefore, employing some algebra:

This is also called the mean squared treatment (MS_treat) or mean squared between (MS_between)

Now, the logic!

OK, so if the null hypothesis really is true and there is no difference between the groups, then these two estimates will be the same:

However, if the treatment is having an effect, this will inflate

as it will not only reflect variance due to random variation, but also variance due to the treatment (or variable).
The treatment will not affect

, therefore, by comparing these two estimates of the population variance, we can assess whether the treatment is having an effect:

measure of effect (or treatment)

measure of random variation (or error)

Memory Reminder

This is just to remind you (when you are reviewing your notes for an exam or whatever) that we spent a class with the following demo.

I "randomly" sampled three groups of 5 people from the class and asked them their age. Since these groups all came from the same population, we have no reason to expect a difference in their mean ages. We took that data at calculated and . As expected, these two estimates of were approximately equal.
I then added twenty years to each of the subjects in the third group (under the pretense that they were a night-school class). Now when we calculated and we found that got much larger whereas didnt change.

The purpose of that demo was to show that is sensitive to differences in the means whereas is not, so the ratio of the two can be used to assess whether there is a difference between the means or not.

Taking a Step Backwards

Steve note divergences from the text

In the computations we did for our demo and in our "Logic of ANOVA" section, we went straight to calculating estimates of MS

_within

(

) and MS

_treat

(

In fact, there is nothing wrong with doing it this way and we could go on to calculate the ratio of these two things (which we will call an F-ratio).

However, for reasons that I will try to convince you are important, when we actually do ANOVAs we usually dont go straight to mean squared estimates, but instead take the following steps:

Calculate a SS_within, SS_treat, and SS_total
Calculate a df_within, df_treat and df_total
By dividing each SS by its relevant df, we then arrive at MS_within and MS_treat (and MS_total)
Then we divide MS_treat by MS_within to get our F-ratio, which we then use for hypothesis testing

Step 1 - Sums of Squares (SS)

Recall that when calculating the variance, we talked about two formulas that could be used:

Each of these formulas can be thought of as simply being the sum of squares divided by the degrees of freedom
Thus, the sum of squares is simply a measure of the sum of the squared deviations of observations from some mean:

OK, so rather than directly calculating the MS_within and MS_treat (which are actually estimates of the variance within and between groups), we can calculate SS_within and SS_treat.

Back to our caffeine data:

Sub	High	Moderate	None
1	72	68	68
2	65	80	74
3	68	64	59
4	83	65	61
5	79	69	65
6	92	79	72
7	69	80	80
8	74	63	58
9	78	69	65
10	83	70	60
11	88	83	78
12	71	75	75
å X	922	865	815
å X²	71622	62891	56009
SS_within	781.67	538.92	656.92
Mean	76.83	72.08	67.92

SS_within

To get SSwithin, the first thing we need to do is calculate a S X

for each group.

For example, for Group 1, the S X² would equal (72²+65²+.+88²+71²) = 71622
Once we have them, we then calculate the sum of squares for each group using the computational formula:

For example, for Group 1, the math would be:

To get SS_within we then sum all the SS_withins

SS_within = SS₁+SS₂+SS₃ = 781.67+538.92+656.92 = 1977.50

SS_treat

Calculating SStreat is easier since we only have to worry about the means. Basically, all we need are our three means and the squares of those means

We then calculate the sum of the means, and the sum of the squared means

Now we can calculate the SS using the same formula as before:

Once again, because we are dealing with means and not observations, we need to multiply this number by the n that went into each mean to get the real SS_treat

SS_treat = 39.81 * 12 = 477.72

SS_total

The sum of squares total is simply the sum of squares of all of the data points, ignoring the fact that there are separate groups at all.

Thus, to calculate it, you will need the sum of all the data points, and the sum of all the data points squared.
An easy way to get this is to just add up the S X and the S X2 for the groups:
S X = S X₁+ S X₂+ S X₃= 922 + 865 + 815 = 2602S X² = S X²₁+ S X²₂+ S X²₃= 71622 + 62891 + 56009= 190522Then using the same old SS formula:

If all is right in the world, then SS_total should equal SS_within+ SS_treat. For us, it does (WooHoo!)

Degrees of Freedom

OK, so now we have our three sum of squares, step two is to figure the appropriate degrees of freedom for each.

Heres the "formulae" and hopefully some explanation from Steve J df_within = k(n - 1)df_treat = k - 1df_total = N - 1where k = the number of groups, n = the number of subjects within each group, and N = the total number of subjects

Mean Squared Estimates

MS estimates for treatment and within are calculated by dividing the appropriate sum of squares by its associated degrees of freedom

F-ratio

We then compute an F-ratio by dividing the MS_treat by the MS_within

ANOVA source (or summary) tables

Once all these values have been calculated, they are typically presented in what is called an analysis of variance source (or summary) table

The table for our data would look like this:

Source	df	SS	MS	F
Treatment	2	477.72	238.86	3.99
Within	33	1977.50	59.92
Total	35	2455.22

OK, now what?

Now we are finally ready to get back to the notion of hypothesis testing that is, we are not ready to answer the following question:
If there is really no effect of caffeine on performance, what is the probability of observing an F-ratio as large as 3.99
More specifically, is that probability less that our chosen level of alpha (e.g., .05)
Note: Ignore sections 11.5 (computer solutions) and 11.6 (derivations of the analysis of variance)

Unequal Sample Size

When the sample sizes in the various conditions are not equal, then the calculation of the SS

_treat

must be done differently to take into account this inequality.

Given there are usually a small number of means, the easiest and most understandable way to do this is to return to the numerator of definitional formula for variance, and adjust it to weight for n

where j = the group number and cycles for 1 to the number of groups
As an example, consider the stoned rats example described in the textbook on page 319
Rats are given either 0, 0.1, 0.5, 1, or 2 micrograms (m g) of THC and then their activity level is assessed. Does THC affect level of activity?
The entire dataset is presented on page 320 of the text. For our purposes, we need only know the following:
m g of THC

0.0

0.1

0.5

1.0

2.0

S X	340	508	543	388	381
n	10	10	9	8	10
Mean	34.00	50.80	60.33	48.50	38.10
Var	204.49	338.19	122.55	335.62	209.09

The overall sum of X is 2160, the overall n is 47
Thus, the grand mean is 45.96

SS_within

To calculate SS_within, we could find the S X² for each group, then use the SS formula to get an SS for each group, then add them up
However, since we already have the variances (which are simply SS/n-1), we could get the SSs by multiplying by each variance by n-1

So,.....SS₁ = 204.49 * 9 = 1840.41SS₂ = 338.19 * 9 = 3043.71SS₃ = 122.55 * 8 = 980.40SS₄ = 335.62 * 7 = 2349.34SS₅ = 209.09 * 9 = 1881.81
SS_within = 10095.67

SS_treat

We are going to calculate SS_treat a little differently from before. Specifically, we are going to take each group mean, subtract the grand mean from it, square that difference, then multiply the result by the n for that group. We will then sum up all these weighted squared deviations.

SS_total

Given we have computed SS_treat and SS_within, we can cheat and just calculate SS_total as the sum of them:

SS_total = SS_treat + SS_within= 4192.55 + 10095.67= 14288.22

The Rest - Source Table

From here, we can do the rest in a source table:

Source	df	SS	MS	F
Treatment	4	4192.55	1048.14	4.36
Within	42	10095.67	240.37
Total	46	14288.22

Assuming an alpha of .05

F_crit(4,42) = approx 2.61, therefore reject H₀

Violation of Assumptions

The textbook discusses this issue in detail and offers a couple of solutions (including some really nasty formulae) for what to do when the variances of the groups are not homogeneous.

What I want you to know is the following:

If the biggest variance is more than 4 times larger than the smallest variance, you may have a problem

There are things that you can do to calculate an F is the variances are heterogeneous

Transformations

The textbook talks about various ways that the raw data can be transformed if the distribution is not normal or if the variances are not homogeneous
Read through that section just to get an idea of the possibilities and why (and when) they might be used. Dont worry about details

Magnitude of Experimental Effect

Once we know that the treatment is having some significant effect, we sometimes want to get some estimate of the size of the experimental effect
We will discuss two ways of measuring this, Eta-Squared (h ²) and Omega-Squared (w ²).

Eta-Squared (h ²)

The textbook suggests two ways of thinking about this measure, a regression way and a "percent reduction in error" (PRE) way. I prefer the second, but you may want to read through the regression stuff too.
So, according to the PRE logic, if we had no idea which group a score was in, our best estimate of the value of that score would be the mean, and the error of the estimate would be reflected by SS_total.
However, when we know the group a subject is in, now our best estimate of their value would be the group mean, and the error in that estimate would be reflected by SS_within.
Therefore, by knowing the treatment group a subject is in, we have reduced the error by an amount equal to SS_total - SS_within. Since SS_total equals SS_within plus SS_treat, the difference between SS_total and SS_within equals SS_treat.
So, we have reduced the estimate of error by an amount equal to SS_treat
If we then express these reduction in error in percentage form (and call it h ²), we arrive at the following equation:

As an example, in our stoned rats study:

Thus, knowledge of which treatment group the rat was in reduced the error of our estimate by 29%Another way this is said is that the treatment accounted for 29% of the total error in the experiment

Omega-Squared (w ²)

Although h ² is an intuitive way to estimate the size of an experimental effect, it is a biased measure when applied to sample dataOften, a better way to calculate the magnitude of an experimental effect if to calculate w ² Heres the formula (note, I will only be focusing on the fixed model):

The textbook provides no explanation of the formula, so neither do I J suffice it to say you will be given this formula if you need it on an exam
Applying it to the stoned rats data:

Notice that this estimate is lower. Thats because it better corrects for the bias that occurs using sample data.

Power for One-Way ANOVAs

	H₀ true	H₀ false
Reject H₀	Type I error	Correct (Power)
Fail to Reject H₀	Correct	Type II error

Type I error is the probability of rejecting the null hypothesis when it is really true

The probability of making a type I error is denoted as

Type II error is the probability of failing to reject a null hypothesis that is really false
The probability of making a type II error is denoted as

Power is the probability of rejecting the null hypothesis, when it is indeed false (i.e., the probability of concluding the means come from different distributions when they really do) .. Power = 1 -

Assuming H

₀

is in fact true, then in terms of distributions:

(Note, these distributions are very poor depictions of the shape of the F distribution, it is much more positively skewed)

So, even when H

₀

is in fact true (i.e., when the samples you have really do come from distributions with different means), there is still a chance than when you run an ANOVA, you will fail to detect a significant difference between the means.

Thus, power reflects your probability of finding a significant difference between the means when the samples really do come from distributions with different means.

See textbook example (p. 336) about the stoned rats experiment if this is still unclear to you

The text goes into some detail about estimating the mean of the F distribution if H

₀

really is false leading to equations like:

Im not worried about you knowing or understanding this formula to any depth, but I do want you to know how to calculate power, and use it to find appropriate sample sizes

Calculating Power

If you have some idea about what you expect your means and MS_error to be (either from previous work, data you are currently collecting, or an educated guess)
Then you can calculated power via the following steps note: these calculations are easiest assuming equal n. since there is a lot of voodoo already involved in power calculations, assuming equal n is no big deal

Step 1: Calculate

Step 2: Convert to

Step 3: Get associated value for ncF table

Power = 1 -

Example

Say we are planning on running an experiment to test whether feedback from a first test, effects performance on a second. Subjects come in and do a test. We then tell them either (1) that they performed below average, (2) that they performed average, or (3) that the performed above average. We then give them a second test. We expect that positive feedback will help performance, so we expect the means to look something like:

Group 1: 45       Group 2: 65      Group 3: 85

So, the overall mean (assuming equal n) is 65
Further, say we expect the MS_error, based on previous studies, to be about 500
OK, we are planning to run 10 subjects per group, how much power would we have to reject the null if it were really false in the manner we expect?

Step 1: Calculate

Step 2: Convert to

Step 3: Get associated value for ncF table

check out ncF table with 2 df_treat, 27 df_error

Example Continued

So, the

associated with our study is 0.11
Power = 1 -

= 1 - 0.11 = 0.89
Lets say we would have been happy with a power of around .70 to .75, how many fewer subjects per group could we run?
Working backwards, if we still assume 2 & 20 dfs, then a

of .26 (power of 0.74) occurs when

= 1.80

Thus, if we had as few as 6 subjects per group, we would still have a power somewhere around 0.75

Open Mind Tree

Statistics Chapter 11

Chapter 11

Simple Analysis of Variance

Other Review Tidbits

Analysis of Variance (ANOVA) - Why?

When it gets tricky:

Part 2 - A statistical attempt at the logic

0 comments:

Popular Posts

Visitors

Archives

Infolinks In Text Ads

Featured Posts

Blogger Tips