Wednesday, December 21, 2011

Statistics Chapter 6


Chapter 6

Categorical Data and Chi-Square




Reminder about hypothesis testing:

  • Assume what you believe (H1) is wrong
    • *Construct H0 and accept is as a default
  • Show that some event is of sufficiently low probability given H0 ***
  • Reject H0
  • *** In order to do this, we need to know the distribution associated with H0, because we use that distribution as the basis for our probability calculation.

Z-Test:

    Use when we have aquired some dataset, then want to ask questions concerning the probability of certain specific data values (e.g., do certain values seem extreme?)
    In this case, the distribution associated with H0 is described by ch6-1 and S2 because the data points reflect a continuous variable that is normally distributed.

Binomial Test:

    Use when we know the probability that some two-alternative event will occur (assuming H0), and want to ask whether some specific observed outcome seems bizarre, given this probability
    In this case, the distribution associated with H0 can be derived using ch6-2 because the data reflects a discreet two-alternative variable.

Chi-Square ( ch6-3 ) Test:

    The Chi-square test is a general purpose test for use with discrete variables
    It has a number of uses, including the detection of bizzare outcomes given some a priori probability for binomial situation, and for multinomial situations
    In addition, it allows us to go beyond questions of bizarreness, and move into the question of whether pairs of variables are related. For example:
Legalize
Do No Legalize
Female
9
23
Male
9
7
    It does so by mapping the discreet variables unto a continuous distribution assuming H0, the chi-square distribution

The Chi-Square Distribution:

    Let's reconsider a simple binomial problem. Say, we have a batter who hits .300 [i.e., P(Hit)=0.30], and we want to know whether is is abnormal for him to go 6 for 10 (i.e., 6 hits in 10 at bats)
    Hopefully, you know how to do this using a binomial test
    A different way is to put the values into a contingency table as follows,
Hits
Outs
Observed
6
4
Expected?
    then consider the distribution of the following formula given H0:
ch6-4

In-Class Example:

Attempt
Expected (E)
Observed (0)
ch6-5
1
3 / 7
/
2
3 / 7
/
3
3 / 7
/
4
3 / 7
/
5
3 / 7
/
6
3 / 7
/
7
3 / 7
/
8
3 / 7
/
9
3 / 7
/
10
3 / 7
/
    Note that while the observed values are discreet, the derived score is continuous.
    If we calculated enough of these derived scores, we could plot a frequency distribution which would be a chi-square distibution with 1 degree of freedom or ch6-6 .
    Given this distribution and appropriate tables, we can then find the probability associated with any particular ch6-7 value.
Continuing the Baseball Example:
Hits
Outs
Observed
6
4
Expected
3
7
ch6-8
ch6-9
ch6-10
ch6-11
    So, if the probability of obtaining a ch6-12 of 4.29 or greater is less than ch6-13, then the observed outcome can be considered bizarre (i.e., the result of something other than a .300 hitter getting lucky).

The ch6-14 Table and Degrees of Freedom:

    There is one hitch to using the chi-square distribution when testing hypotheses ... the chi-square distribution is different for different numbers of degrees of freedom (df)
    This means that in order to provide the areas associated with all values of ch6-15 for some number of df, we would need a complete table like the z-table for each level of df
    Instead of doing that, the ch6-16 table only shows critical values as Steve will now illustrate using the funky new overhead thingy
    Our example question has 1 df. Assuming we are using an ch6-17 level of .05, the critical ch6-18 value for rejecting the null is 3.84
    Thus, since our obtained ch6-19 value of 4.29 is greater than 3.84, we can reject H0 and assume that hitting 6 of 10 reflects more than just chance performance

Going a Step Further:

    Suppose we complicate the previous example by taking walks and hit by pitches into account. That is, suppose the average batter gets a hit with a probability of 0.28, gets walked with a probability of .08, gets hit by a pitch (HBP) with a probability of .02, and gets out the rest of the time
    Now we ask, can you reject H0 (that this batter is typical of the average batter) given the following outcomes from 50 at bats?
Hit
Walk
HBP
Out
Observed
12
3
8
27
Expected
      1) Calculate expected values (Np)
      2) Calculate ch6-20
      3) Figure out the appropriate df (C-1)
      4) Find ch6-21 and compare ch6-22 to it

Using Chi-Square to test Independence:

    So far, all the tests have been to assess whether some observation or set of observations seems out-of-line with some expected distribution
    However, the logic of the chi-square test can be extended to examine the issue of whether two variables are independent (i.e., not systematically related) or dependent (i.e., systematically related)
    Consider the following data set again:
Legalize
Do not Legalize
Female
9
23
Male
9
7
    Are the variables of gender and opinion concerning the legalization of marijuana independent?
Legalize
Do Not Legalize
Total
Female
9
23
32
Male
9
7
16
Total
18
30
48
    From the marginal totals we can calculate:
          P(Female) = 32/48 = 0.750
          P(Male) = 16/48 = 0.250
          P(Legalize) = 18/48 = 0.375
          P(Do Not Legalize) = 30/48 = 0.625
    If these two variables are independent, then by the multiplicative law, we expect that:
          P(Female,Legalize) = P(Female) x P(Legalize)
          = .75 x .375
          = .28125
          EV(Female, Legalize) = Np
          = 48 x .28125 = 13.5
    If we do this for all four cells, we get:
Legalize
Do Not Legalize
Total
Female
9
Expect: 13.5
23
Expect: 22.5
32
Male
9
Expect: 4.5
7
Expect: 7.5
16
Total
18
30
48
    Are the observed values different enough from the expected values to reject the notion that the differences are due to chance variation?

ch6-23 
ch6-24 
ch6-25 
ch6-26

Degrees of Freedom for Two-Variable Contingency Tables

    The df associated with 2 variable contingency tables can be calculated using the formula:
            df = (C-1)(R-1)
    where C is the number of columns and R is the number of rows
    This gives the seemingly odd result that a 2x2 tables has 1 df, just like the simple binomial version of the chi-square test
    However, as Steve will now show, this actually makes sense
    Thus, to finish our previous example, the ch6-27 with ch6-28 equal .05 and 1 df equals 3.84. Since our ch6-29 is bigger than that (i.e., 6.04) we can reject H0 and conclude that opinions concerning the legalization of marijuana appear different across the males and females of our sample

Assumptions of Chi-Square:

    Independence of observations
    Chi-square analyses are only valid when the actual observations within the cells are independent
    This independence of observations is different from the issue of whether the variables are independent, that is what the chi-square is testing
    You know your observations are not independent when the grand total is larger than the number of subjects
    Example: The activity level of 5 rats was tested over 4 days, producing these values:

Activity
Low
Medium
High
3
7
10
    Normality
    Use of the chi-square distribution for finding critical values assumes that the expected values (i.e., Np) are normally distributed
    This assumption breaks down when the expected values are small (specifically, the distribution of Np becomes more and more positively skewed as Np gets small)
    Thus, one should be cautious using the chi-square test when the expected values are small
    How small? This is debatable but if expected values are as low as 5, you should be worried
    Inclusion of Non-Occurrences
    The chi-square test assumes that all outcomes (occurrences and non-occurrences) are considered in the contingency table
    As an example of a failure to include a non-occurrence, see page 142 of the text

A Tale of Tails

    We only reject H0 when values of ch6-30 are larger than ch6-31
    This suggests that the ch6-32 test is always one-tailed and, in terms of the rejection region, it is
    In a different sense, however, the test is actually multiple tailed
    Reonsider the following "marking scheme" example:
Option 1
Option 2
Option 3
38
57
5
    If we do not specify how we expect the results to fall out then any outcome with a high enough ch6-33 can be used to reject H0
    However, if we specify our outcome, we are allowed to increase our ch6-34 as in the example where we can increase ch6-35 as in the example where we can increase to 0.30 if we specified the exact ordering (in advance) that was observed
Measures of Association
    The chi-square test only tells us whether two variables are independent, it does not say anything about the magnitude of the dependency if one is found to exist
    Stealing from the book, consider the following two cases, both of which produce a significant ch6-36 , but which imply different strengths of relation

Smoking Behaviour
Nonsmoker
Smoker
Male
400
100
Female
350
150

Primary Food Shopper
YesNo
Male400100
Female100400

Cramer's Phi ( ch6-37 ) - a measure of association

    There are a number of ways to quantify the strength of a relation (see sections in the text on the contingency coefficient, Phi, & Odds Ratios), but the two most relevant to psychologists is Cramer's Phi and Kappa
    Cramer's Phi can be used with any contingency table and is calculated as:
ch6-38
    Values of ch6-39 range from 0 to 1. The ch6-40 for the tables on the previous page are 0.12 and 0.60 respectively, indicating a much stronger relation in the second example

Kappa (k) - a measure of agreement

    Often, in psychology, we will ask some "judge" to categorize things into specific categories
    For example, imagine a beer brewing competition where we asked a judge to categorize beers as Yucky, OK, or Yummy
    Obviously, we are eventually interested in knowing something about the beers after they are categorized
    However, one issue that arises is the judges abilities to tell the difference between the beers
    One way around this is to get two judges and show that a given beer is reliably rated across the judges (i.e., that both judges tend to categorize things in a similar way)
    Such a finding would suggest that the judges are sensitive to some underlying qualitity of the beers as opposed to just guessing
Judge 1 (Steve)
Judge 2
Yuck!
OK
Yummy
Yuck!
3
2
3
OK
1
15
2
Yummy
0
1
3
    Note that if you just looked at the proportion of decisions that me and Judge 2 agreed on, it looks like we are doing OK:
          P(Agree) = 21/30 = 0.70 or 70%
    There is a problem here, however, because both judges are biased to judge a beer as OK such that even if they were guessing, the agreement would seem high because both would guess OK on a lot of trials and would therefore agree a lot
    Solution:
ch6-41

0 comments: