Test of Independence
The chi square test for independence
Two variables are independent if, for all cases in the sample, the classification of a case into a particular category of one variable has no effect on the probability that the case will fall into any particular category of the second variable.
To conduct a chi square test, the variables must first be organized into a bivariate table, a.k.a. contingency tables.
A contingency table is used to investigate whether two traits or characteristics are related. Each observation is classified according to two criteria. We use the usual hypothesis testing procedure.
Your data must meet the following requirements:
We can use the chi-square statistic to formally test for a relationship between two nominal-scaled variables. To put it another way, Is one variable independent of the other?
Ford Motor Company runs an assembly plant in Dearborn, Michigan. The plant operates three shifts per day, 5 days a week. The quality control manager wishes to compare the quality level on the three shifts. Vehicles are classified by quality level (acceptable, unacceptable) and shift (day, afternoon, night). Is there a difference in the quality level on the three shifts? That is, is the quality of the product related to the shift when it was manufactured? Or is the quality of the product independent of the shift on which it was manufactured?
A sample of 100 drivers, who were stopped for speeding violations, was classified by gender and whether or not the drivers were wearing a seat-belt when stopped. For this sample, is wearing a seat-belt related to gender?
Does a male released from federal prison make a different adjustment to civilian life if he returns to his hometown or if he goes elsewhere to live? The two variables are adjustment to civilian life and place of residence. Note that both variables are measured on the nominal scale.
The idea of independence can be seen in bivariate or a.k.a. contingency tables
You need to compute a test statistic:
Formula 1 for Chi Square (obtained - test statistic):
\[ \chi^{2} = \sum \frac{(O - E)^{2}}{E} \]
The Chi Square statistic is
the sum over all cells of
the squared difference between the obtained value and the expected value, which is then
divided by the expected frequency.
\[ E_{i,j} = \frac{(Row_{i}\ Total) * (Column_{j}\ Total)}{Grand\ Total} \]
The following are situations where we can use the Chi-Square test:
The null hypothesis (\(H_{0}\)) and alternative hypothesis (\(H_{1}\)) of the Chi-Square Test of Independence can be expressed in two different but equivalent ways:
\(H_{0}\): “[Variable 1] is independent of [Variable 2]”
\(H_{1}\): “[Variable 1] is not independent of [Variable 2]”
\(H_{0}\): “[Variable 1] is not associated with [Variable 2]”
\(H_{1}\): “[Variable 1] is associated with [Variable 2]”
Therefore, when we reject the null hypothesis, we believe that the two variables affect each other, even though the test does not tell us how this association occurs between categories.
Suppose that the state legislature is considering a bill to lower the legal drinking age to 18. A political scientist is interested in whether there is a relationship between party affiliation and attitude toward the bill. A random sample of 150 registered republicans and 200 registered democrats are asked their opinion about the proposed bill.
Calculating the Expected Value for a Particular Cell
Ex: \(Cell_{11}\ =\ \frac{150 * 130}{350}\ =\ 55.7\)
Numbers in Black are obtained (\(f_{o}\))
Numbers in Purple are expected (\(f_{e}\))
The calculated value for the chi square statistic is compared to the critical value found in Table.
Note: The distribution of the Chi Square Statistic is not normal and the critical values are only on one side. If the obtained values are close to the expected value, then the chi square statistic will approach 0. As the obtained value is different from the expected, the value of chi square will increase. This is reflected in the values found in Table.
The Degrees of Freedom for the Chi Square Test of Independence is the product of the (number of rows minus 1) times the (number of columns minus 1).
\(\chi^{2}=5.62+0.27+3.11+4.22+0.20+2.33=15.75\)
In our study, we had two rows (Republicans and Democrats) and three columns (For, Undecided, Against). Therefore, the degrees of freedom for our study is (2-1)(3-1) = 1(2) = 2. Using an a of .05, the critical value from Table would be 5.991 Since our calculated chi square is 15.75, we conclude that there IS a relationship between political party and opinion on lowering the drinking age, thereby rejecting the Null Hypothesis
In the sample dataset, respondents were asked their gender and whether or not they were a cigarette smoker. There were three answer choices: Nonsmoker, Past smoker, and Current smoker. Suppose we want to test for an association between smoking behavior (nonsmoker, current smoker, or past smoker) and gender (male or female) using a Chi-Square Test of Independence (we’ll use α = 0.05).
Hypothesis:
\(H_{0}\): Gender is not associated with Smoking
\(H_{1}\): Gender is associated with Smoking
BEFORE THE TEST
RUNNING THE TEST
Syntax
NOTE: Refer to Tab 6 for output tables
Decision
Since the p-value is greater than our chosen significance level (\(\alpha\) = 0.05), we do not reject the null hypothesis. Rather, we conclude that there is not enough evidence to suggest an association between gender and smoking.
Based on the results, we can state the following:
- No association was found between gender and smoking behavior (\(\chi^{2}_{2}\)> = 3.171, \(p\) = 0.205).
Solution