# Controlling for a Variable in Chi-Square Analysis

Suppose you obtain the following table that relates # of drownings per day to amount of icecream eaten per day (the sample size is 200 days randomly chosen from a given year) :

 Drownings Few Many Sampled IceCream eaten Little 60 40 100 Lots 40 60 100 Sampled 100 100 200

chi-square = 8, df = 1, p < 0.005

 Drownings Few Many Sampled Prop IceCream eaten Little 60 40 100 .50 Lots 40 60 100 .50 Sampled 100 100 200 Prop. .50 .50 1.00

Calculating Expected Frequencies
(given assumption of independence)

 Few Many Sampled Little .5*.5*200 = 50 .5*.5*200 = 50 100 Lots .5*.5*200 = 50 .5*.5*200 = 50 100 Sampled 100 100 200

Differences

 Drownings Few Many Sampled IceCream eaten Little 10 -10 0 Lots -10 10 0 Sampled 0 0 0

#### Difference Squared

 Drownings Few Many Sampled IceCream eaten Little 100 100 200 Lots 100 100 200 Sampled 200 200 400

Squared differences divided by expected

 Drownings Few Many Sampled IceCream eaten Little 2 2 4 Lots 2 2 4 Sampled 4 4 8

So the chi-square value is 8, the degrees of freedom = 1, and, if we look this up in the chi-square table, we find that the p-value (the significance) is < 0.05 (it is significant).

So ice-cream sales are related to drownings. The question is 'why?'. Your theory is that the reason why the amount of ice-cream is related to the number of drownings is because of a third variable, which is the temperature of the day. On hot days, people buy more ice-cream, and they also go swimming, which results in some drowings. Like this:

To test this theory, what you need to do is to control for temperature. According to the theory, if you consider only hot days, there should be no relationship between ice-cream and drownings. Similarly, if you consider only cold days, there should again be no relationship between ice-cream and drownings. The reason is that the only connection, theoretically, between ice-cream and drownings is through temperature.

So what you do is create two separate tables that relate ice-cream to drownings: one for hot days, and one for cold:

 HOT DAYS Drownings Few Many Sampled IceCream eaten Little 50 20 70 Lots 20 10 30 Sampled 70 30 100

Chi-Square = 0.227; DF = 1; P = 0.6

 COLD DAYS Drownings Few Many Sampled IceCream eaten Little 10 20 30 Lots 20 50 70 Sampled 30 70 100

Chi-Square = 0.227; DF = 1; P = 0.6

Notice that if you add the corresponding cells of the two tables, you obtain the original table for all days. Notice also that for each table, the chi-square test is non-significant. This supports your theory: when you control for temperature, the apparent relationship between ice-cream and drownings disappears. The only the reason you thought there was a relationship at all is because you were mixing together both hot and cold days.

To further test your theory, you also have to check each individual link in the model. For example, the theory says that temperature determines ice-cream sales. So that implies that a chi-square test of the relationship between temperature and sales will be significant. The theory also says that temperature determines drownings, so you need to check that those two variables are significantly associated as well.

So let me summarize. If the theory has either of the following forms,

then to test it you do the following steps:

1. Confirm that A and C are significantly associated.
2. Confirm that A and B are significantly associated
3. Confirm that B and C are significantly associated
4. Confirm that A and C are NOT significantly associated WHEN you control for B (i.e., when you hold B constant)