MCB422: Genetics Problem Solving

Chi can be hard to swallow!

The chi square test is designed to test the statistical significance of an experimental outcome. We often use the coin flipping example, but probability can be applied to numerous other situations. For example, let’s say you are interested in the inheritance of small vs. large antlers in a population of Canadian Moose. You notice that large antlers appear to be dominant to small antlers (let’s assume antler size is encoded by a single gene). You’d like to test whether or not this gene is inherited in a Mendelian fashion. In a cross between two heterozygotes, you assume a 3:1 inheritance pattern in the offspring. You are setting out to test this expectation. Before you even count your actual numbers, you state for your experiment a null hypothesis.

The null hypothesis is a statistical term that represents the standard against which your model/idea will be compared. In a genetics test, this means the gene(s) under scrutiny is/are Mendelian and the numbers you saw in your cross differ from expectations only by amounts that could occur by chance. Therefore, you should ALWAYS draw your cross before doing an experiment (make sure you know the parental genotypes) and write down what you would expect. Then and only then, count a predetermined number and make comparisons.

Ok, you have somehow engineered matings between some Canadian Moose that are heterozygous for antler size. When you count up the offspring in the F1, this is what you find:

Moose with large antlers: 134
Moose with small antlers: 66

Based on these numbers, it looks pretty close to Mendelian inheritance, right? Well, it may look that way, but we can’t jump to conclusions. We must first determine how significant these numbers are. Significance can be a very subjective thing, so statisticians came up with the chi square (c2) test. The chi square test allows the experimenter to determine whether or not he/she can say there was anything other than chance acting on this scenario. At the end of the chi square, we will retain or reject the null hypothesis. If we retain the null hypothesis, we can say these results could have been caused by chance (but most scientists prefer the phrase “inconclusive result”). If we reject the null hypothesis, we can say that it is unlikely that chance alone could not have resulted in these numbers and, in our moose scenario, there is probably some non-Mendelian force in action. Once you have rejected the null hypothesis, you are allowed to entertain (though have not proven) an alternate hypothesis that would better explain the numbers you saw.

So without getting really involved in the derivation of the chi square test, here is what statisticians came up with:

chi^2 = SUM of all (observed-expected)2/expected

We can apply this equation to our values to obtain a chi square value:

(134-150)2/150 + (66-50)2/50 = 1.7 + 5.1 = 6.8

WOW! A chi square value of 6.8!!!! Ummm, what does that mean? Well, we need to compare this value before we draw any great conclusions. We will be looking to a chi square table to give us the groundbreaking answer. Before we do this, you’ll need to understand what the numbers on a chi square table mean. In the column of the table, we have the degrees of freedom listed. The degrees of freedom, without getting too complicated, is the number of possible outcomes minus one. In our experiment, we had two possible outcomes for our population (large or small antlers), which gives us one degree of freedom. It is the same for calculating degrees of freedom in a dihybrid cross, except in this case you will probably be looking at more possible phenotypic outcomes, in which case you’ll usually have 2 or 3 degrees of freedom.

The chi square table lists another value on the top row, it is the a-value. You may know you need to look at the 0.05 a-value, but why? The a-value leads you to your P-value. P stands for probability, and the P-value is the probability that the test outcome would take a value as extreme or more extreme than that observed. The smaller the P-value is, the stronger the evidence against your null hypothesis. We choose to always look at the 0.05 a-value. By doing this, we are requiring that the data give evidence against the null hypothesis so strong that it would happen no more than 5% of the time when the null hypothesis is true. I think the best way to understand the a-value and P-value is with a bell curve. When you measure a population of outcomes, you often end up with a bell curve:

This graph represents the possible outcomes from ANY experiment, in our case, a Mendelian cross. The peak of the bell curve represents the number of progeny that give a Mendelian ratio. The possibilities where the progeny show deviations from the ratios radiate out from this central point. Notice how you still have numerous cases where there are slight deviations from Mendelian numbers, but the possibilities become fewer and fewer the farther you go from the central point. Notice the shaded area under the graph, this represents the cases of Mendelian ratios as well as some of the deviations from Mendeilan ratios, this is also represented by your P-value. The non-shaded areas represent outliers, or cases that are “out of the realm of possibility”. The a-value at 0.05 represents the line or barrier between the shaded and non-shaded regions. If your chi square value is less than your P-value at 0.05, you have a case that lies within the shaded area and you’d retain the null hypothesis. If your chi square value is greater than the P-value at 0.05, then you have a case that lies in the non-shaded region and you would reject the null hypothesis.

In case you haven’t figured it out by now, we can reject the null hypothesis in our experiment regarding the Mendelian inheritance of large vs. small antlers. The P-value for 1 degree of freedom at an a-value of 0.05 is 3.84. When we compare our chi square value, 6.8 is greater than 3.84…so we can say that it is unlikely that Mendelian inheritance is responsible for the numbers we have seen. In other words, the null hypothesis has done a poor job of generating the data we see, and this allows us to entertain an alternate hypothesis. HOWEVER, this test alone does not allow us to prove that alternate hypothesis!

There are several caveats of the chi square that we should acknowledge. First of all, the P-value indicates probability, so it can only state that it is XXX unlikely that the null hypothesis generated this data. So even though we could say, “there is a 99% chance that such and such…” there is always the chance of the alternative happening; so as good little scientists, we cannot be 100% sure of an outcome. Secondly, this test does not measure the plausibility of your alternate hypothesis generating this data, it only indicates there is “room” to entertain alternatives. Lastly, the chi square test in no way indicates how many alternate hypotheses there may be that can explain the data-yours is only one interpretation, there could be thousands!

So now you have a lovely new hammer. As is so often the case, this does not render everything as a nail. If you have a prediction that all offspring must be blue and you get a single red one, it's over. No need to try and determine what the odds were of that--from the statement of the hypothesis, they were ZERO. The role of chi-square is to determine the closeness a set of numbers comes to a set of predictions. When employing chi-square, ask yourself "Am I certain that I am seeing what I expected to see? How close was my prediction to the somewhat rough reality that I found." These are questions in the realm of chi-square.