Bio/statistics handout 15: Hypothesis testing
My
purpose in this handout is to elaborate on the issues that are raised in the
fourth exercise in Handout 13. By way of
reminder, here is a paraphrase of the situation: You repeat some experiment a large number, N,
times and each time you record the value of a certain key measurement. Label these values as {x1, …., xN}.
A good
theoretical understanding of both the experimental protocol and the biology
should provide you with a hypothetical probability function, x ® p(x),
that gives the probability that any given measurement has value in any given
interval [a, b] Ì (-∞, ∞).
Here, a < b and a = -∞ and b = ∞ are allowed. For example, if you think that the variations
in the values of xj are due to various small,
unrelated, random factors, then you might propose that p(x) is a Gaussian, thus
a function that has the form
p(x)
= ![]()
e
(15.1)
for some suitable choice of m and s. In any event, let’s suppose that you have some
reason to believe that a particular p(x) should determine the probabilities for
the value of any given measurement.
Here
is the issue on the table:
Is it likely or not that N experiments will
obtain a sequence {x1, …, xN}
if the probability
of any one measurement is really determined by p(x)?
(15.2)
If the experimental sequence, {x1, …, xN} is ‘unlikely’
for your chosen version of p(x), this suggests that your understanding of the
experiment is less than adequate.
There
are various ways to measure the likelyhood of any
given set of measurements. What follows
describes some very common ones.
Testing the mean: Let m
denote the mean for p(x) and let s denote its standard deviation. Thus, m = ò xp(x) dx
and s = ò (x-m)2 p(x) dx. If N is large, then the Central Limit Theorem
can be used to make the following prediction:
Let {z1, …., zN} denote
the result of N measurements where any one value is unrelated to any other, and
where the probability of each is actually determined by the proposed function
p(x). Let
z º
∑1≤j≤N zj
.
(15.3)
According to the Central Limit
Theorem, the probability that |z - m| ≥ R
s is approximately
2·
(15.4)
when N is very large. Note
that this last expression is less than
2·
R
.
(15.5)
Meanwhile,
we have our experimental ‘mean’, this
x =
∑1≤j≤N xj .
(15.6)
If we set R =
√N |m - x|, then (15.5) gives an experimental upper bound
to the probability that N measurements determines an experimental mean that is
farther than x from m. For example, if
the number obtained by (15.4) is less than
, and N is very large, then the P-value of our experimental
mean x is probably ‘significant’ and suggests that our understanding of
our experiment is inadequate.
Testing the variance: If our experimental mean is reasonably close
to m, we can then go on to test whether the variation of the xj about the mean is a likely or unlikely
occurrence. Here is one way to do
this: Let f(x) denote the function x ® (x-m)2. This is random variable on (-∞,
∞) that takes its values in [0, ∞). Its mean, m2, as determined by p(x), is
m2 =
(x-m)2 p(x) dx
(15.7)
Meanwhile, the square of its
standard deviation is the square root of
(s2)2 =
((x-m)2 - m2)2 p(x) dx =
(x-m)4 p(x) dx - m22 .
(15.8)
Now,
we just done N identical versions of the same experiment, so N different values
{(x1-m)2, (x2-m)2, …, (xN-m)2}. The plan is to use the Central Limit Theorem
to again estimate a P-value, this time for the average of these N values:
s2 º
∑1≤j≤N (xj-m)2 .
(15.9)
In particular, according to the
Central Limit Theorem, if the variation in the values of xj
are really determined by p(x), then the probability that s2 differs from m2 by more that R
s2 should be less
than the expression in (15.4) when N is very large, thus less than
2·
R
.
(15.10)
This
said, when N is large, the P-value of our experimentally determined standard
deviation, s, from (15.9) is less than the expression in (15.10) with
R = √N
|s2 - m2| .
(15.11)
To elaborate: Our hypothetical probability function p(x)
gives the expected squared standard deviation, this the
expression in (15.7). The Central Limit
Theorem then says that our measured squared standard deviation, this the
expression in (15.9), should approach the predicted one (15.7) as N ®
∞ and gives an approximate probability, this the
expression given using (15.11) in (15.10), for a measured squared standard deviation
to differ from the theoretical one.
In
particular, if the expression in (15.10) is less than
using R from (15.11),
then there is a significant chance that our theoretically determined p(x) is
not correct.
Testing the
higher moments: As I remarked at the outset, if we believe
that the measured variation in the data {xj}
is due to many small, unrelated, randomly varying factors, then we might
predict that the distribution of these numbers is governed by a Gaussian function
as depicted in (15.1). If we have no
theoretical basis for a particular choice for m and s in
(15.1), then the obvious thing to do is to choose m to
equal x as given in (15.6) and to choose s to equal s
with the latter’s square given in (15.9).
These choices render the previous two tests moot. Even so, we can still test whether our data
is consistent with this Gaussian. For
this purpose, note that the probabilities as predicted by the Gaussian are very
small for values of x that are far from the mean. Thus, we might be led to consider whether an
average of the form
∑1≤j≤N (xj-m)m
(15.12)
for some integer m > 2 is close to the predicted value were
the Gaussian probability controlling things.
Note that an average as in (15.12) weighs points that are very far from
the mean. An average as in (15.12) is
called an m’th order moment.
The
strategy here is much like that used previously. The assignment x ® (x-m)m
defines a random variable whose mean and standard deviation are given by
mm =
(x-m)m p(x) dx and sm2 =
((x-m)m - mm)2 p(x) dx .
(15.13)
For example, in the case of the Gaussian in (15.1), these
integrals can be computed exactly. In this
regard, mm =0 when m is odd
and when m = 2k is even, then .
m2k =
s2k .
(15.14)
Meanwhile, sm2 = m2m - mm2.
According
to the Central Limit Theorem, if the Gaussian in (15.1) is controlling things,
then the probability that the sum in (15.12) differs from mm by more than R
sm is well
approximated at large N by a number that is smaller than the number in (15.10).
Thus, the P-value of our expression in (15.12)
should be smaller than (15.10) using
R = √N
|
∑1≤j≤N (xj-m)m - mm| .
(15.15)
As before, if N is large and the computed P-value is small, our theoretical prediction is in trouble.
Note
in this regard, that the value of N that makes the prediction of the Central
Limit Theorem accurate may well depend on the choice of the power m that we use
in (15.12) and (15.15). Large values for
m might require taking N very large before the Central Limit Theorem’s
prediction is reasonable. This just
means that having one or two values of xj
that are far from the mean when N = 100 might give a very small P-value if we
take m = 1000000. But, if we take N
super large, then the number of xj that
are this far from the mean must grow with N to make things significant.
Exercises:
1.
Suppose that we expect that the x-coordinate of bacteria in our
rectangular petri dish
should be any value
between –1 and 1 with equal probability inspite of
our having coated the x = 1 wall of the dish with a specific chemical. We observe the positions of 900 bacteria in
our dish and so obtain 900 values, {x1, ….,
x900}, for the x-coordinates.
a) Suppose the average, x =
∑1≤k≤900 xk,
is 0.01. Use the Central Limit Theorem to
obtain a theoretical upper bound based on our model of a uniform probability
function for the probability that an average of 900 x-coordinates differs from
0 by more than 0.01.
b) Suppose that the average of the squares, s2 =
∑1≤k≤900 (xk)2, equals 0.36. Use the Central Limit Theorem to
obtain a theoretical upper bound based on our model of a uniform probability
function for the probability that an average of the squares of 900
x-coordinates is greater than or equal to 0.36.
(Note that I am not asking that it differ by a certain amount from the square
of the standard deviation for the uniform probability function. If you compute the latter, you will be wrong
by a factor of 2.)