Bio/statistics Handout 7:
More about Bayesian statistics
The
term ‘Bayesian statistics’ has different meanings for different people. Roughly, Bayesian statistics reverses
‘causes’ and ‘effects’ so as to make an educated guess about the
causes given the known effects. The goal
is deduce a probability function on the set of possible causes granted that we
have the probalities of the various effects.
Take note that I have
underlined the words ‘educated guess’.
There are situations when the Bayesian strategy seems reasonable, and
others where it doesn’t.
a) A problem for Bayesians: There is a sample space of interest, S, with a
known function (i.e. random variables) f to another finite set, W. A probability function for the set W is in
hand, but what is needed is one for the set S.
Here is a simple situation that exemplifies this: Flip two distinct coins, coin #1 and coin
#2. Move to the right one step (x ® x+1) for each heads that
appears and to the left one step (x ® x-1) for each tails. Let W denote the set of possible positions
after two flips, thus W = {-2, 0, 2}.
Meanwhile, the sample space is S = {HH, HT, TH, TT}. We can do this experiment many times and so
generate numbers Q(-2), Q(0) and Q(2) that give the respective frequencies that
–2, 0 and 2 are the resulting positions.
How can we use these frequencies to determine the probability of getting
heads on coin #1, and also the probability of getting heads on coin #2? In this regard, we don’t want to assume that
these coins are fair, nor do we want to assume that the probability of heads
for coin #1 is the same as that for coin #2.
b) A second problem: A six
sided die, hidden from view, is rolled twice and the resulting pair of numbers
(each either 1, 2, …, 6) are added to obtain a single number, thus an integer
that can be as small as 2 or as large as 12.
We are told what this sum is, but not the two integers that
appeared. If this is done many times,
how can the relative frequencies for the various values of the sum be used to
determine a probability function for the sample space?
c) Meet the typical Bayesian
To set the stage,
rememeber that if P is any given probability function on S, then P induces one
on W by the rule we saw in Handout 3.
Indeed, if the latter is denoted by Pf, the rule is that Pf(r)
is the probability as measured by P of the subset of points in S where f has
value r. Thus,
Pf(r) = ∑{sÎS with f(s) = r} P(s).
(7.1)
This
last equation can be written in terms of conditional probabilities as follows:
Pf(r)
= ∑sÎS P(r|s) P(s)
(7.2)
where P(r|s) is the conditional probability
that f = r given that you are at the point s Î S. Of course, this just says that Pr|s) is one
if f(s) = r and zero otherwise.
The
problem faced by statisticians is to deduce P, or a reasonable approximation,
given only knowledge of some previously determined probability function, Q, on
the set W. In effect, we want to find a
probability function P on S whose corresponding Pf is the known
function Q.
Your
typical Bayesian will derive a guess for P using the following strategy:
Step
1: Image that there is some
conditional probability, Q(s|r), that gives the probability of obtaining any
given s from S granted that the value of f is r. If such a suite of conditional probabilities
were available, then one could take
Pguess(s)
= ∑rÎW Q(s|r) Q(r)
(7.3)
The problem is that the points in W are the
values of a function of the points in S, not vice-versa. Thus, there is often no readily available
Q(s|r).
Step
2: A Bayesian is not deterred by
this state of affairs. Rather, the
Bayesian plows ahead by using what we have, which is P(r|s). We know its values in all cases; it is 1 when
f(s) = r and zero otherwise. Why not,
asks the Bayesian, take
Q(s|r) =
P(r|s) ,
(7.4)
where Z(r) is the number of points in S on
which f has value r This, is to say that
Z(r)
= ∑sÎS P(r|s).
(7.5)
To explain the appearance of Z(r), remember
that a conditional probability of the form P(A|B) is a probability function in
its own right on the sample space S.
Thus, P(S|B) must be 1 if S is the whole sample space. This would not necessarily be the case for
Q(S|r) were the factor of 1/Z(r) absent.
Step
3: To summarize: Our typical Bayesian takes the following as a
good guess for the probability function on S:
Pguess(s) = ∑rÎW
P(r|s) Q(r)
(7.6)
Note that disentangling the definitions,
there is really no summation involved in (7.6) because there is just one value
of r that makes P(r|s) non-zero for any given s, this the value r = f(s). Thus, (7.6) is a very roundabout way of
saying that
Pguess
=
Q(f(s)).
(7.7)
This is our Bayesian’s guess for the
probability function on S.
d) A first example: Consider the problem in Part a with flipping coin #1 and coin #2. As noted there, W denote is the set of possible positions after the two coin flips, thus W = {-2, 0, 2}. The set S = {HH, HT, TH, TT}. Suppose first that our two coins have the same probability for heads, some number q Î (0, 1). Thus T has probability 1-q, then the true probabilities for the elements in S are q2, q(1-q), q(1-q) and (1-q)2 in the order they appear above. These probability assignments give Ptrue on S. With these true probabilities, the frequencies of appearances of the three elements in W are (1-q)2, 2q(1-q) and q2. These numbers are therefore the probabilities given by Q.
Lets now see what the Bayesian would find for Pguess. For this purpose, note that the only non-zero values of P(r|s) that appear in the relevant version of (7.6) are
∑ P(-2,TT) = 1
∑ P(0, HT) = P(0, TH) = 1.
∑ P(2, HH) = 1
(7.8)
Thus, Z(±2) = 1 and Z(0) = 2. Plugging this into (7.7) finds
Pguess(HH) = q2, Pguess(TT) = (1-q)2 and Pguess(HT) = Pguess(TH) = q(1-q)
(7.9)
Thus, the Bayesian guess for probabilities is the true probability.
e) A second example: Let us now change the rules in the coin
flip game and consider the case where the first flip uses a fair coin
(probability
) for either H or T,
and the second uses a biased coin, with probability q for H and thus (1-q) for
T. In this case, the true probability on
S is given by
Ptrue(HH)
=
q, Ptrue(HT)
=
(1-q), Ptrue(TH)
=
q, and Ptrue(TT)
=
(1-q) .
(7.10)
The frequencies of appearance of the three positions in W
are now
(1-q),
,
q. These three numbers
define the probability function Q. As
the conditional probabilities in (7.8) do not change, we can employ then in
(7.6) to find the Bayesian guess:
Pguess(HH)
=
q, Pguess(HT)
=
, Pguess(TH)
=
, and Pguess(TT) =
(1-q) .
(7.11)
Thus, the Bayesian guess goes bad when q deviates from
.
Roughly speaking, the Bayesian guess can not distinguish between those points in the sample space that give the same value for f.
f) Something traumatic: Let me show you something that is strange about the Bayesian’s guess in (7.11). Suppose we ask for the probability as computed by Pguess that H appears on the first coin. According to our rules of probability,
Pguess(coin
#1 = H) = Pguess(HH) + Pguess(HT) =
q+
.
(7.12)
This is also the probability Pguess(coin #2 = H) since Pguess(HT) = Pguess(TH). Now, note that
Pguess(HH) ≠ Pguess(coin #1 = H) Pguess(coin #2 = H)
(7.13)
unless q =
since the left hand
side is
q and the right is (
q+
)2. Thus,
the Bayesian finds that the event of coin #1 = H is not independent of
the event that coin #2 = H!! (Remember
that events B and B´ are deemed independent when P(BÇB´) = P(B)P(B´).)
g) Rolling dice: Consider here the case where the die is rolled twice and the resulting two integers are added. The sample space, S, consists of the 36 pairs of the form (a, b) where a and b are integers from the set {1, . . ., 6}. The random variable (aka function of S) is the function that assigns a+b to any given (a, b) Î S. Thus, the set of possible outcomes in W = {2, . . . , 12}.
Suppose,
for the sake of argument, that the true probabilities for rolling 1, 2, …, 6 on
any given throw of the die are
,
,
,
,
,
. Were this the case,
then the true probability, Ptrue, for any given pair (a, b)
in S is
Ptrue(a,b)
=
.
(7.14)
If the die has these probabilities, then the probabilities that result for the outcomes, the function Q on W, are as follows:
Q(2) =
, Q(3) =
, Q(4) =
, Q(5) =
, Q(6) =
, Q(7) =
, Q(8) =
,
Q(9) =
, Q(10) =
, Q(11) =
, Q(12) =
.
(7.15)
Now, given that we have Q as just given, here is what the Bayesian finds for the probabilities of some of the elements in the sample space S:
∑
Pguess(1,1) =
,
∑
Pguess(2,1) = Pguess(1,2)
=
,
∑
Pguess(3,1) = Pguess(1,3)
= Pguess(2,2) = ![]()
,
∑
Pguess(4,1) = Pguess(1,4)
= Pguess(3,2) = Pguess(2,3) =
,
∑
Pguess(5,1) = Pguess(1,5)
= Pguess(4,2) = Pguess(2,4) = Pguess(3,3) =
,
∑
Pguess(6,1) = Pguess(1,6)
= Pguess(5,2) = Pguess(2,5) = Pguess(4,3) = Pguess(3,4)
= ![]()
,
∑
Pguess(6,2) = Pguess(2,6)
= Pguess(5,3) = Pguess(3,5) = Pguess(4,4) =
,
∑ ···· etc.
(7.16)
Exercises:
1. a) Complete the table in (7.16) by computing the values of Pguess on the remaining
pairs in S.
b) According to Pguess, is the event of the first roll comes up 1 independent from the event that second roll comes up 6? Justify your answer.
2. Compute the mean and standard deviation for the random variable a+b first using Ptrue from (7.15) and then using Pguess.
3. Consider now the same sample space for rolling a die twice, but now
suppose that the die is fair, and so each number has probability
of turning up on any
given roll.
a) Compute the mean and standard deviation of the random variable a+b.
b) Compute the mean and standard deviation for the random variable ab.
c) Are the
random variables a+b and ab
independent? In this regard, remember that
two random variables, f and g, are said to be independent when P(f=r and g = s)
= P(f=r)P(g=s) for all pairs (r, s) where r is a possible value of f and s is a
possible value of g. Justify your
answer.