Bio/statistics Handout 7: More about Bayesian statistics

The term ‘Bayesian statistics’ has different meanings for different people. Roughly, Bayesian statistics reverses ‘causes’ and ‘effects’ so as to make an educated guess about the causes given the known effects. The goal is deduce a probability function on the set of possible causes granted that we have the probalities of the various effects.

Take note that I have underlined the words ‘educated guess’. There are situations when the Bayesian strategy seems reasonable, and others where it doesn’t.

a) A problem for Bayesians: There is a sample space of interest, S, with a known function (i.e. random variables) f to another finite set, W. A probability function for the set W is in hand, but what is needed is one for the set S.

Here is a simple situation that exemplifies this: Flip two distinct coins, coin #1 and coin #2. Move to the right one step (x ® x+1) for each heads that appears and to the left one step (x ® x-1) for each tails. Let W denote the set of possible positions after two flips, thus W = {-2, 0, 2}. Meanwhile, the sample space is S = {HH, HT, TH, TT}. We can do this experiment many times and so generate numbers Q(-2), Q(0) and Q(2) that give the respective frequencies that –2, 0 and 2 are the resulting positions. How can we use these frequencies to determine the probability of getting heads on coin #1, and also the probability of getting heads on coin #2? In this regard, we don’t want to assume that these coins are fair, nor do we want to assume that the probability of heads for coin #1 is the same as that for coin #2.

b) A second problem: A six sided die, hidden from view, is rolled twice and the resulting pair of numbers (each either 1, 2, …, 6) are added to obtain a single number, thus an integer that can be as small as 2 or as large as 12. We are told what this sum is, but not the two integers that appeared. If this is done many times, how can the relative frequencies for the various values of the sum be used to determine a probability function for the sample space?

c) Meet the typical Bayesian

To set the stage, rememeber that if P is any given probability function on S, then P induces one on W by the rule we saw in Handout 3. Indeed, if the latter is denoted by P_f, the rule is that P_f(r) is the probability as measured by P of the subset of points in S where f has value r. Thus,

P_f(r) = ∑_{s_Î_{S with f(s) = r}} P(s).

(7.1)

This last equation can be written in terms of conditional probabilities as follows:

P_f(r) = ∑_s_Î_S P(r|s) P(s)

(7.2)

where P(r|s) is the conditional probability that f = r given that you are at the point s Î S. Of course, this just says that Pr|s) is one if f(s) = r and zero otherwise.

The problem faced by statisticians is to deduce P, or a reasonable approximation, given only knowledge of some previously determined probability function, Q, on the set W. In effect, we want to find a probability function P on S whose corresponding P_f is the known function Q.

Your typical Bayesian will derive a guess for P using the following strategy:

Step 1: Image that there is some conditional probability, Q(s|r), that gives the probability of obtaining any given s from S granted that the value of f is r. If such a suite of conditional probabilities were available, then one could take

P_guess(s) = ∑_r_Î_W Q(s|r) Q(r)

(7.3)

The problem is that the points in W are the values of a function of the points in S, not vice-versa. Thus, there is often no readily available Q(s|r).

Step 2: A Bayesian is not deterred by this state of affairs. Rather, the Bayesian plows ahead by using what we have, which is P(r|s). We know its values in all cases; it is 1 when f(s) = r and zero otherwise. Why not, asks the Bayesian, take

Q(s|r) = P(r|s) ,

(7.4)

where Z(r) is the number of points in S on which f has value r This, is to say that

Z(r) = ∑_s_Î_S P(r|s).

(7.5)

To explain the appearance of Z(r), remember that a conditional probability of the form P(A|B) is a probability function in its own right on the sample space S. Thus, P(S|B) must be 1 if S is the whole sample space. This would not necessarily be the case for Q(S|r) were the factor of 1/Z(r) absent.

Step 3: To summarize: Our typical Bayesian takes the following as a good guess for the probability function on S:

P_guess(s) = ∑_r_Î_W P(r|s) Q(r)

(7.6)

Note that disentangling the definitions, there is really no summation involved in (7.6) because there is just one value of r that makes P(r|s) non-zero for any given s, this the value r = f(s). Thus, (7.6) is a very roundabout way of saying that

P_guess = Q(f(s)).

(7.7)

This is our Bayesian’s guess for the probability function on S.

d) A first example: Consider the problem in Part a with flipping coin #1 and coin #2. As noted there, W denote is the set of possible positions after the two coin flips, thus W = {-2, 0, 2}. The set S = {HH, HT, TH, TT}. Suppose first that our two coins have the same probability for heads, some number q Î (0, 1). Thus T has probability 1-q, then the true probabilities for the elements in S are q², q(1-q), q(1-q) and (1-q)² in the order they appear above. These probability assignments give P_true on S. With these true probabilities, the frequencies of appearances of the three elements in W are (1-q)², 2q(1-q) and q². These numbers are therefore the probabilities given by Q.

Lets now see what the Bayesian would find for P_guess. For this purpose, note that the only non-zero values of P(r|s) that appear in the relevant version of (7.6) are

∑ P(-2,TT) = 1

∑ P(0, HT) = P(0, TH) = 1.

∑ P(2, HH) = 1

(7.8)

Thus, Z(±2) = 1 and Z(0) = 2. Plugging this into (7.7) finds

P_guess(HH) = q², P_guess(TT) = (1-q)² and P_guess(HT) = P_guess(TH) = q(1-q)

(7.9)

Thus, the Bayesian guess for probabilities is the true probability.

e) A second example: Let us now change the rules in the coin flip game and consider the case where the first flip uses a fair coin (probability ) for either H or T, and the second uses a biased coin, with probability q for H and thus (1-q) for T. In this case, the true probability on S is given by

P_true(HH) = q, P_true(HT) = (1-q), P_true(TH) = q, and P_true(TT) = (1-q) .

(7.10)

The frequencies of appearance of the three positions in W are now (1-q), , q. These three numbers define the probability function Q. As the conditional probabilities in (7.8) do not change, we can employ then in (7.6) to find the Bayesian guess:

P_guess(HH) = q, P_guess(HT) = , P_guess(TH) = , and P_guess(TT) = (1-q) .

(7.11)

Thus, the Bayesian guess goes bad when q deviates from .

Roughly speaking, the Bayesian guess can not distinguish between those points in the sample space that give the same value for f.

f) Something traumatic: Let me show you something that is strange about the Bayesian’s guess in (7.11). Suppose we ask for the probability as computed by P_guess that H appears on the first coin. According to our rules of probability,

P_guess(coin #1 = H) = P_guess(HH) + P_guess(HT) = q+.

(7.12)

This is also the probability P_guess(coin #2 = H) since P_guess(HT) = P_guess(TH). Now, note that

P_guess(HH) ≠ P_guess(coin #1 = H) P_guess(coin #2 = H)

(7.13)

unless q = since the left hand side is q and the right is (q+)². Thus, the Bayesian finds that the event of coin #1 = H is not independent of the event that coin #2 = H!! (Remember that events B and B´ are deemed independent when P(BÇB´) = P(B)P(B´).)

g) Rolling dice: Consider here the case where the die is rolled twice and the resulting two integers are added. The sample space, S, consists of the 36 pairs of the form (a, b) where a and b are integers from the set {1, . . ., 6}. The random variable (aka function of S) is the function that assigns a+b to any given (a, b) Î S. Thus, the set of possible outcomes in W = {2, . . . , 12}.

Suppose, for the sake of argument, that the true probabilities for rolling 1, 2, …, 6 on any given throw of the die are ,,,,,. Were this the case, then the true probability, P_true, for any given pair (a, b) in S is

P_true(a,b) = .

(7.14)

If the die has these probabilities, then the probabilities that result for the outcomes, the function Q on W, are as follows:

Q(2) = , Q(3) = , Q(4) = , Q(5) = , Q(6) = , Q(7) = , Q(8) = ,

Q(9) = , Q(10) = , Q(11) = , Q(12) = .

(7.15)

Now, given that we have Q as just given, here is what the Bayesian finds for the probabilities of some of the elements in the sample space S:

∑ P_guess(1,1) = ,

∑ P_guess(2,1) = P_guess(1,2) = ,

∑ P_guess(3,1) = P_guess(1,3) = P_guess(2,2) = ,

∑ P_guess(4,1) = P_guess(1,4) = P_guess(3,2) = P_guess(2,3) = ,

∑ P_guess(5,1) = P_guess(1,5) = P_guess(4,2) = P_guess(2,4) = P_guess(3,3) = ,

∑ P_guess(6,1) = P_guess(1,6) = P_guess(5,2) = P_guess(2,5) = P_guess(4,3) = P_guess(3,4) = ,

∑ P_guess(6,2) = P_guess(2,6) = P_guess(5,3) = P_guess(3,5) = P_guess(4,4) = ,

∑ ···· etc.

(7.16)

Exercises:

1. a) Complete the table in (7.16) by computing the values of P_guess on the remaining

pairs in S.

b) According to P_guess, is the event of the first roll comes up 1 independent from the event that second roll comes up 6? Justify your answer.

2. Compute the mean and standard deviation for the random variable a+b first using P_true from (7.15) and then using P_guess.

3. Consider now the same sample space for rolling a die twice, but now suppose that the die is fair, and so each number has probability of turning up on any given roll.

a) Compute the mean and standard deviation of the random variable a+b.

b) Compute the mean and standard deviation for the random variable ab.

c) Are the random variables a+b and ab independent? In this regard, remember that two random variables, f and g, are said to be independent when P(f=r and g = s) = P(f=r)P(g=s) for all pairs (r, s) where r is a possible value of f and s is a possible value of g. Justify your answer.