Bio/statistics handout 13: Eigenvalues in biology

a) An example from genetics:

Suppose we have a fixed size N population of cells that reproduce by fission. The model here is that there are N cells that divide at the start, thus producing 2N cells. After one unit of time, half of these die and half survive, so there are N cells to fission at the end of 1 unit of time. These N survivors divide to produce 2N cells and so start the next run of the cycle. In particular, there are always N surviving cells at the end of one unit of time and then 2N just at the start of the next as each of these N cells splits in half.

Now, suppose that at the beginning, t = 0, there is, for each n Î {0, 1, …, N}, a given probability which I’ll call p_n(0) for n of the N initial cells to carry a certain trait. Suppose that this trait (red color as opposed to blue) is neutral with respect to the cell’s survival. In other words, the probability is for any given red cell to survive to reproduce, and the probability is for any given blue cell to survive to reproduce.

Here is the key question: Given the initial probabilities, p₀(0), p₁(0), …, p_N(0), what are the corresponding probabilities after some t generations? Thus, what are the values of p₀(t), p₁(t), … , p_N(t) where now p_n(t) denotes the probability that n cells of generation t carry the red color.

To solve this, we can use our tried and true recourse to conditional probabilities by noting that

p_t(n) = Prob(n red survivors|0 red parents)·p₀(t-1)

+ Prob(n red survivors|1 red parent) p₁(t-1) +

Prob(n red survivors|2 red parents) p₂(t-1) + ···

(13.1)

where Prob(n red survivorst|m red parents) is the conditional probability that there are n red cells at the end of a cycle given that there were m such cells the end of the previous cycle. If the ambient environmental conditions don’t change, one would explect that these conditional probabilities are independent of time. As I explain below, they are, in fact, computable from what we are given about this problem. In any event, let me use the shorthand A to denote the square matrix of size N+1 whose entry in row n and column m is Prob(n red survivors|m red parents). In this regard, note that n and m run from 0 to N, not the from 1 to N+1. Let (t) denote the vector in R^N+1 whose n’th entry is p_n(t). Then (13.1) reads

(t) = A(t-1) .

(13.2)

This last equation would be easy to solve if we knew that (0) was an eigenvector of the matrix A. That is, if it were the case that

A(0) = l(0) with l some real number.

(13.3)

Indeed, were this the case, then the t = 1 version of (13.2) would read (1) = l(0). We could then use the t = 2 version of (13.2) to compute (2) and we would find that (2) = lA(0) = l²(0). Continuing in the vein, we would find that (t) = l^r(0) and our problem would be solved.

Now, it is unlikely that (0) is going to be an eigenvector. However, even if (0) is a linear combination of eigenvectors,

(0) = c₁₁ + c₂₂ + ··· ,

(13.4)

we could still solve for (t). Indeed, let l_k denote the eigenvalue for any given _k. Thus, A_k = l_k_k. Granted we know these eigenvalues and eigenvectors, then we can plug (13.4) into the t = 1 version of (13.2) to find

(1) = c₁l₁₁ + c₂l₂₂ + ··· .

(13.5)

We could then plug this into the t = 2 version of (13.2) to find that

(2) = c₁(l₁)²₁ + c₂(l₂)² ₂ + ··· ,

(13.6)

and so. In general, this then gives

(t) = c₁(l₁)^t₁ + c₂(l₂)^t ₂ + ··· .

(13.7)

b) Transition/Markov matrices

The matrix A that appears in (13.2) has entries that are conditional probabilities, and this has the following implications:

∑ All entries are non-negative.

∑ A_0,m + A_1,m + ··· + A_N,m = 1 for all m Î {0, 1, …, N}.

(13.8)

The last line above asserts that the probability is 1 of there being some number, either 0 or 1 or 2, or … or N of red cells in the subsequent generation given m red cells in the initial generation.

A square matrix with this property is called a transition matrix, or sometimes a Markov matrix. When A is such a matrix, the equation in (13.2) is called a Markov process.

Although we are interested in the eigenvalues of A, it is amusing to note that the transpose matrix, A^T, has an eigenvalue equal to 1 with the corresponding eigenvector being proportional to the vector, , with each entry equal to 1. Indeed, the entry in the m’th row and n’th column of A^T is A_nm, this the entry of A in the n’th row and m’th column. Thus, the m’th entry of A^T is

(A^T)_m = A_0,ma₀ + A_1,ma₁ + ··· + A_N,ma_N .

(13.9)

If the lower line in (13.8) is used and if each a_k is 1, then each entry of A^T is also 1. Thus, A^T = .

As a ‘cultural’ aside, what follows is the story on A_nm in the example from Part a. First, A_nm = 0 of n is larger than 2m since m parent cells can spawn at most 2m survivors. For n ≤ m, consider that you have 2N cells of which 2m are red and you ask for the probability that a choice of N from the 2N cells results in n red ones. This is a counting problem that is much like those discussed in Part c of Handout 9 although more complicated. The answer here is:

A_nm = when 0 ≤ n ≤ 2m.

(13.8)

c) Another protein folding example

Here is another model for protein folding. As you may recall from previous handouts, a protein is made of segments tied end to end as a chain. Each segment is one of 20 amino acids. The protein is made by the cell in a large and complex molecule called a ribosome. The segments are attached one after the other in the ribosome and so the chain grows, link by link. Only a few segments are in the ribosome, and the rest stick out as the protein grows. As soon as a joint between two segments is free of the ribosome, it can bend if it is not somehow stabilized by surrounding molecules. Suppose that the bend at a joint can be in one of n directions as measured relative to the direction of the previously made segment. A simplistic hypothesis has the direction of the bend of any given segment influenced mostly by the bend direction of the previously made segment.

To model this, let me introduce, for k, j Î {1, …, n}, the conditional probability, A_k,j that the a given segment has bend direction k when the previously made segment has bend direction k´. Next, agree to label the segments along the protein by the integers in the set {1, …, N}. If z denotes such an integer, let p_k(z) denote the probability that the z’th segment is bent in the k’th direction relative to the angle of the z-1’st segment. Then we have

p_k(z) = ∑_1≤j≤n A_kj p_j(z-1)

(13.9)

I can now introduce the vector (z) in Rⁿ whose k’th component is p_k(z), and also the square matrix A with the components A_kj. Then (13.9) is the equation

(z) = A(z-1).

(13.10)

Note that as in the previous case, the matrix A is a Markov matrix. This is to say that each A_kj is non-negative because they are conditional probabilities; and

A_1,j + A_2,j + ··· A_n,j = 1

(13.11)

because the segment be at some angle or other.

Here is an explicit example: Take n = 3 so that A is a 3 ´ 3 matrix. In particular, take

A = .

(13.12)

As it turns out, A has the following eigenvectors:

₁ = , ₂ = and ₃ =

(13.13)

with corresponding eigenvalues l₁ = 1, l₂ = and l₃ = . Note that the vectors in (13.13) are mutually orthogonal and have norm 1, so they define an orthonormal basis for R³. Thus, any given z version of (z) can be written as a linear combination of the vectors in (13.13). Doing so, we write

(z) = c₁(z) ₁ + c₂(z) ₂ + c₃(z) ₃

(13.14)

where each c_k(z) is some real number. In this regard, there is one point to make straight away, which is that c₁(z) must equal when the entries of (z) represent probabilities that sum to 1. To explain, keep in mind that the basis in (13.13) is an orthonormal basis, and this implies that ₁· = c₁(z). However, since each entry of ₁ is equal to , this dot product is (p₁(z) + p₂(z) + p₃(z)) and so equals when p₁+p₂+p₃ = 1 as is the case when these coefficients are probabilities.

In any event, if you plug the expression in (13.14) into the left side of (13.10) and use the analogous z-1 version on the right side, you will find that the resulting equation holds if and only if the coefficients obey

c₁(z) = c₁(z-1) , c₂(z) = c₂(z-1) and c₃(z) = c₃(z-1).

(13.15)

Note that the equality here between c₁(z) and c₁(z-1) is heartening in as much as both of them are supposed to equal . Anyway, continue by iterating (13.15) by writing the z-1 versions of c_k in terms of the z-2 versions, then the latter in terms of the z-3 versions, and so on until you obtain can be iterated now to find that

c₁(z) = c₁(0), c₂(z) = ()^z c₂(0) and c₃(z) = ()^z c₃(0).

(13.16)

Thus, we see from (13.16), we have

(z) = + ()^z .

(13.17)

By the way, take note of how the probabilities for the three possible fold directions come closer and closer to being equal as z increases even if the initial z = 0 probabilities were drastically skewed to favor one or the other of the three directions. For example, suppose that

(0) = .

(13.18)

As a result, c₁(0) = , c₂(0) = - and c₃(0) = . Thus, we have

(z) = + ()^z .

(13.8)

Thus, by z = 4, the three directions differ in probability by less than 1%.

Exercises

1. Multiply the matrix A in (3.12) against the vector (z) in (13.17) and verify that the

result is equal to (z+1) as defined by replacing z by z+1 in (13.17).

2. Let A denote the 2 ´ 2 matrix . The vectors ₁ = and ₂ =

are eigenvectors of A.

a) Compute the eigenvalues of ₁ and ₂.

b) Suppose z Î {0, 1, …} and (z+1) = A(z) for all z. Find (z) if (0) = ?

c) Find (z) if (0) = ?

d) Find(z) in the case that (z+1) = + A(z) all z and (0) = .

3. Suppose that you have a model to explain your data that predicts the probability of a

certain measurement having any prescribed value. Suppose that this probability function has mean 1 and standard deviation 2.

a) Give an upper bound for the probability of a measurement being greater than 5.

b) Suppose that you average some very large number, N, of measurements that are

taken in unrelated, but identical versions of the same experimental set up. Write

down a Gaussian probability function that you can use to estimate the probability that the value of this average is greater than 5. In particular, give a numerical estimate using this Gaussian function for N = 100.

c) Lets say that you will agree to throw out your proposed model if it predicts that the

probability for finding an average value that is greater than your measured average for 100 measurements is less than . If your measured average is 1.6 for 100 experiments, should you junk your model?

4. Suppose that you repeat the same experiment 100 times and each time record the value

of a certain key measurement. Let {x₁, …, x₁₀₀} denote the values of the measurement in N experiments. Suppose that ∑_1≤k≤100 x_k = 0 and ∑_1≤k≤100 x_k² = 1. Suppose, in addition, that five of the x_k obey x_k > 3. The purpose of this problem is to walk you through a method for obtaining an upper bound for the likelyhood of having five measurements that far from the mean. In this regard, suppose that you make the hypothesis that the spread of the measured numbers {x_k}_1≤k≤100 is determined by the Gaussian probability function with mean 0 and standard deviation equal to 1.

a) Let p denote the probability using this Gaussian probability function for a measurement x with x ≥ 3. Explain why p ≤ 3 e^-9/2 ≤ 0.014.

b) Use the binomial probability function to explain why the probability of finding five or more values for k out of 100 that have x_k > 3 is equal to

∑_5≥k≥100 p^k (1-p)^100-k.

c) Let k ≥ 5 and let z ® ƒ_k(z) = z^k (1-z)^100-k. Show that ƒ is an increasing function of z when z < . (Hint: Take its derivative with respect to z.)

d) Since 0.014 ≤ 0.05 = , use the result from Part d to conclude that the probability of finding 5 or more of the x_k’s out of 100 with x_k > 3 has probability less than

∑_5≥k≥100 (0.014)^k (0.986)^100-k.

e) We saw in Equation 9.21 of Handout 9 that the terms in the preceding sum get ever smaller as k increases. Use a calculator to show that the k = 9 term and thus all higher k terms are smaller than 1.1´10^-5. Then, use a calculator to estimate the k = 5, 6, 7 and 8 terms to prove that the sum in Part d is no greater than 0.015.