Bio/statistics Handout 8

Dimensions and coordinates in a scientific context

a) Coordinates: Here is a hypothetical situation to think about: Suppose a cell has genes labeled {1, 2, 3}. The level of the corresponding product vector in R³. This space has natural coordinates, x₁, x₂ and x₃, that measure the respective levels of the products of gene 1, gene 2 and gene 3. However, this might not be the most useful coordinate system. In particular, if some subsets of genes are often turned on at the same time and in the same amounts, it might be better to change to a basis where that subset gives one of the basis vectors. Suppose for the sake of argument, that it is usually the case that the level of the product from gene 2 is three times that of gene 1, while the level of the product of gene 3 is half that of gene 1. This is to say that one usually finds x₂ = 3x₁ and x₃ = x₁. Then it might make sense to switch from the standard coordinate bases,

₁ = , ₂ = , ₃ = ,

(8.1)

to the coordinate system that uses a basis ₁, ₂ and ₃ where

₁ = , ₂ = ₂ , ₃ = ₃

(8.2)

To explain, suppose I measure some values for x₁, x₂ and x₃. This then gives a vector,

= = x₁ ₁ + x₂ ₂ + x₃ ₃ .

(8.3)

Now, I can also write this vector in terms of the basis in (8.2) as

= c₁ ₁ + c₂ ₂ + c₃ ₃ .

(8.4)

With ₁, ₂ and ₃ as in (8.2), the coordinates c₁, c₂ and c₃ that appear in (8.5) are

c₁ = x₁, c₂ = x₂ – 3x₁, and c₃ = x₃ - x₁ .

(8.5)

As a consequence, the coordinates c₂ describes the deviation of x₂ from its usual value of 3x₁. Meanwhile, the coordinate c₃ describes the deviation of x₃ from its usual value of x₁.

Here is another example: Suppose now that there are again three genes with the levels of their corresponding products denoted as x₁, x₂, and x₃. Now suppose that it is usually the case that these levels are correlated in that x₃ is generally very close to 2x₂ + x₁. Any given set of measured values for these products determines now a column vector as in (8.3). A useful basis in this case would by one where the coordinates c₁, c₂ and c₃ has

c₁ = x₁, c₂ = x₂, and c₃ = x₃ – 2x₂ – x₁.

(8.6)

Thus, c₃ again measures the deviation from the expected values. The basis with this property is that where

₁ = , ₂ = , ₃ = .

(8.7)

This is to say that if ₁, ₂, and ₃ are as depicted in (8.7), and if is then expanded in this basis as c₁₁ + c₂₂ + c₃ ₃, then c₁, c₂ and c₃ are given by (8.6).

b) A systematic approach: If you are asking how I know to take the basis in (8.7) to get the coordinate relations in (8.6), here is the answer: Suppose that you have coordinates x₁, x₂ and x₃ and you desire new coordinates, c₁, c₂ and c₃ that are related to the x’s by a linear transformation:

= A ,

(8.8)

where A is an invertible, 3´3 matrix. In this regard, I am supposing that you have determined already the matrix A and are simply looking now to find the vectors ₁, ₂ and ₃ that allow you to write = c₁₁ + c₂₂ + c₃₃ with c₁, c₂ and c₃ given by (8.8). As explained in the linear algebra text, the vectors to take are:

₁ = A^‑1₁, ₂ = A^-1₂, ₃ = A^-1₃ .

(8.9)

To explain why (8.9) holds, take the equation = c₁₁ + c₂₂ + c₃₃ and act on both sides by the linear transformation A. According to (8.8), the left hand side, A, is the vector whose top component is c₁, middle component is c₂ and bottom component is c₃. This is to say that A = c₁₁ + c₂₂ + c₃₃. Meanwhile, the left hand side of the resulting equation is c₁ A₁ + c₂ A₂ + c₃ A₃. Thus,

c₁₁ + c₂₂ + c₃₃ = c₁A₁ + c₂A₂ + c₃A₃ .

(8.10)

Now, the two sides of (8.10) are supposed to be equal for all possible values of c₁, c₂ and c₃. In particular, they are equal when c₁ = 1 and c₂ = c₃ = 0. For these choices, the equality in (8.10) asserts that ₁ = A₁; this the left most equality in (8.9). Likewise, setting c₁ = c₃ = 0 and c₂ = 1 in (8.10) gives the equivalent of the middle equality in (8.9); and setting c₁ = c₂ = 0 and c₃ = 1 in (8.10) gives the equivalent of the right most equality in (8.9).

c) Dimensions: What follows is an example of how the notion of dimension arises in a scientific context. Consider the situation in Part a, above, where the system is such that the levels of x₂ and x₃ are very nearly x₂ ~ 3x₁ and x₃ ~ x₁. This is to say that when we use the coordinate c₁, c₂ and c₃ in (8.5), then |c₂| and |c₃| are typically very small. In this case, a reasonably accurate model for the behavior of the three gene system can be had by simply assuming that c₂ and c₃ are always measured to be identically zero. As such, the value of the coordinate c₁ describes the system to great accuracy. Since only one coordinate is needed to describe the system, it is said to be ‘1 dimensional’.

A second example is the system that is described by c₁, c₂ and c₃ as depicted in (8.6). If it is always the case that x₃ is very close to 2x₂+x₁, then the system can be desribed with good accuracy with c₃ set equal to zero. This done, then one need only specify the values of c₁ and c₂ to describe the system. As there are two coordinates needed, this system would be deemed ‘2-dimensional’.

In general, some sort of time dependent phenomena is deemed ‘n-dimensional’ when n coordinates are required to describe the behavior to some acceptable level of accuracy. Of course, it is typically the case that the value of n depends on the desired level of accuracy.

Exercises:

1. Suppose that four genes have corresponding products with levels x₁, x₂, x₃ and x₄ where x₄ is always very close to x₁+ 4x₂ while x₃ is always very close to 2x₁ + x₂. Find a new set of basis vectors for R⁴ and corresponding coordinates c₁, c₂, c₃ and c₄ with the following property: The values of x₁, x₂, x₃ and x₄ for this four gene system are the points in the (c₁, c₂, c₃, c₄) coordinate system where c₃ and c₄ are nearly zero.

2. Suppose that two genes are either ‘on’ of ‘off’, so that there are affectively, just four states for the two gene system, {++, +-, -+, --}, where ++ means that both genes are on; +- means that the first is on and the second is off; etc. Assume that these four states have respective probabilities , , , .

a) Is the event that the first gene is on independent from the event that the second gene

is on?

Now suppose that these two genes jointly influence the levels of two different products. The levels of the first product are given by {3, 2, 1, 0} in the respective states ++, +-, -+, --. The levels of the second are {4, 2, 3, 1} in these same states.

b) View the levels of the two products as random variables on the sample space S that

consists of {++, +-, -+, --} with the probabilities as stated. Write down the mean and standard deviations for these two random variables.

c) Compute the correlation matrix in Equation 3.6 of Handout 3 for these two random

variables to prove that they are not independent.