Bio/statistics Handout 8

Dimensions and coordinates in a scientific context

 

 

a)  Coordinates:  Here is a hypothetical situation to think about:  Suppose a cell has genes labeled {1, 2, 3}.  The level of the corresponding product vector in R3.  This space has natural coordinates, x1, x2 and x3, that measure the respective levels of the products of gene 1, gene 2 and gene 3.  However, this might not be the most useful coordinate system.  In particular, if some subsets of genes are often turned on at the same time and in the same amounts, it might be better to change to a basis where that subset gives one of the basis vectors.  Suppose for the sake of argument, that it is usually the case that the level of the product from gene 2 is three times that of gene 1, while the level of the product of gene 3 is half that of gene 1.  This is to say that one usually finds x2 = 3x1 and x3 = x1.  Then it might make sense to switch from the standard coordinate bases, 

 

1 = ,  2 = ,  3 =  ,

(8.1)

to the coordinate system that uses a basis 1, 2 and 3 where

 

1 = ,    2 =  2 ,    3 = 3

(8.2)

            To explain, suppose I measure some values for x1, x2 and x3.  This then gives a vector,

 

 =  = x1 1 + x2 2 + x3 3 .

(8.3)

Now, I can also write this vector in terms of the basis in (8.2) as

 

 = c1 1 + c2 2 + c3 3 .

(8.4)

With 1, 2 and 3 as in (8.2), the coordinates c1, c2 and c3 that appear in (8.5) are

 

c1 = x1,   c2 = x2 – 3x1,   and   c3 = x3 - x1  .

(8.5)

As a consequence, the coordinates c2 describes the deviation of x2 from its usual value of 3x1.  Meanwhile, the coordinate c3 describes the deviation of x3 from its usual value of x1. 

            Here is another example:  Suppose now that there are again three genes with the levels of their corresponding products denoted as x1, x2, and x3.  Now suppose that it is usually the case that these levels are correlated in that x3 is generally very close to 2x2 + x1.  Any given set of measured values for these products determines now a column vector  as in (8.3).  A useful basis in this case would by one where the coordinates c1, c2 and c3 has

 

c1 = x1,     c2 = x2,  and   c3 = x3 – 2x2 – x1. 

(8.6)

Thus, c3 again measures the deviation from the expected values.  The basis with this property is that where

 

1 = ,     2 = ,     3 =   .

(8.7)

This is to say that if 1, 2, and 3 are as depicted in (8.7), and if  is then expanded in this basis as c11 + c22 + c3 3, then c1, c2 and c3 are given by (8.6).

 

b)  A systematic approach:  If you are asking how I know to take the basis in (8.7) to get the coordinate relations in (8.6), here is the answer:   Suppose that you have coordinates x1, x2 and x3 and you desire new coordinates, c1, c2 and c3 that are related to the x’s by a linear transformation:

 

 = A ,

(8.8)

where A is an invertible, 3´3 matrix.  In this regard, I am supposing that you have determined already the matrix A and are simply looking now to find the vectors 1, 2 and 3 that allow you to write  = c11 + c22 + c33 with c1, c2 and c3 given by (8.8).  As explained in the linear algebra text, the vectors to take are:

 

1 = A11,     2 = A-12,    3 = A-13 .

(8.9)

            To explain why (8.9) holds, take the equation  = c11 + c22 + c33 and act on both sides by the linear transformation A.  According to (8.8), the left hand side, A, is the vector  whose top component is c1, middle component is c2 and bottom component is c3.  This is to say that A = c11 + c22 + c33.  Meanwhile, the left hand side of the resulting equation is c1 A1 + c2 A2 + c3 A3.  Thus,

 

c11 + c22 + c33 = c1A1 + c2A2 + c3A3 .

(8.10)

Now, the two sides of (8.10) are supposed to be equal for all possible values of c1, c2 and c3.  In particular, they are equal when c1 = 1 and c2 = c3 = 0.  For these choices, the equality in (8.10) asserts that 1 = A1; this the left most equality in (8.9).  Likewise, setting c1 = c3 = 0 and c2 = 1 in (8.10) gives the equivalent of the middle equality in (8.9); and setting c1 = c2 = 0 and c3 = 1 in (8.10) gives the equivalent of the right most equality in (8.9).

 

c)  Dimensions:  What follows is an example of how the notion of dimension arises in a scientific context.  Consider the situation in Part a, above, where the system is such that the levels of x2 and x3 are very nearly x2 ~ 3x1 and x3 ~ x1.  This is to say that when we use the coordinate c1, c2 and c3 in (8.5), then |c2| and |c3| are typically very small.  In this case, a reasonably accurate model for the behavior of the three gene system can be had by simply assuming that c2 and c3 are always measured to be identically zero.  As such, the value of the coordinate c1 describes the system to great accuracy.  Since only one coordinate is needed to describe the system, it is said to be ‘1 dimensional’. 

            A second example is the system that is described by c1, c2 and c3 as depicted in (8.6).  If it is always the case that x3 is very close to 2x2+x1, then the system can be desribed with good accuracy with c3 set equal to zero.  This done, then one need only specify the values of c1 and c2 to describe the system.  As there are two coordinates needed, this system would be deemed ‘2-dimensional’. 

            In general, some sort of time dependent phenomena is deemed ‘n-dimensional’ when n coordinates are required to describe the behavior to some acceptable level of accuracy.  Of course, it is typically the case that the value of n depends on the desired level of accuracy.

 

 

Exercises:

 

1.      Suppose that four genes have corresponding products with levels x1, x2, x3 and x4 where x4 is always very close to x1+ 4x2 while x3 is always very close to 2x1 + x2.  Find a new set of basis vectors for R4 and corresponding coordinates c1, c2, c3 and c4 with the following property:  The values of x1, x2, x3 and x4 for this four gene system are the points in the (c1, c2, c3, c4) coordinate system where c3 and c4 are nearly zero.

 

2.      Suppose that two genes are either ‘on’ of ‘off’, so that there are affectively, just four states for the two gene system, {++, +-, -+, --}, where ++ means that both genes are on; +- means that the first is on and the second is off; etc.   Assume that these four states have respective probabilities , ,  , . 

a)  Is the event that the first gene is on independent from the event that the second gene

is on?  

Now suppose that these two genes jointly influence the levels of two different products.  The levels of the first product are given by {3, 2, 1, 0} in the respective states ++, +-, -+, --.  The levels of the second are {4, 2, 3, 1} in these same states.

b)  View the levels of the two products as random variables on the sample space S that

consists of {++, +-, -+, --} with the probabilities as stated.  Write down the mean and standard deviations for these two random variables.

      c)   Compute the correlation matrix in Equation 3.6 of Handout 3 for these two random

variables to prove that they are not independent.