Bio/statistics Handout 8
Dimensions and coordinates in a scientific context
a) Coordinates: Here is a hypothetical
situation to think about: Suppose a cell has genes labeled {1, 2, 3}. The level of the corresponding product vector
in R3. This space has
natural coordinates, x1, x2 and x3, that
measure the respective levels of the products of gene 1, gene 2 and gene
3. However, this might not be the most
useful coordinate system. In particular,
if some subsets of genes are often turned on at the same time and in the same
amounts, it might be better to change to a basis where that subset gives one of
the basis vectors. Suppose for the sake
of argument, that it is usually the case that the level of the product from
gene 2 is three times that of gene 1, while the level of the product of gene 3
is half that of gene 1. This is to say
that one usually finds x2 = 3x1 and x3 =
x1. Then it
might make sense to switch from the standard coordinate bases,
1 =
,
2 =
,
3 =
,
(8.1)
to the coordinate system that uses
a basis
1,
2 and
3 where
1 =
,
2 =
2 ,
3 =
3
(8.2)
To
explain, suppose I measure some values for x1, x2 and x3. This then gives a vector,
=
= x1
1 + x2
2 + x3
3 .
(8.3)
Now, I can also write this vector
in terms of the basis in (8.2) as
= c1
1 + c2
2 + c3
3 .
(8.4)
With
1,
2 and
3 as in (8.2), the coordinates c1, c2
and c3 that appear in (8.5) are
c1 = x1, c2 = x2 – 3x1, and c3 = x3 -
x1 .
(8.5)
As a consequence, the coordinates
c2 describes the deviation of x2 from its usual value of
3x1. Meanwhile, the
coordinate c3 describes the deviation of x3 from its
usual value of
x1.
Here
is another example: Suppose now that
there are again three genes with the levels of their corresponding products denoted
as x1, x2, and x3. Now suppose that it is usually the case that
these levels are correlated in that x3 is generally very close to 2x2
+ x1. Any given set of
measured values for these products determines now a column vector
as in (8.3). A useful basis in this case would by one
where the coordinates c1, c2 and c3 has
c1 = x1, c2 = x2, and c3 = x3 – 2x2
– x1.
(8.6)
Thus, c3 again measures
the deviation from the expected values.
The basis with this property is that where
1 =
,
2 =
,
3 =
.
(8.7)
This is to say that if
1,
2, and
3 are as depicted in (8.7), and if
is then expanded in
this basis as c1
1 + c2
2 + c3
3, then c1, c2 and c3
are given by (8.6).
b) A systematic
approach: If you are asking how I know to take the basis in (8.7) to
get the coordinate relations in (8.6), here is the answer: Suppose that you have coordinates x1,
x2 and x3 and you desire new coordinates, c1,
c2 and c3 that are related to the x’s by a linear
transformation:
= A
,
(8.8)
where A is an invertible, 3´3
matrix. In this regard, I am supposing
that you have determined already the matrix A and are simply looking now to
find the vectors
1,
2 and
3 that allow you to write
= c1
1 + c2
2 + c3
3 with c1, c2 and c3
given by (8.8). As explained in the
linear algebra text, the vectors to take are:
1 = A‑1
1,
2 = A-1
2,
3 = A-1
3 .
(8.9)
To explain
why (8.9) holds, take the equation
= c1
1 + c2
2 + c3
3 and act on both sides by the linear
transformation A. According to (8.8),
the left hand side, A
, is the vector
whose top component is
c1, middle component is c2 and bottom component is c3. This is to say that A
= c1
1 + c2
2 + c3
3.
Meanwhile, the left hand side of the resulting equation is c1
A
1 + c2 A
2 + c3 A
3. Thus,
c1
1 + c2
2 + c3
3 = c1A
1 + c2A
2 + c3A
3 .
(8.10)
Now, the two sides of (8.10) are
supposed to be equal for all possible values of c1, c2
and c3. In particular, they
are equal when c1 = 1 and c2 = c3 = 0. For these choices, the equality in (8.10)
asserts that
1 = A
1; this the left most equality in (8.9). Likewise, setting c1 = c3
= 0 and c2 = 1 in (8.10) gives the equivalent of the middle equality
in (8.9); and setting c1 = c2 = 0 and c3 = 1
in (8.10) gives the equivalent of the right most equality in (8.9).
c) Dimensions: What follows is an example of how the notion
of dimension arises in a scientific context.
Consider the situation in Part a, above, where the system is such that
the levels of x2 and x3 are very nearly x2 ~ 3x1 and x3 ~
x1. This is
to say that when we use the coordinate c1, c2 and c3
in (8.5), then |c2| and |c3| are typically very
small. In this case, a reasonably
accurate model for the behavior of the three gene system can be had by simply
assuming that c2 and c3 are always measured to be
identically zero. As such, the value of
the coordinate c1 describes the system to great accuracy. Since only one coordinate is needed to
describe the system, it is said to be ‘1 dimensional’.
A second example is the system that is described by c1, c2 and c3 as depicted in (8.6). If it is always the case that x3 is very close to 2x2+x1, then the system can be desribed with good accuracy with c3 set equal to zero. This done, then one need only specify the values of c1 and c2 to describe the system. As there are two coordinates needed, this system would be deemed ‘2-dimensional’.
In general, some sort of time dependent phenomena is deemed ‘n-dimensional’ when n coordinates are required to describe the behavior to some acceptable level of accuracy. Of course, it is typically the case that the value of n depends on the desired level of accuracy.
Exercises:
1. Suppose that four genes have corresponding products with levels x1, x2, x3 and x4 where x4 is always very close to x1+ 4x2 while x3 is always very close to 2x1 + x2. Find a new set of basis vectors for R4 and corresponding coordinates c1, c2, c3 and c4 with the following property: The values of x1, x2, x3 and x4 for this four gene system are the points in the (c1, c2, c3, c4) coordinate system where c3 and c4 are nearly zero.
2. Suppose that two genes are either ‘on’ of ‘off’, so that there are
affectively, just four states for the two gene system, {++, +-, -+, --}, where ++
means that both genes are on; +- means that the first is on and the second is
off; etc. Assume that these four states
have respective probabilities
,
,
,
.
a) Is the event that the first gene is on independent from the event that the second gene
is on?
Now suppose that these two genes jointly influence the levels of two different products. The levels of the first product are given by {3, 2, 1, 0} in the respective states ++, +-, -+, --. The levels of the second are {4, 2, 3, 1} in these same states.
b) View the levels of the two products as random variables on the sample space S that
consists of {++, +-, -+, --} with the probabilities as stated. Write down the mean and standard deviations for these two random variables.
c) Compute the correlation matrix in Equation 3.6 of Handout 3 for these two random
variables to prove that they are not independent.