10. Differential equations
10.1 Ordinary differential equations
Many of the ideas of
linear algebra which we have studied in the context of Rn or Cn are applicable in a much
wider context. Mathematicians introduced
the abstract notion of a ‘vector space’, or what is a synonym, a ‘linear
space’, to describe this greater context.
Rather than look at vector spaces in the abstract, we shall look at some
examples in this and the next section which are important in the theory of
differential equations.
To start the story, let us
introduce the notion of a ‘smooth’ function:
This is a function on the line, R, that can be differentiated as often as desired. This set is traditionally denoted as C∞. For example, f(t) = 1, g(t) = t and h(t) = et
are all functions in the set C∞. Indeed, all derivatives of f vanish, all but
the first of g vanish, and the n’th derivative of h is equal to h. On the other hand, f(t) = |t| is not in C∞
since it is not differentiable at t = 0.
We have introduced C∞
as an example of a ‘vector space’. Here
is the point: If f and g are two
functions in C∞, then so is the function t ® f(t) + g(t). Moreover, if c is any real number and if f Î C∞, then
the function t ® c f(t) is also in C∞.
Thus, one can add functions in C∞ to get a new function
in C∞, and one can multiply a function in C∞
by a real number to get a new function in C∞. For example, 1 Î C∞ and
cos(3t) Î C∞,
as is f(t) º 1+cos(3t). Likewise, t and also
5t and –3.414 t are in C∞.
Addition of vectors and
multiplication of vectors are the basic operations that we have studied on Rn and here we see a huge set, C∞,
that admits these same two basic operations.
In this regard, any set with these two operations, addition and
multiplication by scalar numbers, is what is properly called a ‘vector space’
or, equivalently, a ‘linear space’.
Many of the same notions
that we introduced in the context of vectors in Rn have very precise counterparts in the context of
our linear space C∞.
What follows are some examples of particularly relevance to what we will
do in the subsequent subsections.
∑
Subspaces: Any polynomial function, t ® an tn
+ an-1tn-1 + · · · + a0 is infinitely
differentiable, and so is in C∞. Here, each ak is a real (or
complex) number. The set of all
polynomials form a subset, P Ì C∞, with two important
properties: First, if f(t) and g(t) are
in P, then so is the function t ® f(t) + g(t).
Second, if c is a real number and f(t) is a polynomial, then the
function t ® cf(t) is a polynomial. Thus, the
sum of two elements in P is also in P and the product of a real number with an
element in P is in P.
If
you recall, as subset V Ì Rn of vectors was called a ‘subspace’ if it had the
analogous two properties: Sums of
vectors in V are in V, and any real number times any vector in V is V. It is for this reason that a set, such as P,
is called a ‘subspace’ of C∞.
Thus, a subspace is a subset that has the two salient properties of a
vector space.
∑
Linear
independence: Can you find two constants, c1
and c2, that are not both zero and such that the function t ® c1 + c2t is zero for all values of t? A moment’s reflection should tell you that
there if c1 + c2t is zero for all values of t, then c1
must be zero (try setting t = 0), and also c2 must be zero (then set
t = 1).
If
you recall, a set, {v1, . . . , vk}, of vectors in Rn was said to be ‘linearly independent’ in the case
that c1 = c2 = ··· = ck = 0 are the only
values for a collection of constants {c1, . . ., ck} that
makes c1v1 + c2v2 + ··· + ckvk
= 0.
By
the same token, functions {f1, . . ., fk} Ì C∞ are said to be linearly independent in the case that
c1 = c2 = ··· = ck = 0 are the only values
for a collection of constants {c1, . . ., ck} that makes
c1 the only choice for constants {c1, … , ck}
that make the function t ® c1 f1(t) + · · · + ck
fk(t) = 0 for all t. Note that I have underlined the notion that
this sum is supposed to vanish for all choices of t, not just some
choices. For example, the functions 1
and t are linearly independent, even though the function t ® 1+t is zero at t = -1.
To
get a feeling for this notion, do you think that the functions {1, t, . . . , tn}
for any given non-negative integer n forms a linearly independent set? If you said yes, then you are correct. Here is why:
Suppose t ® p(t) º c0 + c1t + ··· + cntn is
zero for all t in the case that c0, …, cn are all
constant. If such is the case, then p(0)
must be zero, and so c0 is zero.
Also, the derivative of p(t) must be the zero function (since p is) and
this derivative is c1 + 2c2t + ··· + n cn tn-1. In particular, p´(0) must be zero and so c1
= 0 too. Continuing in this vein with
the higher derivatives finds each successive ck = 0.
A
collection of functions that is not linearly independent is called ‘linearly
dependent’.
∑
Linear
transformation: If f(t) Î C∞, then
we can define a new function that we will denote as Df by taking the derivative
of f. Thus,
(Df)(t) = f´(t) =
(t)
For example, D(sin(t)) =
cos(t).
Because
we can take as many derivatives of f as we like, the function t ® (Df)(t) is a smooth
function. Moreover, D has two important
properties:
a) D(f + g) = Df + Dg no
matter the choice for f and g.
b) D(cf) = cDF if c is a constant.
If
you recall, a transformation of Rn with the analogous two properties was called a
‘linear transformation’. By analogy, we
call D a linear transformation of C∞. Equivalently, we say that D is ‘linear’.
Here
is another example: Set D2f
to denote the function t ® f´´(t). Then
D2 is also linear. In
general, so is Dn where Dnf takes the n’th
derivatives. Furthermore, so is the
transformation that sends f to the function anDnf + an-1Dn-1f
+ · · · + a0f in the case that the collection a0, …, an
are constants. In fact, such is the case
even if each ak is a fixed function of t. In this regard, be sure to define this
transformation so that the same collection {t ® ak(t)} is used
a f varies in the set C∞.
∑
The kernel
of a linear transformation: The kernel
of a linear transformation such as D is the set of functions, f Î C∞, such
that (Df)(t) = 0 for all values of t.
This is to say that Df is the zero function. In the case of D, a function has everywhere
zero first derivative if and only if it is constant, so ker(D) consists of the
constant functions.
For
a second example, consider D2.
What is the kernel of D2?
Well a function whose second derivative is zero everywhere must have
constant first derivative. Thus, D2f
= 0 if and only if f´ = c1 with c1 a constant. But, a function with constant first
derivative must have the form f = c0 + c1t, where c0
is also constant. Arguing in this manner
finds that the kernel of D2 consists of all functions of the form {c0+c1t}
where c0 and c1 can be any pair of constants.
Note
that the kernel of a linear transformation is always a linear subspace.
∑
The image of
a linear transformation: A function,
say t ® g(t),
is said to be in the ‘image’ of D if there is a smooth function f that obeys
(Df)(t) = g(t) at all t. Thus, g is in
the image of D if g has an anti-derivative that is a smooth function. Now, every function has an anti-derivative,
and the anti-derivative is smooth if the original function is. To explain, if f´ = g and I can take as many
derivatives as I like of g, then I can take as many as I like of f and all are
smooth. In particular, because f´ = g,
there is a first derivative. Moreover,
any n ≥ 1 derivatives of f is n+1 derivatives of g.
With
the preceding understood, the image of D is the whole of C∞. Indeed, you give me any g(t) and I’ll take
the corresponding function f to be
t ®
g(s) ds
By
the way, I hope that it strikes you as passing strange that I have exhibited a
linear transformation from C∞ to itself whose image is the
whole of C∞ but whose kernel is non-zero. A little thought should convince you that a
linear transformation of Rn whose image is Rn must have trivial kernel. This novel phenomena is a manifestation of
the fact that C∞ is what is rightfully called an ‘infinite
dimensional’ space. More is said on this
below.
∑
Basis, span
and dimension: As I argued previously, the
kernel of D2 consists of all functions of the form c0+c1t
where c0 and c1 are constants. Thus, the kernel of D2 consists of
linear combinations of the two functions, 1 and t. These are then said to ‘span’ the kernel of D2
and as they are linearly independent, they are also said to give a ‘basis’ for
the kernel of D2. As this
basis has two elements, so the kernel of D2 is said to be
2-dimensional.
As
another example, the subspace, P3, of polynomials of degree three or
less consists of all functions of the form t ® c0 + c1t
+ c2t2 + c3t3 where each ck
is a constant. Now, the functions {1, t,
t2, t3} are linearly independent, and they span P3
in the sense that every element in P3 is a linear combination from
this set. Since there are four of the
functions involved, the subspace P3 is said to be
4-dimensional.
In
general, if V is a subspace, n Î {0, 1, … } and {f1, . . ., fn}
is a set of linearly independent functions that span V, then V is said to be
n-dimensional. To be precise here, a set
{f1, … , fn} of functions in V, whether linearly
independent or not, is said to span V if any given function g(t) Î V can be written as g(t)
= c1f1(t) + · · · + cnfn(t) where
each ck is a constant.
A
subspace such as the space of all polynomials is said to be infinite dimensional
if it has arbitrarily large subsets of linearly independent functions. It is in this sense that C∞
itself is infinite dimensional.
Any
linear operator on C∞ that takes any given f(t) to some linear
combination of it and its derivatives is an example of a ‘linear differential
operator’. In fact, the general, form
for a linear differential operator is a linear transformation, f ® Tf, of C∞
that sends any given f(t) to the function
(Tf)(t)
= an(t) (Dnf)(t) + an-1(t) (Dn-1f)(t)
+ · · · + ao(t) f(t) ,
where each ak is some smooth
function. If each ak is constant, then T is said to be a ‘constant
coefficient’ differential operator.
Granted that an ≠ 0, then T is said to have ‘order
n’. For example, D2 is an
such an example of order 2. Here is
another:
(Tf)(t)
= f´´(t) + 3f´(t) - 2f(t) .
There is more arcane
vocabulary to learn here. If T is a
linear differential operator and if one is asked to ‘find the general solution
to the homogeneous equation for T’, then one is being asked to find the kernel
of T, thus all functions f(t) such that (Tf)(t) = 0 at every t. On the other hand, if g(t) is some given
function and one is asked to solve the ‘inhomogeneous equation Tf = g’, this
means you should find all functions f such that (Tf)(t) = g(t).
Here is an example: Suppose that you are asked to find all
solutions to the inhomogeneous equation D2f = et. You would answer: The general solution has the form f(t) = et
+ c0 + c1t, where c0 and c1 are
constants.
By the way, this last
example illustrates an important fact:
Fact 10.1.1: If T is a given differential operator, g(t) a given function, and f0 some solution to the inhomogeneous equation Tf = g, then any other solution to this equation has
the form f(t) = f0(t) + h(t) where
h is a function from the kernel of
T. That
is, Th = 0.
This fact has a mundane proof: If f is also a solution, then T(f – f0)
= Tf – Tf0 = g – g = 0, so it is necessarily the case that f – f0
is in the kernel of T. Even so, Fact
10.1.1 is quite useful, since it means that once you find the kernel of T, then
you need only find a single inhomogeneous solution to know them all.
The
task of finding an element in the kernel of a generic differential operator, or
solving an associated inhomogeneous equation can be quite daunting. Often, there is no nice, closed form
algebraic expression for elements in the kernel, or for the solution to the
inhomogeneous equation. Even so, there
are some quite general ‘existence’ theorems that tell us when, and how many,
solutions to expect. For example,
consider the following:
Fact 10.1.2: Suppose
that T is a differential operator
that has the form
(Tf)(t) =
(t) + an-1(t)
(t) + · · · + a1(t)
(t) + a0(t)f(t)
where a0, … , an-1(t)
are smooth functions. Then, the kernel of T has dimension n. Moreover,
if g(t) is any given function, then
there exists some f(t) such that
Tf = g.
Of course, this doesn’t tell us what the
solution to the equation Tf = g looks like, but it does tell us that there is a
solution whether or not we can find it explicitly.
Unfortunately,
the proof of this fact takes us beyond where we can go in this course, so you
will just have to take it on faith until you take a more advanced mathematics
course.
Although
it is no simple matter to write down the kernel of your generic differential
operator, the situation is rather different if the operator has constant
coefficients. In this case, the kernel
can be found in a more or less explicit form.
To elaborate, lets suppose that the operator in question, T, has the
form
T
= Dn + an-1Dn-1 + · · · + a1D + a0 ,
where each ak is now a
constant. Our goal now is to find all
functions f(t) such that (Tf)(t) = 0.
Thus, f(t) must solve
(t) + an-1
(t) + · · · + a1
(t) + a0f(t) = 0.
Consider first the case where n = 1 in which
case we are looking for functions f(t) that obey the equation f´ + a0f
= 0. We can write this equation as
= -a0 dt,
and integrate both sides to find that
ln(f(t)) = -a0t + c, where c can be any constant. Thus, the general solution is
f(t)
= b
where b Î R .
Thus, the kernel is 1-dimensional as
predicted by Fact 10.1.2. As described
below, such exponential functions also play a key role in the n > 1
cases.
To
analyze the n > 1 cases, let us recall Fact 7.5.2: The polynomial
l ® p(l) = ln + an-1ln-1 + · · · + a0
always factorizes as
p(l) = (l - ln)···(l - l1) ,
where each lk is a complex number. In this regard, keep in mind that a given
complex number can appear as more than one lk. Also, keep in mind that if a given lk is complex, then its complex
conjugate appears as some lj with j ≠ k.
In any event, the following summarizes the n > 1 story:
Fact 10.1.3: In the case that the
numbers {l1, …, ln} are distinct, then the kernel of T consists of linear combinations with constant coefficients of the real
and imaginary parts of the collection {
}1≤k≤n. To be
more explicit, write each lk as
lk = ak+ibk with ak and bk real. In the case where the {lk} are distinct, the kernel ofT
is spanned by the functions in the set {
cos(bkt),
sin(bkt)}1≤k≤n . In the
general case, introduce mk to
denote the number of times a given lk appears in the set {lj}1≤j≤n. Then
the kernel of T is spanned by the
collection {pk(t)
cos(bkt), pk(t)
sin(bkt)} where
pk(t) can be any polynomial of
from zero up to mk-1.
For
example, consider the case where T = D2 –2D +3. The resulting version of p(l) is the polynomial l2 – 2l + 3, and the latter
factorizes as (l+3)(l-1). According to Fact 10.1.3, the kernel of T is
spanned by {e-3t, et}.
You can check yourself that both are in the kernel. They are also linearly independent. Indeed, you can see this because et
gets very large as t ® ∞ and e-3t goes to zero as t ® ∞. Because Fact 10.1.2 tells us that the kernel
is 2-dimensional, we therefore know that they must span the kernel also.
Another
example is the case that T = D2+1.
The corresponding polynomial is the function p(l) = l2+1. This one factorizes as (l+i)(l-i). Thus, its roots are ±i, and so Fact 10.1.3
asserts that the kernel is spanned by {cos(t), sin(t)}. Since the second derivative of cos(t) is
–cos(t), it is certainly the case that D2cos(t) + cos(t) = 0. Likewise, sin(t) is in the kernel of D2+1
since the second derivative of sin(t) is –sin(t). These are linearly independent as can be seen
from the following argument: If c1
cos(t) + c2 sin(t) = 0 for all t with c1 and c2
constant, then this is true at t = 0.
But, at t = 0, cos(t) = 1 and sin(t) = 0, so c1 = 0. but then c2 = 0 also. According to Fact 10.1.2, the dimension of
the kernel of D2+1 is 2, so {cos(t), sin(t)} must span the kernel.
Here
is a third example: Take T = D3-3D2+3D-1. In this case, the corresponding polynomial p(l) is (l-1)3. There is only one root here, l = 1, and it appears with
multiplicity 3. According to Fact
10.1.3, the kernel should be generated by the collection of functions {et,
t et, t2et}.
This is to say that every element in the kernel has the schematic form
f(t)
= c1et + c2tet + c3t2et
= (c1+c2t+c3t2) et
where c1, c2 and c3
are constants. You are invited to take
the prescribed derivatives to verify that (Tf)(t) = 0 for all t. Even so, here is what might be an easier way
to do this: First, exploit the
factorizing of p(l) as (l-1)3 to audaciously write
Tf
= (D-1)(D-1)(D-1)f
Now, note that (D-1)f = (c2 + 2c3)
t) et. This being the case,
then (D-1)(D-1)f = 2c3 et. Finally, (D-1)(D-1)(D-1)f = 2c3(D-1)et,
and this is zero because the derivative of et is et.
By
the way, these three examples illustrate two important points, and also
indicate how to prove Fact 10.1.3. These
two points are discussed first, and then the proof of Fact 10.1.3 is sketched.
∑
If {l1, l2, …} is any finite or
infinite collection of real numbers with no two the same, then the functions in
the corresponding collection
{
,
, ··· }
are linearly independent. This is
to say that if {c1, c2, · · ·, ck} are any
finite collection
of constants and if
c1
+ c2
+ · · · ck
= 0
then c1 = c2
= · · · = ck = 0. To prove
that such is the case, just consider the largest number from the collection {l1, . . . , lk}. Call it l. Then
as t ®
∞, all of the terms in the sum of exponential function are very much
smaller than elt, and so its corresponding constant must be
zero. This understood, go to the next
largest number from {l1, . . . , lk} and make the same
argument. Continue until sequentially
until all l’s are
accounted for.
∑
If {l1 = a1+ib1, l2 = a2+ib2, …} are any finite or
infinite collection of complex numbers with no two the same, then the functions
in the collection
{
cos(b1t),
sin(b1t),
cos(b2t),
sin(b2t), ··· }
are linearly independent
in the sense that no linear combination of any finite subset from this
collection will vanish at all t unless the constants involved are all
zero. Indeed, the fact that functions
with different a’s are linearly independent is argued just as in the previous point, by
looking at how they grow as t ® ∞.
The argument in the general case is more involved and so will not be
presented.
What
follows are some remarks that are meant to indicate how to procede with a
rigorous proof of Fact 10.1.3 in the general case. To start the story, remember that the
operator T is Dn + an-1Dn-1 + ··· + a0,
and so determines a corresponding polynomial p(l) = ln + an-1ln-1 + · · · + a0. Suppose that some real number, r, is a root
of this polynomial. Thus, p(r) = 0. Since the derivative of ert is r ert,
so Dkert = rk ert for any given
non-negative integer k. As a consequence,
T(ert) = rnert + an-1rn-1ert
+ ··· + a0ert = p(r) ert = 0 for all t. Thus, we see that each real root of p(l) determines a
corresponding exponential function in the kernel of T.
Now
suppose that h is a complex root of p(l). In
this regard, remember that the complex conjugate
is also a root of
p. Also, recall from Section 9.2 that
the derivative of the complex number valued function t ® eht is h eht. Thus, Dkeht = hk eht for any given
non-negative integer k. Now, write h = a + ib, where a and b are real, and remember
that eat cos(bt) =
(eht+
). Thus, the k’th
derivative of eat cos(bt) is
(hkeht+
k
). As a consequence,
T(eat cos(bt) =
(p(h) eht + p(
)
) = 0 for all values of t.
Since eat sin(bt) =
(eht-
), the same sort of argument proves that T(eat sin(bt)) = 0 for all t as
well. This then proves that every
complex conjugate pairs {h,
} of roots of p(l) determines a corresponding pair, {eat cos(bt), eat sin(bt)} of linearly
independent functions in the kernel of T.
Having
digested the contents of the preceding two paragraphs, you are led inevitably
to the conclusions of Fact 10.1.3 in the case that p(l) has n distinct
roots. Of course, this is predicated on
your acceptance of the assertion in Fact 10.1.2 that the kernel of T is
n-dimensional. It is also predicated on
your acceptance of the conclusions in the second point three paragraphs back
about linear independence.
The
argument for the case when some real or complex number occurs more than once in
the collection {l1, . . ., lk} is based on the
following observation: The derivative of
tk eht is ktk-1eht + heht. As a consequence, tkeht is a solution to the
inhomogeneous equation
(D-h)f = k tk-1eht .
By the same token,
(D-h)(D-h)(tkeht) = k(k-1) tk-2eht .
Now, if we just iterate these last
observations, we find that acting sequentially q times by (D-l) on tkeht gives
(D-h)q(tjeht) = k(k-1)···(k-q+1) tk-q
eht if q ≤ k and (D-h)q(tkeht) = 0 if q > k.
With
the preceding in mind, suppose that some given real or complex number, h, is a root p(l) that occurs some q times
in the collection {l1, . . ., lk}. Let us renumber this list so that the last q
of them are the ones that are equal to h. If
we are willing to take the audacious step of factorizing the operator T by writing
T = (D-l1)···(D-ln-q)(D-h)q ,
we see that (Tf)(t) = 0 if f(t) is any linear
combination from the set {eht, t eht, … , tq-1ent}. Indeed, this is because we have learned from
the preceding paragraph that any such linear combination is already sent to
zero by the factor (D-h)q. As before, the real
and complex parts of any such linear combination must also be sent to zero by
T. Thus, since the real part of tkeht is tkeat cos(bt) and the imaginary part
is tkeat sin(bt), we are led to Fact 10.1.3 for the cases
when the collection of roots of p(l) contains repeated values.
By the way, having just
read the preceding two paragraphs, you now have every right to be nervous about
‘factorizing T’ by manipulating D as if it were just a ‘number’ or a variable
like l
rather than the much more subtle object that says ‘take the derivative of what
ever is in front of me’. You will have
to trust me when I say that this sort of outrageous move can be justified in a
very rigorous way.
When
using differential equation solutions to predict the future from present data,
one can run into a problem of the following sort: Find all solutions to the differential
equation Dnf + an-1Dn-1f + ··· + aof
= 0 where the value of f and certain of its derivatives are prescribed at fixed
times. For example, find all solutions
to D2f - 2Df + 2 = 0 that obey f(0) = 1 and f(
) = 2. This sort of
problem is solved by first using Fact 10.1.3 to write the most general
solution, and then searching for those that obey the given fixed time
conditions. In the example just given,
an appeal to Fact 10.1.3 finds that the general solution has the form
f(t)
= a e-t cos(t) + b e-t sin(t)
where a and b can be any constants. This understood, then the condition f(0) = 1
requires that a = 1 but does not constrain b at all. Meanwhile, the condition that f(
) = 2 demands that b = 2 eπ/2. Therefore, the solution to this particular
constrained differential equation problem is f(t) = e-t cos(t) + 2eπ/2
e-t sin(t).
As
second example using the same equation D2f –2Df + 2 = 0 asks for all
solutions with f´(0) = 0. This condition
reads –a +b = 0. Thus, all solutions to
this constrained problem have the form f(t) = a e-t (cos(t) +
sin(t)) where a is any constant.
With
regards to these constrained problems:
Conditions that are demanded on f or its derivatives at t = 0 are
usually called ‘initial conditions’.
Here are some key notions to remember from the discussion in 10.1:
∑ The space C∞ as a vector space.
∑ Linear dependence and linear independence for a set of functions from C∞.
∑ The formula for the general solution of the equation Dnf + an-1Dn-1f + ··· + aof = 0 in the case that each ak is a constant.
∑ How to find the solution to Dnf + an-1Dn-1f + ··· + aof = 0 that obeys some constraints on the values of f and certain of its derivatives at certain prescribed times.
1. Which of the following are subspaces of C∞?
a) All continuous functions from R to R.
b) All f Î C∞ such that f(0) + f´(0) = 0.
c) All f Î C∞ such that f + f´ = 0.
d) All f Î C∞ such that f(0) = 1.
2. Which of the following subsets of C∞ consists of linearly independent functions?
a) 1, t, t2, t3et .
b) 1+t, 1-t, t2, 1+t+t2.
c) sin(t), et, etsin(t).
d) sin(t), cos(t), sin(t+
).
3. Which of the following maps are linear?
a) T: C∞ ® R given by T(f) = f(0).
b) T: C∞ ® C∞ given by T(f) = f2 + f´.
c) T: C∞ ® R2 given by T(f) = (f(0), f(1)).
d) T: C∞ ® R given by T(f) = ò0≤t≤1 f(t) dt.
4. Find a basis for the kernel of T: C∞ ® C∞ given by T(f) = f´´ + f´ - 12f and then find a
smooth function that obeys the three conditions T(f) = 0, f(0) = 0 and f´(0) = 1.
5. Find a basis for the kernel of T: C∞ ® C∞ given by T(f) = f´´ + 2f´ + 2f and find a
smooth function that obeys the three conditions T(f) = 0, f(0) = 1 and f(1) = 1.
6. Find a basis for the kernel of T: C∞ ® C∞ given by T(f) = f´´ + 6f´ + 9f and find a
smooth function that obeys the three conditions T(t) = 0, f´(0) = 1 and f(1) = 0.
7. Find a basis for the kernel of T: C∞ ® C∞ given by T(f) = f´´ + f(0).
8. Find a basis for the image of T: C∞ ® C∞ given by T(f) = f(0) + f´(0)t + (f(0)+f´(0))t2.
9. Explain why the
equation t f´(t) = 1 has no solutions in C∞.
10. Let T(f) = t2 f´(t) + 2t f(t).
a) Suppose that T(f) = 0. If g(t) = t2 f(t), explain why g´(t) = 0.
b) Explain how to use the conclusions from a) to prove that kernel(T) = {0}.
c) Explain why the constant function 1 is not in the image of T.
10.2 Fourier series
In the preceding section, we looked at spaces of functions that behaved much like vectors in Rn, but we did not look at any analogues of the concept of length, angle or dot product. In this section, we will discuss an example in where these analogs are introduced and play a central role.
To set the stage, recall that if a and b are real numbers with a < b, then [a, b] denotes the interval in R of points x with a ≤ t ≤ b. Note that the end points of the interval are included.
Now
introduce the notation C[-π, π] to denote the collection of all
continuous functions from the interval [-π, π] to R. For example, t, sin(t), |t|, and
are in C[-π,
π]. The last of these illustrates
the fact that we only care about the values when t has values between –π
and π. since 4 > π, the
fact that
blows up as t ® 4 has no bearing on its appearance in the space
C[-π, π]. On the other hand
is not in C[-π,
π] since it is not defined at the point t = 2 which is in the interval
between –π and π. Here is a
completely bounded and well defined function that is not in C[-π,
π]: The function f(t) that is
defined to be 1 where t > 0, 0 at t = 0 and –1 where t < 0. The jump discontinuity of f as t crosses zero
precludes its membership in C[-π, π].
As with C∞, the collection C[-π, π] is a linear space. Indeed, if t ® f(t) and t ® g(t) are in C[-π, π], then so is the function t ® f(t)+g(t) as is t ® r f(t) in the case that r is a real number.
We now define the analog of an dot product on C[-π, π]. For this purpose, let f(t) and g(t) be any two continuous functions that are defined where –π ≤ t ≤ π. Their dot product is then denoted by áf, gñ, a number that is computed by doing the integral
áf, gñ
º ![]()
f(t)g(t) dt .
I hope to convince you that this has all of the salient features of the dot product on Rn. For example
∑ áf, gñ = ág, fñ.
∑ If r is a real number, then ár f, gñ = r áf, gñ.
∑ If f, g and h are any three functions in C[-π, π], then áf + g, hñ = áf, hñ + ág, hñ.
∑ If f is not the constant function 0, then áf, fñ > 0.
I’ll leave it to you to verify the first three. To verify the fourth, notice first that
áf, fñ
= ![]()
f(t)2 dt .
Now, the t ® f(t)2 is non-negative, so the integral for áf, fñ computes the area under the graph in the (t, y) plane of the function y = f(t)2. Now, as f(t)2 is non-zero at some point (since f is not the constant function 0), this graph rises above the axis at some point. Since f is continuous, it rises above nearly as much at nearby points as well. Thus, there is some area under the graph, so áf, fñ > 0.
For example, if you remember how to integrate t sin(t), you will find that the dot product between the functions t and sin(t) is
át, sin(t)ñ
= ![]()
t sin(t) dt = 2.
(If you forgot how to integrate t sin(t), here is a hint: Think about integration by parts.)
For another example, the dot product between the constant function 1 and the function sin(t) is given by
á1, sin(t)ñ
= ![]()
sin(t) dt =
(-cos(π) + cos(-π)) = 0 .
By analogy with the case of vectors
in Rn, we say that a pair of functions f and g from
C[-π, π] are ‘orthogonal’ in the case that áf, gñ = 0. Thus, 1 and sin(t) are orthogonal, but t and
sin(t) are not.
Just as we defined the length of a vector in Rn using the dot product, so we define the length of any given function f Î C[-π, π] to be
![]()
The length of f is denoted here and elsewhere as || f ||, and this number is called the ‘norm’ of f. By analogy with the case of Rn, we define the distance between functions f and g from C[-π, π] to be
||
f – g || = ![]()
Thus, the square of the distance between f and g is equal to
![]()
(f(t)-g(t))2 dt .
According to this definition of distance, f is close to g in the case that f(t) is close to g(t) for all t Î [-π, π]. However, be forwarned that this definition doesn’t require that f(t) be close to g(t) at every t; only that they be suitably close for ‘most’ values of t. You will see this in the third and fourth examples below.
Here are some examples of norms and distances:
∑
The constant function 1 has norm
|| 1 || = √2 since this is the square root of
times the length of
[-π, π].
∑ The the square of the norm of the function t is
át, tñ
= ![]()
t2 dt =
π2 .
Thus, the norm
of t is || t || =
π.
∑ Let R be a positive real number. Then the distance between the function f(t) = t and the function g(t) = t + e-R|t| is the square root of
![]()
e-2R|t| dt = ![]()
(1-e-2Rπ)
.
Note in particular that the larger the value of R, the smaller the distance and as R ® ∞, the distance in question limits to zero. Even so, |f(0) – g(0)| = 1 no matter how large R.
∑ Let R be a positive real number. Then the distance between the function f(t) = t and the function g(t) = t + R1/4 e-R|t| is the square root of
![]()
(1-e-2Rπ)
.
Note that in this variation of the previous example, the distance between f and g again limits to zero as R ® ∞, even though |f(0) – g(0)| = R1/4 now blows up as R ® ∞. The point here and in the previous example is that two functions in C[-π, π] can be close and still have widely different values at some t. As remarked previously, their values need only be suitably close at most t Î [-π, π]. (I can’t criticize you for thinking that this phenomena illustrates a serious defect in our notion of distance. The fact is that for some uses, other notions of distance are necessary for this very reason.)
Granted now that we have a notion of dot product for the linear space C[-π, π], then we can introduce the notion of an ‘orthonormal’ set of functions. This notion is the analog of the notion of orthonormality that we used for vectors in Rn. In particular, a finite or infinite collection {f1, f2, …} of functions is deemed ‘orthonormal’ in the case that
|| fk || = 1 for all k and áfj, fkñ = 0 for all unequal j and k.
For example, the constant function
and the function ![]()
t comprise a two element orthonormal set. Indeed, the computations done previously for
the norms of 1 and t justify the assertion that these two functions both have
norm 1. Meanwhile, to see that these two
functions are orthogonal, first note that the dot product between 1 and t is
times the integral of
t from –π to π. Then note that
the latter integral is zero since it is the difference between the values of
t2 at t = π and t = -π. Here is another example: The functions in the set
{
, ![]()
t, ![]()
(t2 -
π2)}
is also orthonormal.
You most probably will recognize the following facts as C[-π, π] analogs of assertions that hold for vectors in Rn:
∑ If {f1, f2, …, fN} is an orthonormal set, then they are linearly independent and so form a basis for their span.
∑ If h and g are orthogonal functions in C[-π, π], then || h ± g ||2 = || h ||2 + || g ||2.
∑ Suppose that V is a subspace of C[-π, π] and that f Î C[-π, π]. If g is in V and if f – g is orthogonal to all functions in V, then || f – g || ≤ || f – h || if h is in V. Moreover, this inequality is an equality only in the case that h = g.
∑ If {f1, . . ., fN} is an orthonormal basis for a subspace V Ì C[-π, π] and if f is any function in C[-π, π], then the function in V we call projV(f) that is given by
projV(f)(t) = áf, f1ñ f1(t) + · · · + áf, fNñ fN(t)
is the closest function in V to f. Thus, f – projVf is orthogonal to each element in V.
∑
If V Ì
C[-π, π] is a finite
dimensional subspace, then V has an
orthonormal basis.
The arguments for these last facts are essentially identical to those that prove the Rn analogs. For example, to prove the first point, assume that g(t) º c1f1(t) + ··· + cNfN(t) is zero for all t Î [-π, π] where c1, . . . , cN are constants. Now take the dot product of g with f1 to find 0 = áf1, gñ = c1 áf1, f1ñ + c2 áf1, f2ñ + ··· + cN áf1, fNñ. Now, because of the orthonormality, this equality boils down to 0 = c1 1 + c2 0 + ··· + cN 0, so c1 = 0. Take the dot product of g with f2 to find that c2 is zero, then f3, etc.
As a second example, here is how to prove the final point: The first thing to note is that it suffices to prove that f – projVf is orthogonal to every function in V. Indeed, if this is the case, then the version of the second point above with h = f – projVf and g any function in V proves that projVf is the closest function in V to f if f – projVf is orthogonal to every function in V. In any event, f – projVf is orthogonal to every function in V if and only if it is orthogonal to every basis function, that is each of f1, . . ., fN. Computing the dot product of f with any given fk finds áfk, fñ, and this is precisely the same as the dot product of fk with projVf. Thus, the dot product of any given fk with f – projVf is zero.
With
regards to the final point, you won’t be surprised to learn that the
Gram-Schmid algorithm that we used in the case of Rn to
find an orthormal basis works just fine in the case of C[-π, π]. For example, the linear span of the functions
1 and t2 is a 2-dimensional subspace of C[-π, π]. Indeed, if c1 + c2t2
is zero for all t with c1 and c2 constant, then it is
zero at t = 0 and so c1 = 0.
It is also zero at t = 1, and so c2 = 0 as well. To find an orthornormal basis, we first
divide the constant function 1 by its norm to get a function with norm 1. The latter is
. Next, we note that t2
- á
, t2ñ
= t2-
π2 is orthogonal to
. Thus, we get an
orthonormal basis for the span of {1, t2} by using
as the first basis
element, and using for the second the function that you get by dividing the
function t2-
π2 by the square root of the integral from –π
to π of (t2-
π2)2
Left unsaid in the final point above is whether any given infinite dimensional subspace of C[-π, π] has an orthonormal basis. The answer depends to some extent on how this question is interpreted. In any event, the next fact asserts that C[-π, π] itself has an infinite orthonormal basis. Moreover this basis ‘spans’ C[-π, π] in a certain sense that is explained below. The fact is that C[-π, π] has many such basis, but only the most commonly used one is presented below.
Fact 10.2.1: The
collection {
, cos(t), sin(t), cos(2t), sin(2t), cos(3t), sin(3t), · · · }
is an orthormal set of functions in C[-π,
π].
This fact is proved by verifying that the following integrals have the asserted values:
∑
á
,
ñ = ![]()
![]()
dt = 1.
∑
á
, cos(nt)ñ = ![]()
![]()
cos(nt) dt = 0 for any
n ≥ 1.
∑
á
, sin(nt)ñ = ![]()
![]()
sin(nt) dt = 0 for any n ≥ 1.
∑
ácos(nt),
cos(nt)ñ = ![]()
cos2(nt) dt = 1 for any n ≥ 1.
∑
ásin(nt),
sin(nt)ñ = ![]()
sin2(nt) dt = 1 for any n ≥ 1.
∑
ácos(nt),
sin(mt)ñ = ![]()
cos(nt) sin(mt) dt = 0 for
any n and m.
∑
ácos(nt),
cos(mt)ñ = ![]()
cos(nt) cos(mt) dt = 0 for
any n ≠ m ≥ 1.
∑
ásin(nt),
sin(mt)ñ = ![]()
sin(nt) sin(mt) dt = 0 for
any n ≠ m ≥ 1.
To explain the sense in which the basis in Fact 10.2.1 spans C[-π, π], let me introduce, for each positive integer N, the subspace TN Ì C[-π, π] that is given by the span of
{
, cos(t), sin(t), · · · , cos(Nt), sin(Nt)} .
If f is any given function in C[-π, π], one can then define the projection of f onto TN. This is the function
proj
f º a0
+ a1 cos(t) + b1 sin(t) + · · · + aN
cos(Nt) + bN sin(Nt) ,
where
a0
= ![]()
![]()
f(t) dt, ak
= ![]()
cos(kt) f(t) dt, and
bk = ![]()
sin(kt) f(t) dt .
With this notation set, here is what I mean by ‘span’:
Fact 10.2.2: Let f be any function in C[-π, π]. Then limN®∞ || f - proj
f || =
0.
Moreover, if the
derivative of f is defined and
continuous, then limN®∞ (proj
f)(t) = f(t) if t lies
strictly between π and –π.
This assertion also holds at t = π and at t = -π in the
case that f(π) = f(-π). In any event, whether f is or is not differentiable, the infinite
series
a02 +
a12 + b12 + · · · + ak2
+ bk2 + · · · is
convergent and its limit is || f ||2 = ![]()
f(t)2 dt.
By virtue of Fact 10.2.2, one often sees a given function f Î C[-π, π] written as
f(t)
= a0
+
(ak cos(kt) + bk sin(kt)) ,
where the collection {ak, bk} are given just prior to Fact 10.2.2. Such a representation of f is called its ‘Fourier series’ after the mathematician who first introduced it, Jean-Baptiste-Joseph Fourier. (Fourier was born in 1768 and lived until 1830.)
In any event, the Fourier series for a given function f exhibits f as a sum of trigonometric functions and Fact 10.2.2 asserts the rather remarkable claim that every continuous function on the interval [-π, π] can be suitably approximated by such as sum.
The proof
of Fact 10.2.2 is subtle and, but for the next remark, goes beyond what we will
cover in this course. If the series a0
+ a1 cos(t) + b1 sin(t) + ··· is
convergent at each t with limit f(t), then the convergence of the infinite
series a02 + a12
+ b12 + ··· is an automatic consequence of the fact that
the collection {
, cos(t), sin(t), ···} is an orthonormal set of
functions. To see why, take some large
integer N and write
f
= proj
f + (f - proj
f) .
Now, as discussed earlier, the two terms on the right hand side of this equation are orthogonal. This then means that
|| f ||2 =
|| proj
f ||2 + || f - proj
f ||2 .
By virtue of the fact that {
, cos(t), sin(t), ···}
is orthonormal, the first term on the right hand side of this last
equation is a02 + a12 + b12
+ · · · + aN2 + bN2. As a consequence, we see that
||
f ||2 = a02 + a12 + b12
+ ··· + aN2 + bN2 + || f - proj
f ||2 .
Thus, under the assumption that the limit as N ® ∞ of the far right term above is zero, we then have our derivation of the asserted limit for the infinite sum a02 + a12 + b12 + · · · .
Here are some examples:
∑
t
= 2
(-1)k+1
sin(kt) .
∑
t2 =
π2 + 4
(-1)k
cos(kt) .
∑
et =
(eπ – e-π)[
+
(-1)k(
cos(kt) -
sin(kt))].
As you can see, the Fourier series of some very simple functions have infinitely many terms.
When looking at the first example above, what do you make of the fact that π is definitely not zero, but sin(kπ) is zero for all k? In particular, the asserted ‘equality’ between the right and left hand sides in the first example is definitive nonsense at t = π. Even so, this does not violate the assertion of Fact 10.2.2 because the function t obviously does not have the same value at π as it does at –π. With regards to Fact 10.2.2, the equality in the first example holds only in the following sense:
limN®∞ ![]()
(t - 2π
(-1)k+1
sin(kt))2 dt
= 0.
Thus, the equality in the first point holds at ‘most’ values of t in [-π, π], but not at all values of t.
Contrast
this with the equality between t and its Fourier series at
. According to Fact
10.2.2, the equality does indeed hold here, and so we obtain the following
remarkable equality:
= 1 -
+
- ··· .
Other fantastic sums can be had by evaluating the right hand side of the equality between t2 and its Fourier series at some special cases. For example, the respective t = 0 and t = π cases yield
π2 = 1 -
+
-
+ ··· and
π2 = 1 +
+
+
+ ··· .
By the way, the second of these equalities is equivalent to
the assertion in Fact 10.2.2 that the value,
π2, of || t ||2 is equal to the
sum of the squares of the coefficients that appear in front of the various
factors of sin(kt) in the Fourier series expansion given above for t.
Here are the key notions to remember from 10.2:
∑
The space C[-π, π] has
an dot product whereby the dot product of any given two functions f and g is
equal to ![]()
f(t)g(t) dt . This is
denoted by áf, gñ.
∑ The norm of a function f is áf, fñ1/2, it is positive unless f is the constant function 0.
∑ The distance between any two given functions f and g is the norm of f – g.
∑ Most constructions in Rn that use the dot product work as well here. In particular, any finite dimensional subspace has an orthonormal basis, and one can use this basis to define the projection onto the subspace.
∑
There is an orthonormal basis for
C[-π, π] that consists of the constant function
plus the collection
{cos(kt), sin(kt)}k=1,2,… .
Any given function f can be depicted using this basis as
f(t)
= a0 +
(ak cos(kt)
+ bk sin(kt)),
where
a0 = ![]()
f(t) dt, ak
= ![]()
cos(kt) f(t) dt and bk = ![]()
sin(kt) f(t) dt.
∑ The convergence of the series above to f(t) might not occur at all values of t, but in any event, the integral from –π to π of the square of the difference between f and the series truncated after N terms tends to zero as N tends to infinity.
Exercises
1. Find an orthonormal basis for the subspace of C[-π, π] spanned by {1, et, e-t} and then
compute the projection of the function t onto this subspace.
2. Find the Fourier series for the function |t| on the interval [-π, π].
3. If a is a real constant, find the Fourier series for cosh(at) on the interval [-π, π] and use
the result to derive a closed form formula
for ![]()
.
4. Let r Î R.
Prove that the collection
, {cos(k(t-r)), sin(k(t-r)}k=1,… is an
orthonormal basis for C[-π+r, π+r] using the dot product that assigns any two given functions f and g the number
![]()
f(t)g(t) dt.
5. Let a < b be
real numbers. Prove that the constant
function
plus the collection
given by {cos(
k(t -
)), sin(
k(t -
))} is an orthonormal basis for C[a, b] if the dot product is
such as to assign any two functions f and g the number
![]()
f(t)g(t) dt.
10.3 Partial differential equations I: The heat/diffusion equation
There are significant applications of Fourier transforms in the theory of partial differential equations. In this regard, our discussion will focus on three very special, but often met equations: The heat/diffusion equation, Laplace’s equation and the wave equation. This section studies the first of these.
The heat equation and the diffusion equation are one and the same, although they arise in different contexts. For the sake of simplicity, we call it the heat equation. Here it is:
Definition 10.3.1: The heat or diffusion equation is for a function, T(t, x), of time t and position x. The equation involves a positive constant, m, and has the form
.
As is plainly evident, the heat equation relates one time derivative of T to two spacial derivatives. A typical problem is one in where the interest is focused only on points x in some interval [a, b] Î R with T some given function of x at time zero. The task then is to solve the heat equation for T(t, x) at times t > 0 and points x Î [a, b]. Often, there are constraints imposed on T at the endpoints x = a and x = b that are meant to hold for all t.
Here is a sample problem: Take a = -π and b = π so that the focus is on values of x in [-π, π]. Suppose that we are told that T(0, x) = f(x) with f some given function of x for x Î [-π, π]. The task is to find the functional form of T at all times t > 0.
Before we pursue this problem, let me explain where this equation comes from. (My apologies to the graduates of Math 21a who may have seen something very much like the explanation that follows). The preceding equation is known as the heat equation because it is used with great accuracy to predict the temperature of a long, but relatively thin rod as a function of time and position, x, along the rod. Thus T(t, x) is the temperature at time t and position x. The constant m that appears measures something of the thermal conductivity of the rod.
The theoretical underpinnings of this equation are based on our understanding of the temperature of a small section of the bar as measuring the average energy in the random motions of the constituent atoms. Heat ‘flows’ from a high temperature region to a low temperature because collisions between the constituent atoms tend to equalize their energy. In this regard, you most probably have that when a fast moving object strikes a slower one (for example in billiards), the faster one is almost always slowed by the collision while the slower one speeds up.
In any event, it is an experimental
fact that a low energy region adjacent to a high energy one will tend to gain
energy at the expense of the higher energy region. A simple way to model in a quantitative
fashion is to postulate that the rate of flow of energy across any given slice
of the rod at any given time has the form -m
where m is a
positive constant and where the derivative is evaluated at the x-coordinate of
the slice and at the given value of t.
Note that the minus sign here is dictated by the requirement that the
flow of energy is from a high temperature region to a low temperature one.
Granted such a postulate, what
follows is an argument for an equation that predicts the temperature as a
function of time. Remembering that
temperature measures the energy in the random motions of the particles, let us do
some bookkeeping to keep track of the energy in a small width section, [x, x+dx], of the rod. Here, I take dx
> 0 but very small. Think of T(t, x)dx as measuring the energy in this section of
the rod. The time derivative of T(t, x)dx measures the net rate of energy coming
into and leaving the section of rod. The
net flow (positive or negative) of energy into our section of the bar is a sum
of two terms: One is the flow across the
left hand edge of the section, this being -m(
)|x; and the other is the flow across the left
hand edge, this equal to +m(
)|x+dx. Note the appearance of the + sign since flow
into our region across the left hand edge is flow in the direction that makes
the bar’s coordinate decrease.
Summing these two terms finds
(t, x) dx = m
-
.
To end the derivation, divide both sides by dx and observe that
![]()
-
» ![]()
when dx is very small.
In any event, the task before us is to solve the heat equation in Definition 10.3.1 for T(t, x) at values of t ≥ 0 and x Î [-π, π] given that T(0, x) = f(x). To explain how this is done, introduce the space, C∞[-π, π], of infinitely differentiable functions of x Î [-π, π] and then view the assignment
h(x)
® ![]()
as defining a linear operator on this space. (The operator is, of course, linear, because
the second derivatives of a sum of functions is the sum of the second
derivatives, and the second derivative of a constant times a function is equal
to the same constant times the second derivative of the function.) It is customary to call this linear operator
the ‘Laplacian’ and denote it by D. Our heat equation then asks for a function T
that obeys the equation
= DT.
As I hope you recall, we dealt with
equations of just this form in the case that T was a vector in Rn
and D a linear operator from Rn
to itself. In the latter case, we were
able to find explicit solutions when the linear operator on Rn
was diagonalizable. Let me remind you of
how this went: Supposing, for the
moment, that A is a diagonalizable linear operator on Rn,
let {e1, …, en} denote its set of asssociated
eigenvectors, a basis for Rn. Each eigenvector has its associated
eigenvalue, a real or complex number.
The eigenvalue associated to ek is denoted here by lk. Now suppose that v0 is a given
vector in Rn and suppose that we want to find the vector-valued
function of time, t ® v(t), that obeys
the equation
= Av subject to the
constraint that v(0) = v0. We
do this by first writing v0 in terms of the basis {ek} as
v0 = ∑k ak ek with each ak
a scalar. This done, then
v(t)
= ∑k
ak ek
.
Our strategy for solving the heat equation in Definition 10.3.1 for a function T(t, x) of x Î [-π, π] subject to the initial condition T(0, x) = f(x) is the infinite dimensional analog of that just described. This understood, our first step is to find a basis for the functions on [-π, π] that consists of eigenvectors of the linear operator D. This might seem like a daunting task were it not for the seemingly serendipitous fact that every function in the Fourier basis
{
, cos(x), sin(x), cos(2x), sin(2x), cos(3x), sin(3x), · · · }
is an eigenfunction of D. Indeed,
D
= 0 and for each k
> 0,
cos(kx) = -k2 cos(kx) and
sin(kx) = -k2 sin(kx).
Thus, we have the following observation:
Fact 10.3.2: Let f(x) denote any given continuous function on [-π, π] with continuous derivative, and write its Fourier series as
f(x)
= a0
+
(ak cos(kx) + bk sin(kx)) .
T(t,
x) = a0
+
(ak
cos(kx) + bk
sin(kx)) ,
and that solves the heat equation with initial condition T(0, x) = f(x) for all x Î (-π, π). If it is also the case that f(π) = f(-π), then it is also the case that T(t, π) = T(t, -π) for all t ≥ 0 and these are equal to f(π) at t = 0.
Here is a first example: Suppose, that f(x) = (π2 – x2). We found its Fourier series in the previous part of this chapter,
f(x)
=
π2 – 4
(-1)k
cos(kx) .
In this case, the funtion T(t, x) given by Fact 10.3.2 is
T(t,
x) =
π2 – 4
(-1)k ![]()
cos(kx) .
Here is a second example: Take f(x) = ex. From one of the examples in the previous part of this chapter, we see that the function T(t, x) that is given by Fact 10.3.2 in this case is
T(t,
x) =
(eπ – e-π)[
+
(-1)k
(
cos(kx) -
sin(kx))].
In all
fairness, I should point out that there is some tricky business here that
doesn’t arise in the finite dimensional model problem
= Av. In particular, there are non-zero solutions
to the heat equation for x Î [-π,
π] whose time zero solution is the constant function f(x) º 0 for all x! Indeed, choose any point, a, that is not
in the interval [-π, π] and the function
T(t,
x) = ![]()
solves the heat equation for t = 0. Moreover, inspite of the factor
, its t ® 0 limit at
points x Î [-π, π] is
zero. (Here is where the condition a Ï [-π, π] is crucial.) The point is that the factor (x–a)2/2mt blows up as t ®
0 if x ≠ a, and so its negative exponential is tiny and converges to zero
as t ® 0. This convergence is much faster than the rate
of blow up of
. Indeed, to see that
this is so, consider that the time derivative of
T(t, x) is
(-![]()
+
)
T(t, x)
which is positive when
t <
.
Thus,
T(t, x) is increasing for small t as long as x ≠
a. Therefore, since
T(t, x) is not negative, it must have a limit as t ® 0 from the positive side since it decreases
with t decreasing given that t is small.
Let’s call this limit c. Then
T(t, x) ~ t c for small t, and so T(0,
x) = 0.
The existence of solutions such as the one just given is the manifestation of some facts about heat and diffusion that I haven’t mentioned but surely won’t surprise you if you have lived in a drafty old house: The distribution of heat in a room is not completely determined by the heat at time zero because you must take into account the heat that enters and leaves through the walls of the room. Thus, in order to completely pin down a unique solution to the heat equation, the function of x given by T(0, x) must be specified—this corresponds to the heat distribution at time zero in our hypothetical rod—but the functions T(t, π) and T(t, -π) of time must also be specified so as to pin down the amount of heat that enters and leaves the ends of our hypothetical rod.
Any specified function T(t, π) is called a boundary condition seeing as it is a condition on the solution that is imposed on the boundary of the rod. For example, specifying T(t, π) = 0 for all t tells us that the ends of the rod are kept at zero temperature. The existence of solutions to the heat equation with prescribed boundary conditions is an important subject, but one that we won’t pursue in this course.
1. Solve the heat equation for a function T(t, x) of t ≥ 0 and x Î [-π, π] that obeys the
initial condition T(0, x) = sin2(x) – cos4(x). (Rather than do the integrals for the Fourier series, take the following shortcut: Use standard trigonometric identities to write T(0, x) as a sum of sine and cosine functions.)
2. Use Fourier series to solve the heat equation for a function T(t, x) of t ≥ 0 and
x Î [-π, π] that obeys the initial condition T(0, x) = sinh(x). You can avoid many of the integrals by exploiting the Fourier series solution for the initial condition ex given above.
3. Suppose that c is a constant. Prove that T(t, x) =
ecx solves
the heat equation.
4. Take the case c = 1 in the previous problem and prove that the resulting solution of
the heat equation with the initial condition T(0, x) = ex is not the same as the one given in the text, above. (Hint: Compare the corresponding Fourier series.)
5. Use Fourier series to solve the heat equation for a function T(t, x) for t ≥ 0 and
for x Î [-π, π] subject to the initial condition T(0, x) = x.
6. Prove that T(t, x) = x is also a solution to the heat equation for t ≥ 0 and x Î [-π, π]
with the initial condition T(0,
x) = x. Prove that it is different than
the one you found in the Problem 5 using Fourier series.
10.4 Partial differential equations II: The Laplace and wave equations
The discussion that follows explores some features of two other very commonly met differential equations, one called the ‘Laplace equation’ and the other called the ‘wave equation’.
The discussion starts with the Laplace equation. This equation is for a function, T, of two space variables, x and y. Here is the definition:
Definition 10.4.1: A function u that is defined on some given region in the x-y plane is said to obey the Laplace equation in the case that
+
= 0
at all points (x, y) in the given region.
Versions of this equation arise in numerous areas in the sciences. Those of you who plan to take a course about electricity and magnetism will see it. Likewise, if you study the analog of our heat/diffusion equation for a thin plate shaped like the given region in the x-y plane, you will see that time independent solutions to the heat/diffusion equation are solutions to the Laplace equation. Indeed, this is because the two dimensional version of the heat equation is for a function T(t, x, y) of time and the space coordinates x and y that obeys the equation
.
If T is an equilibrium solution to this last equation, then it depends only on the space coordinates x and y and so supplies a solution to the Laplace equation.
Here is a basic fact about the Laplace equation and its solutions: Suppose that R is a bounded region in the x-y plane whose boundary is some finite union of segments of smooth curves. Suppose in addition that f is a continuous functon that is defined on the boundary of R with well defined directional derivatives. Then there is a unique solution in R to the Laplace equation that is smooth at points inside R and whose restriction to the boundary is equal to f.
To explain some of the terminology, a segment of a smooth curve is a connected part of a level set of some function, h(x, y), where the associated gradient vector is nonzero. In this regard, h is assumed to have partial derivatives to all orders with respect to the variables x and y.
If you haven’t yet taken a multivariable calculus course, this explanation and the constraints on the region most probably seem like mumbo-jumbo. If so, don’t fret because the discussion that follows concentrates exclusively on the case where the region R is the square where –π ≤ x ≤ π and –π ≤ y ≤ π. With this proviso understood, here is a formal restatement of what was just said:
Fact 10.4.2: Consider the square where both –π ≤ x ≤ π and –π ≤ y ≤ π. Suppose that f is any given continous function that is defined on the boundary of the square. Suppose, in addition, that f has bounded y-derivative along the two vertical segments of the boundary and bounded x-derivative along the two horizontal segments of the boundary. Then there is a unique solution to the Laplace equation in the square that is smooth at points inside R and whose restriction to the boundary is equal to f.
To see how this Fact plays out, consider first the example where k is a positive integer and where the given function f on the boundary of the square is equal to sin(ky) on the two vertical parts of the boundary, and is equal to zero on the two horizontal parts. In this case, I proceed by assuming that the solution u(x, y) has the form
u(x, y) = c(x) sin(ky)
where c is some function of x that is constrained so that
c(π) = c(-π) = 1 .
You rightly are asking why I chose this very particular form for u(x, y). I chose this form because I know that it works! Most probably, the first person (Laplace?) to try this form for u would not have given you a good answer as to why it was done. This said, you would be surprised at the number of so called ‘brilliant’ scientific advances that owe allegiance to the ‘guess and check’ school.
Anyway, grant me the right to at least give u(x, y) = c(x) sin(ky) a try. This is a function of the form g(x) h(y), and if such a function is plugged into the Laplace equation, all the x-derivatives hit g(x) and all the y-derivatives hit h(y). In the present case, I find that my u(x, y) = c(x) sin(ky) solves the Laplace equation provided that the function c(x) solves the equation
- k2 c = 0
.
Except for renaming the coordinate x as t, this is precisely the sort of equation that we considered in Section 10.1. In particular, we learned in Section 10.1 that the general solution has the form
c(x) = a ekx + b e-kx
where a and b are constants. The question thus is as follows: Can I choose a and b so that the conditions c(π) = c(-π) = 1 hold?
This can be done viewed as finding a simultaneous solution to the linear equations
a ekπ
+ b e-kπ = 1 and
a e-kπ + b ekπ = 1
As we saw earlier in the course, there is a unique solution to this equation when the matrix
M = ![]()
is invertible. As det(M) = e2kπ – e-2kπ > 0, this is indeed the case, and inverting M finds that
a
= b =
.
Thus, our solution in this case is
u(x, y) =
(ekx + e-kx)
sin(ky) .
As with the heat equation the Laplace equation is a linear equation. This is to say that the sum of any two solutions is a solution and the product of any solution and any real number is a solution. Granted this, then what we just did enables us to find a solution, u(x, y), to the Laplace equation in the case that the given function f is zero on the horizontal parts of the boundary and has the Fourier series
f(y)|x=±π =
ak sin(ky)
on the vertical parts of the boundary. Here, each ak is a constant. Indeed, the solution for this case is simply the sum of those for the cases where f was ak sin(ky):
u(x, y) =
ak (ekx + e-kx) sin(ky) .
Fourier series can also be used to write down the solution to Laplace’s equation in the most general cases from Fact 10.4.2; that where the boundary function f is non-zero at points on both the horizontal and vertical parts of the boundary of the square. You will be asked to explore some aspects of this in the exercises.
Turn now to the story for the wave equation. The simplest example is an equation for a function, u(t, x), that is defined for all values of t Î R and for values of x that range over some interval [a, b]. Here is the definition:
Definition 10.4.3: Suppose
that a positive number, c, and
numbers a < b have been specified. A function,
u, of the variables t and x where t Î R and x Î
(a, b) is said to obey the wave equation
in the case that
- c2
= 0 .
at all of values of t Î R and x Î (a, b).
The wave equation is typically augmented with boundary conditions for u at the points where x = a and x = b. To keep the story short, we will only discuss the case where u is constrained so that
u(t, a) = 0 and u(t, b) = 0 for all t.
It is often the case that one must find a solution to the wave equation subject to additional conditions that constrain the value of u and its time derivative at t = 0. These are typically of the following form: Functions f(x) and g(x) on [a, b] are given that both vanish at the endpoints. A solution u(t, x) is then sought for the wave equation subject to the boundary conditions u(t, a) = 0 = u(t, b) and to the initial conditions
u(0, x) = f(x) and
= g(x) for
all x Î
[a, b].
The equation in Definition 10.4.3 is called the wave equation because it is used to
model the wave-like displacements (up/down) that are seen in vibrating
strings. In this regard, such a model
ignores gravity, friction and compressional effects as it postulates an
idealized, tensed string whose equilibrium configuration stretches along the
x-axis from where x = a to x = b, and whose ends are fixed during the
vibration. The constant c that appears
in the wave equation determines the fundamental frequency of the vibration,
.
To elaborate, u(t, x) gives the z-coordinate of the string at time t over the point x on the x-axis. The boundary conditions u(t, a) = 0 = u(t, b) keeps the ends of the string fixed during the vibration. The initial conditions are specifying the state of the string at time 0. For example, in the case that g º 0, the string is started at time zero at rest, but with a displacement at any given x equal to f(x). As it turns out, such an idealization is quite accurate for small displacements in tautly stretched real strings. For example, the behavior of violin and other musical instrument strings are well described by the wave equation.
Somewhat more complicated versions of the wave equation are also used to model the propagation of sound waves, water waves, electromagnetic waves (such as light and radio waves), and sundry other wave-like phenomena.
The following summarizes what can be said about the existence of solutions:
Fact 10.4.4: Let f(x) and g(x) be any two given,
smooth functions on an interval where a ≤ x ≤ b that are zero at the endpoints. Then, there is a unique function, u(t, x), that is defined for all t and for x Î
[a, b], and has the following properties:
∑
u(t, x) obeys the wave equation for all t and for all points x with a
< x < b.
∑
u(t, a) = u(t, b) = 0 for all t.
∑
u(0, x) = f(x) and
= g(x) for
all x Î [a, b].
To keep the subsequent examples relatively simple,
consider henceforth only the case where a = -π and b = π. The challenge before us is to solve the wave
equation in this context where the initial conditions have u(0, x) = sin(kx)
and where
u|t=0 = 0
at all x. Here, k Î {1, 2, …}.
With the benefit of much hindsight, I now propose looking for a solution, u(t, x), having the form u(t, x) = h(t) sin(kx). Note that this guess has the virtue of satisfying the required boundary conditions that u(t, ±π) = 0. Plugging h(t) sin(kx) into the wave equation, I find that the latter equation is obeyed if and only if the function h(t) obeys the equation
+ c2k2
h = 0
subject to the initial
conditions h(0) = 1 and
h|t=0 = 0.
According to Fact 10.1.3, the general solution to this last equation is
h(t) = a sin(ckt) + b cos(ckt)
where a and b
are constants. The conditions h(0) = 1
and
h|t=0 = 0
requires that a = 0 and b = 1.
Thus, our solution u(t, x) is
u(t, x) = cos(ckt) sin(kx) .
As you can see, this
solution is periodic in time, with period equal to
.
Now, you are invited to check that the following linearity conditions are fulfilled:
Fact 10.4.5: The sum of any
two solutions to the wave equation is also a solution, as is the product of any
solution by any real number.
Granted, this, we can
use our solutions for the initial conditions u(0, x) = sin(kx) and (
u)(0,x) = 0 to write down using Fourier series the solution
to the wave equation for the initial conditions u(0, x) = f(x) and (
u)(0, x) = 0 in the case that f(x) has only sine functions in
its Fourier series. To elaborate,
suppose that f(x) has the Fourier series
f(x) = ∑k=1,2,… ak sin(kx) where each ak is a real number .
It then follows that
the corresponding wave equation solution u(t, x) with the initial conditions
u(0, x) = f(x) and (
u)(0, x) = 0 is given by the sum
u(t, x) = ∑k=1,2,… ak cos(ckt) sin(kx) .
Exercises
1. This problem explores the use of Fourier series to write down solutions to the Laplace equation on the square in the x-y plane where –π ≤ x ≤ π and –π ≤ y ≤ π.
a) Let f denote a function on the boundary of the square that is zero on the horizontal parts of the boundary, has the Fourier series ∑k=1,2,… ak sin(ky) on the x = -π part of the boundary, and the Fourier series ∑k=1,2,… bk sin(ky) on the x = π part of the boundary. Here, each ak and each bk are constant, and they are not necessarily equal. Write down the function of x and y on the square that solves the Laplace equation and equals f on the boundary.
b) Let g denote a function on the boundary of the square that is zero on the vertical parts of the boundary, has the Fourier series ∑k=1,2,… ck sin(kx) on the y = -π part of the boundary, and the Fourier series ∑k=1,2,… dk sin(kx) on the y = π part of the boundary. Here, each ck and each dk are constant, and they are not necessarily equal. Write down the function of x and y on the square that solves the Laplace equation and equals g on the boundary.
c) Let h now denote a function on the boundary of the square that has the following Fourier series on the four sides of the boundary: The series ∑k=1,2,… ak sin(ky) on the x = -π part of the boundary, the series ∑k=1,2,… bk sin(ky) on the x = π part of the boundary, the series ∑k=1,2,… ck sin(kx) on the y = -π part of the boundary, and the series ∑k=1,2,… dk sin(kx) on the y = π part of the boundary. Write down the solution to the Laplace equation on the square that equals h on the boundary.
2. a) Find the solution to Laplace’s equation on the square in the x-y plane where both
–π ≤ x ≤ π and –π ≤ y ≤ π whose restriction to the boundary is a given constant, c.
b) Use the results from Part a) and also from Problem 1 to find the temperature at the point (0, 0) in the x-y plane when the temperature at any given point (x, y) with either x = ±π and –π ≤ y ≤ π, or with y = ±π and –π ≤ x ≤ π is equal to 1+ xy for all time. See the examples in Section 10.2 to obtain the explicit Fourier series for x and for y.
3. The problem explores the use of Fourier series to write down solutions to the wave equation for functions of t Î R and x Î [-π, π].
a) Let g(x) denote a
function of x Î [-π, π] that
vanishes at x = -π and at x = π.
Suppose that g(x) has the Fourier series g(x) = ∑k=1,2,…
bk sin(kx). Write down the solution, u(t, x), to the wave equation
for values of t Î R and x Î [-π, π] that vanishes at x =
±π and obeys u(0,x) = 0 and (
u)(0,x) = g(x).
b) Let f(x) denote a
function of x Î [-π, π] that
vanishes at x = -π and at x = π.
Suppose that f(x) has the Fourier series f(x) = ∑k=1,2,…
ak sin(kx). Let g(x) be as in
the Part a). Give the solution, u(t, x),
to the wave equation for values of t Î R and x Î [-π, π] that vanishes at x =
±π, while obeying u(0,x) = f(x) and also (
u)(0,x) = g(x).
4. This problem explores an approach to the wave equation that does not use what we learned about Fourier series. Suppose that f(y) and g(y) are any 2-times differentiable functions of one variable (here called y).
a) Use the two variable version of the chain rule to show that the function
u(t, x) = f(x+ct) + g(x-ct)
satisfies the wave equation.
b) Show that the conditions u(t, -π) = u(t, π) = 0 hold for all t if both
g(y) = -f(2π-y) and f(y) = f(y+4π) for all y. In particular, explain why this then means that
u(t, x) = f(x + ct) – f(2π –x + ct).
c) Use the two
variable chain rule again to explain why the condition (
u) at t = 0 for all x Î
[-π, π] requires that f(y) -
f(2π-y) is constant.
d) Use the
preceding to write down the wave equation solution u(t, x) with initial
condition u(0, x) = cos(
x) and (
u)(0, x) = 0 for the case where x is constrained to obey
–π ≤ x ≤ π.