10. Differential equations

10.1 Ordinary differential equations

Many of the ideas of linear algebra which we have studied in the context of Rⁿ or Cⁿ are applicable in a much wider context. Mathematicians introduced the abstract notion of a ‘vector space’, or what is a synonym, a ‘linear space’, to describe this greater context. Rather than look at vector spaces in the abstract, we shall look at some examples in this and the next section which are important in the theory of differential equations.

To start the story, let us introduce the notion of a ‘smooth’ function: This is a function on the line, R, that can be differentiated as often as desired. This set is traditionally denoted as C^∞. For example, f(t) = 1, g(t) = t and h(t) = e^t are all functions in the set C^∞. Indeed, all derivatives of f vanish, all but the first of g vanish, and the n’th derivative of h is equal to h. On the other hand, f(t) = |t| is not in C^∞ since it is not differentiable at t = 0.

We have introduced C^∞ as an example of a ‘vector space’. Here is the point: If f and g are two functions in C^∞, then so is the function t ® f(t) + g(t). Moreover, if c is any real number and if f Î C^∞, then the function t ® c f(t) is also in C^∞. Thus, one can add functions in C^∞ to get a new function in C^∞, and one can multiply a function in C^∞ by a real number to get a new function in C^∞. For example, 1 Î C^∞ and cos(3t) Î C^∞, as is f(t) º 1+cos(3t). Likewise, t and also 5t and –3.414 t are in C^∞.

Addition of vectors and multiplication of vectors are the basic operations that we have studied on Rⁿ and here we see a huge set, C^∞, that admits these same two basic operations. In this regard, any set with these two operations, addition and multiplication by scalar numbers, is what is properly called a ‘vector space’ or, equivalently, a ‘linear space’.

Many of the same notions that we introduced in the context of vectors in Rⁿ have very precise counterparts in the context of our linear space C^∞. What follows are some examples of particularly relevance to what we will do in the subsequent subsections.

∑ Subspaces: Any polynomial function, t ® a_n tⁿ + a_n-1t^n-1 + · · · + a₀ is infinitely differentiable, and so is in C^∞. Here, each a_k is a real (or complex) number. The set of all polynomials form a subset, P Ì C^∞, with two important properties: First, if f(t) and g(t) are in P, then so is the function t ® f(t) + g(t). Second, if c is a real number and f(t) is a polynomial, then the function t ® cf(t) is a polynomial. Thus, the sum of two elements in P is also in P and the product of a real number with an element in P is in P.

If you recall, as subset V Ì Rⁿ of vectors was called a ‘subspace’ if it had the analogous two properties: Sums of vectors in V are in V, and any real number times any vector in V is V. It is for this reason that a set, such as P, is called a ‘subspace’ of C^∞. Thus, a subspace is a subset that has the two salient properties of a vector space.

∑ Linear independence: Can you find two constants, c₁ and c₂, that are not both zero and such that the function t ® c₁ + c₂t is zero for all values of t? A moment’s reflection should tell you that there if c₁ + c₂t is zero for all values of t, then c₁ must be zero (try setting t = 0), and also c₂ must be zero (then set t = 1).

If you recall, a set, {v₁, . . . , v_k}, of vectors in Rⁿ was said to be ‘linearly independent’ in the case that c₁ = c₂ = ··· = c_k = 0 are the only values for a collection of constants {c₁, . . ., c_k} that makes c₁v₁ + c₂v₂ + ··· + c_kv_k = 0.

By the same token, functions {f₁, . . ., f_k} Ì C^∞ are said to be linearly independent in the case that c₁ = c₂ = ··· = c_k = 0 are the only values for a collection of constants {c₁, . . ., c_k} that makes c₁the only choice for constants {c₁, … , c_k} that make the function t ® c₁ f₁(t) + · · · + c_k f_k(t) = 0 for all t. Note that I have underlined the notion that this sum is supposed to vanish for all choices of t, not just some choices. For example, the functions 1 and t are linearly independent, even though the function t ® 1+t is zero at t = -1.

To get a feeling for this notion, do you think that the functions {1, t, . . . , tⁿ} for any given non-negative integer n forms a linearly independent set? If you said yes, then you are correct. Here is why: Suppose t ® p(t) º c₀ + c₁t + ··· + c_ntⁿ is zero for all t in the case that c₀, …, c_n are all constant. If such is the case, then p(0) must be zero, and so c₀ is zero. Also, the derivative of p(t) must be the zero function (since p is) and this derivative is c₁ + 2c₂t + ··· + n c_n t^n-1. In particular, p´(0) must be zero and so c₁ = 0 too. Continuing in this vein with the higher derivatives finds each successive c_k = 0.

A collection of functions that is not linearly independent is called ‘linearly dependent’.

∑ Linear transformation: If f(t) Î C^∞, then we can define a new function that we will denote as Df by taking the derivative of f. Thus,

(Df)(t) = f´(t) = (t)

For example, D(sin(t)) = cos(t).

Because we can take as many derivatives of f as we like, the function t ® (Df)(t) is a smooth function. Moreover, D has two important properties:

a) D(f + g) = Df + Dg no matter the choice for f and g.

b) D(cf) = cDF if c is a constant.

If you recall, a transformation of Rⁿ with the analogous two properties was called a ‘linear transformation’. By analogy, we call D a linear transformation of C^∞. Equivalently, we say that D is ‘linear’.

Here is another example: Set D²f to denote the function t ® f´´(t). Then D² is also linear. In general, so is Dⁿ where Dⁿf takes the n’th derivatives. Furthermore, so is the transformation that sends f to the function a_nDⁿf + a_n-1D^n-1f + · · · + a₀f in the case that the collection a₀, …, a_n are constants. In fact, such is the case even if each a_k is a fixed function of t. In this regard, be sure to define this transformation so that the same collection {t ® a_k(t)} is used a f varies in the set C^∞.

∑ The kernel of a linear transformation: The kernel of a linear transformation such as D is the set of functions, f Î C^∞, such that (Df)(t) = 0 for all values of t. This is to say that Df is the zero function. In the case of D, a function has everywhere zero first derivative if and only if it is constant, so ker(D) consists of the constant functions.

For a second example, consider D². What is the kernel of D²? Well a function whose second derivative is zero everywhere must have constant first derivative. Thus, D²f = 0 if and only if f´ = c₁ with c₁ a constant. But, a function with constant first derivative must have the form f = c₀ + c₁t, where c₀ is also constant. Arguing in this manner finds that the kernel of D² consists of all functions of the form {c₀+c₁t} where c₀ and c₁ can be any pair of constants.

Note that the kernel of a linear transformation is always a linear subspace.

∑ The image of a linear transformation: A function, say t ® g(t), is said to be in the ‘image’ of D if there is a smooth function f that obeys (Df)(t) = g(t) at all t. Thus, g is in the image of D if g has an anti-derivative that is a smooth function. Now, every function has an anti-derivative, and the anti-derivative is smooth if the original function is. To explain, if f´ = g and I can take as many derivatives as I like of g, then I can take as many as I like of f and all are smooth. In particular, because f´ = g, there is a first derivative. Moreover, any n ≥ 1 derivatives of f is n+1 derivatives of g.

With the preceding understood, the image of D is the whole of C^∞. Indeed, you give me any g(t) and I’ll take the corresponding function f to be

t ® g(s) ds

By the way, I hope that it strikes you as passing strange that I have exhibited a linear transformation from C^∞ to itself whose image is the whole of C^∞ but whose kernel is non-zero. A little thought should convince you that a linear transformation of Rⁿ whose image is Rⁿ must have trivial kernel. This novel phenomena is a manifestation of the fact that C^∞ is what is rightfully called an ‘infinite dimensional’ space. More is said on this below.

∑ Basis, span and dimension: As I argued previously, the kernel of D² consists of all functions of the form c₀+c₁t where c₀ and c₁ are constants. Thus, the kernel of D² consists of linear combinations of the two functions, 1 and t. These are then said to ‘span’ the kernel of D² and as they are linearly independent, they are also said to give a ‘basis’ for the kernel of D². As this basis has two elements, so the kernel of D² is said to be 2-dimensional.

As another example, the subspace, P₃, of polynomials of degree three or less consists of all functions of the form t ® c₀ + c₁t + c₂t² + c₃t³ where each c_k is a constant. Now, the functions {1, t, t², t³} are linearly independent, and they span P₃ in the sense that every element in P₃ is a linear combination from this set. Since there are four of the functions involved, the subspace P₃ is said to be 4-dimensional.

In general, if V is a subspace, n Î {0, 1, … } and {f₁, . . ., f_n} is a set of linearly independent functions that span V, then V is said to be n-dimensional. To be precise here, a set {f₁, … , f_n} of functions in V, whether linearly independent or not, is said to span V if any given function g(t) Î V can be written as g(t) = c₁f₁(t) + · · · + c_nf_n(t) where each c_k is a constant.

A subspace such as the space of all polynomials is said to be infinite dimensional if it has arbitrarily large subsets of linearly independent functions. It is in this sense that C^∞ itself is infinite dimensional.

Any linear operator on C^∞ that takes any given f(t) to some linear combination of it and its derivatives is an example of a ‘linear differential operator’. In fact, the general, form for a linear differential operator is a linear transformation, f ® Tf, of C^∞ that sends any given f(t) to the function

(Tf)(t) = a_n(t) (Dⁿf)(t) + a_n-1(t) (D^n-1f)(t) + · · · + a_o(t) f(t) ,

where each a_k is some smooth function. If each a_k is constant, then T is said to be a ‘constant coefficient’ differential operator. Granted that a_n ≠ 0, then T is said to have ‘order n’. For example, D² is an such an example of order 2. Here is another:

(Tf)(t) = f´´(t) + 3f´(t) - 2f(t) .

There is more arcane vocabulary to learn here. If T is a linear differential operator and if one is asked to ‘find the general solution to the homogeneous equation for T’, then one is being asked to find the kernel of T, thus all functions f(t) such that (Tf)(t) = 0 at every t. On the other hand, if g(t) is some given function and one is asked to solve the ‘inhomogeneous equation Tf = g’, this means you should find all functions f such that (Tf)(t) = g(t).

Here is an example: Suppose that you are asked to find all solutions to the inhomogeneous equation D²f = e^t. You would answer: The general solution has the form f(t) = e^t + c₀ + c₁t, where c₀ and c₁ are constants.

By the way, this last example illustrates an important fact:

Fact 10.1.1: If T is a given differential operator, g(t) a given function, and f₀ some solution to the inhomogeneous equation Tf = g, then any other solution to this equation has the form f(t) = f₀(t) + h(t) where h is a function from the kernel of T. That is, Th = 0.

This fact has a mundane proof: If f is also a solution, then T(f – f₀) = Tf – Tf₀ = g – g = 0, so it is necessarily the case that f – f₀ is in the kernel of T. Even so, Fact 10.1.1 is quite useful, since it means that once you find the kernel of T, then you need only find a single inhomogeneous solution to know them all.

The task of finding an element in the kernel of a generic differential operator, or solving an associated inhomogeneous equation can be quite daunting. Often, there is no nice, closed form algebraic expression for elements in the kernel, or for the solution to the inhomogeneous equation. Even so, there are some quite general ‘existence’ theorems that tell us when, and how many, solutions to expect. For example, consider the following:

Fact 10.1.2: Suppose that T is a differential operator that has the form

(Tf)(t) = (t) + a_n-1(t) (t) + · · · + a₁(t) (t) + a₀(t)f(t)

where a₀, … , a_n-1(t) are smooth functions. Then, the kernel of T has dimension n. Moreover, if g(t) is any given function, then there exists some f(t) such that Tf = g.

Of course, this doesn’t tell us what the solution to the equation Tf = g looks like, but it does tell us that there is a solution whether or not we can find it explicitly.

Unfortunately, the proof of this fact takes us beyond where we can go in this course, so you will just have to take it on faith until you take a more advanced mathematics course.

Although it is no simple matter to write down the kernel of your generic differential operator, the situation is rather different if the operator has constant coefficients. In this case, the kernel can be found in a more or less explicit form. To elaborate, lets suppose that the operator in question, T, has the form

T = Dⁿ + a_n-1D^n-1 + · · · + a₁D + a₀ ,

where each a_k is now a constant. Our goal now is to find all functions f(t) such that (Tf)(t) = 0. Thus, f(t) must solve

(t) + a_n-1(t) + · · · + a₁(t) + a₀f(t) = 0.

Consider first the case where n = 1 in which case we are looking for functions f(t) that obey the equation f´ + a₀f = 0. We can write this equation as

= -a₀ dt,

and integrate both sides to find that ln(f(t)) = -a₀t + c, where c can be any constant. Thus, the general solution is

f(t) = b where b Î R .

Thus, the kernel is 1-dimensional as predicted by Fact 10.1.2. As described below, such exponential functions also play a key role in the n > 1 cases.

To analyze the n > 1 cases, let us recall Fact 7.5.2: The polynomial

l ® p(l) = lⁿ + a_n-1l^n-1 + · · · + a₀

always factorizes as

p(l) = (l - l_n)···(l - l₁) ,

where each l_k is a complex number. In this regard, keep in mind that a given complex number can appear as more than one l_k. Also, keep in mind that if a given l_k is complex, then its complex conjugate appears as some l_j with j ≠ k. In any event, the following summarizes the n > 1 story:

Fact 10.1.3: In the case that the numbers {l₁, …, l_n} are distinct, then the kernel of T consists of linear combinations with constant coefficients of the real and imaginary parts of the collection {}_1≤k≤n. To be more explicit, write each l_k as l_k = a_k+ib_k with a_k and b_k real. In the case where the {l_k} are distinct, the kernel ofT is spanned by the functions in the set {cos(b_kt), sin(b_kt)}_1≤k≤n . In the general case, introduce m_k to denote the number of times a given l_k appears in the set {l_j}_1≤j≤n. Then the kernel of T is spanned by the collection {p_k(t) cos(b_kt), p_k(t) sin(b_kt)} where p_k(t) can be any polynomial of from zero up to m_k-1.

For example, consider the case where T = D² –2D +3. The resulting version of p(l) is the polynomial l² – 2l + 3, and the latter factorizes as (l+3)(l-1). According to Fact 10.1.3, the kernel of T is spanned by {e^-^3t, e^t}. You can check yourself that both are in the kernel. They are also linearly independent. Indeed, you can see this because e^t gets very large as t ® ∞ and e^-3t goes to zero as t ® ∞. Because Fact 10.1.2 tells us that the kernel is 2-dimensional, we therefore know that they must span the kernel also.

Another example is the case that T = D²+1. The corresponding polynomial is the function p(l) = l²+1. This one factorizes as (l+i)(l-i). Thus, its roots are ±i, and so Fact 10.1.3 asserts that the kernel is spanned by {cos(t), sin(t)}. Since the second derivative of cos(t) is –cos(t), it is certainly the case that D²cos(t) + cos(t) = 0. Likewise, sin(t) is in the kernel of D²+1 since the second derivative of sin(t) is –sin(t). These are linearly independent as can be seen from the following argument: If c₁ cos(t) + c₂ sin(t) = 0 for all t with c₁ and c₂ constant, then this is true at t = 0. But, at t = 0, cos(t) = 1 and sin(t) = 0, so c₁ = 0. but then c₂ = 0 also. According to Fact 10.1.2, the dimension of the kernel of D²+1 is 2, so {cos(t), sin(t)} must span the kernel.

Here is a third example: Take T = D³-3D²+3D-1. In this case, the corresponding polynomial p(l) is (l-1)³. There is only one root here, l = 1, and it appears with multiplicity 3. According to Fact 10.1.3, the kernel should be generated by the collection of functions {e^t, t e^t, t²e^t}. This is to say that every element in the kernel has the schematic form

f(t) = c₁e^t + c₂te^t + c₃t²e^t = (c₁+c₂t+c₃t²) e^t

where c₁, c₂ and c₃ are constants. You are invited to take the prescribed derivatives to verify that (Tf)(t) = 0 for all t. Even so, here is what might be an easier way to do this: First, exploit the factorizing of p(l) as (l-1)³ to audaciously write

Tf = (D-1)(D-1)(D-1)f

Now, note that (D-1)f = (c₂ + 2c₃) t) e^t. This being the case, then (D-1)(D-1)f = 2c₃ e^t. Finally, (D-1)(D-1)(D-1)f = 2c₃(D-1)e^t, and this is zero because the derivative of e^t is e^t.

By the way, these three examples illustrate two important points, and also indicate how to prove Fact 10.1.3. These two points are discussed first, and then the proof of Fact 10.1.3 is sketched.

∑ If {l₁, l₂, …} is any finite or infinite collection of real numbers with no two the same, then the functions in the corresponding collection

{, , ··· }

are linearly independent. This is to say that if {c₁, c₂, · · ·, c_k} are any finite collection

of constants and if

c₁ + c₂ + · · · c_k = 0

then c₁ = c₂ = · · · = c_k = 0. To prove that such is the case, just consider the largest number from the collection {l₁, . . . , l_k}. Call it l. Then as t ® ∞, all of the terms in the sum of exponential function are very much smaller than e^l^t, and so its corresponding constant must be zero. This understood, go to the next largest number from {l₁, . . . , l_k} and make the same argument. Continue until sequentially until all l’s are accounted for.

∑ If {l₁ = a₁+ib₁, l₂ = a₂+ib₂, …} are any finite or infinite collection of complex numbers with no two the same, then the functions in the collection

{cos(b₁t), sin(b₁t), cos(b₂t), sin(b₂t), ··· }

are linearly independent in the sense that no linear combination of any finite subset from this collection will vanish at all t unless the constants involved are all zero. Indeed, the fact that functions with different a’s are linearly independent is argued just as in the previous point, by looking at how they grow as t ® ∞. The argument in the general case is more involved and so will not be presented.

What follows are some remarks that are meant to indicate how to procede with a rigorous proof of Fact 10.1.3 in the general case. To start the story, remember that the operator T is Dⁿ+ a_n-1D^n-1+ ··· + a₀, and so determines a corresponding polynomial p(l) = lⁿ + a_n-1l^n-1 + · · · + a₀. Suppose that some real number, r, is a root of this polynomial. Thus, p(r) = 0. Since the derivative of e^rt is r e^rt, so D^ke^rt = r^k e^rt for any given non-negative integer k. As a consequence, T(e^rt) = rⁿe^rt + a_n-1r^n-1e^rt + ··· + a₀e^rt = p(r) e^rt = 0 for all t. Thus, we see that each real root of p(l) determines a corresponding exponential function in the kernel of T.

Now suppose that h is a complex root of p(l). In this regard, remember that the complex conjugate is also a root of p. Also, recall from Section 9.2 that the derivative of the complex number valued function t ® e^h^t is h e^h^t. Thus, D^ke^h^t = h^k e^h^t for any given non-negative integer k. Now, write h = a + ib, where a and b are real, and remember that e^a^t cos(bt) = (e^h^t+). Thus, the k’th derivative of e^a^t cos(bt) is (h^ke^h^t+^k). As a consequence,

T(e^a^t cos(bt) = (p(h) e^h^t + p()) = 0 for all values of t.

Since e^a^t sin(bt) = (e^h^t-), the same sort of argument proves that T(e^a^t sin(bt)) = 0 for all t as well. This then proves that every complex conjugate pairs {h, } of roots of p(l) determines a corresponding pair, {e^a^t cos(bt), e^a^t sin(bt)} of linearly independent functions in the kernel of T.

Having digested the contents of the preceding two paragraphs, you are led inevitably to the conclusions of Fact 10.1.3 in the case that p(l) has n distinct roots. Of course, this is predicated on your acceptance of the assertion in Fact 10.1.2 that the kernel of T is n-dimensional. It is also predicated on your acceptance of the conclusions in the second point three paragraphs back about linear independence.

The argument for the case when some real or complex number occurs more than once in the collection {l₁, . . ., l_k} is based on the following observation: The derivative of t^k e^h^t is kt^k-1e^h^t + he^h^t. As a consequence, t^ke^h^t is a solution to the inhomogeneous equation

(D-h)f = k t^k-1e^h^t .

By the same token,

(D-h)(D-h)(t^ke^h^t) = k(k-1) t^k-2e^h^t .

Now, if we just iterate these last observations, we find that acting sequentially q times by (D-l) on t^ke^h^t gives

(D-h)^q(t^je^h^t) = k(k-1)···(k-q+1) t^k-q e^h^t if q ≤ k and (D-h)^q(t^ke^h^t) = 0 if q > k.

With the preceding in mind, suppose that some given real or complex number, h, is a root p(l) that occurs some q times in the collection {l₁, . . ., l_k}. Let us renumber this list so that the last q of them are the ones that are equal to h. If we are willing to take the audacious step of factorizing the operator T by writing

T = (D-l₁)···(D-l_n-q)(D-h)^q ,

we see that (Tf)(t) = 0 if f(t) is any linear combination from the set {e^h^t, t e^h^t, … , t^q-1eⁿ^t}. Indeed, this is because we have learned from the preceding paragraph that any such linear combination is already sent to zero by the factor (D-h)^q. As before, the real and complex parts of any such linear combination must also be sent to zero by T. Thus, since the real part of t^ke^h^t is t^ke^a^t cos(bt) and the imaginary part is t^ke^a^t sin(bt), we are led to Fact 10.1.3 for the cases when the collection of roots of p(l) contains repeated values.

By the way, having just read the preceding two paragraphs, you now have every right to be nervous about ‘factorizing T’ by manipulating D as if it were just a ‘number’ or a variable like l rather than the much more subtle object that says ‘take the derivative of what ever is in front of me’. You will have to trust me when I say that this sort of outrageous move can be justified in a very rigorous way.

When using differential equation solutions to predict the future from present data, one can run into a problem of the following sort: Find all solutions to the differential equation Dⁿf + a_n-1D^n-1f + ··· + a_of = 0 where the value of f and certain of its derivatives are prescribed at fixed times. For example, find all solutions to D²f - 2Df + 2 = 0 that obey f(0) = 1 and f() = 2. This sort of problem is solved by first using Fact 10.1.3 to write the most general solution, and then searching for those that obey the given fixed time conditions. In the example just given, an appeal to Fact 10.1.3 finds that the general solution has the form

f(t) = a e^-t cos(t) + b e^-t sin(t)

where a and b can be any constants. This understood, then the condition f(0) = 1 requires that a = 1 but does not constrain b at all. Meanwhile, the condition that f() = 2 demands that b = 2 e^π/2. Therefore, the solution to this particular constrained differential equation problem is f(t) = e^-t cos(t) + 2e^π/2 e^-t sin(t).

As second example using the same equation D²f –2Df + 2 = 0 asks for all solutions with f´(0) = 0. This condition reads –a +b = 0. Thus, all solutions to this constrained problem have the form f(t) = a e^-t (cos(t) + sin(t)) where a is any constant.

With regards to these constrained problems: Conditions that are demanded on f or its derivatives at t = 0 are usually called ‘initial conditions’.

Here are some key notions to remember from the discussion in 10.1:

∑ The space C^∞ as a vector space.

∑ Linear dependence and linear independence for a set of functions from C^∞.

∑ The formula for the general solution of the equation Dⁿf + a_n-1D^n-1f + ··· + a_of = 0 in the case that each a_k is a constant.

∑ How to find the solution to Dⁿf + a_n-1D^n-1f + ··· + a_of = 0 that obeys some constraints on the values of f and certain of its derivatives at certain prescribed times.

Exercises

1. Which of the following are subspaces of C^∞?

a) All continuous functions from R to R.

b) All f Î C^∞ such that f(0) + f´(0) = 0.

c) All f Î C^∞ such that f + f´ = 0.

d) All f Î C^∞ such that f(0) = 1.

2. Which of the following subsets of C^∞ consists of linearly independent functions?

a) 1, t, t², t³e^t .

b) 1+t, 1-t, t², 1+t+t².

c) sin(t), e^t, e^tsin(t).

d) sin(t), cos(t), sin(t+).

3. Which of the following maps are linear?

a) T: C^∞ ® R given by T(f) = f(0).

b) T: C^∞ ® C^∞ given by T(f) = f² + f´.

c) T: C^∞ ® R² given by T(f) = (f(0), f(1)).

d) T: C^∞ ® R given by T(f) = ò_0≤t≤1 f(t) dt.

4. Find a basis for the kernel of T: C^∞ ® C^∞ given by T(f) = f´´ + f´ - 12f and then find a

smooth function that obeys the three conditions T(f) = 0, f(0) = 0 and f´(0) = 1.

5. Find a basis for the kernel of T: C^∞ ® C^∞ given by T(f) = f´´ + 2f´ + 2f and find a

smooth function that obeys the three conditions T(f) = 0, f(0) = 1 and f(1) = 1.

6. Find a basis for the kernel of T: C^∞ ® C^∞ given by T(f) = f´´ + 6f´ + 9f and find a

smooth function that obeys the three conditions T(t) = 0, f´(0) = 1 and f(1) = 0.

7. Find a basis for the kernel of T: C^∞ ® C^∞given by T(f) = f´´ + f(0).

8. Find a basis for the image of T: C^∞ ® C^∞ given by T(f) = f(0) + f´(0)t + (f(0)+f´(0))t².

9. Explain why the equation t f´(t) = 1 has no solutions in C^∞.

10. Let T(f) = t² f´(t) + 2t f(t).

a) Suppose that T(f) = 0. If g(t) = t² f(t), explain why g´(t) = 0.

b) Explain how to use the conclusions from a) to prove that kernel(T) = {0}.

c) Explain why the constant function 1 is not in the image of T.

10.2 Fourier series

In the preceding section, we looked at spaces of functions that behaved much like vectors in Rⁿ, but we did not look at any analogues of the concept of length, angle or dot product. In this section, we will discuss an example in where these analogs are introduced and play a central role.

To set the stage, recall that if a and b are real numbers with a < b, then [a, b] denotes the interval in R of points x with a ≤ t ≤ b. Note that the end points of the interval are included.

Now introduce the notation C[-π, π] to denote the collection of all continuous functions from the interval [-π, π] to R. For example, t, sin(t), |t|, and are in C[-π, π]. The last of these illustrates the fact that we only care about the values when t has values between –π and π. since 4 > π, the fact that blows up as t ® 4 has no bearing on its appearance in the space C[-π, π]. On the other hand is not in C[-π, π] since it is not defined at the point t = 2 which is in the interval between –π and π. Here is a completely bounded and well defined function that is not in C[-π, π]: The function f(t) that is defined to be 1 where t > 0, 0 at t = 0 and –1 where t < 0. The jump discontinuity of f as t crosses zero precludes its membership in C[-π, π].

As with C^∞, the collection C[-π, π] is a linear space. Indeed, if t ® f(t) and t ® g(t) are in C[-π, π], then so is the function t ® f(t)+g(t) as is t ® r f(t) in the case that r is a real number.

We now define the analog of an dot product on C[-π, π]. For this purpose, let f(t) and g(t) be any two continuous functions that are defined where –π ≤ t ≤ π. Their dot product is then denoted by áf, gñ, a number that is computed by doing the integral

áf, gñ º f(t)g(t) dt .

I hope to convince you that this has all of the salient features of the dot product on Rⁿ. For example

∑ áf, gñ = ág, fñ.

∑ If r is a real number, then ár f, gñ = r áf, gñ.

∑ If f, g and h are any three functions in C[-π, π], then áf + g, hñ = áf, hñ + ág, hñ.

∑ If f is not the constant function 0, then áf, fñ > 0.

I’ll leave it to you to verify the first three. To verify the fourth, notice first that

áf, fñ = f(t)² dt .

Now, the t ® f(t)² is non-negative, so the integral for áf, fñ computes the area under the graph in the (t, y) plane of the function y = f(t)². Now, as f(t)² is non-zero at some point (since f is not the constant function 0), this graph rises above the axis at some point. Since f is continuous, it rises above nearly as much at nearby points as well. Thus, there is some area under the graph, so áf, fñ > 0.

For example, if you remember how to integrate t sin(t), you will find that the dot product between the functions t and sin(t) is

át, sin(t)ñ = t sin(t) dt = 2.

(If you forgot how to integrate t sin(t), here is a hint: Think about integration by parts.)

For another example, the dot product between the constant function 1 and the function sin(t) is given by

á1, sin(t)ñ = sin(t) dt = (-cos(π) + cos(-π)) = 0 .

By analogy with the case of vectors in Rⁿ, we say that a pair of functions f and g from C[-π, π] are ‘orthogonal’ in the case that áf, gñ = 0. Thus, 1 and sin(t) are orthogonal, but t and sin(t) are not.

Just as we defined the length of a vector in Rⁿ using the dot product, so we define the length of any given function f Î C[-π, π] to be

The length of f is denoted here and elsewhere as || f ||, and this number is called the ‘norm’ of f. By analogy with the case of Rⁿ, we define the distance between functions f and g from C[-π, π] to be

|| f – g || =

Thus, the square of the distance between f and g is equal to

(f(t)-g(t))² dt .

According to this definition of distance, f is close to g in the case that f(t) is close to g(t) for all t Î [-π, π]. However, be forwarned that this definition doesn’t require that f(t) be close to g(t) at every t; only that they be suitably close for ‘most’ values of t. You will see this in the third and fourth examples below.

Here are some examples of norms and distances:

∑ The constant function 1 has norm || 1 || = √2 since this is the square root of times the length of [-π, π].

∑ The the square of the norm of the function t is

át, tñ = t² dt = π² .

Thus, the norm of t is || t || = π.

∑ Let R be a positive real number. Then the distance between the function f(t) = t and the function g(t) = t + e^-R|t| is the square root of

e^-2R|t| dt = (1-e^-2Rπ) .

Note in particular that the larger the value of R, the smaller the distance and as R ® ∞, the distance in question limits to zero. Even so, |f(0) – g(0)| = 1 no matter how large R.

∑ Let R be a positive real number. Then the distance between the function f(t) = t and the function g(t) = t + R^1/4 e^-R|t| is the square root of

(1-e^-2Rπ) .

Note that in this variation of the previous example, the distance between f and g again limits to zero as R ® ∞, even though |f(0) – g(0)| = R^1/4now blows up as R ® ∞. The point here and in the previous example is that two functions in C[-π, π] can be close and still have widely different values at some t. As remarked previously, their values need only be suitably close at most t Î [-π, π]. (I can’t criticize you for thinking that this phenomena illustrates a serious defect in our notion of distance. The fact is that for some uses, other notions of distance are necessary for this very reason.)

Granted now that we have a notion of dot product for the linear space C[-π, π], then we can introduce the notion of an ‘orthonormal’ set of functions. This notion is the analog of the notion of orthonormality that we used for vectors in Rⁿ. In particular, a finite or infinite collection {f₁, f₂, …} of functions is deemed ‘orthonormal’ in the case that

|| f_k || = 1 for all k and áf_j, f_kñ = 0 for all unequal j and k.

For example, the constant function and the function t comprise a two element orthonormal set. Indeed, the computations done previously for the norms of 1 and t justify the assertion that these two functions both have norm 1. Meanwhile, to see that these two functions are orthogonal, first note that the dot product between 1 and t is times the integral of t from –π to π. Then note that the latter integral is zero since it is the difference between the values of t² at t = π and t = -π. Here is another example: The functions in the set

{, t, (t² - π²)}

is also orthonormal.

You most probably will recognize the following facts as C[-π, π] analogs of assertions that hold for vectors in Rⁿ:

∑ If {f₁, f₂, …, f_N} is an orthonormal set, then they are linearly independent and so form a basis for their span.

∑ If h and g are orthogonal functions in C[-π, π], then || h ± g ||² = || h ||² + || g ||².

∑ Suppose that V is a subspace of C[-π, π] and that f Î C[-π, π]. If g is in V and if f – g is orthogonal to all functions in V, then || f – g || ≤ || f – h || if h is in V. Moreover, this inequality is an equality only in the case that h = g.

∑ If {f₁, . . ., f_N} is an orthonormal basis for a subspace V Ì C[-π, π] and if f is any function in C[-π, π], then the function in V we call proj_V(f) that is given by

proj_V(f)(t) = áf, f₁ñ f₁(t) + · · · + áf, f_Nñ f_N(t)

is the closest function in V to f. Thus, f – proj_Vf is orthogonal to each element in V.

∑ If V Ì C[-π, π] is a finite dimensional subspace, then V has an orthonormal basis.

The arguments for these last facts are essentially identical to those that prove the Rⁿ analogs. For example, to prove the first point, assume that g(t) º c₁f₁(t) + ··· + c_Nf_N(t) is zero for all t Î [-π, π] where c₁, . . . , c_N are constants. Now take the dot product of g with f₁ to find 0 = áf₁, gñ = c₁ áf₁, f₁ñ + c₂ áf₁, f₂ñ + ··· + c_N áf₁, f_Nñ. Now, because of the orthonormality, this equality boils down to 0 = c₁ 1 + c₂ 0 + ··· + c_N 0, so c₁ = 0. Take the dot product of g with f₂ to find that c₂ is zero, then f₃, etc.

As a second example, here is how to prove the final point: The first thing to note is that it suffices to prove that f – proj_Vf is orthogonal to every function in V. Indeed, if this is the case, then the version of the second point above with h = f – proj_Vf and g any function in V proves that proj_Vf is the closest function in V to f if f – proj_Vf is orthogonal to every function in V. In any event, f – proj_Vf is orthogonal to every function in V if and only if it is orthogonal to every basis function, that is each of f₁, . . ., f_N. Computing the dot product of f with any given f_k finds áf_k, fñ, and this is precisely the same as the dot product of f_k with proj_Vf. Thus, the dot product of any given f_k with f – proj_Vf is zero.

With regards to the final point, you won’t be surprised to learn that the Gram-Schmid algorithm that we used in the case of Rⁿ to find an orthormal basis works just fine in the case of C[-π, π]. For example, the linear span of the functions 1 and t² is a 2-dimensional subspace of C[-π, π]. Indeed, if c₁ + c₂t² is zero for all t with c₁ and c₂ constant, then it is zero at t = 0 and so c₁ = 0. It is also zero at t = 1, and so c₂ = 0 as well. To find an orthornormal basis, we first divide the constant function 1 by its norm to get a function with norm 1. The latter is . Next, we note that t²- á, t²ñ= t²-π² is orthogonal to . Thus, we get an orthonormal basis for the span of {1, t²} by using as the first basis element, and using for the second the function that you get by dividing the function t²-π² by the square root of the integral from –π to π of (t²-π²)²

Left unsaid in the final point above is whether any given infinite dimensional subspace of C[-π, π] has an orthonormal basis. The answer depends to some extent on how this question is interpreted. In any event, the next fact asserts that C[-π, π] itself has an infinite orthonormal basis. Moreover this basis ‘spans’ C[-π, π] in a certain sense that is explained below. The fact is that C[-π, π] has many such basis, but only the most commonly used one is presented below.

Fact 10.2.1: The collection {, cos(t), sin(t), cos(2t), sin(2t), cos(3t), sin(3t), · · · } is an orthormal set of functions in C[-π, π].

This fact is proved by verifying that the following integrals have the asserted values:

∑ á, ñ = dt = 1.

∑ á, cos(nt)ñ = cos(nt) dt = 0 for any n ≥ 1.

∑ á, sin(nt)ñ = sin(nt) dt = 0 for any n ≥ 1.

∑ ácos(nt), cos(nt)ñ = cos²(nt) dt = 1 for any n ≥ 1.

∑ ásin(nt), sin(nt)ñ = sin²(nt) dt = 1 for any n ≥ 1.

∑ ácos(nt), sin(mt)ñ = cos(nt) sin(mt) dt = 0 for any n and m.

∑ ácos(nt), cos(mt)ñ = cos(nt) cos(mt) dt = 0 for any n ≠ m ≥ 1.

∑ ásin(nt), sin(mt)ñ = sin(nt) sin(mt) dt = 0 for any n ≠ m ≥ 1.

To explain the sense in which the basis in Fact 10.2.1 spans C[-π, π], let me introduce, for each positive integer N, the subspace T_N Ì C[-π, π] that is given by the span of

{, cos(t), sin(t), · · · , cos(Nt), sin(Nt)} .

If f is any given function in C[-π, π], one can then define the projection of f onto T_N. This is the function

projf º a₀ + a₁ cos(t) + b₁ sin(t) + · · · + a_N cos(Nt) + b_N sin(Nt) ,

where

a₀ = f(t) dt, a_k = cos(kt) f(t) dt, and b_k = sin(kt) f(t) dt .

With this notation set, here is what I mean by ‘span’:

Fact 10.2.2: Let f be any function in C[-π, π]. Then lim_N_®∞ || f - projf || = 0.

Moreover, if the derivative of f is defined and continuous, then lim_N_®∞ (projf)(t) = f(t) if t lies strictly between π and –π. This assertion also holds at t = π and at t = -π in the case that f(π) = f(-π). In any event, whether f is or is not differentiable, the infinite series

a₀² + a₁² + b₁² + · · · + a_k² + b_k² + · · · is convergent and its limit is || f ||² = f(t)² dt.

By virtue of Fact 10.2.2, one often sees a given function f Î C[-π, π] written as

f(t) = a₀ + (a_k cos(kt) + b_k sin(kt)) ,

where the collection {a_k, b_k} are given just prior to Fact 10.2.2. Such a representation of f is called its ‘Fourier series’ after the mathematician who first introduced it, Jean-Baptiste-Joseph Fourier. (Fourier was born in 1768 and lived until 1830.)

In any event, the Fourier series for a given function f exhibits f as a sum of trigonometric functions and Fact 10.2.2 asserts the rather remarkable claim that every continuous function on the interval [-π, π] can be suitably approximated by such as sum.

The proof of Fact 10.2.2 is subtle and, but for the next remark, goes beyond what we will cover in this course. If the series a₀ + a₁ cos(t) + b₁ sin(t) + ··· is convergent at each t with limit f(t), then the convergence of the infinite series a₀² + a₁² + b₁² + ··· is an automatic consequence of the fact that the collection {, cos(t), sin(t), ···} is an orthonormal set of functions. To see why, take some large integer N and write

f = projf + (f - projf) .

Now, as discussed earlier, the two terms on the right hand side of this equation are orthogonal. This then means that

|| f ||² = || projf ||² + || f - projf ||² .

By virtue of the fact that {, cos(t), sin(t), ···} is orthonormal, the first term on the right hand side of this last equation is a₀² + a₁² + b₁² + · · · + a_N² + b_N². As a consequence, we see that

|| f ||² = a₀² + a₁² + b₁² + ··· + a_N² + b_N² + || f - projf ||² .

Thus, under the assumption that the limit as N ® ∞ of the far right term above is zero, we then have our derivation of the asserted limit for the infinite sum a₀² + a₁² + b₁² + · · · .

Here are some examples:

∑ t = 2(-1)^k+1sin(kt) .

∑ t² = π² + 4 (-1)^k cos(kt) .

∑ e^t =(e^π – e^-π)[ + (-1)^k(cos(kt) - sin(kt))].

As you can see, the Fourier series of some very simple functions have infinitely many terms.

When looking at the first example above, what do you make of the fact that π is definitely not zero, but sin(kπ) is zero for all k? In particular, the asserted ‘equality’ between the right and left hand sides in the first example is definitive nonsense at t = π. Even so, this does not violate the assertion of Fact 10.2.2 because the function t obviously does not have the same value at π as it does at –π. With regards to Fact 10.2.2, the equality in the first example holds only in the following sense:

lim_N_®∞ (t - 2π(-1)^k+1sin(kt))² dt = 0.

Thus, the equality in the first point holds at ‘most’ values of t in [-π, π], but not at all values of t.

Contrast this with the equality between t and its Fourier series at . According to Fact 10.2.2, the equality does indeed hold here, and so we obtain the following remarkable equality:

= 1 - + - ··· .

Other fantastic sums can be had by evaluating the right hand side of the equality between t² and its Fourier series at some special cases. For example, the respective t = 0 and t = π cases yield

π² = 1 - + - + ··· and π² = 1 + + + + ··· .

By the way, the second of these equalities is equivalent to the assertion in Fact 10.2.2 that the value, π², of || t ||² is equal to the sum of the squares of the coefficients that appear in front of the various factors of sin(kt) in the Fourier series expansion given above for t.

Here are the key notions to remember from 10.2:

∑ The space C[-π, π] has an dot product whereby the dot product of any given two functions f and g is equal to f(t)g(t) dt . This is denoted by áf, gñ.

∑ The norm of a function f is áf, fñ^1/2, it is positive unless f is the constant function 0.

∑ The distance between any two given functions f and g is the norm of f – g.

∑ Most constructions in Rⁿ that use the dot product work as well here. In particular, any finite dimensional subspace has an orthonormal basis, and one can use this basis to define the projection onto the subspace.

∑ There is an orthonormal basis for C[-π, π] that consists of the constant function plus the collection {cos(kt), sin(kt)}_k=1,2,… . Any given function f can be depicted using this basis as

f(t) = a₀ + (a_k cos(kt) + b_k sin(kt)),

where

a₀ = f(t) dt, a_k = cos(kt) f(t) dt and b_k = sin(kt) f(t) dt.

∑ The convergence of the series above to f(t) might not occur at all values of t, but in any event, the integral from –π to π of the square of the difference between f and the series truncated after N terms tends to zero as N tends to infinity.

Exercises

1. Find an orthonormal basis for the subspace of C[-π, π] spanned by {1, e^t, e^-t} and then

compute the projection of the function t onto this subspace.

2. Find the Fourier series for the function |t| on the interval [-π, π].

3. If a is a real constant, find the Fourier series for cosh(at) on the interval [-π, π] and use

the result to derive a closed form formula for .

4. Let r Î R. Prove that the collection , {cos(k(t-r)), sin(k(t-r)}_k=1,… is an

orthonormal basis for C[-π+r, π+r] using the dot product that assigns any two given functions f and g the number

f(t)g(t) dt.

5. Let a < b be real numbers. Prove that the constant function plus the collection

given by {cos(k(t - )), sin(k(t - ))} is an orthonormal basis for C[a, b] if the dot product is such as to assign any two functions f and g the number

f(t)g(t) dt.

10.3 Partial differential equations I: The heat/diffusion equation

There are significant applications of Fourier transforms in the theory of partial differential equations. In this regard, our discussion will focus on three very special, but often met equations: The heat/diffusion equation, Laplace’s equation and the wave equation. This section studies the first of these.

The heat equation and the diffusion equation are one and the same, although they arise in different contexts. For the sake of simplicity, we call it the heat equation. Here it is:

Definition 10.3.1: The heat or diffusion equation is for a function, T(t, x), of time t and position x. The equation involves a positive constant, m, and has the form

As is plainly evident, the heat equation relates one time derivative of T to two spacial derivatives. A typical problem is one in where the interest is focused only on points x in some interval [a, b] Î R with T some given function of x at time zero. The task then is to solve the heat equation for T(t, x) at times t > 0 and points x Î [a, b]. Often, there are constraints imposed on T at the endpoints x = a and x = b that are meant to hold for all t.

Here is a sample problem: Take a = -π and b = π so that the focus is on values of x in [-π, π]. Suppose that we are told that T(0, x) = f(x) with f some given function of x for x Î [-π, π]. The task is to find the functional form of T at all times t > 0.

Before we pursue this problem, let me explain where this equation comes from. (My apologies to the graduates of Math 21a who may have seen something very much like the explanation that follows). The preceding equation is known as the heat equation because it is used with great accuracy to predict the temperature of a long, but relatively thin rod as a function of time and position, x, along the rod. Thus T(t, x) is the temperature at time t and position x. The constant m that appears measures something of the thermal conductivity of the rod.

The theoretical underpinnings of this equation are based on our understanding of the temperature of a small section of the bar as measuring the average energy in the random motions of the constituent atoms. Heat ‘flows’ from a high temperature region to a low temperature because collisions between the constituent atoms tend to equalize their energy. In this regard, you most probably have that when a fast moving object strikes a slower one (for example in billiards), the faster one is almost always slowed by the collision while the slower one speeds up.

In any event, it is an experimental fact that a low energy region adjacent to a high energy one will tend to gain energy at the expense of the higher energy region. A simple way to model in a quantitative fashion is to postulate that the rate of flow of energy across any given slice of the rod at any given time has the form -m where m is a positive constant and where the derivative is evaluated at the x-coordinate of the slice and at the given value of t. Note that the minus sign here is dictated by the requirement that the flow of energy is from a high temperature region to a low temperature one.

Granted such a postulate, what follows is an argument for an equation that predicts the temperature as a function of time. Remembering that temperature measures the energy in the random motions of the particles, let us do some bookkeeping to keep track of the energy in a small width section, [x, x+dx], of the rod. Here, I take dx > 0 but very small. Think of T(t, x)dx as measuring the energy in this section of the rod. The time derivative of T(t, x)dx measures the net rate of energy coming into and leaving the section of rod. The net flow (positive or negative) of energy into our section of the bar is a sum of two terms: One is the flow across the left hand edge of the section, this being -m()|_x; and the other is the flow across the left hand edge, this equal to +m()|_x+_dx. Note the appearance of the + sign since flow into our region across the left hand edge is flow in the direction that makes the bar’s coordinate decrease.

Summing these two terms finds

(t, x) dx = m - .

To end the derivation, divide both sides by dx and observe that

- »

when dx is very small.

In any event, the task before us is to solve the heat equation in Definition 10.3.1 for T(t, x) at values of t ≥ 0 and x Î [-π, π] given that T(0, x) = f(x). To explain how this is done, introduce the space, C^∞[-π, π], of infinitely differentiable functions of x Î [-π, π] and then view the assignment

h(x) ®

as defining a linear operator on this space. (The operator is, of course, linear, because the second derivatives of a sum of functions is the sum of the second derivatives, and the second derivative of a constant times a function is equal to the same constant times the second derivative of the function.) It is customary to call this linear operator the ‘Laplacian’ and denote it by D. Our heat equation then asks for a function T that obeys the equation = DT.

As I hope you recall, we dealt with equations of just this form in the case that T was a vector in Rⁿ and D a linear operator from Rⁿ to itself. In the latter case, we were able to find explicit solutions when the linear operator on Rⁿ was diagonalizable. Let me remind you of how this went: Supposing, for the moment, that A is a diagonalizable linear operator on Rⁿ, let {e₁, …, e_n} denote its set of asssociated eigenvectors, a basis for Rⁿ. Each eigenvector has its associated eigenvalue, a real or complex number. The eigenvalue associated to e_k is denoted here by l_k. Now suppose that v₀ is a given vector in Rⁿ and suppose that we want to find the vector-valued function of time, t ® v(t), that obeys the equation = Av subject to the constraint that v(0) = v₀. We do this by first writing v₀ in terms of the basis {e_k} as v₀ = ∑_k a_k e_k with each a_k a scalar. This done, then

v(t) = ∑_k a_k e_k .

Our strategy for solving the heat equation in Definition 10.3.1 for a function T(t, x) of x Î [-π, π] subject to the initial condition T(0, x) = f(x) is the infinite dimensional analog of that just described. This understood, our first step is to find a basis for the functions on [-π, π] that consists of eigenvectors of the linear operator D. This might seem like a daunting task were it not for the seemingly serendipitous fact that every function in the Fourier basis

{, cos(x), sin(x), cos(2x), sin(2x), cos(3x), sin(3x), · · · }

is an eigenfunction of D. Indeed, D = 0 and for each k > 0,

cos(kx) = -k² cos(kx) and sin(kx) = -k² sin(kx).

Thus, we have the following observation:

Fact 10.3.2: Let f(x) denote any given continuous function on [-π, π] with continuous derivative, and write its Fourier series as

f(x) = a₀ + (a_k cos(kx) + b_k sin(kx)) .

Then there is a function, T(t, x), that is defined for t ≥ 0 and x Î [-π, π], whose Fourier series with respect to the variable x at any given t ≥ 0 is

T(t, x) = a₀ + (a_k cos(kx) + b_k sin(kx)) ,

and that solves the heat equation with initial condition T(0, x) = f(x) for all x Î (-π, π). If it is also the case that f(π) = f(-π), then it is also the case that T(t, π) = T(t, -π) for all t ≥ 0 and these are equal to f(π) at t = 0.

Here is a first example: Suppose, that f(x) = (π² – x²). We found its Fourier series in the previous part of this chapter,

f(x) = π² – 4 (-1)^k cos(kx) .

In this case, the funtion T(t, x) given by Fact 10.3.2 is

T(t, x) = π² – 4 (-1)^k cos(kx) .

Here is a second example: Take f(x) = e^x. From one of the examples in the previous part of this chapter, we see that the function T(t, x) that is given by Fact 10.3.2 in this case is

T(t, x) = (e^π – e^-π)[ + (-1)^k (cos(kx) - sin(kx))].

In all fairness, I should point out that there is some tricky business here that doesn’t arise in the finite dimensional model problem = Av. In particular, there are non-zero solutions to the heat equation for x Î [-π, π] whose time zero solution is the constant function f(x) º 0 for all x! Indeed, choose any point, a, that is not in the interval [-π, π] and the function

T(t, x) =

solves the heat equation for t = 0. Moreover, inspite of the factor , its t ® 0 limit at points x Î [-π, π] is zero. (Here is where the condition a Ï [-π, π] is crucial.) The point is that the factor (x–a)²/2mt blows up as t ® 0 if x ≠ a, and so its negative exponential is tiny and converges to zero as t ® 0. This convergence is much faster than the rate of blow up of . Indeed, to see that this is so, consider that the time derivative of T(t, x) is

(- + ) T(t, x)

which is positive when

t < .

Thus, T(t, x) is increasing for small t as long as x ≠ a. Therefore, since T(t, x) is not negative, it must have a limit as t ® 0 from the positive side since it decreases with t decreasing given that t is small. Let’s call this limit c. Then T(t, x) ~ t c for small t, and so T(0, x) = 0.

The existence of solutions such as the one just given is the manifestation of some facts about heat and diffusion that I haven’t mentioned but surely won’t surprise you if you have lived in a drafty old house: The distribution of heat in a room is not completely determined by the heat at time zero because you must take into account the heat that enters and leaves through the walls of the room. Thus, in order to completely pin down a unique solution to the heat equation, the function of x given by T(0, x) must be specified—this corresponds to the heat distribution at time zero in our hypothetical rod—but the functions T(t, π) and T(t, -π) of time must also be specified so as to pin down the amount of heat that enters and leaves the ends of our hypothetical rod.

Any specified function T(t, π) is called a boundary condition seeing as it is a condition on the solution that is imposed on the boundary of the rod. For example, specifying T(t, π) = 0 for all t tells us that the ends of the rod are kept at zero temperature. The existence of solutions to the heat equation with prescribed boundary conditions is an important subject, but one that we won’t pursue in this course.

Exercises

1. Solve the heat equation for a function T(t, x) of t ≥ 0 and x Î [-π, π] that obeys the

initial condition T(0, x) = sin²(x) – cos⁴(x). (Rather than do the integrals for the Fourier series, take the following shortcut: Use standard trigonometric identities to write T(0, x) as a sum of sine and cosine functions.)

2. Use Fourier series to solve the heat equation for a function T(t, x) of t ≥ 0 and

x Î [-π, π] that obeys the initial condition T(0, x) = sinh(x). You can avoid many of the integrals by exploiting the Fourier series solution for the initial condition e^x given above.

3. Suppose that c is a constant. Prove that T(t, x) = e^cx solves the heat equation.

4. Take the case c = 1 in the previous problem and prove that the resulting solution of

the heat equation with the initial condition T(0, x) = e^x is not the same as the one given in the text, above. (Hint: Compare the corresponding Fourier series.)

5. Use Fourier series to solve the heat equation for a function T(t, x) for t ≥ 0 and

for x Î [-π, π] subject to the initial condition T(0, x) = x.

6. Prove that T(t, x) = x is also a solution to the heat equation for t ≥ 0 and x Î [-π, π]

with the initial condition T(0, x) = x. Prove that it is different than the one you found in the Problem 5 using Fourier series.

10.4 Partial differential equations II: The Laplace and wave equations

The discussion that follows explores some features of two other very commonly met differential equations, one called the ‘Laplace equation’ and the other called the ‘wave equation’.

The discussion starts with the Laplace equation. This equation is for a function, T, of two space variables, x and y. Here is the definition:

Definition 10.4.1: A function u that is defined on some given region in the x-y plane is said to obey the Laplace equation in the case that

+ = 0

at all points (x, y) in the given region.

Versions of this equation arise in numerous areas in the sciences. Those of you who plan to take a course about electricity and magnetism will see it. Likewise, if you study the analog of our heat/diffusion equation for a thin plate shaped like the given region in the x-y plane, you will see that time independent solutions to the heat/diffusion equation are solutions to the Laplace equation. Indeed, this is because the two dimensional version of the heat equation is for a function T(t, x, y) of time and the space coordinates x and y that obeys the equation

If T is an equilibrium solution to this last equation, then it depends only on the space coordinates x and y and so supplies a solution to the Laplace equation.

Here is a basic fact about the Laplace equation and its solutions: Suppose that R is a bounded region in the x-y plane whose boundary is some finite union of segments of smooth curves. Suppose in addition that f is a continuous functon that is defined on the boundary of R with well defined directional derivatives. Then there is a unique solution in R to the Laplace equation that is smooth at points inside R and whose restriction to the boundary is equal to f.

To explain some of the terminology, a segment of a smooth curve is a connected part of a level set of some function, h(x, y), where the associated gradient vector is nonzero. In this regard, h is assumed to have partial derivatives to all orders with respect to the variables x and y.

If you haven’t yet taken a multivariable calculus course, this explanation and the constraints on the region most probably seem like mumbo-jumbo. If so, don’t fret because the discussion that follows concentrates exclusively on the case where the region R is the square where –π ≤ x ≤ π and –π ≤ y ≤ π. With this proviso understood, here is a formal restatement of what was just said:

Fact 10.4.2: Consider the square where both –π ≤ x ≤ π and –π ≤ y ≤ π. Suppose that f is any given continous function that is defined on the boundary of the square. Suppose, in addition, that f has bounded y-derivative along the two vertical segments of the boundary and bounded x-derivative along the two horizontal segments of the boundary. Then there is a unique solution to the Laplace equation in the square that is smooth at points inside R and whose restriction to the boundary is equal to f.

To see how this Fact plays out, consider first the example where k is a positive integer and where the given function f on the boundary of the square is equal to sin(ky) on the two vertical parts of the boundary, and is equal to zero on the two horizontal parts. In this case, I proceed by assuming that the solution u(x, y) has the form

u(x, y) = c(x) sin(ky)

where c is some function of x that is constrained so that

c(π) = c(-π) = 1 .

You rightly are asking why I chose this very particular form for u(x, y). I chose this form because I know that it works! Most probably, the first person (Laplace?) to try this form for u would not have given you a good answer as to why it was done. This said, you would be surprised at the number of so called ‘brilliant’ scientific advances that owe allegiance to the ‘guess and check’ school.

Anyway, grant me the right to at least give u(x, y) = c(x) sin(ky) a try. This is a function of the form g(x) h(y), and if such a function is plugged into the Laplace equation, all the x-derivatives hit g(x) and all the y-derivatives hit h(y). In the present case, I find that my u(x, y) = c(x) sin(ky) solves the Laplace equation provided that the function c(x) solves the equation

- k² c = 0 .

Except for renaming the coordinate x as t, this is precisely the sort of equation that we considered in Section 10.1. In particular, we learned in Section 10.1 that the general solution has the form

c(x) = a e^kx + b e^-kx

where a and b are constants. The question thus is as follows: Can I choose a and b so that the conditions c(π) = c(-π) = 1 hold?

This can be done viewed as finding a simultaneous solution to the linear equations

a e^kπ + b e^-kπ = 1 and a e^-kπ + b e^kπ = 1

As we saw earlier in the course, there is a unique solution to this equation when the matrix

M =

is invertible. As det(M) = e^2kπ – e^-2kπ > 0, this is indeed the case, and inverting M finds that

a = b = .

Thus, our solution in this case is

u(x, y) = (e^kx + e^-kx) sin(ky) .

As with the heat equation the Laplace equation is a linear equation. This is to say that the sum of any two solutions is a solution and the product of any solution and any real number is a solution. Granted this, then what we just did enables us to find a solution, u(x, y), to the Laplace equation in the case that the given function f is zero on the horizontal parts of the boundary and has the Fourier series

f(y)|_x=±π = a_k sin(ky)

on the vertical parts of the boundary. Here, each a_k is a constant. Indeed, the solution for this case is simply the sum of those for the cases where f was a_k sin(ky):

u(x, y) = a_k (e^kx + e^-kx) sin(ky) .

Fourier series can also be used to write down the solution to Laplace’s equation in the most general cases from Fact 10.4.2; that where the boundary function f is non-zero at points on both the horizontal and vertical parts of the boundary of the square. You will be asked to explore some aspects of this in the exercises.

Turn now to the story for the wave equation. The simplest example is an equation for a function, u(t, x), that is defined for all values of t Î R and for values of x that range over some interval [a, b]. Here is the definition:

Definition 10.4.3: Suppose that a positive number, c, and numbers a < b have been specified. A function, u, of the variables t and x where t Î R and x Î (a, b) is said to obey the wave equation in the case that

- c² = 0 .

at all of values of t Î R and x Î (a, b).

The wave equation is typically augmented with boundary conditions for u at the points where x = a and x = b. To keep the story short, we will only discuss the case where u is constrained so that

u(t, a) = 0 and u(t, b) = 0 for all t.

It is often the case that one must find a solution to the wave equation subject to additional conditions that constrain the value of u and its time derivative at t = 0. These are typically of the following form: Functions f(x) and g(x) on [a, b] are given that both vanish at the endpoints. A solution u(t, x) is then sought for the wave equation subject to the boundary conditions u(t, a) = 0 = u(t, b) and to the initial conditions

u(0, x) = f(x) and = g(x) for all x Î [a, b].

The equation in Definition 10.4.3 is called the wave equation because it is used to model the wave-like displacements (up/down) that are seen in vibrating strings. In this regard, such a model ignores gravity, friction and compressional effects as it postulates an idealized, tensed string whose equilibrium configuration stretches along the x-axis from where x = a to x = b, and whose ends are fixed during the vibration. The constant c that appears in the wave equation determines the fundamental frequency of the vibration, .

To elaborate, u(t, x) gives the z-coordinate of the string at time t over the point x on the x-axis. The boundary conditions u(t, a) = 0 = u(t, b) keeps the ends of the string fixed during the vibration. The initial conditions are specifying the state of the string at time 0. For example, in the case that g º 0, the string is started at time zero at rest, but with a displacement at any given x equal to f(x). As it turns out, such an idealization is quite accurate for small displacements in tautly stretched real strings. For example, the behavior of violin and other musical instrument strings are well described by the wave equation.

Somewhat more complicated versions of the wave equation are also used to model the propagation of sound waves, water waves, electromagnetic waves (such as light and radio waves), and sundry other wave-like phenomena.

The following summarizes what can be said about the existence of solutions:

Fact 10.4.4: Let f(x) and g(x) be any two given, smooth functions on an interval where a ≤ x ≤ b that are zero at the endpoints. Then, there is a unique function, u(t, x), that is defined for all t and for x Î [a, b], and has the following properties:

∑ u(t, x) obeys the wave equation for all t and for all points x with a < x < b.

∑ u(t, a) = u(t, b) = 0 for all t.

∑ u(0, x) = f(x) and = g(x) for all x Î [a, b].

To keep the subsequent examples relatively simple, consider henceforth only the case where a = -π and b = π. The challenge before us is to solve the wave equation in this context where the initial conditions have u(0, x) = sin(kx) and where u|_t=0 = 0 at all x. Here, k Î {1, 2, …}.

With the benefit of much hindsight, I now propose looking for a solution, u(t, x), having the form u(t, x) = h(t) sin(kx). Note that this guess has the virtue of satisfying the required boundary conditions that u(t, ±π) = 0. Plugging h(t) sin(kx) into the wave equation, I find that the latter equation is obeyed if and only if the function h(t) obeys the equation

+ c²k² h = 0

subject to the initial conditions h(0) = 1 and h|_t=0 = 0. According to Fact 10.1.3, the general solution to this last equation is

h(t) = a sin(ckt) + b cos(ckt)

where a and b are constants. The conditions h(0) = 1 and h|_t=0 = 0 requires that a = 0 and b = 1. Thus, our solution u(t, x) is

u(t, x) = cos(ckt) sin(kx) .

As you can see, this solution is periodic in time, with period equal to .

Now, you are invited to check that the following linearity conditions are fulfilled:

Fact 10.4.5: The sum of any two solutions to the wave equation is also a solution, as is the product of any solution by any real number.

Granted, this, we can use our solutions for the initial conditions u(0, x) = sin(kx) and (u)(0,x) = 0 to write down using Fourier series the solution to the wave equation for the initial conditions u(0, x) = f(x) and (u)(0, x) = 0 in the case that f(x) has only sine functions in its Fourier series. To elaborate, suppose that f(x) has the Fourier series

f(x) = ∑_k=1,2,… a_k sin(kx) where each a_k is a real number .

It then follows that the corresponding wave equation solution u(t, x) with the initial conditions u(0, x) = f(x) and (u)(0, x) = 0 is given by the sum

u(t, x) = ∑_k=1,2,… a_k cos(ckt) sin(kx) .

Exercises

1. This problem explores the use of Fourier series to write down solutions to the Laplace equation on the square in the x-y plane where –π ≤ x ≤ π and –π ≤ y ≤ π.

a) Let f denote a function on the boundary of the square that is zero on the horizontal parts of the boundary, has the Fourier series ∑_k=1,2,… a_k sin(ky) on the x = -π part of the boundary, and the Fourier series ∑_k=1,2,… b_k sin(ky) on the x = π part of the boundary. Here, each a_k and each b_k are constant, and they are not necessarily equal. Write down the function of x and y on the square that solves the Laplace equation and equals f on the boundary.

b) Let g denote a function on the boundary of the square that is zero on the vertical parts of the boundary, has the Fourier series ∑_k=1,2,… c_k sin(kx) on the y = -π part of the boundary, and the Fourier series ∑_k=1,2,… d_k sin(kx) on the y = π part of the boundary. Here, each c_k and each d_k are constant, and they are not necessarily equal. Write down the function of x and y on the square that solves the Laplace equation and equals g on the boundary.

c) Let h now denote a function on the boundary of the square that has the following Fourier series on the four sides of the boundary: The series ∑_k=1,2,… a_k sin(ky) on the x = -π part of the boundary, the series ∑_k=1,2,… b_k sin(ky) on the x = π part of the boundary, the series ∑_k=1,2,… c_k sin(kx) on the y = -π part of the boundary, and the series ∑_k=1,2,… d_k sin(kx) on the y = π part of the boundary. Write down the solution to the Laplace equation on the square that equals h on the boundary.

2. a) Find the solution to Laplace’s equation on the square in the x-y plane where both

–π ≤ x ≤ π and –π ≤ y ≤ π whose restriction to the boundary is a given constant, c.

b) Use the results from Part a) and also from Problem 1 to find the temperature at the point (0, 0) in the x-y plane when the temperature at any given point (x, y) with either x = ±π and –π ≤ y ≤ π, or with y = ±π and –π ≤ x ≤ π is equal to 1+ xy for all time. See the examples in Section 10.2 to obtain the explicit Fourier series for x and for y.

3. The problem explores the use of Fourier series to write down solutions to the wave equation for functions of t Î R and x Î [-π, π].

a) Let g(x) denote a function of x Î [-π, π] that vanishes at x = -π and at x = π. Suppose that g(x) has the Fourier series g(x) = ∑_k=1,2,… b_k sin(kx). Write down the solution, u(t, x), to the wave equation for values of t Î R and x Î [-π, π] that vanishes at x = ±π and obeys u(0,x) = 0 and (u)(0,x) = g(x).

b) Let f(x) denote a function of x Î [-π, π] that vanishes at x = -π and at x = π. Suppose that f(x) has the Fourier series f(x) = ∑_k=1,2,… a_k sin(kx). Let g(x) be as in the Part a). Give the solution, u(t, x), to the wave equation for values of t Î R and x Î [-π, π] that vanishes at x = ±π, while obeying u(0,x) = f(x) and also (u)(0,x) = g(x).

4. This problem explores an approach to the wave equation that does not use what we learned about Fourier series. Suppose that f(y) and g(y) are any 2-times differentiable functions of one variable (here called y).

a) Use the two variable version of the chain rule to show that the function

u(t, x) = f(x+ct) + g(x-ct)

satisfies the wave equation.

b) Show that the conditions u(t, -π) = u(t, π) = 0 hold for all t if both

g(y) = -f(2π-y) and f(y) = f(y+4π) for all y. In particular, explain why this then means that

u(t, x) = f(x + ct) – f(2π –x + ct).

c) Use the two variable chain rule again to explain why the condition (u) at t = 0 for all x Î [-π, π] requires that f(y) - f(2π-y) is constant.

d) Use the preceding to write down the wave equation solution u(t, x) with initial condition u(0, x) = cos(x) and (u)(0, x) = 0 for the case where x is constrained to obey –π ≤ x ≤ π.