Ordinary Differential Equations (II)

According to the ‘What’s Physics All About?’ title in Usborne Children’s Books series, physics is all about ‘discovering why things fall to the ground, how sound travels through walls and how many wonderful inventions exist thanks to physics.’

The Encyclopædia Britannica rephrases that definition of physics somewhat and identifies physics with ‘the science that deals with the structure of matter and the interactions between the fundamental constituents of the observable universe.’

[…]

Now, if I would have to define physics at this very moment, I’d say that physics is all about solving differential equations and complex integration. Let’s be honest: is there any page in any physics textbook that does not have any ∫ or ∂ symbols on it?

When everything is said and done, I guess that’s the Big Lie behind all these popular books, including Penrose’s Road to Reality. You need to learn how to write and speak in the language of physics to appreciate them and, for all practical purposes, the language of physics is math. Period.

I am also painfully aware of the fact that the type of differential equations I had to study as a student in economics (even at the graduate or Master’s level) are just a tiny fraction of what’s out there. The variety of differential equations that can be solved is truly intimidating and, because each and every type comes with its own step-by-step methodology, it’s not easy to remember what needs to be done.

Worse, I actually find it quite difficult to remember what ‘type’ this or that equation actually is. In addition, one often needs to reduce or rationalize the equation or – more complicated – substitute variables to get the equation in a form which can then be used to apply a certain method. To top if all off, there’s also this intimidating fact that – despite all these mathematical acrobatics – the vast majority of differential equations can actually not be solved analytically. Hence, in order to penetrate that area of darkness, one has to resort to numerical approaches, which I have yet to learn (the oldest of such numerical methods was apparently invented by the great Leonhard Euler, an 18th century mathematician and physicist from Switzerland).

So where am I actually in this mathematical Wonderland?

I’ve looked at ordinary differential equations only so far, i.e. equations involving one dependent variable (usually written as y) and one independent variable (usually written as x or t), and at equations of the first order only. So that means that (a) we don’t have any ∂ symbols in these differential equations (let me use the DE abbreviation from now on) but just the differential symbol d (so that’s what makes them ordinary DEs, as opposed to partial DEs), and that (b) the highest-order derivative in them is the first derivative only (i.e. y’ = dy/dx). Hence, the only ‘lower-order derivative’ is the function y itself (remember that there’s this somewhat odd mathematical ‘convention’ identifying a function with the zeroth derivative of itself).

Such first-order DEs will usually not be linear things and, even if they look like linear things, don’t jump to conclusions because the term linear (first-order) differential equation is very specific: it means that the (first) derivative and the function itself appear in a linear combination. To be more specific, the term linear differential equation (for the first-order case) is reserved for DEs of the form

a1(t) y'(t) + a0(t) y(t) = q(t).

So, besides y(t) and y'(t) – whose functional form we don’t know because (don’t forget!) finding y(t) is the objective of solving these DEs 🙂 – we have three other random functions of the independent variable t here, namely  a1(t), a0(t) and q(t). Now, these functions may or may not be linear functions of t (they’re probably not) but that doesn’t matter: the important thing – to qualify as ‘linear’ – is that (1) y(t) and y'(t), i.e. the dependent variable and its derivative, appear in a linear combination and have these ‘coefficients’ a1(t) and a0(t) (which, I repeat, may be constants but, more likely, will probably be functions of t themselves), and (2) that, on the other side of the equation, we’ve got this q(t) function, which also may or – more likely – may not be a constant.

Are you still with me? [If not, read again. :-)]

This type of equation – of which the example in my previous post was a specimen – can be solved by introducing a so-called integrating factor. Now, I won’t explain that here – not because the explanation is too easy (it’s not), but because it’s pretty standard and, much more importantly, because it’s too lengthy to copy here. [If you’d be looking for an ‘easy’ explanation, I’d recommend Paul’s Online Math Notes once again.]

So I’ll continue with my ‘typology’ of first-order DEs. However, I’ll do so only after noting that, before letting that integrating factor loose (OK, let me say something about it: in essence, the integrating factor is some function λ(x) which we’ll multiply with the whole equation and which, because of a clever choice of λ(x) obviously, helps us to solve the equation), you’ll have to rewrite these linear first-order DEs as y'(t) + (a0(t)/a1(t)) y(t) = q(t)/a1(t) (so just divide both sides by this a1(t) function) or, using the more prevalent notation x for the independent variable (instead of t) and equating a0(x)/a1(x) with F(x) and q(x)/a1(x) with G(x), as:

dy/dx + F(x) y = G(x), or y‘ + F(x) y = G(x)

So, that’s one ‘type’ of first-order differential equations: linear DEs. [We’re only dealing with first-order DEs here but let me note that the general form of a linear DE of the nth order is an(x) y(n) + an-1(x) y(n-1) + … + a1(x) y’ + a0(x) y = q(x), and that most standard texts on higher-order DEs focus on linear DEs only, so they are important – even if they are only a tiny fraction of the DE universe.]

The second major ‘exam-type’ of DEs which you’ll encounter is the category of so-called separable DEs. Separable (first-order) differential equations are equations of the form:

P(xdx + Q(ydy = 0, which can also be written as G(y) y‘ = F(x)

or dy/dx = F(x)/G(y)

The notion of ‘separable’ refers to the fact that we can neatly separate out the terms involving y and x respectively, in order to then bring them on the left- and right-hand side of the equation respectively (cf. the G(yy‘ = F(x) form), which is what we’ll need to do to solve the equation.

I’ve been rather vague on that ‘integrating factor’ we use to solve linear equations – for the obvious reason that it’s not all that simple – but, in contrast, solving separable equations is very straightforward. We don’t need to use an integrating factor or substitute something. We actually don’t need any mathematical acrobatics here at all! We can just ‘separate’ the variables indeed and integrate both sides.

Indeed, if we write the equation as G(y)y’ = G(y)[dy/dx] = F(x), we can integrate both sides over xbut use the fact that ∫G(y)[dy/dx]dx = ∫G(y)dy. So the equation becomes ∫G(y)dy = ∫F(x)dx, and so we’re actually integrating a function of y over y on the left-hand side, and the other function (of x), on the right-hand side, over x. We then get an implicit function with y and x as variables and, usually, we can solve that implicit equation and find y in terms of x (i.e. we can solve the implicit equation for y(x) – which is the solution for our problem). [I do say ‘usually’ here. That means: not always. In fact, for most implicit functions, there’s no formula which defines them explicitly. But that’s OK and I won’t dwell on that.]

So that’s what meant with ‘separation’ of variables: we put all the things with y on one side, and all the things with x on the other, and then we integrate both sides. Sort of. 🙂

OK. You’re with me. In fact, you’re ahead of me and you’ll say: Hey! Hold it! P(x)dx + Q(y)dy is a linear combination as well, isn’t it? So we can look at this as a linear DE as well, isn’t it? And so why wouldn’t we use the other method – the one with that factor thing?

Well… No. Go back and read again. We’ve got a linear combination of the differentials dx and dy here, but so that’s obviously not a linear combination of the derivative y’ and the function y. In addition, the coefficient in front of dy is a function in y, i.e. a function of the dependent variable, not a function in x, so it’s not like these an(x) coefficients which we would need to see in order to qualify the DE as a linear one. So it’s not linear. It’s separable. Period.

[…] Oh. I see. But are these non-linear things allowed really?

Of course. Linear differential equations are only a tiny little fraction of the DE universe: first, we can have these ‘coefficients’, which can be – and usually will be – a function of both x and y, and then, secondly, the various terms in the DE do not need to constitute a nice linear combination. In short, most DEs are not linear – in the context-specific definitional sense of the word ‘linear’ that is (sorry for my poor English). 🙂

[…] OK. Got it. Please carry on.

That brings us to the third type of first-order DEs: these are the so-called exact DEs. Exact DEs have the same ‘shape’ as separable equations but the ‘coefficients’ of the dx and dy terms are a function of both x and y indeed. In other words, we can write them as:

P(x, y) dx + Q(x, y) dy = 0, or as A(x, y) dx + B(x, y) dy = 0,

or, as you will also see it, dy/dx = M(x, y)/N(x, y) (use whatever letter you want).

However, in order to solve this type of equation, an additional condition will need to be fulfilled, and that is that ∂P/∂y = ∂Q/∂x (or ∂A/∂y = ∂B/∂x if you use the other representation). Indeed, if that condition is fulfilled – which you have to verify by checking these derivatives for the case at hand – then this equation is a so-called exact equation and, then… Well… Then we can find some function U(x, y) of which P(x, y) and Q(x, y) are the partial derivatives, so we’ll have that ∂U(x, y)/∂x = P(x, y) and ∂U(x, y)/∂y = Q(x, y). [As for that condition we need to impose, that’s quite logical if you write down the second-order cross-partials, ∂P(x, y)/∂y and ∂Q(x, y)/∂x and remember that such cross-partials are equal to each other, i.e. Uxy = Uyx.]

We can then find U(x, y), of course, by integrating P or Q. And then we just write that dU = P(x, y) dx + Q(x, y) dy = Ux dx + Uy dy = 0 and, because we’ve got the functional form of U, we’ll get, once again, an implicit function in y and x, which we may or may not be able to solve for y(x).

Are you still with me? [If not, read again. :-)]

So, we’ve got three different types of first-order DEs here: linear, separable, and exact. Are there any other types? Well… Yes.

Yes of course! Just write down any random equation with a first-order derivative in it – don’t think: just do it – and then look at what you’ve jotted down and compare its form with the form of the equations above: the probability that it will not fit into any of the three mentioned categories is ‘rather high’, as the Brits would say – euphemistically. 🙂

That being said, it’s also quite probable that a good substitution of the variable could make it ‘fit’. In addition, we have not exhausted our typology of first-order DEs as yet and, hence, we’ve not exhausted our repertoire of methods to solve them either.

For example, if we would find that the conditions for exactness for the equation P(x, y) dx + Q(x, y) dy = 0 are not fulfilled, we could still solve that equation if another condition would turn out to be true: if the functions P(x, y) and Q(x, y) would happen to be homogeneous, i.e. P(x, y) and Q(x, y) would both happen to satisfy the equality P(ax, ay) = ar P(x, y) and Q(ax, ay) = ar Q(x, y) (i.e. they are both homogeneous functions of degree r), then we can use the substitution v(x) = y/x (i.e. y = vx) and transform the equation into a separable one, which we can then solve for v.

Indeed, the substitution yields dv/dx = [F(v)-v]/x, and so that’s nicely separable. We can then find y, after we’ve solved the equation, by substituting v for y/x again. I’ll refer to the Wikipedia article on homogeneous functions for the proof that, if P(x, y) and Q(x, y) are homogeneous indeed, we can write the differential equation as:

dy/dx = M(x, y)/N(x, y) = F(y/x) or, in short, y’ = F(y/x)

[…]

Hmm… OK. What’s next? That condition of homogeneity which we are imposing here is quite restrictive too, isn’t it?

It is: the vast majority of M(x, y) and N(x, y) functions will not be homogeneous and so then we’re stuck once again. But don’t worry, the mathematician’s repertoire of substitutions is vast, and so there’s plenty of other stuff out there which we can try – if we’d remember it at least 🙂 .

Indeed, another nice example of a type of equation which can be made separable through the use of a substitution are equations of the form y’ = G(ax + by), which can be rewritten as a separable equation by substituting ax + by for v. If we do this substitution, we can then rewrite the equation – after some re-arranging of the terms at least – as dv/dx = a + b G(v), and so that’s, once again, an equation which is separable and, hence, solvable. Tick! 🙂

Finally, we can also solve DEs which come in the form of a so-called Bernoulli equation through another clever substitution. A Bernoulli equation is a non-linear differential equation in the form:

y’ + F(x) y = G(x) yn

The problem here is, obviously, that exponent n in the right-hand side of the equation (i.e. the exponent of y), which makes the equation very non-linear indeed. However, it turns out that, if one substitutes y for v = y1-n, we are back at the linear situation and so we can then use the method for the linear case (i.e. the use of an integrating factor). [If you want to try this without consulting a math textbook, then don’t forget that v’ will be equal to v’ = (1-n)y-ny’ (so y-ny’ = v’/(1-n), and also that you’ll need to rewrite the equation as y-ny’ + f(x) y1-n = g(x) before doing that substitution. Of course, also remember that, after the substitution, you’ll still have to solve the linear equation, so then you need to know how to use that integrating factor. Good luck! :-)]

OK. I understand you’ve had enough by now. So what’s next? Well, frankly, this is not so bad as far as first-order differential equations go. I actually covered a lot of terrain here, although Mathews and Walker go much and much further (so don’t worry: I know what to do in the days ahead!).

The thing now is to get good at solving these things, and to understand how to model physical systems using such equations. But so that’s something which is supposed to be fun: it should be all about “discovering why things fall to the ground, how sound travels through walls and how many wonderful inventions exist thanks to physics” indeed.

Too bad that, in order to do that, one has to do quite some detour!

Post Scriptum: The term ‘homogeneous’ is quite confusing: there is also the concept of linear homogeneous differential equations and it’s not the same thing as a homogeneous first-order differential equation. I find it one of the most striking examples of how the same word can mean entirely different things even in mathematics. What’s the difference?

Well… A homogeneous first-order DE is actually not linear. See above: a homogeneous first-order DE is an equation in the form dy/dx = M(x, y)/N(x, y). In addition, there’s another requirement, which is as important as the form of the DE, and that is that M(x, y) and N(x, y) should be homogeneous functions, i.e. they should have that F(ax, ay) = ar F(x, y) property. In contrast, a linear homogeneous DE is, in the first place, a linear DE, so it’s general form must be L(y) = an(x) y(n) + an-1(x) y(n-1) + … + a1(x) y’ + a0(x) y = q(x) (so L(y) must be a linear combination whose terms have coefficients which may be constants but, more often than not, will be functions of the variable x). In addition, it must be homogeneous, and this means – in this context at least – that q(x) is equal to zero (so q(x) is equal to the constant 0). So we’ve got L(y) = 0 or, if we’d use the y’ + F(x) y = G(x) formulation, we have y’ + F(xy = 0 (so that G(x) function in the more general form of a linear first-order DE is equal to zero).

So is this yet another type of differential equation? No. A linear homogeneous DE is, in the first place, linear, 🙂 so we can solve it with that method I mentioned above already, i.e. we should introduce an integrating factor. An integrating factor is a new function λ(x), which helps us – after we’ve multiplied the whole equation with this λ(x) – to solve the equation. However, while the procedure is not difficult at all, its explanation is rather lengthy and, hence, I’ll skip that and just refer my imaginary readers here to the Web.

But, now that we’re here, let me quickly complete my typology of first-order DEs and introduce a generalization of the (first) notion of homogeneity, and that’s isobaric differential equations.

An isobaric DE is an equation which has the same general form as the homogeneous (first-order) DE, so an isobaric DE looks like dy/dx = F(x, y), but we have a more general condition than homogeneity applying to F(x, y), namely the property of isobarity (which is another word with multiple meanings but let us not be bothered by that). An isobaric function F(x, y) satisfies the following equality: F(ax, ary) = ar-1F(x, y), and it can be shown that the isobaric differential equation dy/dx = F(x, y), i.e. a DE of this form with F(x, y) being isobaric, becomes separable when using the y = vxr substitution.

OK. You’ll say: So what? Well… Nothing much I guess. 🙂

Let me wrap up by noting that we also have the so-called Clairaut equations as yet another type of first-order DEs. Clairaut equations are first-order DEs in the form y – xy’ = F(y’). When we differentiate both sides, we get y”(F'(y’) + x) = 0.

Now, this equation holds if (i) y” = 0 or (ii) F'(y’) + x = 0 (or both obviously). Solving (i), so solving for y” = 0, yields a family of (infinitely many) straight-line functions y = ax + b as the general solution, while solving (ii) yields only one solution, the so-called singular solution, whose graph is the envelope of the graphs of the general solution. The graph below shows these solutions for the square and cube functional forms respectively (so the solutions for y – xy’ = [y’]2 and y – xy’ = [y’]3 respectively).

Clairaut f(t)=t^2Clairaut equation f(t)=t^3

For the F(y’) = [y’]functional form, you have a parabola (i.e. the graph of a quadratic function indeed) as the envelope of all of the straight lines. As for the F(y’) = [y’]function, well… I am not sure. It reminds me of those plastic French curves we used as little kids to make all kinds of silly drawings. It also reminds me of those drawings we had to make in high school on engineering graph paper using an expensive 0.1 or 0.05 mm pen. 🙂

In any case, we’ve got quite a collection of first-order DEs now – linear, separable, exact, homogeneous, Bernouilli-type, isobaric, Clairaut-type, … – and so I think I should really stop now. Remember I haven’t started talking about higher-order DEs (e.g. second-order DEs) as yet, and I haven’t talked about partial differential equations either, and so you can imagine that the universe of differential equations is much and much larger than what this brief overview here suggests. Expect much more to come as I’ll dig into it!

Post Scriptum 2: There is a second thing I wanted to jot down somewhere, and this post may be the appropriate place. Let me ask you something: have you never wondered why the same long S symbol (i.e. the summation or integration symbol ∫) is used to denote both definite and indefinite integrals? I did. I mean the following: when we write ∫f(x)dx or ∫[a, b] f(x)dx, we refer to two very different things, don’t we? Things that, at first sight, have nothing to do with each other.

Huh? 

Well… Think about it. When we write ∫f(x)dx, then we actually refer to infinitely many functions F1(x), F2(x), F3(x), etcetera (we generally write them as F(x) + c, because they differ by a constant only) which all belong to the same ‘family’ because they all have the same derivative, namely that function f(x) in the integrand. So we have F1‘(x) = F2‘(x) = F3‘(x) = … = F'(x) = f(x). The graphs of these functions cover the whole plane, and we can say all kinds of things about them, but it is not obvious that these functions can be related to some sum, finite or infinite. Indeed, when we look for those functions by solving, for example, an integral such as ∫(xe6x+x5/3+√x)dx, we use a lot of rules and various properties of functions (this one will involve integration by parts for example) but nothing of that reminds us, not even remotely, of doing some kind of finite or infinite sum.

On the other hand, ∫[a, b] f(x)dx, i.e. the definite integral of f(x) over the interval [a, b], yields a real number with a very specific meaning: it’s the area between point a and point b under the graph y = f(x), and the long S symbol (i.e. the summation symbol ∫) is particularly appropriate because the expression ∫[a, b] f(x)dx stands for an infinite sum indeed. That’s why Leibniz chose the symbol back in 1675!

Let me give an example here. Let x be the distance which an object has traveled since we started observing it. Now, that distance is equal to an infinite sum which we can write as ∑v(t)Δt, . What we do here amounts to multiplying the speed v at time t, i.e. v(t), with (the length of) the time interval Δt over an infinite number of little time intervals, and then we sum all those products to get the total distance. If we use the differential notation (d) for infinitesimally small quantities (dv, dx, dt etcetera), then this distance x will be equal to the sum of all little distances dx = v(t)dt. So we have an infinite sum indeed which, using the long S (i.e. Leibniz’s summation symbol), we can write as ∑v(t)dt  = ∑dx = ∫[0, t]v(t)dt  = ∫[0, t]dx = x(t).

The illustration below gives an idea of how this works. The black curve is the v(t) function, so velocity (vertical axis) as a function of time (horizontal axis). Don’t worry about the function going negative: negative velocity would mean that we allow our object to reverse direction. As you can see, the value of v(t) is the (approximate) height of each of these rectangles (note that we take irregular partitions here, but that doesn’t matter), and then just imagine that the time intervals Δt (i.e. the width of the rectangular areas) become smaller and smaller – infinitesimally small in fact.

600px-Integral_Riemann_sum

I guess I don’t need to be more explicit here. The point is that we have such infinite sum interpretation for the definite integral only, not for an indefinite one. So why would we use the same summation symbol ∫ for the indefinite integral? Why wouldn’t we use some other symbol for it (because it is something else, isn’t it?)? Or, if we wouldn’t want to introduce any new symbols (because we’ve got quite a bunch already here), then why wouldn’t we combine the common inverse function symbol (i.e. f-1) and the differentiation operator DDx or d/dx, so we would write D-1f(x) or Dx-1 instead of ∫f(x)dx? If we would do that, we would write the Fundamental Theorem of Calculus, which you obviously know (as you need it to solve definite integrals), as:

Capture

You have seen this formula, haven’t you? Except for the D-1f(x) notation of course. This Theorem tells us that, to solve the definite integral on the left-hand side, we should just (i) take an antiderivative of f(x) (and it really doesn’t matter which one because the constant c will appear two times in the F(b) – F(a) equation,  as c — c = 0 to be precise, and, hence, this constant just vanishes, regardless of its value), (ii) plug in the values a and b, (iii) subtract one from the other (i.e. F(a) from F(b), not the other way around—otherwise we’ll have the sign of the integral wrong), and there we are: we’ve got the answer—for our definite integral that is.

But so I am not using the standard ∫ symbol for the antiderivative above. I am using… well… a new symbol, D-1, which, in my view, makes it clear what we have to do, and that is to find an antiderivative of f(x) so we can solve that definite integral. [Note that, if we’d want to keep track of what variable we’re integrating over (in case we’d be dealing with partial differential equations for instance, or if it would not be sufficiently clear from the context), we should use the Dx-1 notation, rather than just D.]

OK. You may think this is hairsplitting. What’s in a name after all? Or in a symbol in this case? Well… In math, you need to make sure that your notations make perfect sense and that you don’t write things that may be confusing.

That being said, there’s actually a very good reason to re-use the long S symbol for indefinite integrals also.

Huh? Why? You just said the definite and indefinite integral are two very different things and so that’s why you’d rather see that new D-1f(x) notation instead of ∫f(x)dx !? 

Well… Yes and no. You may or may not remember from your high school course in calculus or analysis that, in order to get to that fundamental theorem of calculus, we need the following ‘intermediate’ result: IF we define a function F(x) in some interval [a, b] as F(x) = ∫[a, xf(t)dt (so a ≤ x ≤ b and a ≤ t ≤ x) — so, in other words, we’ve got a definite integral here with some fixed value a as the lower boundary but with the variable x itself as the upper boundary (so we have x instead of the fixed value b, and b now only serves as the upper limit of the interval over which we’re defining this new function F(x) here) — THEN it’s easy to show that the derivative of this F(x) function will be equal to f(x), so we’ll find that F'(x) = f(x).

In other words, F(x) = ∫[a, xf(t)dt is, quite obviously, one of the (infinitely many) antiderivatives of f(x), and if you’d wonder which one, well… That obviously depends on the value of a that we’d be picking. So there actually is a pretty straightforward relationship between the definite and indefinite integral: we can find an antiderivative F(x) + c of a function f(x) by evaluating a definite integral from some fixed point a to the variable x itself, as illustrated below.

Relation between definite and indefinite integral

Now, remember that we just need one antiderivative to solve a definite integral, not the whole family, and which one we’ll get will depend on that value a (or x0as that fixed point is being referred to in the formula used the illustration above), so it will depend on what choice we make there for the lower boundary. Indeed, you can work that out for yourself by just solving ∫[x0xf(t)dt for two different values of x0 (i.e. a and b in the example below):

Capture

The point is that we can get all of the antiderivatives of f(x) through that definite integral: it just depends on a judicious choice of x0 but so you’ll get the same family of functions F(x) + c. Hence, it is logical to use the same summation symbol, but with no bounds mentioned, to designate the whole family of antiderivatives. So, writing the Fundamental Theorem of Calculus as

Capture

instead of that alternative with the D-1f(x) notation does make sense. 🙂

Let me wrap up this conversation by noting that the above-mentioned ‘intermediate’ result (I mean F(x) = ∫[a, xf(t)dt with F'(x) = f(x) here) is actually not ‘intermediate’ at all: it is equivalent to the fundamental theorem of calculus itself (indeed, the author of the Wikipedia article of the fundamental theorem of calculus presents the expression above as a ‘corollary’ to the F(x) = ∫[a, xf(t)dt result, which he or she presents as the theorem itself). So, if you’ve been able to prove the ‘intermediate’ result, you’ve also proved the theorem itself. One can easily see that by verifying the identities below:

Capture

Huh? Is this legal? It is. Just jot down a graph with some function f(t) and the values a, x and b, and you’ll see it all makes sense. 🙂

An easy piece: Ordinary Differential Equations (I)

Although Richard Feynman’s iconic Lectures on Physics are best read together, as an integrated textbook that is, smart publishers bundled some of the lectures in two separate publications: Six Easy Pieces and Six Not-So-Easy Pieces. Well… Reading Penrose has been quite exhausting so far and, hence, I feel like doing an easy piece here – just for a change. 🙂

In addition, I am half-way through this graduate-level course on Complex variables and Applications (from McGraw-Hill’s Brown—Churchill Series) but I feel that I will gain much more from the remaining chapters (which are focused on applications) if I’d just branch off for a while and first go through another classic graduate-level course dealing with math, but perhaps with some more emphasis on physics. A quick check reveals that Mathematical Methods of Physics, written by Jon Mathews and R.L. Walker will probably fit the bill. This textbook is used it as a graduate course at the University of Chicago and, in addition, Mathews and Walker were colleagues of Feynman and, hence, their course should dovetail nicely with Feynman’s Lectures: that’s why I bought it when I saw this 2004 reprint for the Indian subcontinent in a bookshop in Delhi. [As for Feynman’s Lectures, I wouldn’t recommend these Lectures if you want to know more about quantum mechanics, but for classical mechanics and electromagnetism/electrodynamics they’re still great.]

So here we go: Chapter 1, on Differential Equations.

Of course, I mean ordinary differential equations, so things with one dependent and one independent variable only, as opposed to partial differential equations, which have partial derivatives (i.e. terms with δ symbols in them, as opposed to the used in dy and dy) because there’s more than one independent variable. We’ll need to get into partial differential equations soon enough, if only because wave equations are partial differential equations, but let’s start with the start.

While I thought I knew a thing or two about differential equations from my graduate-level courses in economics, I’ve discovered many new things already. One of them is the concept of a slope field, or a direction field. Below the examples I took from Paul’s Online Notes in Mathematics (http://tutorial.math.lamar.edu/Classes/DE/DirectionFields.aspx), who’s a source I warmly recommend (his full name is Paul Dawkins, and he developed these notes for Lamar University, Texas):

Direction field 1  Direction field 3Direction field 2

These things are great: they helped me to understand what a differential equation actually is. So what is it then? Well, let’s take the example of the first graph. That example models the following situation: we have a falling object with mass m (so the force of gravity acts on it) but its fall gets slowed down because of air resistance. So we have two forces FG and Facting on the object, as depicted below:

Forces on m

Now, the force of gravity is proportional to the mass m of the falling object, with the factor of proportionality equal to the gravitational constant of course. So we have FG = mg with g = 9.8 m/s2. [Note that forces are measured in newtons and 1 N = 1 (kg)(m)/(s2).] 

The force due to air resistance has a negative sign because it acts like a brake and, hence, it has the opposite direction of the gravity force. The example assumes that it is proportional to the velocity v of the object, which seems reasonable enough: if it goes faster and faster, the air will slow it down more and more so we have FA = —γv, with v = v(t) the velocity of the object and γ some (positive) constant representing the factor of proportionality for this force. [In fact, the force due to air resistance is usually referred to as the drag, and it is proportional to the square of the velocity, but so let’s keep it simple here.]

Now, when things are being modeled like this, I find the thing that is most difficult is to keep track of what depends on what exactly. For example, it is obvious that, in this example, the total force on the object will also depend on the velocity and so we have a force here which we should write as a function of both time and velocity. Newton’s Law of Motion (the Second Law to be precise, i.e. ma = m(dv/dt) =F) thus becomes

m(dv/dt) = F(t, v) = mg – γv(t).

Note the peculiarity of this F(t, v) function: in the end, we will want to write v(t) as an explicit function of t, but so here we write F as a function with two separate arguments t and v. So what depends on what here? What does this equation represent really?

Well… The equation does surely not represent one or the other implicit function: an implicit function, such as x2 + y2 = 1 for example (i.e. the unit circle), is still a function: it associates one of the variables (usually referred to as the value) to the other variables (the arguments). But, surely, we have that too here? No. If anything, a differential equation represents a family of functions, just like an indefinite integral.

Indeed, you’ll remember that an indefinite integral ∫f(x)dx represents all functions F(x) for which F'(x) = dF(x)/dx = f(x). These functions are, for a very obvious reason, referred to as the anti-derivatives of f(x) and it turns out that all these antiderivatives differ from each other by a constant only, so we can write ∫f(x)dx = F(x) + c, and so the graphs of all the antiderivatives of a given function are, quite simply, vertical translations of each other, i.e. their vertical location depends on the value of c. I don’t want to anticipate too much, but so we’ll have something similar here, except that our ‘constant’ will usually appear in a somewhat more complicated format such as, in this example, as v(t) = 50 + ce—0.196t. So we also have a family of primitive functions v(t) here, which differ from each other by the constant c (and, hence, are ‘indefinite’ so to say), but so when we would graph this particular family of functions, their vertical distance will not only depend on c but also on t. But let us not run before we can walk.  

The thing to note – and to always remember when you’re looking at a differential equation – is that the equation itself represents a world of possibilities, or parallel universes if you want :-), but, also, that’s it in only one of them that things are actually happening. That’s why differential equations usually have an infinite number of general (or possible) solutions but only one of these will be the actual solution, and which one that is will depend on the initial conditions, i.e. where we actually start from: is the object at rest when we start looking, is it in equilibrium, or is it somewhere in-between?

What we know for sure is that, at any one point of time t, this object can only have one velocity, and, because it’s also pretty obvious that, in the real world, t is the independent variable and v the dependent one (the velocity of our object does not change time), we can thus write v = v(t) = du/dt indeed. [The variable u = u(t) is the vertical position of the object and its velocity is, obviously, the rate of change of this vertical position, i.e. the derivative with regard to time.]

So that’s the first thing you should note about these direction fields: we’re trying to understand what is going on with these graphs and so we identify the dependent variable with the y axis and the independent variable with the x axis, in line with the general convention that such graphs will usually depict a y = y(x) relationship. In this case, we’re interested in the velocity of the object (not its position), and so v = v(t) is the variable on the y axis of that first graph.

Now, there’s a world of possibilities out there indeed, but let’s suppose we start watching when the object is at rest, i.e. we have v(t) = v(0) = 0 and so that’s depicted by the origin point. Let’s also make it all more real by assigning the values m = 2 kg and γ = 0.392 to m an γ in Newton’s formula. [In case you wonder where this value for γ comes from, note that its value is 1/25 of the gravitational constant and so it’s just a number to make sure the solution for v(t) is a ‘nice’ number, i.e. an integer instead of some decimal. In any case, I am taking this example from Paul’s Online Notes and I won’t try to change it.]

So we start at point zero with zero velocity but so now we’ve got the force F with us. 🙂 Hence, the object’s velocity v(t) will not stay zero. As the clock ticks, its movement will respect Newton’s Law, i.e. m(dv/dt) = F(t, v), which is m(dv/dt) = mg – γv(t) in this case. Now, if we plug in the above-mentioned values for m and γ (as well as the 9.8 approximation for g), we get dv(t)/dt = 9.8 – 0.196v(t) (we brought m over to the other side, and so then it becomes 1/m on the right-hand side).

Now, let’s insert some values into these equation. Let’s first take the value v(0) = 0, i.e. our point of departure. We obviously get d(v(0)/dt = 9.8 – 0.196.0 = 9.8 (so that’s close to 10 but not quite).

Let’s take another value for v(0). If v(0) would be equal to 30 m/s (this means that the object is already moving at a speed of 30 m/s when we start watching), then we’d get a value for dv/dt of 3.92, which is much less – but so that reflects the fact that, at such speed, air resistance is counteracting gravity.

Let’s take yet another value for v(0). Let’s take 100 now for example: we get dv/dt = – 9.8.

Ooops! What’s that? Minus 9.8? A negative value for dv/dt? Yes. It indicates that, at such high speed, air resistance is actually slowing down the object. [Of course, if that’s the case, then you may wonder how it got to go so fast in the first place but so that’s none of our own business: maybe it’s an object that got launched up into the air instead of something that was dropped out of an airplane. Note that a speed of 100 m/s is 360 km/h so we’re not talking any supersonic launch speeds here.]

OK. Enough of that kids’ stuff now. What’s the point?

Well, it’s these values for dv/dt (so these values of 9.8, 3.92, -9.8 etcetera) that we use for that direction field, or slope field as it’s often referred to. Note that we’re currently considering the world of possibilities, not the actual world so to say, and so we are contemplating any possible combination of v and t really.

Also note that, in this particular example that is, it’s only the value of v that determines the value of dv/dt, not the value of t. So, if, at some other point in time (e.g. t = 3), we’d be imagining the same velocities for our object, i.e. 0 m/s, 30 m/s or 100 m/s, we’d get the same values 9.8, 3.92 and -9.8 for dv/dt. So the little red arrows which represent the direction field all have the same magnitude and the same direction for equal values of v(t). [That’s also the case in the second graph above, but not for the third graph, which presents a far more general case: think of a changing electromagnetic field for instance. A second footnote to be made here concerns the length – or magnitude – of these arrows: they obviously depend on the scale we’re using but so they do reflect the values for dv/dt we calculated.]

So that slope field, or direction field, i.e. all of these little red arrows, represents the fact that the world of possibilities, or all parallel universes which may exist out there, have one thing in common: they all need to respect Newton or, at the very least, his m(dv/dt) = mg – γv(t) equation which, in this case, is dv(t)/dt = 9.8 – 0.196v(t). So, wherever we are in this (v, t) space, we look at the nearest arrow and it will tell us how our speed v will change as a function of t.

As you can see from the graph, the slope of these little arrows (i.e. dv/dt) is negative above the v(t) = 50 m/s line, and positive underneath it, and so we should not be surprised that, when we try to calculate at what speed dv/dt would be equal to zero (we do this by writing 9.8 – 0.196v(t) = 0), we find that this is the case if and only if v(t) = 9.8/0.196 = 50 indeed. So that looks like the stable situation: indeed, you’ll remember that derivatives reflect the rate of change, and so when dv/dt = 0, it means the object won’t change speed.

Now, the dynamics behind the graph are obviously clear: above the v(t) = 50 m/s line, the object will be slowing down, and underneath it, it will be speeding up. At the v(t) line itself, the gravity and air resistance forces will balance each other and the object’s speed will be constant – that is until it hits the earth of course :-).

So now we can have a look at these blue lines on the graph. If you understood something of the lengthy story about the red arrows above, then you’ll also understand, intuitively at least, that the blue lines on this graph represent the various solutions to the differential equation. Huh? Well. Yes.

The blue lines show how the velocity of the object will gradually converge to 50 m/s, and that the actual path being followed will depend on our actual starting point, which may be zero, less than 50 m/s, or equal or more than 50 m/s. So these blue lines still represent the world of possibilities, or all of the possible parallel universes, but so one of them – and one of them only – will represent the actual situation. Whatever that actual situation is (i.e. whatever point we start at when t = 0), the dynamics at work will make sure the speed converges to 50 m/s, so that’s the longer-term equilibrium for this situation. [Note that all is relative of course: if the object is being dropped out of a plane at an altitude of two or three km only, then ‘longer-term’ means like a minute or so, after which time the object will hit the ground and so then the equilibrium speed is obviously zero. :-)]

OK. I must assume you’re fine with the intuitive interpretation of these blue curves now. But so what are they really, beyond this ‘intuitive’ interpretation? Well, they are the solutions to the differential equation really and, because these solutions are found through an integration process indeed, they are referred to as the integral curves. I have to refer my imaginary reader here to Paul’s Notes (or any other math course) for as to how exactly that integration process works (it’s not as easy as you might think) but the equation for these blue curves is

v(t) = 50 + ce—0.196t 

In this equation, we have Euler’s number e (so that’s the irrational number e = 2.718281… etcetera) and also a constant c which depends on the initial conditions indeed. The graph below shows some of these curves for various values of c. You can calculate some more yourself of course. For example, if we start at the origin indeed, so if we have zero speed at t = 0, then we have v(0) = 50 + ce-0.196.0 = 50 + ce0 = 50 + c and, hence, c = -50 will represent that initial condition. [And, yes, please do note the similarity with the graphs of the antiderivatives (i.e. the indefinite integral) of a given function, because the c in that v(t) function is, effectively, the result of an integration process.]   

Solution for falling object

So that’s it really: the secret behind differential equations has been unlocked. There’s nothing more to it.

Well… OK. Of course we still need to learn how to actually solve these differential equations, and we’ll also have to learn how to solve partial differential equations, including equations with complex numbers as well obviously, and so on and son on. Even those other two ‘simple’ situations depicted above (see the two other graphs) are obviously more intimidating already (the second graph involves three equilibrium solutions – one stable, one unstable and one semi-stable – while the third graph shows not all situations have equilibrium solutions). However, I am sure I’ll get through it: it has been great fun so far, and what I read so far (i.e. this morning) is surely much easier to digest than all the things I wrote about in my other posts. 🙂

In addition, the example did involve two forces, and so it resembles classical electrodynamics, in which we also have two forces, the electric and magnetic force, which generate force fields that influence each other. However, despite of all the complexities, it is fair to say that, when push comes to shove, understanding Maxwell’s equations is a matter of understanding a particular set of partial differential equations. However, I won’t dwell on that now. My next post might consist of a brief refresher on all of that but I will probably first want to move on a bit with that course of Mathews and Walker. I’ll keep you posted on progress. 🙂

Post scriptum:

My imaginary reader will obviously note that this direction field looks very much like a vector field. In fact, it obviously is a vector field. Remember that a vector field assigns a vector to each point, and so a vector field in the plane is visualized as a collection of arrows indeed, with a given magnitude and direction attached to a point in the plane. As Wikipedia puts it: ‘vector fields are often used to model the strength and direction of some force, such as the electrostatic, magnetic or gravitational force. And so, yes, in the example above, we’re indeed modeling a force obeying Newton’s law: the change in the velocity of the object (i.e. the factor a = dv/dt in the F = ma equation) is proportional to the force (which is a force combining gravity and drag in this example), and the factor of proportionality is the inverse of the object’s mass (a = F/m and, hence, the greater its mass, the less a body accelerates under given force). [Note that the latter remark just underscores the fact that Newton’s formula shows that mass is nothing but a measure of the object’s inertia, i.e. its resistance to being accelerated or change its direction of motion.]

A second post scriptum point to be made, perhaps, is my remark that solving that dv(t)/dt = 9.8 – 0.196v(t) equation is not as easy as it may look. Let me qualify that remark: it actually is an easy differential equation, but don’t make the mistake of just putting an integral sign in front and writing something like ∫(0.196v + v’) dv = ∫9.8 dv, to then solve it as 0.098 v2 + v = 9.8v + c, which is equivalent to 0.098 v2 – 8.8 v + c = 0. That’s nonsensical because it does not give you v as an implicit or explicit function of t and so it’s a useless approach: it just yields a quadratic function in v which may or may not have any physical interpretation.

So should we, perhaps, use t as the variable of integration on one side and, hence, write something like ∫(0.196v + v’) dv = ∫9.8 dt? We then find 0.098 v+ v = 9.8t + c, and so that looks good, doesn’t it?  No. It doesn’t. That’s worse than that other quadratic expression in v (I mean the one which didn’t have t in it), and a lot worse, because it’s not only meaningless but wrongvery wrong. Why? Well, you’re using a different variable of integration (v versus t) on both sides of the equation and you can’t do that: you have to apply the same operation to both sides of the equation, whether that’s multiplying it with some factor or bringing one of the terms over to the other side (which actually mounts to subtracting the same term from both sides) or integrating both sides: we have to integrate both sides over the same variable indeed.

But – hey! – you may remember that’s what we do when differential equations are separable, isn’t? And so that’s the case here, isn’t it?We’ve got all the y’s on one side and all the x’s on the other side of the equation here, don’t we? And so then we surely can integrate one side over y and the other over x, isn’t it? Well… No. And yes. For a differential equation to be separable, all the x‘s and all the y’s must be nicely separated on both sides of the equation indeed but all the y’s in the differential equation (so not just one of them) must be part of the product with the derivative. Remember, a separable equation is an equation in the form of B(y)(dy/dx) = A(x), with B(y) some function of y indeed, and A(x) some function of x, but so the whole B(y) function is multiplied with dy/dx, not just one part of it. If, and only if, the equation can be written in this form, we can (a) integrate both sides over x but (b) also use the fact that ∫[B(y)dy/dx]dx = ∫B(y)dy. So, it looks like we’re effectively integrating one part (or one side) of the equation over the dependent variable y here, and the other over x, but the condition for being allowed to do so is that the whole B(y) function can be written as a factor in a product involving the dy/dx  derivative. Is that clear? I guess not. 😦 But then I need to move on.

The lesson here is that we always have to make sure that we write the differential equation in its ‘proper form’ before we do the integration, and we should note that the ‘proper form’ usually depends on the method we’re going to select to solve the equation: if we can’t write the equation in its proper form, then we can’t apply the method. […] Oh… […] But so how do we solve that equation then? Well… It’s done using a so-called integrating factor but, just as I did in the text above already, I’ll refer you to a standard course on that, such as Paul’s Notes indeed, because otherwise my posts would become even longer than they already are, and I would have even less imaginary readers. 🙂