The Hamiltonian for a two-state system: the ammonia example

Ammonia, i.e. NH3, is a colorless gas with a strong smell. Its serves as a precursor in the production of fertilizer, but we also know it as a cleaning product, ammonium hydroxide, which is NH3 dissolved in water. It has a lot of other uses too. For example, its use in this post, is to illustrate a two-state system. 🙂 We’ll apply everything we learned in our previous posts and, as I  mentioned when finishing the last of those rather mathematical pieces, I think the example really feels like a reward after all of the tough work on all of those abstract concepts – like that Hamiltonian matrix indeed – so I hope you enjoy it. So… Here we go!

The geometry of the NH3 molecule can be described by thinking of it as a trigonal pyramid, with the nitrogen atom (N) at its apex, and the three hydrogen atoms (H) at the base, as illustrated below. [Feynman’s illustration is slightly misleading, though, because it may give the impression that the hydrogen atoms are bonded together somehow. That’s not the case: the hydrogen atoms share their electron with the nitrogen, thereby completing the outer shell of both atoms. This is referred to as a covalent bond. You may want to look it up, but it is of no particular relevance to what follows here.]


Here, we will only worry about the spin of the molecule about its axis of symmetry, as shown above, which is either in one direction or in the other, obviously. So we’ll discuss the molecule as a two-state system. So we don’t care about its translational (i.e. linear) momentum, its internal vibrations, or whatever else that might be going on. It is one of those situations illustrating that the spin vector, i.e. the vector representing angular momentum, is an axial vector: the first state, which is denoted by | 1 〉 is not the mirror image of state | 2 〉. In fact, there is a more sophisticated version of the illustration above, which usefully reminds us of the physics involved.

dipoleIt should be noted, however, that we don’t need to specify what the energy barrier really consists of: moving the center of mass obviously requires some energy, but it is likely that a ‘flip’ also involves overcoming some electrostatic forces, as shown by the reversal of the electric dipole moment in the illustration above. In fact, the illustration may confuse you, because we’re usually thinking about some net electric charge that’s spinning, and so the angular momentum results in a magnetic dipole moment, that’s either ‘up’ or ‘down’, and it’s usually also denoted by the very same μ symbol that’s used below. As I explained in my post on angular momentum and the magnetic moment, it’s related to the angular momentum J through the so-called g-number. In the illustration above, however, the μ symbol is used to denote an electric dipole moment, so that’s different. Don’t rack your brain over it: just accept there’s an energy barrier, and it requires energy to get through it. Don’t worry about its details!

Indeed, in quantum mechanics, we abstract away from such nitty-gritty, and so we just say that we have base states | i 〉 here, with i equal to 1 or 2. One or the other. Now, in our post on quantum math, we introduced what Feynman only half-jokingly refers to as the Great Law of Quantum Physics: | = ∑ | i 〉〈 i | over all base states i. It basically means that we should always describe our initial and end states in terms of base states. Applying that principle to the state of our ammonia molecule, which we’ll denote by | ψ 〉, we can write:


You may – in fact, you should – mechanically apply that | = ∑ | i 〉〈 i | substitution to | ψ 〉 to get what you get here, but you should also think about what you’re writing. It’s not an easy thing to interpret, but it may help you to think of the similarity of the formula above with the description of a vector in terms of its base vectors, which we write as A = Ax·e+ Ay·e2 + Az·e3. Just substitute the Acoefficients for Ci and the ebase vectors for the | i 〉 base states, and you may understand this formula somewhat better. It also explains why the | ψ 〉 state is often referred to as the | ψ 〉 state vector: unlike our  A = ∑ Ai·esum of base vectors, our | 1 〉 C1 + | 2 〉 Csum does not have any geometrical interpretation but… Well… Not all ‘vectors’ in math have a geometric interpretation, and so this is a case in point.

It may also help you to think of the time-dependency. Indeed, this formula makes a lot more sense when realizing that the state of our ammonia molecule, and those coefficients Ci, depend on time, so we write: ψ = ψ(t) and C= Ci(t). Hence, if we would know, for sure, that our molecule is always in state | 1 〉, then C1 = 1 and C2 = 0, and we’d write: | ψ 〉 = | 1 〉 = | 1 〉 1 + | 2 〉 0. [I am always tempted to insert a little dot (·), and change the order of the factors, so as to show we’re talking some kind of product indeed – so I am tempted to write | ψ 〉 = C1·| 1 〉 C1 + C2·| 2 〉 C2, but I note that’s not done conventionally, so I won’t do it either.]  

Why this time dependency? It’s because we’ll allow for the possibility of the nitrogen to push its way through the pyramid – through the three hydrogens, really – and flip to the other side. It’s unlikely, because it requires a lot of energy to get half-way through (we’ve got what we referred to as an energy barrier here), but it may happen and, as we’ll see shortly, it results in us having to think of the the ammonia molecule as having two separate energy levels, rather than just one. We’ll denote those energy levels as E0 ± A. However, I am getting ahead of myself here, so let me get back to the main story.

To fully understand the story, you should really read my previous post on the Hamiltonian, which explains how those Ci coefficients, as a function of time, can be determined. They’re determined by a set of differential equations (i.e. equations involving a function and the derivative of that function) which we wrote as:


 If we have two base states only – which is the case here – then this set of equations is:

set - two-base

Two equations and two functions – C= C1(t) and C= C2(t) – so we should be able to solve this thing, right? Well… No. We don’t know those Hij coefficients. As I explained in my previous post, they also evolve in time, so we should write them as Hij(t) instead of Hij tout court, and so it messes the whole thing up. We have two equations and six functions really. There is no way we can solve this! So how do we get out of this mess?

Well… By trial and error, I guess. 🙂 Let us just assume the molecule would behave nicely—which we know it doesn’t, but so let’s push the ‘classical’ analysis as far as we can, so we might get some clues as to how to solve this problem. In fact, our analysis isn’t ‘classical’ at all, because we’re still talking amplitudes here! However, you’ll agree the ‘simple’ solution would be that our ammonia molecule doesn’t ‘tunnel’. It just stays in the same spin direction forever. Then H12 and H21 must be zero (think of the U12(t + Δt, t) and U21(t + Δt, t) functions) and H11 and H22 are equal to… Well… I’d love to say they’re equal to 1 but… Well… You should go through my previous posts: these Hamiltonian coefficients are related to probabilities but… Well… Same-same but different, as they say in Asia. 🙂 They’re amplitudes, which are things you use to calculate probabilities. But calculating probabilities involve normalization and other stuff, like allowing for interference of amplitudes, and so… Well… To make a long story short, if our ammonia molecule would stay in the same spin direction forever, then H11 and H22  are not one but some constant. In any case, the point is that they would not change in time (so H11(t) = H11  and H22(t ) = H22), and, therefore, our two equations would reduce to:


So the coefficients are now proper coefficients, in the sense that they’ve got some definite value, and so we have two equations and two functions only now, and so we can solve this. Indeed, remembering all of the stuff we wrote on the magic of exponential functions (more in particular, remembering that d[ex]/dx), we can understand the proposed solution:


As Feynman notes: “These are just the amplitudes for stationary states with the energies E= H11 and E= H22.” Now let’s think about that. Indeed, I find the term ‘stationary’ state quite confusing, as it’s ill-defined. In this context, it basically means that we have a wavefunction that is determined by (i) a definite (i.e. unambiguous, or precise) energy level and (ii) that there is no spatial variation. Let me refer you to my post on the basics of quantum math here. We often use a sort of ‘Platonic’ example of the wavefunction indeed:

a·ei·θ ei·(ω·t − k ∙x) = a·e(i/ħ)·(E·t − px)

So that’s a wavefunction assuming the particle we’re looking at has some well-defined energy E and some equally well-defined momentum p. Now, that’s kind of ‘Platonic’ indeed, because it’s more like an idea, rather than something real. Indeed, a wavefunction like that means that the particle is everywhere and nowhere, really—because its wavefunction is spread out all of over space. Of course, we may think of the ‘space’ as some kind of confined space, like a box, and then we can think of this particle as being ‘somewhere’ in that box, and then we look at the temporal variation of this function only – which is what we’re doing now: we don’t consider the space variable x at all. So then the equation reduces to a·e–(i/ħ)·(E·t), and so… Well… Yes. We do find that our Hamiltonian coefficient Hii is like the energy of the | i 〉 state of our NH3 molecule, so we write: H11 = E1, and H22 = E2, and the ‘wavefunctions’ of our Cand Ccoefficients can be written as:

  • Ca·e(i/ħ)·(H11·t) a·e(i/ħ)·(E1·t), with H11 = E1, and
  • C= a·e(i/ħ)·(H22·t) a·e(i/ħ)·(E2·t), with H22 = E2.

But can we interpret Cand  Cas proper amplitudes? They are just coefficients in these equations, aren’t they? Well… Yes and no. From what we wrote in previous posts, you should remember that these Ccoefficients are equal to 〈 i | ψ 〉, so they are the amplitude to find our ammonia molecule in one state or the other.

Back to Feynman now. He adds, logically but brilliantly:

We note, however, that for the ammonia molecule the two states |1〉 and |2〉 have a definite symmetry. If nature is at all reasonable, the matrix elements H11 and H22 must be equal. We’ll call them both E0, because they correspond to the energy the states would have if H11 and H22 were zero.”

So our Cand Camplitudes then reduce to:

  • C〈 1 | ψ 〉 = a·e(i/ħ)·(E0·t)
  • C=〈 2 | ψ 〉 = a·e(i/ħ)·(E0·t)

We can now take the absolute square of both to find the probability for the molecule to be in state 1 or in state 2:

  • |〈 1 | ψ 〉|= |a·e(i/ħ)·(E0·t)|a
  • |〈 2 | ψ 〉|= |a·e(i/ħ)·(E0·t)|a

Now, the probabilities have to add up to 1, so a+ a= 1 and, therefore, the probability to be in either in state 1 or state 2 is 0.5, which is what we’d expect.

Note: At this point, it is probably good to get back to our | ψ 〉 = | 1 〉 C1 + | 2 〉 Cequation, so as to try to understand what it really says. Substituting the a·e(i/ħ)·(E0·t) expression for C1 and C2 yields:

| ψ 〉 = | 1 〉 a·e(i/ħ)·(E0·t) + | 2 〉 a·e(i/ħ)·(E0·t) = [| 1 〉 + | 2 〉] a·e(i/ħ)·(E0·t)

Now, what is this saying, really? In our previous post, we explained this is an ‘open’ equation, so it actually doesn’t mean all that much: we need to ‘close’ or ‘complete’ it by adding a ‘bra’, i.e. a state like 〈 χ |, so we get a 〈 χ | ψ〉 type of amplitude that we can actually do something with. Now, in this case, our final 〈 χ | state is either 〈 1 | or 〈 2 |, so we write:

  • 〈 1 | ψ 〉 = [〈 1 | 1 〉 + 〈 1 | 2 〉]·a·e(i/ħ)·(E0·t) = [1 + 0]·a·e(i/ħ)·(E0·t)· = a·e(i/ħ)·(E0·t)
  • 〈 2 | ψ 〉 = [〈 2 | 1 〉 + 〈 2 | 2 〉]·a·e(i/ħ)·(E0·t) = [0 + 1]·a·e(i/ħ)·(E0·t)· = a·e(i/ħ)·(E0·t)

Note that I finally added the multiplication dot (·) because we’re talking proper amplitudes now and, therefore, we’ve got a proper product too: we multiply one complex number with another. We can now take the absolute square of both to find the probability for the molecule to be in state 1 or in state 2:

  • |〈 1 | ψ 〉|= |a·e(i/ħ)·(E0·t)|a
  • |〈 2 | ψ 〉|= |a·e(i/ħ)·(E0·t)|a

Unsurprisingly, we find the same thing: these probabilities have to add up to 1, so a+ a= 1 and, therefore, the probability to be in state 1 or state 2 is 0.5. So the notation and the logic behind makes perfect sense. But let me get back to the lesson now.

The point is: the true meaning of a ‘stationary’ state here, is that we have non-fluctuating probabilities. So they are and remain equal to some constant, i.e. 1/2 in this case. This implies that the state of the molecule does not change: there is no way to go from state 1 to state 2 and vice versa. Indeed, if we know the molecule is in state 1, it will stay in that state. [Think about what normalization of probabilities means when we’re looking at one state only.]

You should note that these non-varying probabilities are related to the fact that the amplitudes have a non-varying magnitude. The phase of these amplitudes varies in time, of course, but their magnitude is and remains aalways. The amplitude is not being ‘enveloped’ by another curve, so to speak.

OK. That should be clear enough. Sorry I spent so much time on this, but this stuff on ‘stationary’ states comes back again and again and so I just wanted to clear that up as much as I can. Let’s get back to the story.

So we know that, what we’re describing above, is not what ammonia does really. As Feynman puts it: “The equations [i.e. the Cand Cequations above] don’t tell us what what ammonia really does. It turns out that it is possible for the nitrogen to push its way through the three hydrogens and flip to the other side. It is quite difficult; to get half-way through requires a lot of energy. How can it get through if it hasn’t got enough energy? There is some amplitude that it will penetrate the energy barrier. It is possible in quantum mechanics to sneak quickly across a region which is illegal energetically. There is, therefore, some [small] amplitude that a molecule which starts in |1〉 will get to the state |2. The coefficients H12 and H21 are not really zero.”

He adds: “Again, by symmetry, they should both be the same—at least in magnitude. In fact, we already know that, in general, Hij must be equal to the complex conjugate of Hji.”

His next step, then, is to interpreted as either a stroke of genius or, else, as unexplained. 🙂 He invokes the symmetry of the situation to boldly state that H12 is some real negative number, which he denotes as −A, which – because it’s a real number (so the imaginary part is zero) – must be equal to its complex conjugate H21. So then Feynman does this fantastic jump in logic. First, he keeps using the E0 value for H11 and H22, motivating that as follows: “If nature is at all reasonable, the matrix elements H11 and H22 must be equal, and we’ll call them both E0, because they correspond to the energy the states would have if H11 and H22 were zero.” Second, he uses that minus A value for H12 and H21. In short, the two equations and six functions are now reduced to:


Solving these equations is rather boring. Feynman does it as follows:


Now, what does these equations actually mean? It depends on those a and b coefficients. Looking at the solutions, the most obvious question to ask is: what if a or b are zero? If b is zero, then the second terms in both equations is zero, and so C1 and C2 are exactly the same: two amplitudes with the same temporal frequency ω = (E− A)/ħ. If a is zero, then C1 and C2 are the same too, but with opposite sign: two amplitudes with the same temporal frequency ω = (E+ A)/ħ. Squaring them – in both cases (i.e. for a = 0 or b = 0) – yields, once again, an equal and constant probability for the spin of the ammonia molecule to in the ‘up’ or ‘down’ or ‘down’. To be precise, we We can now take the absolute square of both to find the probability for the molecule to be in state 1 or in state 2:

  • For b = 0: |〈 1 | ψ 〉|= |(a/2)·e(i/ħ)·(E− A)·t|a2/4 = |〈 2 | ψ 〉|
  • For a = 0: |〈 1 | ψ 〉|=|(b/2)·e(i/ħ)·(E+ A)·t|= b2/4 = |〈 2 | ψ 〉|(the minus sign in front of b/2 is squared away)

So we get two stationary states now. Why two instead of one? Well… You need to use your imagination a bit here. They actually reflect each other: they’re the same as the one stationary state we found when assuming our nitrogen atom could not ‘flip’ from one position to the other. It’s just that the introduction of that possibility now results in a sort of ‘doublet’ of energy levels. But so we shouldn’t waste our time on this, as we want to analyze the general case, for which the probabilities to be in state 1 or state 2 do vary in time. So that’s when a and b are non-zero.

To analyze it all, we may want to start with equating t to zero. We then get:


This leads us to conclude that a = b = 1, so our equations for C1(t) and C2(t) can now be written as:


Remembering our rules for adding and subtracting complex conjugates (eiθ + e–iθ = 2cosθ and eiθ − e–iθ = 2sinθ), we can re-write this as:


Now these amplitudes are much more interesting. Their temporal variation is defined by Ebut, on top of that, we have an envelope here: the cos(A·t/ħ) and sin(A·t/ħ) factor respectively. So their magnitude is no longer time-independent: both the phase as well as the amplitude now vary with time. What’s going on here becomes quite obvious when calculating and plotting the associated probabilities, which are

  • |C1(t)|= cos2(A·t/ħ), and
  • |C2(t)|= sin2(A·t/ħ)

respectively (note that the absolute square of i is equal to 1, not −1). The graph of these functions is depicted below.


As Feynman puts it: “The probability sloshes back and forth.” Indeed, the way to think about this is that, if our ammonia molecule is in state 1, then it will not stay in that state. In fact, one can be sure the nitrogen atom is going to flip at some point in time, with the probabilities being defined by that fluctuating probability density function above. Indeed, as time goes by, the probability to be in state 2 increases, until it will effectively be in state 2. And then the cycle reverses.

Our | ψ 〉 = | 1 〉 C1 + | 2 〉 Cequation is a lot more interesting now, as we do have a proper mix of pure states now: we never really know in what state our molecule will be, as we have these ‘oscillating’ probabilities now, which we should interpret carefully.

The point to note is that the a = 0 and b = 0 solutions came with precise temporal frequencies: (E− A)/ħ and (E0 + A)/ħ respectively, which correspond to two separate energy levels: E− A and E0 + A respectively, with |A| = H12 = H21. So everything is related to everything once again: allowing the nitrogen atom to push its way through the three hydrogens, so as to flip to the other side, thereby breaking the energy barrier, is equivalent to associating two energy levels to the ammonia molecule as a whole, thereby introducing some uncertainty, or indefiniteness as to its energy, and that, in turn, gives us the amplitudes and probabilities that we’ve just calculated.

Note that the probabilities “sloshing back and forth”, or “dumping into each other” – as Feynman puts it – is the result of the varying magnitudes of our amplitudes, going up and down and, therefore, their absolute square varies too.

So… Well… That’s it as an introduction to a two-state system. There’s more to come. Ammonia is used in the ammonia maser. Now that is something that’s interesting to analyze—both from a classical as well as from a quantum-mechanical perspective. Feynman devotes a full chapter to it, so I’d say… Well… Have a look. 🙂

Post scriptum: I must assume this analysis of the NH3 molecule, with the nitrogen ‘flipping’ across the hydrogens, triggers a lot of questions, so let me try to answer some. Let me first insert the illustration once more, so you don’t have to scroll up:


The first thing that you should note is that the ‘flip’ involves a change in the center of mass position. So that requires energy, which is why we associate two different energy levels with the molecule: E+ A and E− A. However, as mentioned above, we don’t care about the nitty-gritty here: the energy barrier is likely to combine a number of factors, including electrostatic forces, as evidenced by the flip in the electric dipole moment, which is what the μ symbol here represents! Just note that the two energy levels are separated by an amount that’s equal to 2·A, rather than A and that, once again, it becomes obvious now why Feynman would prefer the Hamiltonian to be called the ‘energy matrix’, as its coefficients do represent specific energy levels, or differences between them! Now, that assumption yielded the following wavefunctions for C= 〈 1 | ψ 〉 and C= 〈 2 | ψ 〉:

  • C= 〈 1 | ψ 〉 = (1/2)·e(i/ħ)·(E− A)·t + (1/2)·e(i/ħ)·(E+ A)·t
  • C= 〈 2 | ψ 〉 = (1/2)·e(i/ħ)·(E− A)·t – (1/2)·e(i/ħ)·(E+ A)·t

Both are composite waves. To be precise, they are the sum of two component waves with a temporal frequency equal to ω= (E− A)/ħ and ω= (E+ A)/ħ respectively. [As for the minus sign in front of the second term in the wave equation for C2, −1 = e±iπ, so + (1/2)·e(i/ħ)·(E+ A)·t and – (1/2)·e(i/ħ)·(E+ A)·t are the same wavefunction: they only differ because their relative phase is shifted by ±π.]

Now, writing things this way, rather than in terms of probabilities, makes it clear that the two base states of the molecule themselves are associated with two different energy levels, so it is not like one state has more energy than the other. It’s just that the possibility of going from one state to the other requires an uncertainty about the energy, which is reflected by the energy doublet  E± A in the wavefunction of the base states. Now, if the wavefunction of the base states incorporates that energy doublet, then it is obvious that the state of the ammonia molecule, at any point in time, will also incorporate that energy doublet.

This triggers the following remark: what’s the uncertainty really? Is it an uncertainty in the energy, or is it an uncertainty in the wavefunction? I mean: we have a function relating the energy to a frequency. Introducing some uncertainty about the energy is mathematically equivalent to introducing uncertainty about the frequency. Think of it: two energy levels implies two frequencies, and vice versa. More in general, introducing n energy levels, or some continuous range of energy levels ΔE, amounts to saying that our wave function doesn’t have a specific frequency: it now has n frequencies, or a range of frequencies Δω = ΔE/ħ. Of course, the answer is: the uncertainty is in both, so it’s in the frequency and in the energy and both are related through the wavefunction. So… In a way, we’re chasing our own tail.

Having said that, the energy may be uncertain, but it is real. It’s there, as evidenced by the fact that the ammonia molecule behaves like an atomic oscillator: we can excite it in exactly the same way as we can excite an electron inside an atom, i.e. by shining light on it. The only difference is the photon energies: to cause a transition in an atom, we use photons in the optical or ultraviolet range, and they give us the same radiation back. To cause a transition in an ammonia molecule, we only need photons with energies in the microwave range. Here, I should quickly remind you of the frequencies and energies involved. visible light is radiation in the 400–800 terahertz range and, using the E = h·f equation, we can calculate the associated energies of a photon as 1.6 to 3.2 eV. Microwave radiation – as produced in your microwave oven – is typically in the range of 1 to 2.5 gigahertz, and the associated photon energy is 4 to 10 millionths of an eV. Having illustrated the difference in terms of the energies involved, I should add that masers and lasers are based on the same physical principle: LASER and MASER stand for Light/Micro-wave Amplification by Stimulated Emission of Radiation, respectively.

So… How shall I phrase this? There’s uncertainty, but the way we are modeling that uncertainty matters. So yes, the uncertainty in the frequency of our wavefunction and the uncertainty in the energy are mathematically equivalent, but the wavefunction has a meaning that goes much beyond that. [You may want to reflect on that yourself.]

Finally, another question you may have is why would Feynman take minus A (i.e. −A) for H12 and H21. Frankly, my first thought on this was that it should have something to do with the original equation for these Hamiltonian coefficients, which also has a minus sign: Uij(t + Δt, t) = δij + Kij(t)·Δt = δij − (i/ħ)·Hij(t)·Δt. For i ≠ j, this reduces to:

Uij(t + Δt, t) = + Kij(t)·Δt = − (i/ħ)·Hij(t)·Δt

However, the answer is: it really doesn’t matter. One could write: H12 and H21 = +A, and we’d find the same equations. We’d just switch the indices 1 and 2, and the coefficients a and b. But we get the same solutions. You can figure that out yourself. Have fun with it !

Oh ! And please do let me know if some of the stuff above would trigger other questions. I am not sure if I’ll be able to answer them, but I’ll surely try, and good question always help to ensure we sort of ‘get’ this stuff in a more intuitive way. Indeed, when everything is said and done, the goal of this blog is not simply re-produce stuff, but to truly ‘get’ it, as good as we can. 🙂

Traveling fields: the wave equation and its solutions

We’ve climbed a big mountain over the past few weeks, post by post, 🙂 slowly gaining height, and carefully checking out the various routes to the top. But we are there now: we finally fully understand how Maxwell’s equations actually work. Let me jot them down once more:

Maxwell's equations

As for how real or unreal the E and B fields are, I gave you Feynman’s answer to it, so… Well… I can’t add to that. I should just note, or remind you, that we have a fully equivalent description of it all in terms of the electric and magnetic (vector) potential Φ and A, and so we can ask the same question about Φ and A. They explain real stuff, so they’re real in that sense. That’s what Feynman’s answer amounts to, and I am happy with it. 🙂

What I want to do here is show how we can get from those equations to some kind of wave equation: an equation that describes how a field actually travels through space. So… Well… Let’s first look at that very particular wave function we used in the previous post to prove that electromagnetic waves propagate with speed c, i.e. the speed of light. The fields were very simple: the electric field had a y-component only, and the magnetic field a z-component only. Their magnitudes, i.e. their magnitude where the field had reached, as it fills the space traveling outwards, were given in terms of J, i.e. the surface current density going in the positive y-direction, and the geometry of the situation is illustrated below.


sheet of charge

The fields were, obviously, zero where the fields had not reached as they were traveling outwards. And, yes, I know that sounds stupid. But… Well… It’s just to make clear what we’re looking at here. 🙂

We also showed how the wave would look like if we would turn off its First Cause after some time T, so if the moving sheet of charge would no longer move after time T. We’d have the following pulse traveling through space, a rectangular shape really:

wavefrontWe can imagine more complicated shapes for the pulse, like the shape shown below. J goes from one unit to two units at time t = t1 and then to zero at t = t2. Now, the illustration on the right shows the electric field as a function of x at the time t shown by the arrow. We’ve seen this before when discussing waves: if the speed of travel of the wave is equal to c, then x is equal to x = c·t, and the pattern is as shown below indeed: it mirrors what happened at the source x/c seconds ago. So we write:

equation 2


This idea of using the retarded time t’ = tx/c in the argument of a wave function f – or, what amounts to the same, using x − c/t – is key to understanding wave functions. I’ve explained this in very simple language in a post for my kids and, if you don’t get this, I recommend you check it out. What we’re doing, basically, is converting something expressed in time units into something expressed in distance units, or vice versa, using the velocity of the wave as the scale factor, so time and distance are both expressed in the same unit, which may be seconds, or meter.

To see how it works, suppose we add some time Δt to the argument of our wave function f, so we’re looking at f[x−c(t+Δt)] now, instead of f(x−ct). Now, f[x−c(t+Δt)] = f(x−ct−cΔt), so we’ll get a different value for our function—obviously! But it’s easy to see that we can restore our wave function F to its former value by also adding some distance Δx = cΔt to the argument. Indeed, if we do so, we get f[x+Δx−c(t+Δt)] = f(x+cΔt–ct−cΔt) = f(x–ct). You’ll say: t − x/c is not the same as x–ct. It is and it isn’t: any function of x–ct is also a function of t − x/c, because we can write:


Here, I need to add something about the direction of travel. The pulse above travel in the positive x-direction, so that’s why we have x minus ct in the argument. For a wave traveling in the negative x-direction, we’ll have a wave function y = F(x+ct). In any case, I can’t dwell on this, so let me move on.

Now, Maxwell’s equations in free or empty space, where are there no charges nor currents to interact with, reduce to:

Maxwell in free space

Now, how can we relate this set of complicated equations to a simple wave function? Let’s do the exercise for our simple Ey and Bz wave. Let’s start by writing out the first equation, i.e. ·E = 0, so we get:


Now, our wave does not vary in the y and z direction, so none of the components, including Ey and Edepend on y or z. It only varies in the x-direction, so ∂Ey/∂y and ∂Ez/∂z are zero. Note that the cross-derivatives ∂Ey/∂z and ∂Ez/∂y are also zero: we’re talking a plane wave here, the field varies only with x. However, because ·E = 0, ∂Ex/∂x must be zero and, hence, Ex must be zero.

Huh? What? How is that possible? You just said that our field does vary in the x-direction! And now you’re saying it doesn’t it? Read carefully. I know it’s complicated business, but it all makes sense. Look at the function: we’re talking Ey, not Ex. Ey does vary as a function of x, but our field does not have an x-component, so Ex = 0. We have no cross-derivative ∂Ey/∂x in the divergence of E (i.e. in ·E = 0).

Huh? What? Let me put it differently. E has three components: Ex, Ey and Ez, and we have three space coordinates: x, y and z, so we have nine cross-derivatives. What I am saying is that all derivatives with respect to y and z are zero. That still leaves us with three derivatives: ∂Ex/∂x, ∂Ey/∂x, and ∂Ey/∂x. So… Because all derivatives in respect to y and z are zero, and because of the ·E = 0 equation, we know that ∂Ex/∂x must be zero. So, to make a long story short, I did not say anything about ∂Ey/∂x or ∂Ez/∂x. These may still be whatever they want to be, and they may vary in more or in less complicated ways. I’ll give an example of that in a moment.

Having said that, I do agree that I was a bit quick in writing that, because ∂Ex/∂x = 0, Ex must be zero too. Looking at the math only, Ex is not necessarily zero: it might be some non-zero constant. So… Yes. That’s a mathematical possibility. The static field from some charged condenser plate would be an example of a constant Ex field. However, the point is that we’re not looking at such static fields here: we’re talking dynamics here, and we’re looking at a particular type of wave: we’re talking a so-called plane wave. Now, the wave front of a plane wave is… Well… A plane. 🙂 So Ex is zero indeed. It’s a general result for plane waves: the electric field of a plane wave will always be at right angles to the direction of propagation.

Hmm… I can feel your skepticism here. You’ll say I am arbitrarily restricting the field of analysis… Well… Yes. For the moment. It’s not a reasonable restriction though. As I mentioned above, the field of a plane wave may still vary in both the y- and z-directions, as shown in the illustration below (for which the credit goes to Wikipedia), which visualizes the electric field of circularly polarized light. In any case, don’t worry too much about. Let’s get back to the analysis. Just note we’re talking plane waves here. We’ll talk about non-plane waves i.e. incoherent light waves later. 🙂

circular polarization

So we have plane waves and, therefore, a so-called transverse E field which we can resolve in two components: Eand Ez. However, we wanted to study a very simply Efield only. Why? Remember the objective of this lesson: it’s just to show how we go from Maxwell’s equations to the wave function, and so let’s keep the analysis simple as we can for now: we can make it more general later. In fact, if we do the analysis now for non-zero Eand zero Ez, we can do a similar analysis for non-zero Eand zero Ey, and the general solution is going to be some superposition of two such fields, so we’ll have a non-zero Eand Ez. Capito? 🙂 So let me write out Maxwell’s second equation, and use the results we got above, so I’ll incorporate the zero values for the derivatives with respect to y and z, and also the assumption that Ez is zero. So we get:

f3[By the way: note that, out of the nine derivatives, the curl involves only the (six) cross-derivatives. That’s linked to the neat separation between the curl and the divergence operator. Math is great! :-)]

Now, because of the flux rule (×E = –∂B/∂t), we can (and should) equate the three components of ×E above with the three components of –∂B/∂t, so we get:


[In case you wonder what it is that I am trying to do, patience, please! We’ll get where we want to get. Just hang in there and read on.] Now, ∂Bx/∂t = 0 and ∂By/∂t = 0 do not necessarily imply that Bx and Bare zero: there might be some magnets and, hence, we may have some constant static field. However, that’s a matter of choosing a reference point or, more simply, assuming that empty space is effectively empty, and so we don’t have magnets lying around and so we assume that Bx and Bare effectively zero. [Again, we can always throw more stuff in when our analysis is finished, but let’s keep it simple and stupid right now, especially because the Bx = B= 0 is entirely in line with the Ex = E= 0 assumption.]

The equations above tell us what we know already: the E and B fields are at right angles to each other. However, note, once again, that this is a more general result for all plane electromagnetic waves, so it’s not only that very special caterpillar or butterfly field that we’re looking at it. [If you didn’t read my previous post, you won’t get the pun, but don’t worry about it. You need to understand the equations, not the silly jokes.]

OK. We’re almost there. Now we need Maxwell’s last equation. When we write it out, we get the following monstrously looking set of equations:


However, because of all of the equations involving zeroes above 🙂 only ∂Bz/∂x is not equal to zero, so the whole set reduced to only simple equation only:


Simplifying assumptions are great, aren’t they? 🙂 Having said that, it’s easy to be confused. You should watch out for the denominators: a ∂x and a ∂t are two very different things. So we have two equations now involving first-order derivatives:

  1. ∂Bz/∂t = −∂Ey/∂x
  2. c2∂Bz/∂x = −∂Ey/∂t

So what? Patience, please! 🙂 Let’s differentiate the first equation with respect to x and the second with respect to t. Why? Because… Well… You’ll see. Don’t complain. It’s simple. Just do it. We get:

  1. ∂[∂Bz/∂t]/∂x = −∂2Ey/∂x2
  2. ∂[−c2∂Bz/∂x]/∂t = −∂2Ey/∂x2

So we can equate the left-hand sides of our two equations now, and what we get is a differential equation of the second order that we’ve encountered already, when we were studying wave equations. In fact, it is the wave equation for one-dimensional waves:

f7In case you want to double-check, I did a few posts on this, but, if you don’t get this, well… I am sorry. You’ll need to do some homework. More in particular, you’ll need to do some homework on differential equations. The equation above is basically some constraint on the functional form of Ey. More in general, if we see an equation like:


then the function ψ(x, t) must be some function


So any function ψ like that will work. You can check it out by doing the necessary derivatives and plug them into the wave equation. [In case you wonder how you should go about this, Feynman actually does it for you in his Lecture on this topic, so you may want to check it there.]

In fact, the functions f(x − c/t) and g(x + c/t) themselves will also work as possible solutions. So we can drop one or the other, which amounts to saying that our ‘shape’ has to travel in some direction, rather than in both at the same time. 🙂 Indeed, from all of my explanations above, you know what f(x − c/t) represents: it’s a wave that travels in the positive x-direction. Now, it may be periodic, but it doesn’t have to be periodic. The f(x − c/t) function could represent any constant ‘shape’ that’s traveling in the positive x-direction at speed c. Likewise, the g(x + c/t) function could represent any constant ‘shape’ that’s traveling in the negative x-direction at speed c. As for super-imposing both…

Well… I suggest you check that post I wrote for my son, Vincent. It’s on the math of waves, but it doesn’t have derivatives and/or differential equations. It just explains how superimposition and all that works. It’s not very abstract, as it revolves around a vibrating guitar string. So, if you have trouble with all of the above, you may want to read that first. 🙂 The bottom line is that we can get any wavefunction we want by superimposing simple sinusoidals that are traveling in one or the other direction, and so that’s what’s the more general solution really says. Full stop. So that’s what’s we’re doing really: we add very simple waves to get very more complicated waveforms. 🙂

Now, I could leave it at this, but then it’s very easy to just go one step further, and that is to assume that Eand, therefore, Bare not zero. It’s just a matter of super-imposing solutions. Let me just give you the general solution. Just look at it for a while. If you understood all that I’ve said above, 20 seconds or so should be sufficient to say: “Yes, that makes sense. That’s the solution in two dimensions.” At least, I hope so! 🙂

General solution two dimensions

OK. I should really stop now. But… Well… Now that we’ve got a general solution for all plane waves, why not be even bolder and think about what we could possibly say about three-dimensional waves? So then Eand, therefore, Bwould not necessarily be zero either. After all, light can behave that way. In fact, light is likely to be non-polarized and, hence, Eand, therefore, Bare most probably not equal to zero!

Now, you may think the analysis is going to be terribly complicated. And you’re right. It would be if we’d stick to our analysis in terms of x, y and z coordinates. However, it turns out that the analysis in terms of vector equations is actually quite straightforward. I’ll just copy the Master here, so you can see His Greatness. 🙂

waves in three dimensions

But what solution does an equation like (20.27) have? We can appreciate it’s actually three equations, i.e. one for each component, and so… Well… Hmm… What can we say about that? I’ll quote the Master on this too:

“How shall we find the general wave solution? The answer is that all the solutions of the three-dimensional wave equation can be represented as a superposition of the one-dimensional solutions we have already found. We obtained the equation for waves which move in the x-direction by supposing that the field did not depend on y and z. Obviously, there are other solutions in which the fields do not depend on x and z, representing waves going in the y-direction. Then there are solutions which do not depend on x and y, representing waves travelling in the z-direction. Or in general, since we have written our equations in vector form, the three-dimensional wave equation can have solutions which are plane waves moving in any direction at all. Again, since the equations are linear, we may have simultaneously as many plane waves as we wish, travelling in as many different directions. Thus the most general solution of the three-dimensional wave equation is a superposition of all sorts of plane waves moving in all sorts of directions.”

It’s the same thing once more: we add very simple waves to get very more complicated waveforms. 🙂

You must have fallen asleep by now or, else, be watching something else. Feynman must have felt the same. After explaining all of the nitty-gritty above, Feynman wakes up his students. He does so by appealing to their imagination:

“Try to imagine what the electric and magnetic fields look like at present in the space in this lecture room. First of all, there is a steady magnetic field; it comes from the currents in the interior of the earth—that is, the earth’s steady magnetic field. Then there are some irregular, nearly static electric fields produced perhaps by electric charges generated by friction as various people move about in their chairs and rub their coat sleeves against the chair arms. Then there are other magnetic fields produced by oscillating currents in the electrical wiring—fields which vary at a frequency of 6060 cycles per second, in synchronism with the generator at Boulder Dam. But more interesting are the electric and magnetic fields varying at much higher frequencies. For instance, as light travels from window to floor and wall to wall, there are little wiggles of the electric and magnetic fields moving along at 186,000 miles per second. Then there are also infrared waves travelling from the warm foreheads to the cold blackboard. And we have forgotten the ultraviolet light, the x-rays, and the radiowaves travelling through the room.

Flying across the room are electromagnetic waves which carry music of a jazz band. There are waves modulated by a series of impulses representing pictures of events going on in other parts of the world, or of imaginary aspirins dissolving in imaginary stomachs. To demonstrate the reality of these waves it is only necessary to turn on electronic equipment that converts these waves into pictures and sounds.

If we go into further detail to analyze even the smallest wiggles, there are tiny electromagnetic waves that have come into the room from enormous distances. There are now tiny oscillations of the electric field, whose crests are separated by a distance of one foot, that have come from millions of miles away, transmitted to the earth from the Mariner II space craft which has just passed Venus. Its signals carry summaries of information it has picked up about the planets (information obtained from electromagnetic waves that travelled from the planet to the space craft).

There are very tiny wiggles of the electric and magnetic fields that are waves which originated billions of light years away—from galaxies in the remotest corners of the universe. That this is true has been found by “filling the room with wires”—by building antennas as large as this room. Such radiowaves have been detected from places in space beyond the range of the greatest optical telescopes. Even they, the optical telescopes, are simply gatherers of electromagnetic waves. What we call the stars are only inferences, inferences drawn from the only physical reality we have yet gotten from them—from a careful study of the unendingly complex undulations of the electric and magnetic fields reaching us on earth.

There is, of course, more: the fields produced by lightning miles away, the fields of the charged cosmic ray particles as they zip through the room, and more, and more. What a complicated thing is the electric field in the space around you! Yet it always satisfies the three-dimensional wave equation.”

So… Well… That’s it for today, folks. 🙂 We have some more gymnastics to do, still… But we’re really there. Or here, I should say: on top of the peak. What a view we have here! Isn’t it beautiful? It took us quite some effort to get on top of this thing, and we’re still trying to catch our breath as we struggle with what we’ve learned so far, but it’s really worthwhile, isn’t it? 🙂

Differential equations revisited: the math behind oscillators

When wrapping up my previous post, I said that I might be tempted to write something about how to solve these differential equations. The math behind them is pretty essential indeed. So let’s revisit the oscillator from a formal-mathematical point of view.

Modeling the problem

The simplest equation we used was the one for a hypothetical ‘ideal’ oscillator without friction and without any external driving force. The equation for a mechanical oscillator (i.e. a mass on a spring) is md2x/dt2 = –kx. The k in this equation is a factor of proportionality: the force pulling back is assumed to be proportional to the amount of stretch, and the minus sign is there because the force is pulling back indeed. As for the equation itself, it’s just Newton’s Law: the mass times the acceleration equals the force: ma = F.

You’ll remember we preferred to write this as d2x/dt2 = –(k/m)x = –ω02x with ω0= k/m. You’ll also remember that ωis an angular frequency, which we referred to as the natural frequency of the oscillator (because it determines the natural motion of the spring indeed). We also gave the general solution to the differential equation: x(t) = x0cos(ω0t + Δ). That solution basically states that, if we just let go of that spring, it will oscillate with frequency ω0 and some (maximum) amplitude x0, the value of which depends on the initial conditions. As for the Δ term, that’s just a phase shift depending on where x is when we start counting time: if x would happen to pass through the equilibrium point at time t = 0, then Δ would be π/2. So Δ allows us to shift the beginning of time, so to speak.

In my previous posts, I just presented that general equation as a fait accompli, noting that a cosine (or sine) function does indeed have that ‘nice’ property of come back to itself with a minus sign in front after taking the derivative two times: d2[cos(ω0t)]/dt2 = –ω02cos(ω0t). We could also write x(t) as a sine function because the sine and cosine function are basically the same except for a phase shift: x0cos(ω0t + Δ) = x0sin(ω0t + Δ + π/2).

Now, the point to note is that the sine or cosine function actually has two properties that are ‘nice’ (read ‘essential’ in the context of this discussion):

  1. Sinusoidal functions are periodic functions and so that’s why they represent an oscillation–because that’s something periodic too!
  2. Sinusoidal functions come back to themselves when we derive them two times and so that’s why it effectively solves our second-order differential equation.

However, in my previous post, I also mentioned in passing that sinusoidal functions share that second property with exponential functions: d2et/dt= d[det/dt]/dt = det/dt = et. So, if it we would not have had that minus sign in our differential equation, our solution would have been some exponential function, instead of a sine or a cosine function. So what’s going on here?

Solving differential equations using exponentials

Let’s scrap that minus sign and assume our problem would indeed be to solve the d2x/dt2 = ω02x equation. So we know we should use some exponential function, but we have that coefficient ω02. Well… That’s actually easy to deal with: we know that, when deriving an exponential function, we should bring the exponent down as a coefficient: d[eω0t]/dt = ω0eω0t. If we do it two times, we get d2[eω0t]/dt2 = ω02eω0t, so we can immediately see that eω0is a solution indeed.

But it’s not the only one: e–ω0t is a solution too: d2[e–ω0t]/dt2 = (–ω0)(–ω0)e–ω0t = ω02e–ω0t. So e–ω0solves the equation too. It is easy to see why: ω02 has two square roots–one positive, and one negative.

But we have more: in fact, every linear combination c1eω0+ c2e–ω0is also a solution to that second-order differential equation. Just check it by writing it all out: you’ll find that d2[c1eω0+ c2e–ω0t]/dt2 = ω02[c1eω0+c2e–ω0t] and so, yes, we have a whole family of functions here, that are all solutions to our differential equation.

Now, you may or may not remember that we had the same thing with first-order differential equations: we would find a whole family of functions, but only one would be the actual solution or the ‘real‘ solution I should say. So what’s the real solution here?

Well… That depends on the initial conditions: we need to know the value of x at time t= 0 (or some other point t = t1). And that’s not enough: we have two coefficients (cand c2), and, therefore, we need one more initial condition (it takes two equations to solve for two variables). That could be another value for x at some other point in time (e.g. t2) but, when solving problems like this, you’ll usually get the other ‘initial condition’ expressed in terms of the first derivative, so that’s in terms of dx/dt = v. For example, it is not illogical to assume that the initial velocity v0 would be zero. Indeed, we can imagine we pull or push the spring and then let it go. In fact, that’s what we’ve been assuming here all along in our example! Assuming that v0 = 0 is equivalent to writing that

d[c1eω0+ c2e–ω0t]/dt = 0 for t = 0

⇒ ω0c1 – ω0c2 = 0 (e= 1) ⇔  c1 = c2

Now we need the other initial condition. Let’s assume the initial value of x is equal to x0 = 2 (it’s just an example: we could take any value, including negative values). Then we get:

c1eω0+ c2e–ω0t = 2 for t = 0 ⇔ c1 + c= 2 (again, note that e= 1)

Combining the two gives us the grand result that c1 = c= 1 and, hence, the ‘real’ or actual solution is x = eω0e–ω0t. The graph below plots that function for ω= 1 and ω= 0.5 respectively. We could take other values for ω0 but, whatever the value, we’ll always get an exponential function like the ones below. It basically graphs what we expect to happen: the mass just accelerates away from its equilibrium point. Indeed, the differential equation is just a description of an accelerating object. Indeed, the e–ω0t term quickly goes to zero, and then it’s the eω0term that rockets that object sky-high – literally. [Note that the acceleration is actually not constant: the force is equal to kx and, hence, the force (and, therefore, the acceleration) actually increases as the mass goes further and further away from its equilibrium point. Also note that if the initial position would have been minus 2, i.e. x= –2, then the object would accelerate away in the other direction, i.e. downwards. Just check it to make sure you understand the equations.]

graph 2 graph

The point to note is our general solution. More formally, and more generally, we get it as follows:

  • If we have a linear second-order differential equation ax” + bx’ + cx = 0 (because of the zero on the right-hand side, we call such equation homogeneous, so it’s quite a mouthful: a linear and homogeneous DE of the second order), then we can find an exponential function ert that will be a solution for it.
  • If such function is a solution, then plugging in it yields ar2ert + brert + cert = 0 or (ar2 + br + c)ert = 0.
  • Now, we can read that as a condition, and the condition amounts to ar2 + br + c = 0. So that’s a quadratic equation we need to solve for r to find two specific solutions r1 and r2, which, in turn, will then yield our general solution:

 x(t) = c1er1+ c2er2t

Note that the general solution is based on the principle of superposition: any linear combination of two specific solutions will be a solution as well. I am mentioning this here because we’ll use that principle more than once.

Complex roots

The steps as described above implicitly assume that the quadratic equation above (i.e. ar2ert + brert + cert = 0), which is better known as the characteristic equation, does yield two real and distinct roots r1 and r2. In fact, it amounts to assuming that that exponential ert is a real-valued exponential function. We know how to find these real roots from our high school math classes: r = (–b ± [b– 4ac]1/2)/2a. However, what happens if the discrimant b– 4ac is negative?

If the disciminant is negative, we will still have two roots, but they will be complex roots. In fact, we can write these two complex roots as r = α ± βi, with i the imaginary unit. Hence, the two complex roots are each other’s complex conjugate and our er1and er2t can be written as:

er1= e(α+βi)t and er2e(α–βi)t

Also, the general solution based on these two particular solutions will be c1e(α+βi)t + c2e(α–βi)t.

[You may wonder why complex roots have to be complex conjugates from each other. Indeed, that’s not so obvious from the raw r = (–b ± [b– 4ac]1/2)/2a formula. But you can re-write it as r = –b/2a ± [b– 4ac]1/2)/2a and, if b– 4ac is negative, as r = –b/2a ± [(−b2+4ac)1/2/2a]. So that gives you the α and β and shows that the two roots are, in effect, each other’s complex conjugate.]

We should briefly pause here to think about what we are doing here really: if we allow r to be complex, then what we’re doing really is allow a complex-valued function (to be precise: we’re talking the complex exponential functions e(λ±μi)t, or any linear combination of the two) of a real variable (the time variable t) to be part of our ‘solution set’ as well.

Now, we’ve analyzed complex exponential functions before–long time ago: you can check out some of my posts last year (November 2013). In fact, we analyzed even more complex – in fact, I should say more complicated rather than more complex here: complex numbers don’t need to be complicated! 🙂 – because we were talking complex-valued functions of complex variables there! That’s not the case here: the argument t (i.e. the input into our function) is real, not complex, but the output – or the function itself – is complex-valued. Now, any complex exponential e(α+βi)t can be written as eαteiβt, and so that’s easy enough to understand:

1. The first factor (i.e. eαt) is just a real-valued exponential function and so we should be familiar with that. Depending on the value of α (negative or positive: see the graph below), it’s a factor that will create an envelope for our function. Indeed, when α is negative, the damping will cause the oscillation to stop after a while. When α is positive, we’ll have a solution resembling the second graph below: we have an amplitude that’s getting bigger and bigger, despite the friction factor (that’s obviously possible only because we keep reinforcing the movement, so we’re not switching off the force in that case). When α is equal to zero, then eαt is equal to unity and so the amplitude will not change as the spring goes up and down over time: we have no friction in that case.

graph 4


2. The second factor (i.e. eiβt) is our periodic function. Indeed, eiβt is the same as eiθ and so just remember Euler’s formula to see what it is really:

eiθ = cos(θ) + isin(θ)

The two graphs below represent the idea: as the phase θ = ωt + Δ (the angular frequency or velocity times the time is equal to the phase, plus or minus some phase shift) goes round and round and round (i.e. increases with time), the two components of eiθ, i.e. the real and imaginary part eiθ, oscillate between –1 and 1 because they are both sinusoidal functions (cosine and sine respectively). Now, we could amplify the amplitude by putting another (real) factor in front (a magnitude different than 1) and write reiθ = r·cos(θ) + r·sin(θ) but that wouldn’t change the nature of this thing.

euler13 slkL9

But so how does all of this relate to that other ‘general’ solution which we’ve found for our oscillator, i.e. the one we got without considering these complex-valued exponential functions as solutions. Indeed, what’s the relation between that x = x0cos(ω0t + Δ) equation and that rather frightening c1e(α+βi)t + c2e(α–βi)t equation? Perhaps we should look at x = x0cos(ω0t + Δ) as the real part of that monster? Yes and no. More no than yes actually. Actually… No. We are not going to have some complex exponential and then forget about the imaginary part. What we will do, though, is to find that general solution – i.e. a family of complex-valued functions – but then we’ll only consider those functions for which the imaginary part is zero, so that’s the subset of real-valued functions only.

I guess this must sound like Chinese. Let’s go step by step.

Using complex roots to find real-valued functions

If we re-write d2x/dt2 = –ω02x in the more general ax” + bx’ + cx = 0 form, then we get x” + ω02x = 0 and so the discriminant b– 4ac is equal to –4ω02, and so that’s a negative number. So we need to go for these complex roots. However, before solving this, let’s first restate what we’re actually doing. We have a differential equation that, ultimately, depends on a real variable (the time variable t), but so now we allow complex-valued functions er1e(α+βi)t and er2e(α–βi)t as solutions. To be precise: these are complex-valued functions x of the real variable t.

That being said, it’s fine to note that real numbers are a subset of the complex numbers and so we can just shrug our shoulders and say all that we’re doing is switch to complex-valued functions because we got stuck with that negative determinant and so we had to allow for complex roots. However, in the end, we do want a real-valued solution x(t). So our x(t) = c1e(α+βi)t + c2e(α–βi)t has to be a real-valued function, not a complex-valued function.

That means that we have to take a subset of the family of functions that we’ve found. In other words, the imaginary part of  c1e(α+βi)t + c2e(α–βi)t has to be zero. How can it be zero? Well… It basically means that c1e(α+βi)t and c2e(α–βi)t have to be complex conjugates.

OK… But how do we do that? We need to find a way to write that c1e(α+βi)t + c2e(α–βi)t sum in a more manageable ζ + η form. We can do that by using Euler’s formula once again to re-write those two complex exponentials as follows:

  • e(α+βi)t = eαteiβt = eαt[cos(βt) + isin(βt)]
  • e(α–βi)t = eαte–iβt = eαt[cos(–βt) + isin(–βt)] = eαt[cos(βt) – isin(βt)]

Note that, for the e(α–βi)t expression, we’ve used the fact that cos(–θ) = cos(θ) and that sin(–θ) = –sin(θ). Also note that α and β are real numbers, so they do not have an imaginary part–unlike cand c2, which may or may not have an imaginary part (i.e. they could be pure real numbers, but they could be complex as well).

We can then re-write that c1e(α+βi)t + c2e(α–βi)t sum as:

c1e(α+βi)t + c2e(α–βi)t = c1eαt[cos(βt) + isin(βt)] + c2eαt[cos(βt) – isin(βt)]

= (c1 + c2)eαtcos(βt) + (c1 – c2)ieαtsin(βt)

So what? Well, we want that imaginary part in our solution to disappear and so it’s easy to see that the imaginary part will indeed disappear if c1 – c2 = 0, i.e. if c1 = c= c. So we have a fairly general real-valued solution x(t) = 2c·eαtcos(βt) here, with c some real number. [Note that c has to be some real number because, if we would assume that cand c(and, therefore, c) would be equal complex numbers, then the c1 – c2 factor would also disappear, but then we would have a complex c1 + c2 sum in front of the eαtcos(βt) factor, so that would defeat the purpose of finding real-valued function as a solution because (c1 + c2)eαtcos(βt) would still be complex! […] Are you still with me? :-)]

So, OK, we’ve got the solution and so that should be it, isn’t it? Well… No. Wait. Not yet. Because these coefficients  c1 and c2 may be complex, there’s another solution as well. Look at that formula above. Let us suppose that c1 would be equal to some (real) number c divided by i (so c= c/i), and that cwould be its opposite, so c= –c(i.e. minus c1). Then we would have two complex numbers consisting of an imaginary part only: c= c/i and c= –c= –c/i, and they would be each other’s complex conjugate. Indeed, note that 1/i = i–1= –i and so we can write c= –c·and c= c·i. Then we’d get the following for that c1e(α+βi)t + c2e(α–βi)t sum:

 (c1 + c2)eαtcos(βt) + (c1 – c2)ieαtsin(βt)

= (c/i – c/i)eαtcos(βt) + (c/i + c/i)ieαtsin(βt) = 2c·eαtsin(βt)

So, while cand c2 are complex, our grand result is a real-valued function once again or – to be precise – another family of real-valued functions (that’s because c can take on any value).

Are we done? Yes. There are no other possibilities. So now we just need to remember to apply the principle of superposition: any (real) linear combination of 2c·eαtcos(μt) and 2c·eαtsin(μt) will also be a (real-valued) solution, so the general (real-valued) solution for our problem is:

x(t) = a·2c·eαtcos(βt) + b·2c·eαtsin(βt) = Aeαtcos(βt) + Beαtsin(βt)

eαt[Acos(βt) + Bsin(βt)]

So what do we have here? Well, the first factor is, once again, an ‘envelope’ function: depending on the value of α, (i) negative, (ii) positive or (iii) zero, we have an oscillation that (i) damps out, (ii) goes out of control, or (iii) keeps oscillating in the same steady way forever.

The second part is equivalent to our ‘general’ x(t) = x0cos(ω0t + Δ) solution. Indeed, that x(t) = x0cos(ω0t + Δ) solution is somewhat less ‘general’ than the one above because it does not have the eαt factor. However, x(t) = x0cos(ω0t + Δ) solution is equivalent to the Acos(βt) + Bsin(βt) factor. How’s that? We can show how they are related by using the trigonometric formula for adding angles: cos(α + β) = cos(α)cos(β) – sin(α)sin(β). Indeed, we can write:

x0cos(ω0t + Δ) = x0cos(Δ)cos(ω0t) – x0sin(Δ)sin(ω0t) = Acos(βt) + Bsin(βt)

with A = x0cos(Δ), B = – x0sin(Δ) and, finally, μ = ω0

Are you convinced now? If not… Well… Nothing much I can do, I feel. In that case, I can only encourage you to do a full ‘work-out’ by reading the excellent overview of all possible situations in Paul’s Online MathNotes (

Feynman’s treatment of second-order differential equations

Feynman takes a somewhat different approach in his Lectures. He solves them in a much more general way. At first, I thought his treatment was too confusing and, hence, I would not have mentioned it. However, I like the logic behind, even if his approach is somewhat more messy in terms of notations and all that. Let’s first look at the differential equation once again. Let’s take a system with a friction factor that’s proportional to the speed: Ff = –c·dx/dt. [See my previous post for some comments on that assumption: the assumption is, generally speaking, too much of a simplification but it makes for a ‘nice’ linear equation and so that’s why physicists present it that way.] To ease the math, c is usually written as c = mγ. Hence, γ = c/m is the friction per unit of mass. That makes sense, I’d think. In addition, we need to remember that ω02 = k/m, so k = mω02. Our differential equation then becomes m·d2x/dt2 = –γm·dx/dt – kx (mass times acceleration is the sum of the forces) or m·d2x/dt2 + γm·dx/dt + mω02·x = 0. Dividing the mass factor away gives us an even simpler form:

d2x/dt2 + γdx/dt + ω02x = 0

You’ll remember this differential equation from the previous post: we used it to calculate the (stored) energy and the Q of a mechanical oscillator. However, we didn’t show you how. You now understand why: the stuff above is not easy–the length of the arguments involved is why I am devoting an entire post to it!

Now, instead of assuming some exponential ert as a solution, real- or complex-valued, Feynman assumes a much more general complex-valued function as solution: he substitutes x for x = Aeiαt, with A a complex number as well so we can write A as A = A0eiΔ. That more general assumption allows for the inclusion of a phase shift straight from the start. Indeed, we can write x as x = A0eiΔeiαt = = A0ei(αt+Δ). Does that look complicated? It probably does, because we also have to remember that α is a complex number! So we’ve got a very general complex-valued exponential function indeed here!

However, let’s not get ahead of ourselves and follow Feynman. So he plugs in that complex-valued x = Aeiαt and we get:

(–α+ iγα + ω02)Aeiαt = 0

So far, so good. The logic now is more or less the same as the logic we developed above. We’ve got two factors here: (1) a quadratic equation –αiγα + ω02 (with one complex coefficient iγ) and (2) a complex exponential function Aeiαt. The second factor (Aeiαt) cannot be zero, because that’s x and we assume our oscillator is not standing still. So it’s the first factor (i.e. the quadratic equation in α with a complex coefficient iγ) which has to be zero. So we solve for the roots α and find

α = –iγ/(–2) ± [(–(iγ)2–4ω02)1/2/(-2)] = iγ/2 ± [(γ2–4ω02)1/2/(-2)]

= iγ/2 ± (ω0– γ2/4)1/2 iγ/2 ± ωγ

[We get this by bringing i and –2 inside of the square root expression. It’s not very straightforward but you should be able to figure it out.]

So that’s an interesting expression: the imaginary part of α is iγ/2 and its real part is (ω0– γ2/4)1/2, which we denoted as ωγ in the expression above. [Note that we assume there’s no problem with the square root expression: γ2/4 should be smaller than ω02 so ωγ is supposed to be some real positive number.] And so we’ve got the two solutions xand x2:

x= Aei(iγ/2 + ωγ)t =  Ae–γt/2+iωγ= Ae–γt/2eiωγ

x= Bei(iγ/2 – ωγ)t =  Be–γt/2–iωγ= Be–γt/2e–iωγt

Note, once again, that A and B can be any (complex) number and that, because of the principle of superposition, any linear combination of these two solutions will also be a solution. So the general solution is

x = Ae–γt/2eiωγ+ Be–γt/2e–iωγ= e–γt/2(Aeiωγ+ Be–iωγt) 

Now, we recognize the shape of this: a (real-valued) envelope function e–γt/2 and then a linear combination of two exponentials. But so we want something real-valued in the end so, once again, we need to impose the condition that Aeiωγand Be–iωγare complex conjugates of each other. Now, we can see that eiωγand e–iωγare complex conjugates but what does this say about A and B? Well… The complex conjugate of a product is the product of the complex conjugates of the factors involved: (z1z2)* = (z1*)(z1*). That implies that B has to be the complex conjugate of A: B = A*. So the final (real-valued) solution becomes:

x = e–γt/2(Aeiωγ+ A*e–iωγt) 

Now, I’ll leave it to you to prove that the second factor in the product above (Aeiωγ+ A*e–iωγt) is a real-valued function of the real variable t. It should be the same as x0cos(Δ)cos(ω0t) – x0sin(Δ)sin(ω0t), and that gives you a graph like the one below. However, I can readily imagine that, by now, you’re just thinking: Oh well… Whatever! 🙂


So the difference between Feynman’s approach and the one I presented above (which is the one you’ll find in most textbooks) is the assumption in terms of the specific solution: instead of substituting x for ert, with allowing r to take on complex values, Feynman substitutes x for Aeiαt, and allows both A and α  to take on complex values. It makes the calculations more complicated but, when everything is said and done, I think Feynman’s approach is more consistent because more encompassing. However, that’s subject to taste, and I gather, from comments on the Web, that many people think that this chapter in Feynman’s Lectures is not his best. So… Well… I’ll leave it to you to make the final judgment.

Note: The one critique that is relevant, in regard to Feynman’s treatment of the matter, is that he devotes quite a bit of time and space to explain how these oscillatory or periodic displacements can be viewed as being the real part of a complex exponential. Indeed, cos(ωt) is the real part of eiωt. But so that’s something different than (1) expanding the realm of possible solutions to a second-order differential equation from real-valued functions to complex-valued functions in order to (2) then, once we’ve found the general solution, consider only real-valued functions once again as ‘allowable’ solutions to that equation. I think that’s the gist of the matter really. It took me a while to fully ‘get’ this. I hope this post helps you to understand it somewhat quicker than I did. 🙂


I guess the only thing that I should do now is to work some examples. However, I’ll refer you Paul’s Online Math Notes for that once again (see the reference above). Indeed, it is about time I end my rather lengthy exposé (three posts on the same topic!) on oscillators and resonance. I hope you enjoyed it, although I can readily imagine that it’s hard to appreciate the math involved.

It is not easy indeed: I actually struggled with it, despite the fact that I think I understand complex analysis somewhat. However, the good thing is that, once we’re through it, we can really solve a lot of problems. As Feynman notes: “Linear (differential) equations are so important that perhaps fifty percent of the time we are solving linear equations in physics and engineering.” So, bearing in that mind, we should move on to the next.

A not so easy piece: introducing the wave equation (and the Schrödinger equation)

The title above refers to a previous post: An Easy Piece: Introducing the wave function.

Indeed, I may have been sloppy here and there – I hope not – and so that’s why it’s probably good to clarify that the wave function (usually represented as Ψ – the psi function) and the wave equation (Schrödinger’s equation, for example – but there are other types of wave equations as well) are two related but different concepts: wave equations are differential equations, and wave functions are their solutions.

Indeed, from a mathematical point of view, a differential equation (such as a wave equation) relates a function (such as a wave function) with its derivatives, and its solution is that function or – more generally – the set (or family) of functions that satisfies this equation. 

The function can be real-valued or complex-valued, and it can be a function involving only one variable (such as y = y(x), for example) or more (such as u = u(x, t) for example). In the first case, it’s a so-called ordinary differential equation. In the second case, the equation is referred to as a partial differential equation, even if there’s nothing ‘partial’ about it: it’s as ‘complete’ as an ordinary differential equation (the name just refers to the presence of partial derivatives in the equation). Hence, in an ordinary differential equation, we will have terms involving dy/dx and/or d2y/dx2, i.e. the first and second derivative of y respectively (and/or higher-order derivatives, depending on the degree of the differential equation), while in partial differential equations, we will see terms involving ∂u/∂t and/or ∂u2/∂x(and/or higher-order partial derivatives), with ∂ replacing d as a symbol for the derivative.

The independent variables could also be complex-valued but, in physics, they will usually be real variables (or scalars as real numbers are also being referred to – as opposed to vectors, which are nothing but two-, three- or more-dimensional numbers really). In physics, the independent variables will usually be x – or let’s use r = (x, y, z) for a change, i.e. the three-dimensional space vector – and the time variable t. An example is that wave function which we introduced in our ‘easy piece’.

Ψ(r, t) = Aei(p·r – Et)ħ

[If you read the Easy Piece, then you might object that this is not quite what I wrote there, and you are right: I wrote Ψ(r, t) = Aei(p/ħr – ωt). However, here I am just introducing the other de Broglie relation (i.e. the one relating energy and frequency): E = hf =ħω and, hence, ω = E/ħ. Just re-arrange a bit and you’ll see it’s the same.]

From a physics point of view, a differential equation represents a system subject to constraints, such as the energy conservation law (the sum of the potential and kinetic energy remains constant), and Newton’s law of course: F = d(mv)/dt. A differential equation will usually also be given with one or more initial conditions, such as the value of the function at point t = 0, i.e. the initial value of the function. To use Wikipedia’s definition: “Differential equations arise whenever a relation involving some continuously varying quantities (modeled by functions) and their rates of change in space and/or time (expressed as derivatives) is known or postulated.”

That sounds a bit more complicated, perhaps, but it means the same: once you have a good mathematical model of a physical problem, you will often end up with a differential equation representing the system you’re looking at, and then you can do all kinds of things, such as analyzing whether or not the actual system is in an equilibrium and, if not, whether it will tend to equilibrium or, if not, what the equilibrium conditions would be. But here I’ll refer to my previous posts on the topic of differential equations, because I don’t want to get into these details – as I don’t need them here.

The one thing I do need to introduce is an operator referred to as the gradient (it’s also known as the del operator, but I don’t like that word because it does not convey what it is). The gradient – denoted by ∇ – is a shorthand for the partial derivatives of our function u or Ψ with respect to space, so we write:

∇ = (∂/∂x, ∂/∂y, ∂/∂z)

You should note that, in physics, we apply the gradient only to the spatial variables, not to time. For the derivative in regard to time, we just write ∂u/∂t or ∂Ψ/∂t.

Of course, an operator means nothing until you apply it to a (real- or complex-valued) function, such as our u(x, t) or our Ψ(r, t):

∇u = ∂u/∂x and ∇Ψ = (∂Ψ/∂x, ∂Ψ/∂y, ∂Ψ/∂z)

As you can see, the gradient operator returns a vector with three components if we apply it to a real- or complex-valued function of r, and so we can do all kinds of funny things with it combining it with the scalar or vector product, or with both. Here I need to remind you that, in a vector space, we can multiply vectors using either (i) the scalar product, aka the dot product (because of the dot in its notation: ab) or (ii) the vector product, aka as the cross product (yes, because of the cross in its notation: b).

So we can define a whole range of new operators using the gradient and these two products, such as the divergence and the curl of a vector field. For example, if E is the electric field vector (I am using an italic bold-type E so you should not confuse E with the energy E, which is a scalar quantity), then div E = ∇•E, and curl E =∇×E. Taking the divergence of a vector will yield some number (so that’s a scalar), while taking the curl will yield another vector. 

I am mentioning these operators because you will often see them. A famous example is the set of equations known as Maxwell’s equations, which integrate all of the laws of electromagnetism and from which we can derive the electromagnetic wave equation:

(1) ∇•E = ρ/ε(Gauss’ law)

(2) ∇×E = –∂B/∂t (Faraday’s law)

(3) ∇•B = 0

(4) c2∇×B =  j+  ∂E/∂t  

I should not explain these but let me just remind you of the essentials:

  1. The first equation (Gauss’ law) can be derived from the equations for Coulomb’s law and the forces acting upon a charge q in an electromagnetic field: F = q(E + v×B) – with B the magnetic field vector (F is also referred to as the Lorentz force: it’s the combined force on a charged particle caused by the electric and magnetic fields; v the velocity of the (moving) charge;  ρ the charge density (so charge is thought of as being distributed in space, rather than being packed into points, and that’s OK because our scale is not the quantum-mechanical one here); and, finally, ε0 the electric constant (some 8.854×10−12 farads per meter).
  2. The second equation (Faraday’s law) gives the electric field associated with a changing magnetic field.
  3. The third equation basically states that there is no such thing as a magnetic charge: there are only electric charges.
  4. Finally, in the last equation, we have a vector j representing the current density: indeed, remember than magnetism only appears when (electric) charges are moving, so if there’s an electric current. As for the equation itself, well… That’s a more complicated story so I will leave that for the post scriptum.

We can do many more things: we can also take the curl of the gradient of some scalar, or the divergence of the curl of some vector (both have the interesting property that they are zero), and there are many more possible combinations – some of them useful, others not so useful. However, this is not the place to introduce differential calculus of vector fields (because that’s what it is).

The only other thing I need to mention here is what happens when we apply this gradient operator twice. Then we have an new operator ∇•∇ = ∇which is referred to as the Laplacian. In fact, when we say ‘apply ∇ twice’, we are actually doing a dot product. Indeed, ∇ returns a vector, and so we are going to multiply this vector once again with a vector using the dot product rule: a= ∑aib(so we multiply the individual vector components and then add them). In the case of our functions u and Ψ, we get:

∇•(∇u) =∇•∇u = (∇•∇)u = ∇u =∂2u/∂x2

∇•(∇Ψ) = ∇Ψ = ∂2Ψ/∂x+ ∂2Ψ/∂y+ ∂2Ψ/∂z2

Now, you may wonder what it means to take the derivative (or partial derivative) of a complex-valued function (which is what we are doing in the case of Ψ) but don’t worry about that: a complex-valued function of one or more real variables,  such as our Ψ(x, t), can be decomposed as Ψ(x, t) =ΨRe(x, t) + iΨIm(x, t), with ΨRe and ΨRe two real-valued functions representing the real and imaginary part of Ψ(x, t) respectively. In addition, the rules for integrating complex-valued functions are, to a large extent, the same as for real-valued functions. For example, if z is a complex number, then dez/dz = ez and, hence, using this and other very straightforward rules, we can indeed find the partial derivatives of a function such as Ψ(r, t) = Aei(p·r – Et)ħ with respect to all the (real-valued) variables in the argument.

The electromagnetic wave equation  

OK. That’s enough math now. We are ready now to look at – and to understand – a real wave equation – I mean one that actually represents something in physics. Let’s take Maxwell’s equations as a start. To make it easy – and also to ensure that you have easy access to the full derivation – we’ll take the so-called Heaviside form of these equations:

Heaviside form of Maxwell's equations

This Heaviside form assumes a charge-free vacuum space, so there are no external forces acting upon our electromagnetic wave. There are also no other complications such as electric currents. Also, the c2 (i.e. the square of the speed of light) is written here c2 = 1/με, with μ and ε the so-called permeability (μ) and permittivity (ε) respectively (c0, μand ε0 are the values in a vacuum space: indeed, light travels slower elsewhere (e.g. in glass) – if at all).

Now, these four equations can be replaced by just two, and it’s these two equations that are referred to as the electromagnetic wave equation(s):

electromagnetic wave equation

The derivation is not difficult. In fact, it’s much easier than the derivation for the Schrödinger equation which I will present in a moment. But, even if it is very short, I will just refer to Wikipedia in case you would be interested in the details (see the article on the electromagnetic wave equation). The point here is just to illustrate what is being done with these wave equations and why – not so much howIndeed, you may wonder what we have gained with this ‘reduction’.

The answer to this very legitimate question is easy: the two equations above are second-order partial differential equations which are relatively easy to solve. In other words, we can find a general solution, i.e. a set or family of functions that satisfy the equation and, hence, can represent the wave itself. Why a set of functions? If it’s a specific wave, then there should only be one wave function, right? Right. But to narrow our general solution down to a specific solution, we will need extra information, which are referred to as initial conditions, boundary conditions or, in general, constraints. [And if these constraints are not sufficiently specific, then we may still end up with a whole bunch of possibilities, even if they narrowed down the choice.]

Let’s give an example by re-writing the above wave equation and using our function u(x, t) or, to simplify the analysis, u(x, t) – so we’re looking at a plane wave traveling in one dimension only:

Wave equation for u

There are many functional forms for u that satisfy this equation. One of them is the following:

general solution for wave equation

This resembles the one I introduced when presenting the de Broglie equations, except that – this time around – we are talking a real electromagnetic wave, not some probability amplitude. Another difference is that we allow a composite wave with two components: one traveling in the positive x-direction, and one traveling in the negative x-direction. Now, if you read the post in which I introduced the de Broglie wave, you will remember that these Aei(kx–ωt) or Be–i(kx+ωt) waves give strange probabilities. However, because we are not looking at some probability amplitude here – so it’s not a de Broglie wave but a real wave (so we use complex number notation only because it’s convenient but, in practice, we’re only considering the real part), this functional form is quite OK.

That being said, the following functional form, representing a wave packet (aka a wave train) is also a solution (or a set of solutions better):

Wave packet equation

Huh? Well… Yes. If you really can’t follow here, I can only refer you to my post on Fourier analysis and Fourier transforms: I cannot reproduce that one here because that would make this post totally unreadable. We have a wave packet here, and so that’s the sum of an infinite number of component waves that interfere constructively in the region of the envelope (so that’s the location of the packet) and destructively outside. The integral is just the continuum limit of a summation of n such waves. So this integral will yield a function u with x and t as independent variables… If we know A(k) that is. Now that’s the beauty of these Fourier integrals (because that’s what this integral is). 

Indeed, in my post on Fourier transforms I also explained how these amplitudes A(k) in the equation above can be expressed as a function of u(x, t) through the inverse Fourier transform. In fact, I actually presented the Fourier transform pair Ψ(x) and Φ(p) in that post, but the logic is same – except that we’re inserting the time variable t once again (but with its value fixed at t=0):

Fourier transformOK, you’ll say, but where is all of this going? Be patient. We’re almost done. Let’s now introduce a specific initial condition. Let’s assume that we have the following functional form for u at time t = 0:

u at time 0

You’ll wonder where this comes from. Well… I don’t know. It’s just an example from Wikipedia. It’s random but it fits the bill: it’s a localized wave (so that’s a a wave packet) because of the very particular form of the phase (θ = –x2+ ik0x). The point to note is that we can calculate A(k) when inserting this initial condition in the equation above, and then – finally, you’ll say – we also get a specific solution for our u(x, t) function by inserting the value for A(k) in our general solution. In short, we get:



u final form

As mentioned above, we are actually only interested in the real part of this equation (so that’s the e with the exponent factor (note there is no in it, so it’s just some real number) multiplied with the cosine term).

However, the example above shows how easy it is to extend the analysis to a complex-valued wave function, i.e. a wave function describing a probability amplitude. We will actually do that now for Schrödinger’s equation. [Note that the example comes from Wikipedia’s article on wave packets, and so there is a nice animation which shows how this wave packet (be it the real or imaginary part of it) travels through space. Do watch it!]

Schrödinger’s equation

Let me just write it down:

Schrodinger's equation

That’s it. This is the Schrödinger equation – in a somewhat simplified form but it’s OK.

[…] You’ll find that equation above either very simple or, else, very difficult depending on whether or not you understood most or nothing at all of what I wrote above it. If you understood something, then it should be fairly simple, because it hardly differs from the other wave equation.

Indeed, we have that imaginary unit (i) in front of the left term, but then you should not panic over that: when everything is said and done, we are working here with the derivative (or partial derivative) of a complex-valued function, and so it should not surprise us that we have an i here and there. It’s nothing special. In fact, we had them in the equation above too, but they just weren’t explicit. The second difference with the electromagnetic wave equation is that we have a first-order derivative of time only (in the electromagnetic wave equation we had 2u/∂t2, so that’s a second-order derivative). Finally, we have a -1/2 factor in front of the right-hand term, instead of c2. OK, so what? It’s a different thing – but that should not surprise us: when everything is said and done, it is a different wave equation because it describes something else (not an electromagnetic wave but a quantum-mechanical system).

To understand why it’s different, I’d need to give you the equivalent of Maxwell’s set of equations for quantum mechanics, and then show how this wave equation is derived from them. I could do that. The derivation is somewhat lengthier than for our electromagnetic wave equation but not all that much. The problem is that it involves some new concepts which we haven’t introduced as yet – mainly some new operators. But then we have introduced a lot of new operators already (such as the gradient and the curl and the divergence) so you might be ready for this. Well… Maybe. The treatment is a bit lengthy, and so I’d rather do in a separate post. Why? […] OK. Let me say a few things about it then. Here we go:

  • These new operators involve matrix algebra. Fine, you’ll say. Let’s get on with it. Well… It’s matrix algebra with matrices with complex elements, so if we write a n×m matrix A as A = (aiaj), then the elements aiaj (i = 1, 2,… n and j = 1, 2,… m) will be complex numbers.
  • That allows us to define Hermitian matrices: a Hermitian matrix is a square matrix A which is the same as the complex conjugate of its transpose.
  • We can use such matrices as operators indeed: transformations acting on a column vector X to produce another column vector AX.
  • Now, you’ll remember – from your course on matrix algebra with real (as opposed to complex) matrices, I hope – that we have this very particular matrix equation AX = λX which has non-trivial solutions (i.e. solutions X ≠ 0) if and only if the determinant of A-λI is equal to zero. This condition (det(A-λI) = 0) is referred to as the characteristic equation.
  • This characteristic equation is a polynomial of degree n in λ and its roots are called eigenvalues or characteristic values of the matrix A. The non-trivial solutions X ≠ 0 corresponding to each eigenvalue are called eigenvectors or characteristic vectors.

Now – just in case you’re still with me – it’s quite simple: in quantum mechanics, we have the so-called Hamiltonian operator. The Hamiltonian in classical mechanics represents the total energy of the system: H = T + V (total energy H = kinetic energy T + potential energy V). Here we have got something similar but different. 🙂 The Hamiltonian operator is written as H-hat, i.e. an H with an accent circonflexe (as they say in French). Now, we need to let this Hamiltonian operator act on the wave function Ψ and if the result is proportional to the same wave function Ψ, then Ψ is a so-called stationary state, and the proportionality constant will be equal to the energy E of the state Ψ. These stationary states correspond to standing waves, or ‘orbitals’, such as in atomic orbitals or molecular orbitals. So we have:

E\Psi=\hat H \Psi

I am sure you are no longer there but, in fact, that’s it. We’re done with the derivation. The equation above is the so-called time-independent Schrödinger equation. It’s called like that not because the wave function is time-independent (it is), but because the Hamiltonian operator is time-independent: that obviously makes sense because stationary states are associated with specific energy levels indeed. However, if we do allow the energy level to vary in time (which we should do – if only because of the uncertainty principle: there is no such thing as a fixed energy level in quantum mechanics), then we cannot use some constant for E, but we need a so-called energy operator. Fortunately, this energy operator has a remarkably simple functional form:

\hat{E} \Psi = i\hbar\dfrac{\partial}{\partial t}\Psi = E\Psi  Now if we plug that in the equation above, we get our time-dependent Schrödinger equation  

i \hbar \frac{\partial}{\partial t}\Psi = \hat H \Psi

OK. You probably did not understand one iota of this but, even then, you will object that this does not resemble the equation I wrote at the very beginning: i(u/∂t) = (-1/2)2u.

You’re right, but we only need one more step for that. If we leave out potential energy (so we assume a particle moving in free space), then the Hamiltonian can be written as:

\hat{H} = -\frac{\hbar^2}{2m}\nabla^2

You’ll ask me how this is done but I will be short on that: the relationship between energy and momentum is being used here (and so that’s where the 2m factor in the denominator comes from). However, I won’t say more about it because this post would become way too lengthy if I would include each and every derivation and, remember, I just want to get to the result because the derivations here are not the point: I want you to understand the functional form of the wave equation only. So, using the above identity and, OK, let’s be somewhat more complete and include potential energy once again, we can write the time-dependent wave equation as:

 i\hbar\frac{\partial}{\partial t}\Psi(\mathbf{r},t) = -\frac{\hbar^2}{2m}\nabla^2\Psi(\mathbf{r},t) + V(\mathbf{r},t)\Psi(\mathbf{r},t)

Now, how is the equation above related to i(u/∂t) = (-1/2)2u? It’s a very simplified version of it: potential energy is, once again, assumed to be not relevant (so we’re talking a free particle again, with no external forces acting on it) but the real simplification is that we give m and ħ the value 1, so m = ħ = 1. Why?

Well… My initial idea was to do something similar as I did above and, hence, actually use a specific example with an actual functional form, just like we did for that the real-valued u(x, t) function. However, when I look at how long this post has become already, I realize I should not do that. In fact, I would just copy an example from somewhere else – probably Wikipedia once again, if only because their examples are usually nicely illustrated with graphs (and often animated graphs). So let me just refer you here to the other example given in the Wikipedia article on wave packets: that example uses that simplified i(u/∂t) = (-1/2)2u equation indeed. It actually uses the same initial condition:

u at time 0

However, because the wave equation is different, the wave packet behaves differently. It’s a so-called dispersive wave packet: it delocalizes. Its width increases over time and so, after a while, it just vanishes because it diffuses all over space. So there’s a solution to the wave equation, given this initial condition, but it’s just not stable – as a description of some particle that is (from a mathematical point of view – or even a physical point of view – there is no issue).

In any case, this probably all sounds like Chinese – or Greek if you understand Chinese :-). I actually haven’t worked with these Hermitian operators yet, and so it’s pretty shaky territory for me myself. However, I felt like I had picked up enough math and physics on this long and winding Road to Reality (I don’t think I am even halfway) to give it a try. I hope I succeeded in passing the message, which I’ll summarize as follows:

  1. Schrödinger’s equation is just like any other differential equation used in physics, in the sense that it represents a system subject to constraints, such as the relationship between energy and momentum.
  2. It will have many general solutions. In other words, the wave function – which describes a probability amplitude as a function in space and time – will have many general solutions, and a specific solution will depend on the initial conditions.
  3. The solution(s) can represent stationary states, but not necessary so: a wave (or a wave packet) can be non-dispersive or dispersive. However, when we plug the wave function into the wave equation, it will satisfy that equation.

That’s neither spectacular nor difficult, is it? But, perhaps, it helps you to ‘understand’ wave equations, including the Schrödinger equation. But what is understanding? Dirac once famously said: “I consider that I understand an equation when I can predict the properties of its solutions, without actually solving it.”

Hmm… I am not quite there yet, but I am sure some more practice with it will help. 🙂

Post scriptum: On Maxwell’s equations

First, we should say something more about these two other operators which I introduced above: the divergence and the curl. First on the divergence.

The divergence of a field vector E (or B) at some point r represents the so-called flux of E, i.e. the ‘flow’ of E per unit volume. So flux and divergence both deal with the ‘flow’ of electric field lines away from (positive) charges. [The ‘away from’ is from positive charges indeed – as per the convention: Maxwell himself used the term ‘convergence’ to describe flow towards negative charges, but so his ‘convention’ did not survive. Too bad, because I think convergence would be much easier to remember.]

So if we write that ∇•ρ/ε0, then it means that we have some constant flux of E because of some (fixed) distribution of charges.

Now, we already mentioned that equation (2) in Maxwell’s set meant that there is no such thing as a ‘magnetic’ charge: indeed, ∇•B = 0 means there is no magnetic flux. But, of course, magnetic fields do exist, don’t they? They do. A current in a wire, for example, i.e. a bunch of steadily moving electric charges, will induce a magnetic field according to Ampère’s law, which is part of equation (4) in Maxwell’s set: c2∇×B =  j0, with j representing the current density and ε0 the electric constant.

Now, at this point, we have this curl: ∇×B. Just like divergence (or convergence as Maxwell called it – but then with the sign reversed), curl also means something in physics: it’s the amount of ‘rotation’, or ‘circulation’ as Feynman calls it, around some loop.

So, to summarize the above, we have (1) flux (divergence) and (2) circulation (curl) and, of course, the two must be related. And, while we do not have any magnetic charges and, hence, no flux for B, the current in that wire will cause some circulation of B, and so we do have a magnetic field. However, that magnetic field will be static, i.e. it will not change. Hence, the time derivative ∂B/∂t will be zero and, hence, from equation (2) we get that ∇×E = 0, so our electric field will be static too. The time derivative ∂E/∂t which appears in equation (4) also disappears and we just have c2∇×B =  j0. This situation – of a constant magnetic and electric field – is described as electrostatics and magnetostatics respectively. It implies a neat separation of the four equations, and it makes magnetism and electricity appear as distinct phenomena. Indeed, as long as charges and currents are static, we have:

[I] Electrostatics: (1) ∇•E = ρ/εand (2) ∇×E = 0

[II] Magnetostatics: (3) c2∇×B =  jand (4) ∇•B = 0

The first two equations describe a vector field with zero curl and a given divergence (i.e. the electric field) while the third and fourth equations second describe a seemingly separate vector field with a given curl but zero divergence. Now, I am not writing this post scriptum to reproduce Feynman’s Lectures on Electromagnetism, and so I won’t say much more about this. I just want to note two points:

1. The first point to note is that factor cin the c2∇×B =  jequation. That’s something which you don’t have in the ∇•E = ρ/εequation. Of course, you’ll say: So what? Well… It’s weird. And if you bring it to the other side of the equation, it becomes clear that you need an awful lot of current for a tiny little bit of magnetic circulation (because you’re dividing by c , so that’s a factor 9 with 16 zeroes after it (9×1016):  an awfully big number in other words). Truth be said, it reveals something very deep. Hmm? Take a wild guess. […] Relativity perhaps? Well… Yes!

It’s obvious that we buried v somewhere in this equation, the velocity of the moving charges. But then it’s part of j of course: the rate at which charge flows through a unit area per second. But – Hey! – velocity as compared to what? What’s the frame of reference? The frame of reference is us obviously or – somewhat less subjective – the stationary charges determining the electric field according to equation (1) in the set above: ∇•E = ρ/ε0. But so here we can ask the same question: stationary in what reference frame? As compared to the moving charges? Hmm… But so how does it work with relativity? I won’t copy Feynman’s 13th Lecture here, but so, in that lecture, he analyzes what happens to the electric and magnetic force when we look at the scene from another coordinate system – let’s say one that moves parallel to the wire at the same speed as the moving electrons, so – because of our new reference frame – the ‘moving electrons’ now appear to have no speed at all but, of course, our stationary charges will now seem to move.

What Feynman finds – and his calculations are very easy and straightforward – is that, while we will obviously insert different input values into Maxwell’s set of equations and, hence, get different values for the E and B fields, the actual physical effect – i.e. the final Lorentz force on a (charged) particle – will be the same. To be very specific, in a coordinate system at rest with respect to the wire (so we see charges move in the wire), we find a ‘magnetic’ force indeed, but in a coordinate system moving at the same speed of those charges, we will find an ‘electric’ force only. And from yet another reference frame, we will find a mixture of E and B fields. However, the physical result is the same: there is only one combined force in the end – the Lorentz force F = q(E + v×B) – and it’s always the same, regardless of the reference frame (inertial or moving at whatever speed – relativistic (i.e. close to c) or not).

In other words, Maxwell’s description of electromagnetism is invariant or, to say exactly the same in yet other words, electricity and magnetism taken together are consistent with relativity: they are part of one physical phenomenon: the electromagnetic interaction between (charged) particles. So electric and magnetic fields appear in different ‘mixtures’ if we change our frame of reference, and so that’s why magnetism is often described as a ‘relativistic’ effect – although that’s not very accurate. However, it does explain that cfactor in the equation for the curl of B. [How exactly? Well… If you’re smart enough to ask that kind of question, you will be smart enough to find the derivation on the Web. :-)]

Note: Don’t think we’re talking astronomical speeds here when comparing the two reference frames. It would also work for astronomical speeds but, in this case, we are talking the speed of the electrons moving through a wire. Now, the so-called drift velocity of electrons – which is the one we have to use here – in a copper wire of radius 1 mm carrying a steady current of 3 Amps is only about 1 m per hour! So the relativistic effect is tiny  – but still measurable !

2. The second thing I want to note is that  Maxwell’s set of equations with non-zero time derivatives for E and B clearly show that it’s changing electric and magnetic fields that sort of create each other, and it’s this that’s behind electromagnetic waves moving through space without losing energy. They just travel on and on. The math behind this is beautiful (and the animations in the related Wikipedia articles are equally beautiful – and probably easier to understand than the equations), but that’s stuff for another post. As the electric field changes, it induces a magnetic field, which then induces a new electric field, etc., allowing the wave to propagate itself through space. I should also note here that the energy is in the field and so, when electromagnetic waves, such as light, or radiowaves, travel through space, they carry their energy with them.

Let me be fully complete here, and note that there’s energy in electrostatic fields as well, and the formula for it is remarkably beautiful. The total (electrostatic) energy U in an electrostatic field generated by charges located within some finite distance is equal to:

Energy of electrostatic field

This equation introduces the electrostatic potential. This is a scalar field Φ from which we can derive the electric field vector just by applying the gradient operator. In fact, all curl-free fields (such as the electric field in this case) can be written as the gradient of some scalar field. That’s a universal truth. See how beautiful math is? 🙂