Re-visiting electron orbitals (II)

I’ve talked about electron orbitals in a couple of posts already – including a fairly recent one, which is why I put the (II) after the title. However, I just wanted to tie up some loose ends here – and do some more thinking about the concept of a definite energy state. What is it really? We know the wavefunction for a definite energy state can always be written as:

ψ(x, t) = ei·(E/ħ)·t·ψ(x)

Well… In fact, we should probably formally prove that but… Well… Let us just explore this formula in a more intuitive way – for the time being, that is – using those electron orbitals we’ve derived.

First, let me note that ψ(x, t) and ψ(x) are very different functions and, therefore, the choice of the same symbol for both (the Greek psi) is – in my humble opinion – not very fortunate, but then… Well… It is the choice of physicists – as copied in textbooks all over – and so we’ll just have to live with it. Of course, we can appreciate why they choose to use the same symbol – ψ(x) is like a time-independent wavefunction now, so that’s nice – but… Well… You should note that it is not so obvious to write some function as the product of two other functions. To be complete, I’ll be a bit more explicit here: if some function in two variables – say F(x, y) – can be written as the product of two functions in one variable – say f(x) and g(y), so we can write F as F(x, y) = f(x)·g(y) – then we say F is a separable function. For a full overview of what that means, click on this link. And note mathematicians do choose a different symbol for the functions F and g. It would probably be interesting to explore what the conditions for separability actually imply in terms of properties of… Well… The wavefunction and its argument, i.e. the space and time variables. But… Well… That’s stuff for another post. 🙂

Secondly, note that the momentum variable (p) – i.e. the p in our elementary wavefunction a·ei·(p·x−E·t)/ħ has sort of vanished: ψ(x) is a function of the position only. Now, you may think it should be somewhere there – that, perhaps, we can write something like ψ(x) = ψ[x), p(x)]. But… No. The momentum variable has effectively vanished. Look at Feynman’s solutions for the electron orbitals of a hydrogen atom:Grand EquationThe Yl,m(θ, φ) and Fn,l(ρ) functions here are functions of the (polar) coordinates ρ, θ, φ only. So that’s the position only (these coordinates are polar or spherical coordinates, so ρ is the radial distance, θ is the polar angle, and φ is the azimuthal angle). There’s no idea whatsoever of any momentum in one or the other spatial direction here. I find that rather remarkable. Let’s see how it all works with a simple example.

The functions below are the Yl,m(θ, φ) for = 1. Note the symmetry: if we swap θ and φ for -θ and -φ respectively, we get the other function: 2-1/2·sin(-θ)·ei(-φ) = -2-1/2·sinθ·eiφ.


To get the probabilities, we need to take the absolute square of the whole thing, including ei·(E/ħ), but we know |ei·δ|2 = 1 for any value of δ. Why? Because the absolute square of any complex number is the product of the number with its complex conjugate, so |ei·δ|2 = ei·δ·ei·δ = ei·0 = 1. So we only have to look at the absolute square of the Yl,m(θ, φ) and Fn,l(ρ) functions here. The Fn,l(ρ) function is a real-valued function, so its absolute square is just what it is: some real number (I gave you the formula for the ak coefficients in my post on it, and you shouldn’t worry about them: they’re real too). In contrast, the Yl,m(θ, φ) functions are complex-valued – most of them are, at least. Unsurprisingly, we find the probabilities are also symmetric:

P = |-2-1/2·sinθ·eiφ|2 = (-2-1/2·sinθ·eiφ)·(-2-1/2·sinθ·eiφ)

= (2-1/2·sinθ·eiφ)·(2-1/2·sinθ·eiφ) =  |2-1/2·sinθ·eiφ|2 = (1/2)·sin2θ

Of course, for = 0, the probability is just cos2θ. The graphs below are the polar graphs for the cos2θ and (1/2)·sin2θ functions respectively.

combinationThese polar graphs are not so easy to interpret, so let me say a few words about them. The points that are plotted combine (a) some radial distance from the center – which I wrote as P because this distance is, effectively, a probability – with (b) the polar angle θ (so that’s one of the  three coordinates). To be precise, the plot gives us, for a given ρ, all of the (θ, P) combinations. It works as follows. To calculate the probability for some ρ and θ (note that φ can be any angle), we must take the absolute square of that ψn,l,m, = Yl,m(θ, φ)·Fn,l(ρ) product. Hence, we must calculate |Yl,m(θ, φ)·Fn,l(ρ)|2 = |Fn,l(ρ)|2·cos2θ for = 0, and (1/2)·|Fn,l(ρ)|2·sin2θ for = ±1. Hence, the value of ρ determines the value of Fn,l(ρ), and that Fn,l(ρ) value then determines the shape of the polar graph. The three graphs below – P = cos2θ, P = (1/2)·cos2θ and P = (1/4)·cos2θ – illustrate the idea. polar comparativeNote that we’re measuring θ from the z-axis here, as we should. So that gives us the right orientation of this volume, as opposed to the other polar graphs above, which measured θ from the x-axis. So… Well… We’re getting there, aren’t we? 🙂

Now you’ll have two or three – or even more – obvious questions. The first one is: where is the third lobe? That’s a good question. Most illustrations will represent the p-orbitals as follows:p orbitalsThree lobes. Well… Frankly, I am not quite sure here, but the equations speak for themselves: the probabilities only depend on ρ and θ. Hence, the azimuthal angle φ can be anything. So you just need to rotate those P = (1/2)·sin2θ and P = cos2θ curves about the the z-axis. In case you wonder how to do that, the illustration below may inspire you.sphericalThe second obvious question is about the size of those lobes. That 1/2 factor must surely matter, right? Well… We still have that Fn,l(ρ) factor, of course, but you’re right: that factor does not depend on the value for m: it’s the same for = 0 or ± 1. So… Well… Those representations above – with the three lobes, all of the same volume – may not be accurate. I found an interesting site – Atom in a Box – with an app that visualizes the atomic orbitals in a fun and exciting way. Unfortunately, it’s for Mac and iPhone only – but this YouTube video shows how it works. I encourage you to explore it. In fact, I need to explore it – but what I’ve seen on that YouTube video (I don’t have a Mac nor an iPhone) suggests the three-lobe illustrations may effectively be wrong: there’s some asymmetry here – which we’d expect, because those p-orbitals are actually supposed to be asymmetric! In fact, the most accurate pictures may well be the ones below. I took them from Wikimedia Commons. The author explains the use of the color codes as follows: “The depicted rigid body is where the probability density exceeds a certain value. The color shows the complex phase of the wavefunction, where blue means real positive, red means imaginary positive, yellow means real negative and green means imaginary negative.” I must assume he refers to the sign of and when writing a complex number as + i·b

The third obvious question is related to the one above: we should get some cloud, right? Not some rigid body or some surface. Well… I think you can answer that question yourself now, based on what the author of the illustration above wrote: if we change the cut-off value for the probability, then we’ll give a different shape. So you can play with that and, yes, it’s some cloud, and that’s what the mentioned app visualizes. 🙂

The fourth question is the most obvious of all. It’s the question I started this post with: what are those definite energy states? We have uncertainty, right? So how does that play out? Now that is a question I’ll try to tackle in my next post. Stay tuned ! 🙂

Post scriptum: Let me add a few remarks here so as to – hopefully – contribute to an even better interpretation of what’s going on here. As mentioned, the key to understanding is, obviously, the following basic functional form:

ψ(r, t) = ei·(E/ħ)·t·ψ(r)

Wikipedia refers to the ei·(E/ħ)·t factor as a time-dependent phase factor which, as you can see, we can separate out because we are looking at a definite energy state here. Note the minus sign in the exponent – which reminds us of the minus sign in the exponent of the elementary  wavefunction, which we wrote as:

 a·ei·θ = a·ei·[(E/ħ)·t − (p/ħ)∙x] = a·ei·[(p/ħ)∙x − (E/ħ)·t] = a·ei·(E/ħ)·t·ei·(p/ħ)∙x

We know this elementary wavefunction is problematic in terms of interpretation because its absolute square gives us some constant probability P(x, t) = |a·ei·[(E/ħ)·t − (p/ħ)∙x]|= a2. In other words, at any point in time, our electron is equally likely to be anywhere in space. That is not consistent with the idea of our electron being somewhere at some point in time.

The other question is: what reference frame do we use to measure E and p? Indeed, the value of E and p = (px, py, pz) depends on our reference frame: from the electron’s own point of view, it has no momentum whatsoever: p = 0. Fortunately, we do have a point of reference here: the nucleus of our hydrogen atom. And our own position, of course, because you should note, indeed, that both the subject and the object of the observation are necessary to define the Cartesian xx, y, z – or, more relevant in this context – the polar r = ρ, θ, φ coordinates.

This, then, defines some finite or infinite box in space in which the (linear) momentum (p) of our electron vanishes, and then we just need to solve Schrödinger’s diffusion equation to find the solutions for ψ(r). These solutions are more conveniently written in terms of the radial distance ρ, the polar angle θ, and the azimuthal angle φ:Grand Equation

The functions below are the Yl,m(θ, φ) functions for = 1.


The interesting thing about these Yl,m(θ, φ) functions is the ei·φ and/or ei·φ factor. Indeed, note the following:

  1. Because the sinθ and cosθ factors are real-valued, they only define some envelope for the ψ(r) function.
  2. In contrast, the ei·φ and/or ei·φ factor define some phase shift.

Let’s have a look at the physicality of the situation, which is depicted below.


The nucleus of our hydrogen atom is at the center. The polar angle is measured from the z-axis, and we know we only have an amplitude there for = 0, so let’s look at what that cosθ factor does. If θ = 0°, the amplitude is just what it is, but when θ > 0°, then |cosθ| < 1 and, therefore, the probability P = |Fn,l(ρ)|2·cos2θ will diminish. Hence, for the same radial distance (ρ), we are less likely to find the electron at some angle θ > 0° than on the z-axis itself. Now that makes sense, obviously. You can work out the argument for = ± 1 yourself, I hope. [The axis of symmetry will be different, obviously!] angle 2In contrast, the ei·φ and/or ei·φ factor work very differently. These just give us a phase shift, as illustrated below. A re-set of our zero point for measuring time, so to speak, and the ei·φ and/or ei·φ factor effectively disappears when we’re calculating probabilities, which is consistent with the fact that this angle clearly doesn’t influence the magnitude of the amplitude fluctuations.phase shiftSo… Well… That’s it, really. I hope you enjoyed this ! 🙂


Re-visiting electron orbitals

One of the pieces I barely gave a glance when reading Feynman’s Lectures over the past few years, was the derivation of the non-spherical electron orbitals for the hydrogen atom. It just looked like a boring piece of math – and I thought the derivation of the s-orbitals – the spherically symmetrical ones – was interesting enough already. To some extent, it is – but there is so much more to it. When I read it now, the derivation of those p-, d-, f– etc. orbitals brings all of the weirdness of quantum mechanics together and, while doing so, also provides for a deeper understanding of all of the ideas and concepts we’re trying to get used to. In addition, Feynman’s treatment of the matter is actually much shorter than what you’ll find in other textbooks, because… Well… As he puts it, he takes a shortcut. So let’s try to follow the bright mind of our Master as he walks us through it.

You’ll remember – if not, check it out again – that we found the spherically symmetric solutions for Schrödinger’s equation for our hydrogen atom. Just to be make sure, Schrödinger’s equation is a differential equation – a condition we impose on the wavefunction for our electron – and so we need to find the functional form for the wavefunctions that describe the electron orbitals. [Quantum math is so confusing that it’s often good to regularly think of what it is that we’re actually trying to do. :-)] In fact, that functional form gives us a whole bunch of solutions – or wavefunctions – which are defined by three quantum numbers: n, l, and m. The parameter n corresponds to an energy level (En), l is the orbital (quantum) number, and m is the z-component of the angular momentum. But that doesn’t say much. Let’s go step by step.

First, we derived those spherically symmetric solutions – which are referred to as s-states – assuming this was a state with zero (orbital) angular momentum, which we write as = 0. [As you know, Feynman does not incorporate the spin of the electron in his analysis, which is, therefore, approximative only.] Now what exactly is a state with zero angular momentum? When everything is said and done, we are effectively trying to describe some electron orbital here, right? So that’s an amplitude for the electron to be somewhere, but then we also know it always moves. So, when everything is said and done, the electron is some circulating negative charge, right? So there is always some angular momentum and, therefore, some magnetic moment, right?

Well… If you google this question on Physics Stack Exchange, you’ll get a lot of mumbo jumbo telling you that you shouldn’t think of the electron actually orbiting around. But… Then… Well… A lot of that mumbo jumbo is contradictory. For example, one of the academics writing there does note that, while we shouldn’t think of an electron as some particle, the orbital is still a distribution which gives you the probability of actually finding the electron at some point (x,y,z). So… Well… It is some kind of circulating charge – as a point, as a cloud or as whatever. The only reasonable answer – in my humble opinion – is that = 0 probably means there is no net circulating charge, so the movement in this or that direction must balance the movement in the other. One may note, in this regard, that the phenomenon of electron capture in nuclear reactions suggests electrons do travel through the nucleus for at least part of the time, which is entirely coherent with the wavefunctions for s-states – shown below – which tell us that the most probable (x, y, z) position for the electron is right at the center – so that’s where the nucleus is. There is also a non-zero probability for the electron to be at the center for the other orbitals (pd, etcetera).s-statesIn fact, now that I’ve shown this graph, I should quickly explain it. The three graphs are the spherically symmetric wavefunctions for the first three energy levels. For the first energy level – which is conventionally written as n = 1, not as n = 0 – the amplitude approaches zero rather quickly. For n = 2 and n = 3, there are zero-crossings: the curve passes the r-axis. Feynman calls these zero-crossing radial nodes. To be precise, the number of zero-crossings for these s-states is n − 1, so there’s none for = 1, one for = 2, two for = 3, etcetera.

Now, why is the amplitude – apparently – some real-valued function here? That’s because we’re actually not looking at ψ(r, t) here but at the ψ(r) function which appears in the following break-up of the actual wavefunction ψ(r, t):

ψ(r, t) = ei·(E/ħ)·t·ψ(r)

So ψ(r) is more of an envelope function for the actual wavefunction, which varies both in space as well as in time. It’s good to remember that: I would have used another symbol, because ψ(r, t) and ψ(r) are two different beasts, really – but then physicists want you to think, right? And Mr. Feynman would surely want you to do that, so why not inject some confusing notation from time to time? 🙂 So for = 3, for example, ψ(r) goes from positive to negative and then to positive, and these areas are separated by radial nodes. Feynman put it on the blackboard like this:radial nodesI am just inserting it to compare this concept of radial nodes with the concept of a nodal plane, which we’ll encounter when discussing p-states in a moment, but I can already tell you what they are now: those p-states are symmetrical in one direction only, as shown below, and so we have a nodal plane instead of a radial node. But so I am getting ahead of myself here… 🙂nodal planesBefore going back to where I was, I just need to add one more thing. 🙂 Of course, you know that we’ll take the square of the absolute value of our amplitude to calculate a probability (or the absolute square – as we abbreviate it), so you may wonder why the sign is relevant at all. Well… I am not quite sure either but there’s this concept of orbital parity which you may have heard of.  The orbital parity tells us what will happen to the sign if we calculate the value for ψ for −r rather than for r. If ψ(−r) = ψ(r), then we have an even function – or even orbital parity. Likewise, if ψ(−r) = −ψ(r), then we’ll the function odd – and so we’ll have an odd orbital parity. The orbital parity is always equal to (-1)l = ±1. The exponent is that angular quantum number, and +1, or + tout court, means even, and -1 or just − means odd. The angular quantum number for those p-states is = 1, so that works with the illustration of the nodal plane. 🙂 As said, it’s not hugely important but I might as well mention in passing – especially because we’ll re-visit the topic of symmetries a few posts from now. 🙂

OK. I said I would talk about states with some angular momentum (so ≠ 0) and so it’s about time I start doing that. As you know, our orbital angular momentum is measured in units of ħ (just like the total angular momentum J, which we’ve discussed ad nauseam already). We also know that if we’d measure its component along any direction – any direction really, but physicists will usually make sure that the z-axis of their reference frame coincides with, so we call it the z-axis 🙂 – then we will find that it can only have one of a discrete set of values m·ħ l·ħ, (l-1)·ħ, …, -(l-1)·ħ, –l·ħ. Hence, just takes the role of our good old quantum number here, and m is just Jz. Likewise, I’d like to introduce l as the equivalent of J, so we can easily talk about the angular momentum vector. And now that we’re here, why not write in bold type too, and say that m is the z-component itself – i.e. the whole vector quantity, so that’s the direction and the magnitude.

Now, we do need to note one crucial difference between and l, or between J and l: our j could be an integer or a half-integer. In contrast, must be some integer. Why? Well… If can be zero, and the values of must be separated by a full unit, then l must be 1, 2, 3 etcetera. 🙂 If this simple answer doesn’t satisfy you, I’ll refer you to Feynman’s, which is also short but more elegant than mine. 🙂 Now, you may or may not remember that the quantum-mechanical equivalent of the magnitude of a vector quantity such as l is to be calculated as √[l·(l+1)]·ħ, so if = 1, that magnitude will be √2·ħ ≈ 1.4142·ħ, so that’s – as expected – larger than the maximum value for m, which is +1. As you know, that leads us to think of that z-component m as a projection of l. Paraphrasing Feynman, the limited set of values for m imply that the angular momentum is always “cocked” at some angle. For = 1, that angle is either +45° or, else, −45°, as shown below.cocked angleWhat if l = 2? The magnitude of is then equal to √[2·(2+1)]·ħ = √6·ħ ≈ 2.4495·ħ. How do we relate that to those “cocked” angles? The values of now range from -2 to +2, with a unit distance in-between. The illustration below shows the angles. [I didn’t mention ħ any more in that illustration because, by now, we should know it’s our unit of measurement – always.]

cocked angle 2Note we’ve got a bigger circle here (the radius is about 2.45 here, as opposed to a bit more than 1.4 for m = 0). Also note that it’s not a nice cake with perfectly equal pieces. From the graph, it’s obvious that the formula for the angle is the following:angle formulaIt’s simple but intriguing. Needless to say, the sin −1 function is the inverse sine, also known as the arcsine. I’ve calculated the values for all for l = 1, 2, 3, 4 and 5 below. The most interesting values are the angles for = 1 and l. As the graphs underneath show, for = 1, the values start approaching the zero angle for very large l, so there’s not much difference any more between = ±1 and = 1 for large values of l. What about the l case? Well… Believe it or not, if becomes really large, then these angles do approach 90°. If you don’t remember how to calculate limits, then just calculate θ for some huge value for and m. For = 1,000,000, for example, you should find that θ = 89.9427…°. 🙂angles

graphIsn’t this fascinating? I’ve actually never seen this in a textbook – so it might be an original contribution. 🙂 OK. I need to get back to the grind: Feynman’s derivation of non-symmetrical electron orbitals. Look carefully at the illustration below. If m is really the projection of some angular momentum that’s “cocked”, either at a zero-degree or, alternatively, at ±45º (for the = 1 situation we show here) – a projection on the z-axis, that is – then the value of m (+1, 0 or -1) does actually correspond to some idea of the orientation of the space in which our electron is circulating. For = 0, that space – think of some torus or whatever other space in which our electron might circulate – would have some alignment with the z-axis. For = ±1, there is no such alignment. m = 0

The interpretation is tricky, however, and the illustration on the right-hand side above is surely too much of a simplification: an orbital is definitely not like a planetary orbit. It doesn’t even look like a torus. In fact, the illustration in the bottom right corner, which shows the probability density, i.e. the space in which we are actually likely to find the electron, is a picture that is much more accurate – and it surely does not resemble a planetary orbit or some torus. However, despite that, the idea that, for = 0, we’d have some alignment of the space in which our electron moves with the z-axis is not wrong. Feynman expresses it as follows:

“Suppose m is zero, then there can be some non-zero amplitude to find the electron on the z-axis at some distance r. We’ll call this amplitude Fl(r).”

You’ll say: so what? And you’ll also say that illustration in the bottom right corner suggests the electron is actually circulating around the z-axis, rather than through it. Well… No. That illustration does not show any circulation. It only shows a probability density. No suggestion of any actual movement or circulation. So the idea is valid: if = 0, then the implication is that, somehow, the space of circulation of current around the direction of the angular momentum vector (J), as per the well-known right-hand rule, will include the z-axis. So the idea of that electron orbiting through the z-axis for = 0 is essentially correct, and the corollary is… Well… I’ll talk about that in a moment.

But… Well… So what? What’s so special about that Fl(r) amplitude? What can we do with that? Well… If we would find a way to calculate Fl(r), then we know everything. Huh? Everything? Yes. The reasoning here is quite complicated, so please bear with me as we go through it.

The first thing you need to accept, is rather weird. The thing we said about the non-zero amplitudes to find the electron somewhere on the z-axis for the m = 0 state – which, using Dirac’s bra-ket notation, we’ll write as |l= 0〉 – has a very categorical corollary:

The amplitude to find an electron whose state m is not equal to zero on the z-axis (at some non-zero distance r) is zero. We can only find an electron on the z-axis unless the z-component of its angular momentum (m) is zero. 

Now, I know this is hard to swallow, especially when looking at those 45° angles for J in our illustrations, because these suggest the actual circulation of current may also include at least part of the z-axis. But… Well… No. Why not? Well… I have no good answer here except for the usual one which, I admit, is quite unsatisfactory: it’s quantum mechanics, not classical mechanics. So we have to look at the m and m vectors, which are pointed along the z-axis itself for m = ±1 and, hence, the circulation we’d associate with those momentum vectors (even if they’re the zcomponent only) is around the z-axis. Not through or on it. I know it’s a really poor argument, but it’s consistent with our picture of the actual electron orbitals – that picture in terms of probability densities, which I copy below. For m = −1, we have the yz-plane as the nodal plane between the two lobes of our distribution, so no amplitude to find the electron on the z-axis (nor would we find it on the y-axis, as you can see). Likewise, for m = +1, we have the xz-plane as the nodal plane. Both nodal planes include the z-axis and, therefore, there’s zero probability on that axis. p orbitals

In addition, you may also want to note the 45° angle we associate with = ±1 does sort of demarcate the lobes of the distribution by defining a three-dimensional cone and… Well… I know these arguments are rather intuitive, and so you may refuse to accept them. In fact, to some extent, refuse to accept them. 🙂 Indeed, let me say this loud and clear: I really want to understand this in a better way! 

But… Then… Well… Such better understanding may never come. Feynman’s warning, just before he starts explaining the Stern-Gerlach experiment and the quantization of angular momentum, rings very true here: “Understanding of these matters comes very slowly, if at all. Of course, one does get better able to know what is going to happen in a quantum-mechanical situation—if that is what understanding means—but one never gets a comfortable feeling that these quantum-mechanical rules are “natural.” Of course they are, but they are not natural to our own experience at an ordinary level.” So… Well… What can I say?

It is now time to pull the rabbit out of the hat. To understand what we’re going to do next, you need to remember that our amplitudes – or wavefunctions – are always expressed with regard to a specific frame of reference, i.e. some specific choice of an x-, y– and z-axis. If we change the reference frame – say, to some new set of x’-, y’– and z’-axes – then we need to re-write our amplitudes (or wavefunctions) in terms of the new reference frame. In order to do so, one should use a set of transformation rules. I’ve written several posts on that – including a very basic one, which you may want to re-read (just click the link here).

Look at the illustration below. We want to calculate the amplitude to find the electron at some point in space. Our reference frame is the x, y, z frame and the polar coordinates (or spherical coordinates, I should say) of our point are the radial distance r, the polar angle θ (theta), and the azimuthal angle φ (phi). [The illustration below – which I copied from Feynman’s exposé – uses a capital letter for phi, but I stick to the more usual or more modern convention here.]

change of reference frame

In case you wonder why we’d use polar coordinates rather than Cartesian coordinates… Well… I need to refer you to my other post on the topic of electron orbitals, i.e. the one in which I explain how we get the spherically symmetric solutions: if you have radial (central) fields, then it’s easier to solve stuff using polar coordinates – although you wouldn’t think so if you think of that monster equation that we’re actually trying to solve here:

new de

It’s really Schrödinger’s equation for the situation on hand (i.e. a hydrogen atom, with a radial or central Coulomb field because of its positively charged nucleus), but re-written in terms of polar coordinates. For the detail, see the mentioned post. Here, you should just remember we got the spherically symmetric solutions assuming the derivatives of the wavefunction with respect to θ and φ – so that’s the ∂ψ/∂θ and ∂ψ/∂φ in the equation abovewere zero. So now we don’t assume these partial derivatives to be zero: we’re looking for states with an angular dependence, as Feynman puts it somewhat enigmatically. […] Yes. I know. This post is becoming very long, and so you are getting impatient. Look at the illustration with the (r, θ, φ) point, and let me quote Feynman on the line of reasoning now:

“Suppose we have the atom in some |lm〉 state, what is the amplitude to find the electron at the angles θ and φ and the distance from the origin? Put a new z-axis, say z’, at that angle (see the illustration above), and ask: what is the amplitude that the electron will be at the distance along the new z’-axis? We know that it cannot be found along z’ unless its z’-component of angular momentum, say m’, is zero. When m’ is zero, however, the amplitude to find the electron along z’ is Fl(r). Therefore, the result is the product of two factors. The first is the amplitude that an atom in the state |lm〉 along the z-axis will be in the state |lm’ = 0〉 with respect to the z’-axis. Multiply that amplitude by Fl(r) and you have the amplitude ψl,m(r) to find the electron at (r, θ, φ) with respect to the original axes.”

So what is he telling us here? Well… He’s going a bit fast here. 🙂 Worse, I think he may actually not have chosen the right words here, so let me try to rephrase it. We’ve introduced the Fl(r) function above: it was the amplitude, for m = 0, to find the electron on the z-axis at some distance r. But so here we’re obviously in the x’, y’, z’ frame and so Fl(r) is the amplitude for m’ = 0,  it’s the amplitude to find the electron on the z-axis at some distance r along the z’-axis. Of course, for this amplitude to be non-zero, we must be in the |lm’ = 0〉 state, but are we? Well… |lm’ = 0〉 actually gives us the amplitude for that. So we’re going to multiply two amplitudes here:

Fl(r)·|lm’ = 0〉

So this amplitude is the product of two amplitudes as measured in the the x’, y’, z’ frame. Note it’s symmetric: we may also write it as |lm’ = 0〉·Fl(r). We now need to sort of translate that into an amplitude as measured in the x, y, frame. To go from x, y, z to x’, y’, z’, we first rotated around the z-axis by the angle φ, and then rotated around the new y’-axis by the angle θ. Now, the order of rotation matters: you can easily check that by taking a non-symmetrical object in your hand and doing those rotations in the two different sequences: check what happens to the orientation of your object. Hence, to go back we should first rotate about the y’-axis by the angle −θ, so our z’-axis folds into the old z-axis, and then rotate about the z-axis by the angle −φ.

Now, we will denote the transformation matrices that correspond to these rotations as Ry’(−θ) and Rz(−φ) respectively. These transformation matrices are complicated beasts. They are surely not the easy rotation matrices that you can use for the coordinates themselves. You can click this link to see how they look like for = 1. For larger l, there are other formulas, which Feynman derives in another chapter of his Lectures on quantum mechanics. But let’s move on. Here’s the grand result:

The amplitude for our wavefunction ψl,m(r) – which denotes the amplitude for (1) the atom to be in the state that’s characterized by the quantum numbers and m and – let’s not forget – (2) find the electron at r – note the bold type: = (x, y, z) – would be equal to:

ψl,m(r) = 〈l, m|Rz(−φ) Ry’(−θ)|lm’ = 0〉·Fl(r)

Well… Hmm… Maybe. […] That’s not how Feynman writes it. He writes it as follows:

ψl,m(r) = 〈l, 0|Ry(θ) Rz(φ)|lm〉·Fl(r)

I am not quite sure what I did wrong. Perhaps the two expressions are equivalent. Or perhaps – is it possible at all? – Feynman made a mistake? I’ll find out. [P.S: I re-visited this point in the meanwhile: see the P.S. to this post. :-)] The point to note is that we have some combined rotation matrix Ry(θ) Rz(φ). The elements of this matrix are algebraic functions of θ and φ, which we will write as Yl,m(θ, φ), so we write:

a·Yl,m(θ, φ) = 〈l, 0|Ry(θ) Rz(φ)|lm

Or a·Yl,m(θ, φ) = 〈l, m|Rz(−φ) Ry’(−θ)|lm’ = 0〉, if Feynman would have it wrong and my line of reasoning above would be correct – which is obviously not so likely. Hence, the ψl,m(r) function is now written as:

ψl,m(r) = a·Yl,m(θ, φ)·Fl(r)

The coefficient is, as usual, a normalization coefficient so as to make sure the surface under the probability density function is 1. As mentioned above, we get these Yl,m(θ, φ) functions from combining those rotation matrices. For = 1, and = -1, 0, +1, they are:spherical harmonics A more complete table is given below:spherical harmonics 2So, yes, we’re done. Those equations above give us those wonderful shapes for the electron orbitals, as illustrated below (credit for the illustration goes to an interesting site of the UC Davis school).electron orbitalsBut… Hey! Wait a moment! We only have these Yl,m(θ, φ) functions here. What about Fl(r)?

You’re right. We’re not quite there yet, because we don’t have a functional form for Fl(r). Not yet, that is. Unfortunately, that derivation is another lengthy development – and that derivation actually is just tedious math only. Hence, I will refer you to Feynman for that. 🙂 Let me just insert one more thing before giving you The Grand Equation, and that’s a explanation of how we get those nice graphs. They are so-called polar graphs. There is a nice and easy article on them on the website of the University of Illinois, but I’ll summarize it for you. Polar graphs use a polar coordinate grid, as opposed to the Cartesian (or rectangular) coordinate grid that we’re used to. It’s shown below. 

The origin is now referred to as the pole – like in North or South Pole indeed. 🙂 The straight lines from the pole (like the diagonals, for example, or the axes themselves, or any line in-between) measure the distance from the pole which, in this case, goes from 0 to 10, and we can connect the equidistant points by a series of circles – as shown in the illustration also. These lines from the pole are defined by some angle – which we’ll write as θ to make things easy 🙂 – which just goes from 0 to 2π = 0 and then round and round and round again. The rest is simple: you’re just going to graph a function, or an equation – just like you’d graph y = ax + b in the Cartesian plane – but it’s going to be a polar equation. Referring back to our p-orbitals, we’ll want to graph the cos2θ = ρ equation, for example, because that’s going to show us the shape of that probability density function for = 1 and = 0. So our graph is going to connect the (θ, ρ) points for which the angle (θ) and the distance from the pole (ρ) satisfies the cos2θ = ρ equation. There is a really nice widget on the WolframAlpha site that produces those graphs for you. I used it to produce the graph below, which shows the 1.1547·cos2θ = ρ graph (the 1.1547 coefficient is the normalization coefficient a). Now, you’ll wonder why this is a curve, or a curved line. That widget even calculates its length: it’s about 6.374743 units long. So why don’t we have a surface or a volume here? We didn’t specify any value for ρ, did we? No, we didn’t. The widget calculates those values from the equation. So… Yes. It’s a valid question: where’s the distribution? We were talking about some electron cloud or something, right?

Right. To get that cloud – those probability densities really – we need that Fl(r) function. Our cos2θ = ρ is, once again, just some kind of envelope function: it marks a space but doesn’t fill it, so to speak. 🙂 In fact, I should now give you the complete description, which has all of the possible states of the hydrogen atom – everything! No separate pieces anymore. Here it is. It also includes n. It’s The Grand Equation:The ak coefficients in the formula for ρFn,l(ρ) are the solutions to the equation below, which I copied from Feynman’s text on it all. I’ll also refer you to the same text to see how you actually get solutions out of it, and what they then actually represent. 🙂We’re done. Finally!

I hope you enjoyed this. Look at what we’ve achieved. We had this differential equation (a simple diffusion equation, really, albeit in the complex space), and then we have a central Coulomb field and the rather simple concept of quantized (i.e. non-continuous or discrete) angular momentum. Now see what magic comes out of it! We literally constructed the atomic structure out of it, and it’s all wonderfully elegant and beautiful.

Now think that’s amazing, and if you’re reading this, then I am sure you’ll find it as amazing as I do.

Note: I did a better job in explaining the intricacies of actually representing those orbitals in a later post. I recommend you have a look at it by clicking the link here.

Post scriptum on the transformation matrices:

You must find the explanation for that 〈l, 0|Ry(θ) Rz(φ)|lm〉·Fl(r) product highly unsatisfactory, and it is. 🙂 I just wanted to make you think – rather than just superficially read through it. First note that Fl(r)·|lm’ = 0〉 is not a product of two amplitudes: it is the product of an amplitude with a state. A state is a vector in a rather special vector space – a Hilbert space (just a nice word to throw around, isn’t it?). The point is: a state vector is written as some linear combination of base states. Something inside of me tells me we may look at the three p-states as base states, but I need to look into that.

Let’s first calculate the Ry(θ) Rmatrix to see if we get those formulas for the angular dependence of the amplitudes. It’s the product of the Ry(θ) and Rmatrices, which I reproduce below.

Note that this product is non-commutative because… Well… Matrix products generally are non-commutative. 🙂 So… Well… There they are: the second row gives us those functions, so am wrong, obviously, and Dr. Feynman is right. Of course, he is. He is always right – especially because his Lectures have gone through so many revised editions that all errors must be out by now. 🙂

However, let me – just for fun – also calculate my Rz(−φ) Ry’(−θ) product. I can do so in two steps: first I calculate Rz(φ) Ry’(θ), and then I substitute the angles φ and θ for –φ and –θ, remembering that cos(–α) = cos(α) and sin(–α) = –sin(α). I might have made a mistake, but I got this:The functions look the same but… Well… No. The eiφ and eiφ are in the wrong place (it’s just one minus sign – but it’s crucially different). And then these functions should not be in a column. That doesn’t make sense when you write it all out. So Feynman’s expression is, of course, fully correct. But so how do we interpret that 〈l, 0|Ry(θ) Rz(φ)|lm〉 expression then? This amplitude probably answers the following question:

Given that our atom is in the |lm〉 state, what is the amplitude for it to be in the 〈l, 0| state in the x’, y’, z’ frame?

That makes sense – because we did start out with the assumption that our atom was in the the |lm〉 state, so… Yes. Think about it some more and you’ll see it all makes sense: we can – and should – multiply this amplitude with the Fl(r) amplitude.

OK. Now we’re really done with this. 🙂

Note: As for the 〈 | and  | 〉 symbols to denote a state, note that there’s not much difference: both are state vectors, but a state vector that’s written as an end state – so that’s like 〈 Φ | – is a 1×3 vector (so that’s a column vector), while a vector written as | Φ 〉 is a 3×1 vector (so that’s a row vector). So that’s why 〈l, 0|Ry(θ) Rz(φ)|lm〉 does give us some number. We’ve got a (1×3)·(3×3)·(3×1) matrix product here – but so it gives us what we want: a 1×1 amplitude. 🙂

Schrödinger’s equation: the original approach

Of course, your first question when seeing the title of this post is: what’s original, really? Well… The answer is simple: it’s the historical approach, and it’s original because it’s actually quite intuitive. Indeed, Lecture no. 16 in Feynman’s third Volume of Lectures on Physics is like a trip down memory lane as Feynman himself acknowledges, after presenting Schrödinger’s equation using that very rudimentary model we developed in our previous post:

“We do not intend to have you think we have derived the Schrödinger equation but only wish to show you one way of thinking about it. When Schrödinger first wrote it down, he gave a kind of derivation based on some heuristic arguments and some brilliant intuitive guesses. Some of the arguments he used were even false, but that does not matter; the only important thing is that the ultimate equation gives a correct description of nature.”

So… Well… Let’s have a look at it. 🙂 We were looking at some electron we described in terms of its location at one or the other atom in a linear array (think of it as a line). We did so by defining base states |n〉 = |xn〉, noting that the state of the electron at any point in time could then be written as:

|φ〉 = ∑ |xnCn(t) = ∑ |xn〉〈xn|φ〉 over all n

The Cn(t) = 〈xn|φ〉 coefficient is the amplitude for the electron to be at xat t. Hence, the Cn(t) amplitudes vary with t as well as with x. We’ll re-write them as Cn(t) = C(xn, t) = C(xn). Note that the latter notation does not explicitly show the time dependence. The Hamiltonian equation we derived in our previous post is now written as:

iħ·(∂C(xn)/∂t) = E0C(xn) − AC(xn+b) − AC(xn−b)

Note that, as part of our move from the Cn(t) to the C(xn) notation, we write the time derivative dCn(t)/dt now as ∂C(xn)/∂t, so we use the partial derivative symbol now (∂). Of course, the other partial derivative will be ∂C(x)/∂x) as we move from the count variable xto the continuous variable x, but let’s not get ahead of ourselves here. The solution we found for our C(xn) functions was the following wavefunction:

C(xn) = a·ei(k∙xn−ω·t) ei∙ω·t·ei∙k∙xn ei·(E/ħ)·t·ei·k∙xn

We also found the following relationship between E and k:

E = E0 − 2A·cos(kb)

Now, even Feynman struggles a bit with the definition of E0 and k here, and their relationship with E, which is graphed below.


Indeed, he first writes, as he starts developing the model, that E0 is, physically, the energy the electron would have if it couldn’t leak away from one of the atoms, but then he also adds: “It represents really nothing but our choice of the zero of energy.”

This is all quite enigmatic because we cannot just do whatever we want when discussing the energy of a particle. As I pointed out in one of my previous posts, when discussing the energy of a particle in the context of the wavefunction, we generally consider it to be the sum of three different energy concepts:

  1. The particle’s rest energy m0c2, which de Broglie referred to as internal energy (Eint), and which includes the rest mass of the ‘internal pieces’, as Feynman puts it (now we call those ‘internal pieces’ quarks), as well as their binding energy (i.e. the quarks’ interaction energy).
  2. Any potential energy it may have because of some field (i.e. if it is not traveling in free space), which we usually denote by U. This field can be anything—gravitational, electromagnetic: it’s whatever changes the energy of the particle because of its position in space.
  3. The particle’s kinetic energy, which we write in terms of its momentum p: m·v2/2 = m2·v2/(2m) = (m·v)2/(2m) = p2/(2m).

It’s obvious that we cannot just “choose” the zero point here: the particle’s rest energy is its rest energy, and its velocity is its velocity. So it’s not quite clear what the E0 in our model really is. As far as I am concerned, it represents the average energy of the system really, so it’s just like the E0 for our ammonia molecule, or the E0 for whatever two-state system we’ve seen so far. In fact, when Feynman writes that we can “choose our zero of energy so that E0 − 2A = 0″ (so the minimum of that curve above is at the zero of energy), he actually makes some assumption in regard to the relative magnitude of the various amplitudes involved.

We should probably think about it in this way: −(i/ħ)·E0 is the amplitude for the electron to just stay where it is, while i·A/ħ is the amplitude to go somewhere else—and note we’ve got two possibilities here: the electron can go to |xn+1〉,  or, alternatively, it can go to |xn−1〉. Now, amplitudes can be associated with probabilities by taking the absolute square, so I’d re-write the E0 − 2A = 0 assumption as:

E0 = 2A ⇔ |−(i/ħ)·E0|= |(i/ħ)·2A|2

Hence, in my humble opinion, Feynman’s assumption that E0 − 2A = 0 has nothing to do with ‘choosing the zero of energy’. It’s more like a symmetry assumption: we’re basically saying it’s as likely for the electron to stay where it is as it is to move to the next position. It’s an idea I need to develop somewhat further, as Feynman seems to just gloss over these little things. For example, I am sure it is not a coincidence that the EI, EIIEIII and EIV energy levels we found when discussing the hyperfine splitting of the hydrogen ground state also add up to 0. In fact, you’ll remember we could actually measure those energy levels (E= EII = EIII = A ≈ 9.23×10−6 eV, and EIV = −3A ≈ −27.7×10−6 eV), so saying that we can “choose” some zero energy point is plain nonsense. The question just doesn’t arise. In any case, as I have to continue the development here, I’ll leave this point for further analysis in the future. So… Well… Just note this E0 − 2A = 0 assumption, as we’ll need it in a moment.

The second assumption we’ll need concerns the variation in k. As you know, we can only get a wave packet if we allow for uncertainty in k which, in turn, translates into uncertainty for E. We write:

ΔE = Δ[E0 − 2A·cos(kb)]

Of course, we’d need to interpret the Δ as a variance (σ2) or a standard deviation (σ) so we can apply the usual rules – i.e. var(a) = 0, var(aX) = a2·var(X), and var(aX ± bY) = a2·var(X) + b2·var(Y) ± 2ab·cov(X, Y) – to be a bit more precise about what we’re writing here, but you get the idea. In fact, let me quickly write it out:

var[E0 − 2A·cos(kb)] = var(E0) + 4A2·var[cos(kb)] ⇔ var(E) = 4A2·var[cos(kb)]

Now, you should check my post scriptum to my page on the Essentials, to see how the probability density function of the cosine of a randomly distributed variable looks like, and then you should go online to find a formula for its variance, and then you can work it all out yourself, because… Well… I am not going to do it for you. What I want to do here is just show how Feynman gets Schrödinger’s equation out of all of these simplifications.

So what’s the second assumption? Well… As the graph shows, our k can take any value between −π/b and +π/b, and therefore, the kb argument in our cosine function can take on any value between −π and +π. In other words, kb could be any angle. However, as Feynman puts it—we’ll be assuming that kb is ‘small enough’, so we can use the small-angle approximations whenever we see the cos(kb) and/or sin(kb) functions. So we write: sin(kb) ≈ kb and cos(kb) ≈ 1 − (kb)2/2 = 1 − k2b2/2. Now, that assumption led to another grand result, which we also derived in our previous post. It had to do with the group velocity of our wave packet, which we calculated as:

= dω/dk = (2Ab2/ħ)·k

Of course, we should interpret our k here as “the typical k“. Huh? Yes… That’s how Feynman refers to it, and I have no better term for it. It’s some kind of ‘average’ of the Δk interval, obviously, but… Well… Feynman does not give us any exact definition here. Of course, if you look at the graph once more, you’ll say that, if the typical kb has to be “small enough”, then its expected value should be zero. Well… Yes and no. If the typical kb is zero, or if is zero, then is zero, and then we’ve got a stationary electron, i.e. an electron with zero momentum. However, because we’re doing what we’re doing (that is, we’re studying “stuff that moves”—as I put it unrespectfully in a few of my posts, so as to distinguish from our analyses of “stuff that doesn’t move”, like our two-state systems, for example), our “typical k” should not be zero here. OK… We can now calculate what’s referred to as the effective mass of the electron, i.e. the mass that appears in the classical kinetic energy formula: K.E. = m·v2/2. Now, there are two ways to do that, and both are somewhat tricky in their interpretation:

1. Using both the E0 − 2A = 0 as well as the “small kb” assumption, we find that E = E0 − 2A·(1 − k2b2/2) = A·k2b2. Using that for the K.E. in our formula yields:

meff = 2A·k2b2/v= 2A·k2b2/[(2Ab2/ħ)·k]= ħ2/(2Ab2)

2. We can use the classical momentum formula (p = m·v), and then the 2nd de Broglie equation, which tells us that each wavenumber (k) is to be associated with a value for the momentum (p) using the p = ħk (so p is proportional to k, with ħ as the factor of proportionality). So we can now calculate meff as meff = ħk/v. Substituting again for what we’ve found above, gives us the same:

meff = 2A·k2b2/v = ħ·k/[(2Ab2/ħ)·k] = ħ2/(2Ab2)

Of course, we’re not supposed to know the de Broglie relations at this point in time. 🙂 But, now that you’ve seen them anyway, note how we have two formulas for the momentum:

  • The classical formula (p = m·v) tells us that the momentum is proportional to the classical velocity of our particle, and m is then the factor of proportionality.
  • The quantum-mechanical formula (p = ħk) tells us that the (typical) momentum is proportional to the (typical) wavenumber, with Planck’s constant (ħ) as the factor of proportionality. Combining both combines the classical and quantum-mechanical perspective of a moving particle:

v = ħk

I know… It’s an obvious equation but… Well… Think of it. It’s time to get back to the main story now. Remember we were trying to find Schrödinger’s equation? So let’s get on with it. 🙂

To do so, we need one more assumption. It’s the third major simplification and, just like the others, the assumption is obvious on first, but not on second thought. 😦 So… What is it? Well… It’s easy to see that, in our meff = ħ2/(2Ab2) formula, all depends on the value of 2Ab2. So, just like we should wonder what happens with that kb factor in the argument of our sine or cosine function if b goes to zero—i.e. if we’re letting the lattice spacing go to zero, so we’re moving from a discrete to a continuous analysis now—we should also wonder what happens with that 2Ab2 factor! Well… Think about it. Wouldn’t it be reasonable to assume that the effective mass of our electron is determined by some property of the material, or the medium (so that’s the silicon in our previous post) and, hence, that it’s constant really. Think of it: we’re not changing the fundamentals really—we just have some electron roaming around in some medium and all that we’re doing now is bringing those xcloser together. Much closer. It’s only logical, then, that our amplitude to jump from xn±1 to xwould also increase, no? So what we’re saying is that 2Ab2 is some constant which we write as ħ2/meff or, what amounts to the same, that Ab= ħ2/2·meff.

Of course, you may raise two objections here:

  1. The Ab= ħ2/2·meff assumption establishes a very particular relation between A and b, as we can write A as A = [ħ2/(2meff)]·b−2 now. So we’ve got like an y = 1/x2 relation here. Where the hell does that come from?
  2. We were talking some real stuff here: a crystal lattice with atoms that, in reality, do have some spacing, so that corresponds to some real value for b. So that spacing gives some actual physical significance to those xvalues.

Well… What can I say? I think you should re-read that quote of Feynman when I started this post. We’re going to get Schrödinger’s equation – i.e. the ultimate prize for all of the hard work that we’ve been doing so far – but… Yes. It’s really very heuristic, indeed! 🙂 But let’s get on with it now! We can re-write our Hamiltonian equation as:

iħ·(∂C(xn)/∂t) = E0C(xn) − AC(xn+b) − AC(xn−b)]

= (E0−2A)C(xn) + A[2C(xn) − C(xn+b) − C(xn−b) = A[2C(xn) − C(xn+b) − C(xn−b)]

Now, I know your brain is about to melt down but, fiddling with this equation as we’re doing right now, Schrödinger recognized a formula for the second-order derivative of a function. I’ll just jot it down, and you can google it so as to double-check where it comes from:

second derivative

Just substitute f(x) for C(xn) in the second part of our equation above, and you’ll see we can effectively write that 2C(xn) − C(xn+b) − C(xn−b) factor as:

formula 1

We’re done. We just iħ·(∂C(xn)/∂t) on the left-hand side now and multiply the expression above with A, to get what we wanted to get, and that’s – YES! – Schrödinger’s equation:

Schrodinger 2

Whatever your objections to this ‘derivation’, it is the correct equation. For a particle in free space, we just write m instead of meff, but it’s exactly the same. I’ll now give you Feynman’s full quote, which is quite enlightening:

“We do not intend to have you think we have derived the Schrödinger equation but only wish to show you one way of thinking about it. When Schrödinger first wrote it down, he gave a kind of derivation based on some heuristic arguments and some brilliant intuitive guesses. Some of the arguments he used were even false, but that does not matter; the only important thing is that the ultimate equation gives a correct description of nature. The purpose of our discussion is then simply to show you that the correct fundamental quantum mechanical equation [i.e. Schrödinger’s equation] has the same form you get for the limiting case of an electron moving along a line of atoms. We can think of it as describing the diffusion of a probability amplitude from one point to the next along the line. That is, if an electron has a certain amplitude to be at one point, it will, a little time later, have some amplitude to be at neighboring points. In fact, the equation looks something like the diffusion equations which we have used in Volume I. But there is one main difference: the imaginary coefficient in front of the time derivative makes the behavior completely different from the ordinary diffusion such as you would have for a gas spreading out along a thin tube. Ordinary diffusion gives rise to real exponential solutions, whereas the solutions of Schrödinger’s equation are complex waves.”

So… That says it all, I guess. Isn’t it great to be where we are? We’ve really climbed a mountain here. And I think the view is gorgeous. 🙂

Oh—just in case you’d think I did not give you Schrödinger’s equation, let me write it in the form you’ll usually see it:

schrodinger 3

Done! 🙂

A Royal Road to quantum physics?

It is said that, when Ptolemy asked Euclid to quickly explain him geometry, Euclid told the King that there was no ‘Royal Road’ to it, by which he meant it’s just difficult and takes a lot of time to understand.

Physicists will tell you the same about quantum physics. So, I know that, at this point, I should just study Feynman’s third Lectures Volume and shut up for a while. However, before I get lost while playing with state vectors, S-matrices, eigenfunctions, eigenvalues and what have you, I’ll try that Royal Road anyway, building on my previous digression on Hamiltonian mechanics.

So… What was that about? Well… If you understood anything from my previous post, it should be that both the Lagrangian and Hamiltonian function use the equations for kinetic and potential energy to derive the equations of motion for a system. The key difference between the Lagrangian and Hamiltonian approach was that the Lagrangian approach yields one differential equation–which had to be solved to yield a functional form for x as a function of time, while the Hamiltonian approach yielded two differential equations–which had to be solved to yield a functional form for both position (x) and momentum (p). In other words, Lagrangian mechanics is a model that focuses on the position variable(s) only, while, in Hamiltonian mechanics, we also keep track of the momentum variable(s). Let me briefly explain the procedure again, so we’re clear on it:

1. We write down a function referred to as the Lagrangian function. The function is L = T – V with T and V the kinetic and potential energy respectively. T has to be expressed as a function of velocity (v) and V has to be expressed as a function of position (x). You’ll say: of course! However, it is an important point to note, otherwise the following step doesn’t make sense. So we take the equations for kinetic and potential energy and combine them to form a function L = L(x, v).

2. We then calculate the so-called Lagrangian equation, in which we use that function L. To be precise: what we have to do is calculate its partial derivatives and insert these in the following equation:


It should be obvious now why I stressed we should write L as a function of velocity and position, i.e. as L = L(x, v). Otherwise those partial derivatives don’t make sense. As to where this equation comes from, don’t worry about it: I did not explain why this works. I didn’t do that here, and I also didn’t do it in my previous post. What we’re doing here is just explaining how it goes, not why.

3. If we’ve done everything right, we should get a second-order differential equation which, as mentioned above, we should then solve for x(t). That’s what ‘solving’ a differential equation is about: find a functional form that satisfies the equation.

Let’s now look at the Hamiltonian approach.

1. We write down a function referred to as the Hamiltonian function. It looks similar to the Lagrangian, except that we sum kinetic and potential energy, and that T has to be expressed as a function of the momentum p. So we have a function H = T + V = H(x, p).

2. We then calculate the so-called Hamiltonian equations, which is a set of two equations, rather than just one equation. [We have two for the one-dimensional situation that we are modeling here: it’s a different story (i.e. we will have more equations) if we’d have more degrees of freedom of course.] It’s the same as in the Lagrangian approach: it’s just a matter of calculating partial derivatives, and insert them in the equations below. Again, note that I am not explaining why this Hamiltonian hocus-pocus actually works. I am just saying how it works.

Hamiltonian equations

3. If we’ve done everything right, we should get two first-order differential equations which we should then solve for x(t) and p(t). Now, solving a set of equations may or may not be easy, depending on your point of view. If you wonder how it’s done, there’s excellent stuff on the Web that will show you how (such as, for instance, Paul’s Online Math Notes).

Now, I mentioned in my previous post that the Hamiltonian approach to modeling mechanics is very similar to the approach that’s used in quantum mechanics and that it’s therefore the preferred approach in physics. I also mentioned that, in classical physics, position and momentum are also conjugate variables, and I also showed how we can calculate the momentum as a conjugate variable from the Lagrangian: p = ∂L/∂v. However, I did not dwell on what conjugate variables actually are in classical mechanics. I won’t do that here either. Just accept that conjugate variables, in classical mechanics, are also defined as pairs of variables. They’re not related through some uncertainty relation, like in quantum physics, but they’re related because they can both be obtained as the derivatives of a function which I haven’t introduced as yet. That function is referred to as the action, but… Well… Let’s resist the temptation to digress any further here. If you really want to know what action is–in physics, that is… 🙂 Well… Google it, I’d say. What you should take home from this digression is that position and momentum are also conjugate variables in classical mechanics.

Let’s now move on to quantum mechanics. You’ll see that the ‘similarity’ in approach is… Well… Quite relative, I’d say. 🙂

Position and momentum in quantum mechanics

As you know by now (I wrote at least a dozen posts on this), the concept of position and momentum in quantum mechanics is very different from that in classical physics: we do not have x(t) and p(t) functions which give a unique, precise and unambiguous value for x and p when we assign a value to the time variable and plug it in. No. What we have in quantum physics is some weird wave function, denoted by the Greek letters φ (phi) or ψ (psi) or, using Greek capitals, Φ and Ψ. To be more specific, the psi usually denotes the wave function in the so-called position space (so we write ψ = ψ(x)), and the phi will usually denote the wave function in the so-called momentum space (so we write φ = φ(p)). That sounds more complicated than it is, obviously, but I just wanted to respect terminology here. Finally, note that the ψ(x) and φ(p) wave functions are related through the Uncertainty Principle: they’re conjugate variables, and we have this ΔxΔp = ħ/2 equation, in which the Δ is some standard deviation from some mean value. I should not go into more detail here: you know that by now, don’t you?

While the argument of these functions is some real number, the wave functions themselves are complex-valued, so they have a real and complex amplitude. I’ve also illustrated that a couple of times already but, just to make sure, take a look at the animation below, so you know what we are sort of talking about:

  1. The A and B situations represent a classical oscillator: we know exactly where the red ball is at any point in time.
  2. The C to H situations give us a complex-valued amplitude, with the blue oscillation as the real part, and the pink oscillation as the imaginary part.

QuantumHarmonicOscillatorAnimationSo we have such wave function both for x and p. Note that the animation above suggests we’re only looking at the wave function for x but–trust me–we have a similar one for p, and they’re related indeed. [To see how exactly, I’d advise you to go through the proof of the so-called Kennard inequality.] So… What do we do with that?

The position and momentum operators

When we want to know where a particle actually is, or what its momentum is, we need to do something with this wave function ψ or φ. Let’s focus on the position variable first. While the wave function itself is said to have ‘no physical interpretation’ (frankly, I don’t know what that means: I’d think everything has some kind of interpretation (and what’s physical and non-physical?), but let’s not get lost in philosophy here), we know that the square of the absolute value of the probability amplitude yields a probability density. So |ψ(x)|gives us a probability density function or, to put it simply, the probability to find our ‘particle’ (or ‘wavicle’ if you want) at point x. Let’s now do something more sophisticated and write down the expected value of x, which is usually denoted by 〈x〉 (although that invites confusion with Dirac’s bra-ket notation, but don’t worry about it):

expected value of x

Don’t panic. It’s just an integral. Look at it. ψ* is just the complex conjugate (i.e. a – ib if ψ = a + ib) and you will (or should) remember that the product of a complex number with its (complex) conjugate gives us the square of its absolute value: ψ*ψ = |ψ(x)|2. What about that x? Can we just insert that there, in-between ψ* and ψ ? Good question. The answer is: yes, of course! That x is just some real number and we can put it anywhere. However, it’s still a good question because, while multiplication of complex numbers is commutative (hence,  z1z2 = z2z1), the order of our operators – which we will introduce soon – can often not be changed without consequences, so it is something to note.

For the rest, that integral above is quite obvious and it should really not puzzle you: we just multiply a value with its probability of occurring and integrate over the whole domain to get an expected value 〈x〉. Nothing wrong here. Note that we get some real number. [You’ll say: of course! However, I always find it useful to check that when looking at those things mixing complex-valued functions with real-valued variables or arguments. A quick check on the dimensions of what we’re dealing helps greatly in understanding what we’re doing.]

So… You’ve surely heard about the position and momentum operators already. Is that, then, what it is? Doing some integral on some function to get an expected value? Well… No. But there’s a relation. However, let me first make a remark on notation, because that can be quite confusing. The position operator is usually written with a hat on top of the variable – like ẑ – but so I don’t find a hat with every letter with the editor tool for this blog and, hence, I’ll use a bold letter x and p to denote the operator. Don’t confuse it with me using a bold letter for vectors though ! Now, back to the story.

Let’s first give an example of an operator you’re already familiar with in order to understand what an operator actually is. To put it simply: an operator is an instruction to do something with a function. For example: ∂/∂t is an instruction to differentiate some function with regard to the variable t (which usually stands for time). The ∂/∂t operator is obviously referred to as a differentiation operator. When we put a function behind, e.g. f(x, t), we get ∂f(x, t)/∂t, which is just another function in x and t.

So we have the same here: x in itself is just an instruction: you need to put a function behind in order to get some result. So you’ll see it as xψ. In fact, it would be useful to use brackets probably, like x[ψ], especially because I can’t put those hats on the letters here, but I’ll stick to the usual notation, which does not use brackets.

Likewise, we have a momentum operator: p = –iħ∂/∂x. […] Let it sink in. [..]

What’s this? Don’t worry about it. I know: that looks like a very different animal than that x operator. I’ll explain later. Just note, for the moment, that the momentum operator (also) involves a (partial) derivative and, hence, we refer to it as a differential operator (as opposed to differentiation operator). The instruction p = –iħ∂/∂x basically means: differentiate the function with regard to x and multiply with iħ (i.e. the product of Planck’s constant and the imaginary unit i). Nothing wrong with that. Just calculate a derivative and multiply with a tiny imaginary (complex) number.

Now, back to the position operator x. As you can see, that’s a very simple operator–much simpler than the momentum operator in any case. The position operator applied to ψ yields, quite simply, the xψ(x) factor in the integrand above. So we just get a new function xψ(x) when we apply x to ψ, of which the values are simply the product of x and ψ(x). Hence, we write xψ = xψ.

Really? Is it that simple? Yes. For now at least. 🙂

Back to the momentum operator. Where does that come from? That story is not so simple. [Of course not. It can’t be. Just look at it.] Because we have to avoid talking about eigenvalues and all that, my approach to the explanation will be quite intuitive. [As for ‘my’ approach, let me note that it’s basically the approach as used in the Wikipedia article on it. :-)] Just stay with me for a while here.

Let’s assume ψ is given by ψ = ei(kx–ωt). So that’s a nice periodic function, albeit complex-valued. Now, we know that functional form doesn’t make all that much sense because it corresponds to the particle being everywhere, because the square of its absolute value is some constant. In fact, we know it doesn’t even respect the normalization condition: all probabilities have to add up to 1. However, that being said, we also know that we can superimpose an infinite number of such waves (all with different k and ω) to get a more localized wave train, and then re-normalize the result to make sure the normalization condition is met. Hence, let’s just go along with this idealized example and see where it leads.

We know the wave number k (i.e. its ‘frequency in space’, as it’s often described) is related to the momentum p through the de Broglie relation: p = ħk. [Again, you should think about a whole bunch of these waves and, hence, some spread in k corresponding to some spread in p, but just go along with the story for now and don’t try to make it even more complicated.] Now, if we differentiate with regard to x, and then substitute, we get ∂ψ/∂x = ∂ei(kx–ωt)/∂x = ikei(kx–ωt) = ikψ, or


So what is this? Well… On the left-hand side, we have the (partial) derivative of a complex-valued function (ψ) with regard to x. Now, that derivative is, more likely than not, also some complex-valued function. And if you don’t believe me, just look at the right-hand side of the equation, where we have that i and ψ. In fact, the equation just shows that, when we take that derivative, we get our original function ψ but multiplied by ip/ħ. Hey! We’ve got a differential equation here, don’t we? Yes. And the solution for it is… Well… The natural exponential. Of course! That should be no surprise because we started out with a natural exponential as functional form! So that’s not the point. What is the point, then? Well… If we bring that i/ħ factor to the other side, we get:

(–i/ħ)(∂ψ/∂x) = pψ

[If you’re confused about the –i, remember that i–1 = 1/i = –i.] So… We’ve got pψ on the right-hand side now. So… Well… That’s like xψ, isn’t it? Yes. 🙂 If we define the momentum operator as p = (–i/ħ)(∂/∂x), then we get pψ = pψ. So that’s the same thing as for the position operator. It’s just that p is… Well… A more complex operator, as it has that –i/ħ factor in it. And, yes, of course it also involves an instruction to differentiate, which also sets it apart from the position operator, which is just an instruction to multiply the function with its argument.

I am sure you’ll find this funny–perhaps even fishy–business. And, yes, I have the same questions: what does it all mean? I can’t answer that here. As for now, just accept that this position and momentum operator are what they are, and that I can’t do anything about that. But… I hear you sputter: what about their interpretation? Well… Sorry… I could say that the functions xψ and pψ are so-called linear maps but that is not likely to help you much in understanding what these operators really do. You – and I for sure 🙂 – will indeed have to go through that story of eigenvalues to a somewhat deeper understanding of what these operators actually are. That’s just how it is. As for now, I just have to move on. Sorry for letting you down here. 🙂

Energy operators

Now that we sort of ‘understand’ those position and momentum operators (or their mathematical form at least), it’s time to introduce the energy operators. Indeed, in quantum mechanics, we’ve also got an operator for (a) kinetic energy, and for (b) potential energy. These operators are also denoted with a hat above the T and V symbol. All quantum-mechanical operators are like that, it seems. However, because of the limitations of the editor tool here, I’ll also use a bold T and V respectively. Now, I am sure you’ve had enough of this operators, so let me just jot them down:

  1. V = V, so that’s just an instruction to multiply a function with V = V(x, t). That’s easy enough because that’s just like the position vector.
  2. As for T, that’s more complicated. It involves that momentum operator p, which was also more complicated, remember? Let me just give you the formula:

T = p/2m = p2/2m.

So we multiply the operator p with itself here. What does that mean? Well… Because the operator involves a derivative, it means we have to take the derivative twice and… No ! Well… Let me correct myself: yes and no. 🙂 That p·p product is, strictly speaking, a dot product between two vectors, and so it’s not just a matter of differentiating twice. Now that we are here, we may just as well extend the analysis a bit and assume that we also have a y and z coordinate, so we’ll have a position vector r = (x, y, z). [Note that r is a vector here, not an operator. !?! Oh… Well…] Extending the analysis to three (or more) dimensions means that we should replace the differentiation operator by the so-called gradient or del operator: ∇ = (∂/∂x, ∂/∂y, ∂/∂z). And now that dot product p will, among other things, yield another operator which you’re surely familiar with: the Laplacian. Let me remind you of it:


Hence, we can write the kinetic energy operator T as:

Kinetic energy operator

I quickly copied this formula from Wikipedia, which doesn’t have the limitation of the WordPress editor tool, and so you see it now the way you should see it, i.e. with the hat notation. 🙂


In case you’re despairing, hang on ! We’re almost there. 🙂 We can, indeed, now define the Hamiltonian operator that’s used in quantum mechanics. While the Hamiltonian function was the sum of the potential and kinetic energy functions in classical physics, in quantum mechanics we add the two energy operators. You’ll grumble and say: that’s not the same as adding energies. And you’re right: adding operators is not the same as adding energy functions. Of course it isn’t. 🙂 But just stick to the story, please, and stop criticizing. [Oh – just in case you wonder where that minus sign comes from: i2 = –1, of course.]

Adding the two operators together yields the following:

Hamiltonian operator

So. Yes. That’s the famous Hamiltonian operator.

OK. So what?

Yes…. Hmm… What do we do with that operator? Well… We apply it to the function and so we write Hψ = … Hmm…

Well… What? 

Well… I am not writing this post just to give some definitions of the type of operators that are used in quantum mechanics and then just do obvious stuff by writing it all out. No. I am writing this post to illustrate how things work.

OK. So how does it work then? 

Well… It turns out that, in quantum mechanics, we have similar equations as in classical mechanics. Remember that I just wrote down the set of (two) differential equations when discussing Hamiltonian mechanics? Here I’ll do the same. The Hamiltonian operator appears in an equation of which you’ve surely heard of and which, just like me, you’d love to understand–and then I mean: understand it fully, completely, and intuitively. […] Yes. It’s the Schrödinger equation:

schrodinger 1

Note, once again, I am not saying anything about where this equation comes from. It’s like jotting down that Lagrange equation, or the set of Hamiltonian equations: I am not saying anything about the why of all this hocus pocus. I am just saying how it goes. So we’ve got another differential equation here, and we have to solve it. If we all write it out using the above definition of the Hamiltonian operator, we get:

Schrodinger 2

If you’re still with me, you’ll immediately wonder about that μ. Well… Don’t. It’s the mass really, but the so-called reduced mass. Don’t worry about it. Just google it if you want to know more about this concept of a ‘reduced’ mass: it’s a fine point which doesn’t matter here really. The point is the grand result.

But… So… What is the grand result? What are we looking at here? Well… Just as I said above: that Schrödinger equation is a differential equation, just like those equations we got when applying the Lagrangian and Hamiltonian approach to modeling a dynamic system in classical mechanics, and, hence, just like what we (were supposed to) do there, we have to solve it. 🙂 Of course, it looks much more daunting than our Lagrangian or Hamiltonian differential equations, because we’ve got complex-valued functions here, and you’re probably scared of that iħ factor too. But you shouldn’t be. When everything is said and done, we’ve got a differential equation here that we need to solve for ψ. In other words, we need to find functional forms for ψ that satisfy the above equation. That’s it. Period.

So how do these solutions look like? Well, they look like those complex-valued oscillating things in the very first animation above. Let me copy them again:


So… That’s it then? Yes. I won’t say anything more about it here, because (1) this post has become way too long already, and so I won’t dwell on the solutions of that Schrödinger equation, and because (2) I do feel it’s about time I really start doing what it takes, and that’s to work on all of the math that’s necessary to actually do all that hocus-pocus. 🙂

Post scriptum: As for understanding the Schrödinger equation “fully, completely, and intuitively”, I am not sure that’s actually possible. But I am trying hard and so let’s see. 🙂 I’ll tell you after I mastered the math. But something inside of me tells me there’s indeed no Royal Road to it. 🙂

Post scriptum 2 (dated 16 November 2015): I’ve added this post scriptum, more than a year later after writing all of the above, because I now realize how immature it actually is. If you really want to know more about quantum math, then you should read my more recent posts, like the one on the Hamiltonian matrix. It’s not that anything that I write above is wrong—it isn’t. But… Well… It’s just that I feel that I’ve jumped the gun. […] But then that’s probably not a bad thing. 🙂

Re-visiting the matter wave (II)

My previous post was, once again, littered with formulas – even if I had not intended it to be that way: I want to convey some kind of understanding of what an electron – or any particle at the atomic scale – actually is – with the minimum number of formulas necessary.

We know particles display wave behavior: when an electron beam encounters an obstacle or a slit that is somewhat comparable in size to its wavelength, we’ll observe diffraction, or interference. [I have to insert a quick note on terminology here: the terms diffraction and interference are often used interchangeably, but there is a tendency to use interference when we have more than one wave source and diffraction when there is only one wave source. However, I’ll immediately add that distinction is somewhat artificial. Do we have one or two wave sources in a double-slit experiment? There is one beam but the two slits break it up in two and, hence, we would call it interference. If it’s only one slit, there is also an interference pattern, but the phenomenon will be referred to as diffraction.]

We also know that the wavelength we are talking about it here is not the wavelength of some electromagnetic wave, like light. It’s the wavelength of a de Broglie wave, i.e. a matter wave: such wave is represented by an (oscillating) complex number – so we need to keep track of a real and an imaginary part – representing a so-called probability amplitude Ψ(x, t) whose modulus squared (│Ψ(x, t)│2) is the probability of actually detecting the electron at point x at time t. [The purists will say that complex numbers can’t oscillate – but I am sure you get the idea.]

You should read the phrase above twice: we cannot know where the electron actually is. We can only calculate probabilities (and, of course, compare them with the probabilities we get from experiments). Hence, when the wave function tells us the probability is greatest at point x at time t, then we may be lucky when we actually probe point x at time t and find it there, but it may also not be there. In fact, the probability of finding it exactly at some point x at some definite time t is zero. That’s just a characteristic of such probability density functions: we need to probe some region Δx in some time interval Δt.

If you think that is not very satisfactory, there’s actually a very common-sense explanation that has nothing to do with quantum mechanics: our scientific instruments do not allow us to go beyond a certain scale anyway. Indeed, the resolution of the best electron microscopes, for example, is some 50 picometer (1 pm = 1×10–12 m): that’s small (and resolutions get higher by the year), but so it implies that we are not looking at points – as defined in math that is: so that’s something with zero dimension – but at pixels of size Δx = 50×10–12 m.

The same goes for time. Time is measured by atomic clocks nowadays but even these clocks do ‘tick’, and these ‘ticks’ are discrete. Atomic clocks take advantage of the property of atoms to resonate at extremely consistent frequencies. I’ll say something more about resonance soon – because it’s very relevant for what I am writing about in this post – but, for the moment, just note that, for example, Caesium-133 (which was used to build the first atomic clock) oscillates at 9,192,631,770 cycles per second. In fact, the International Bureau of Standards and Weights re-defined the (time) second in 1967 to correspond to “the duration of 9,192,631,770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the Caesium-133 atom at rest at a temperature of 0 K.”

Don’t worry about it: the point to note is that when it comes to measuring time, we also have an uncertainty. Now, when using this Caesium-133 atomic clock, this uncertainty would be in the range of ±9.2×10–9 seconds (so that’s nanoseconds: 1 ns = 1×10–9 s), because that’s the rate at which this clock ‘ticks’. However, there are other (much more plausible) ways of measuring time: some of the unstable baryons have lifetimes in the range of a few picoseconds only (1 ps = 1×10–12 s) and the really unstable ones – known as baryon resonances – have lifetimes in the 1×10–22 to 1×10–24 s range. This we can only measure because they leave some trace after these particle collisions in particle accelerators and, because we have some idea about their speed, we can calculate their lifetime from the (limited) distance they travel before disintegrating. The thing to remember is that for time also, we have to make do with time pixels  instead of time points, so there is a Δt as well. [In case you wonder what baryons are: they are particles consisting of three quarks, and the proton and the neutron are the most prominent (and most stable) representatives of this family of particles.]  

So what’s the size of an electron? Well… It depends. We need to distinguish two very different things: (1) the size of the area where we are likely to find the electron, and (2) the size of the electron itself. Let’s start with the latter, because that’s the easiest question to answer: there is a so-called classical electron radius re, which is also known as the Thompson scattering length, which has been calculated as:

r_\mathrm{e} = \frac{1}{4\pi\varepsilon_0}\frac{e^2}{m_{\mathrm{e}} c^2} = 2.817 940 3267(27) \times 10^{-15} \mathrm{m}As for the constants in this formula, you know these by now: the speed of light c, the electron charge e, its mass me, and the permittivity of free space εe. For whatever it’s worth (because you should note that, in quantum mechanics, electrons do not have a size: they are treated as point-like particles, so they have a point charge and zero dimension), that’s small. It’s in the femtometer range (1 fm = 1×10–15 m). You may or may not remember that the size of a proton is in the femtometer range as well – 1.7 fm to be precise – and we had a femtometer size estimate for quarks as well: 0.7 m. So we have the rather remarkable result that the much heavier proton (its rest mass is 938 MeV/csas opposed to only 0.511 MeV MeV/c2, so the proton is 1835 times heavier) is 1.65 times smaller than the electron. That’s something to be explored later: for the moment, we’ll just assume the electron wiggles around a bit more – exactly because it’s lighterHere you just have to note that this ‘classical’ electron radius does measure something: it’s something ‘hard’ and ‘real’ because it scatters, absorbs or deflects photons (and/or other particles). In one of my previous posts, I explained how particle accelerators probe things at the femtometer scale, so I’ll refer you to that post (End of the Road to Reality?) and move on to the next question.

The question concerning the area where we are likely to detect the electron is more interesting in light of the topic of this post (the nature of these matter waves). It is given by that wave function and, from my previous post, you’ll remember that we’re talking the nanometer scale here (1 nm = 1×10–9 m), so that’s a million times larger than the femtometer scale. Indeed, we’ve calculated a de Broglie wavelength of 0.33 nanometer for relatively slow-moving electrons (electrons in orbit), and the slits used in single- or double-slit experiments with electrons are also nanotechnology. In fact, now that we are here, it’s probably good to look at those experiments in detail.

The illustration below relates the actual experimental set-up of a double-slit experiment performed in 2012 to Feynman’s 1965 thought experiment. Indeed, in 1965, the nanotechnology you need for this kind of experiment was not yet available, although the phenomenon of electron diffraction had been confirmed experimentally already in 1925 in the famous Davisson-Germer experiment. [It’s famous not only because electron diffraction was a weird thing to contemplate at the time but also because it confirmed the de Broglie hypothesis only two years after Louis de Broglie had advanced it!]. But so here is the experiment which Feynman thought would never be possible because of technology constraints:

Electron double-slit set-upThe insert in the upper-left corner shows the two slits: they are each 50 nanometer wide (50×10–9 m) and 4 micrometer tall (4×10–6 m). [The thing in the middle of the slits is just a little support. Please do take a few seconds to contemplate the technology behind this feat: 50 nm is 50 millionths of a millimeter. Try to imagine dividing one millimeter in ten, and then one of these tenths in ten again, and again, and once again, again, and again. You just can’t imagine that, because our mind is used to addition/subtraction and – to some extent – with multiplication/division: our mind can’t deal with with exponentiation really – because it’s not a everyday phenomenon.] The second inset (in the upper-right corner) shows the mask that can be moved to close one or both slits partially or completely.

Now, 50 nanometer is 150 times larger than the 0.33 nanometer range we got for ‘our’ electron, but it’s small enough to show diffraction and/or interference. [In fact, in this experiment (done by Bach, Pope, Liou and Batelaan from the University of Nebraska-Lincoln less than two years ago indeed), the beam consisted of electrons with an (average) energy of 600 eV and a de Broglie wavelength of 50 picometer. So that’s like the electrons used in electron microscopes. 50 pm is 6.6 times smaller than the 0.33 nm wavelength we calculated for our low-energy (70 eV) electron – but then the energy and the fact these electrons are guided in electromagnetic fields explain the difference. Let’s go to the results.

The illustration below shows the predicted pattern next to the observed pattern for the two scenarios:

  1. We first close slit 2, let a lot of electrons go through it, and so we get a pattern described by the probability density function P1 = │Φ12. Here we see no interference but a typical diffraction pattern: the intensity follows a more or less normal (i.e. Gaussian) distribution. We then close slit 1 (and open slit 2 again), again let a lot of electrons through, and get a pattern described by the probability density function P2 = │Φ22. So that’s how we get P1 and P2.
  2. We then open both slits, let a whole electrons through, and get according to the pattern described by probability density function P12 = │Φ122, which we get not from adding the probabilities P1 and P2 (hence, P12 ≠  P1 + P2) – as one would expect if electrons would behave like particles – but from adding the probability amplitudes. We have interference, rather than diffraction.

Predicted interference effectBut so what exactly is interfering? Well… The electrons. But that can’t be, can it?

The electrons are obviously particles, as evidenced from the impact they make – one by one – as they hit the screen as shown below. [If you want to know what screen, let me quote the researchers: “The resulting patterns were magnified by an electrostatic quadrupole lens and imaged on a two-dimensional microchannel plate and phosphorus screen, then recorded with a charge-coupled device camera. […] To study the build-up of the diffraction pattern, each electron was localized using a “blob” detection scheme: each detection was replaced by a blob, whose size represents the error in the localization of the detection scheme. The blobs were compiled together to form the electron diffraction patterns.” So there you go.]

Electron blobs

Look carefully at how this interference pattern becomes ‘reality’ as the electrons hit the screen one by one. And then say it: WAW ! 

Indeed, as predicted by Feynman (and any other physics professor at the time), even if the electrons go through the slits one by one, they will interfere – with themselves so to speak. [In case you wonder if these electrons really went through one by one, let me quote the researchers once again: “The electron source’s intensity was reduced so that the electron detection rate in the pattern was about 1 Hz. At this rate and kinetic energy, the average distance between consecutive electrons was 2.3 × 106 meters. This ensures that only one electron is present in the 1 meter long system at any one time, thus eliminating electron-electron interactions.” You don’t need to be a scientist or engineer to understand that, isn’t it?]

While this is very spooky, I have not seen any better way to describe the reality of the de Broglie wave: the particle is not some point-like thing but a matter wave, as evidenced from the fact that it does interfere with itself when forced to move through two slits – or through one slit, as evidenced by the diffraction patterns built up in this experiment when closing one of the two slits: the electrons went through one by one as well!

But so how does it relate to the characteristics of that wave packet which I described in my previous post? Let me sum up the salient conclusions from that discussion:

  1. The wavelength λ of a wave packet is calculated directly from the momentum by using de Broglie‘s second relation: λ = h/p. In this case, the wavelength of the electrons averaged 50 picometer. That’s relatively small as compared to the width of the slit (50 nm) – a thousand times smaller actually! – but, as evidenced by the experiment, it’s small enough to show the ‘reality’ of the de Broglie wave.
  2. From a math point (but, of course, Nature does not care about our math), we can decompose the wave packet in a finite or infinite number of component waves. Such decomposition is referred to, in the first case (finite number of composite waves or discrete calculus) as a Fourier analysis, or, in the second case, as a Fourier transform. A Fourier transform maps our (continuous) wave function, Ψ(x), to a (continuous) wave function in the momentum space, which we noted as φ(p). [In fact, we noted it as Φ(p) but I don’t want to create confusion with the Φ symbol used in the experiment, which is actually the wave function in space, so Ψ(x) is Φ(x) in the experiment – if you know what I mean.] The point to note is that uncertainty about momentum is related to uncertainty about position. In this case, we’ll have pretty standard electrons (so not much variation in momentum), and so the location of the wave packet in space should be fairly precise as well.
  3. The group velocity of the wave packet (vg) – i.e. the envelope in which our Ψ wave oscillates – equals the speed of our electron (v), but the phase velocity (i.e. the speed of our Ψ wave itself) is superluminal: we showed it’s equal to (vp) = E/p =   c2/v = c/β, with β = v/c, so that’s the ratio of the speed of our electron and the speed of light. Hence, the phase velocity will always be superluminal but will approach c as the speed of our particle approaches c. For slow-moving particles, we get astonishing values for the phase velocity, like more than a hundred times the speed of light for the electron we looked at in our previous post. That’s weird but it does not contradict relativity: if it helps, one can think of the wave packet as a modulation of an incredibly fast-moving ‘carrier wave’. 

Is any of this relevant? Does it help you to imagine what the electron actually is? Or what that matter wave actually is? Probably not. You will still wonder: How does it look like? What is it in reality?

That’s hard to say. If the experiment above does not convey any ‘reality’ according to you, then perhaps the illustration below will help. It’s one I have used in another post too (An Easy Piece: Introducing Quantum Mechanics and the Wave Function). I took it from Wikipedia, and it represents “the (likely) space in which a single electron on the 5d atomic orbital of an atom would be found.” The solid body shows the places where the electron’s probability density (so that’s the squared modulus of the probability amplitude) is above a certain value – so it’s basically the area where the likelihood of finding the electron is higher than elsewhere. The hue on the colored surface shows the complex phase of the wave function.


So… Does this help? 

You will wonder why the shape is so complicated (but it’s beautiful, isn’t it?) but that has to do with quantum-mechanical calculations involving quantum-mechanical quantities such as spin and other machinery which I don’t master (yet). I think there’s always a bit of a gap between ‘first principles’ in physics and the ‘model’ of a real-life situation (like a real-life electron in this case), but it’s surely the case in quantum mechanics! That being said, when looking at the illustration above, you should be aware of the fact that you are actually looking at a 3D representation of the wave function of an electron in orbit. 

Indeed, wave functions of electrons in orbit are somewhat less random than – let’s say – the wave function of one of those baryon resonances I mentioned above. As mentioned in my Not So Easy Piece, in which I introduced the Schrödinger equation (i.e. one of my previous posts), they are solutions of a second-order partial differential equation – known as the Schrödinger wave equation indeed – which basically incorporates one key condition: these solutions – which are (atomic or molecular) ‘orbitals’ indeed – have to correspond to so-called stationary states or standing waves. Now what’s the ‘reality’ of that? 

The illustration below comes from Wikipedia once again (Wikipedia is an incredible resource for autodidacts like me indeed) and so you can check the article (on stationary states) for more details if needed. Let me just summarize the basics:

  1. A stationary state is called stationary because the system remains in the same ‘state’ independent of time. That does not mean the wave function is stationary. On the contrary, the wave function changes as function of both time and space – Ψ = Ψ(x, t) remember? – but it represents a so-called standing wave.
  2. Each of these possible states corresponds to an energy state, which is given through the de Broglie relation: E = hf. So the energy of the state is proportional to the oscillation frequency of the (standing) wave, and Planck’s constant is the factor of proportionality. From a formal point of view, that’s actually the one and only condition we impose on the ‘system’, and so it immediately yields the so-called time-independent Schrödinger equation, which I briefly explained in the above-mentioned Not So Easy Piece (but I will not write it down here because it would only confuse you even more). Just look at these so-called harmonic oscillators below:


A and B represent a harmonic oscillator in classical mechanics: a ball with some mass m (mass is a measure for inertia, remember?) on a spring oscillating back and forth. In case you’d wonder what the difference is between the two: both the amplitude as well as the frequency of the movement are different. 🙂 A spring and a ball?

It represents a simple system. A harmonic oscillation is basically a resonance phenomenon: springs, electric circuits,… anything that swings, moves or oscillates (including large-scale things such as bridges and what have you – in his 1965 Lectures (Vol. I-23), Feynman even discusses resonance phenomena in the atmosphere in his Lectures) has some natural frequency ω0, also referred to as the resonance frequency, at which it oscillates naturally indeed: that means it requires (relatively) little energy to keep it going. How much energy it takes exactly to keep them going depends on the frictional forces involved: because the springs in A and B keep going, there’s obviously no friction involved at all. [In physics, we say there is no damping.] However, both springs do have a different k (that’s the key characteristic of a spring in Hooke’s Law, which describes how springs work), and the mass m of the ball might be different as well. Now, one can show that the period of this ‘natural’ movement will be equal to t0 = 2π/ω= 2π(m/k)1/2 or that ω= (m/k)–1/2. So we’ve got a A and a B situation which differ in k and m. Let’s go to the so-called quantum oscillator, illustrations C to H.

C to H in the illustration are six possible solutions to the Schrödinger Equation for this situation. The horizontal axis is position (and so time is the variable) – but we could switch the two independent variables easily: as I said a number of times already, time and space are interchangeable in the argument representing the phase (θ) of a wave provided we use the right units (e.g. light-seconds for distance and seconds for time): θ = ωt – kx. Apart from the nice animation, the other great thing about these illustrations – and the main difference with resonance frequencies in the classical world – is that they show both the real part (blue) as well as the imaginary part (red) of the wave function as a function of space (fixed in the x axis) and time (the animation).

Is this ‘real’ enough? If it isn’t, I know of no way to make it any more ‘real’. Indeed, that’s key to understanding the nature of matter waves: we have to come to terms with the idea that these strange fluctuating mathematical quantities actually represent something. What? Well… The spooky thing that leads to the above-mentioned experimental results: electron diffraction and interference. 

Let’s explore this quantum oscillator some more. Another key difference between natural frequencies in atomic physics (so the atomic scale) and resonance phenomena in ‘the big world’ is that there is more than one possibility: each of the six possible states above corresponds to a solution and an energy state indeed, which is given through the de Broglie relation: E = hf. However, in order to be fully complete, I have to mention that, while G and H are also solutions to the wave equation, they are actually not stationary states. The illustration below – which I took from the same Wikipedia article on stationary states – shows why. For stationary states, all observable properties of the state (such as the probability that the particle is at location x) are constant. For non-stationary states, the probabilities themselves fluctuate as a function of time (and space of obviously), so the observable properties of the system are not constant. These solutions are solutions to the time-dependent Schrödinger equation and, hence, they are, obviously, time-dependent solutions.

StationaryStatesAnimationWe can find these time-dependent solutions by superimposing two stationary states, so we have a new wave function ΨN which is the sum of two others:  ΨN = Ψ1  + Ψ2. [If you include the normalization factor (as you should to make sure all probabilities add up to 1), it’s actually ΨN = (2–1/2)(Ψ1  + Ψ2).] So G and H above still represent a state of a quantum harmonic oscillator (with a specific energy level proportional to h), but so they are not standing waves.

Let’s go back to our electron traveling in a more or less straight path. What’s the shape of the solution for that one? It could be anything. Well… Almost anything. As said, the only condition we can impose is that the envelope of the wave packet – its ‘general’ shape so to say – should not change. That because we should not have dispersion – as illustrated below. [Note that this illustration only represent the real or the imaginary part – not both – but you get the idea.]


That being said, if we exclude dispersion (because a real-life electron traveling in a straight line doesn’t just disappear – as do dispersive wave packets), then, inside of that envelope, the weirdest things are possible – in theory that is. Indeed, Nature does not care much about our Fourier transforms. So the example below, which shows a theoretical wave packet (again, the real or imaginary part only) based on some theoretical distribution of the wave numbers of the (infinite number) of component waves that make up the wave packet, may or may not represent our real-life electron. However, if our electron has any resemblance to real-life, then I would expect it to not be as well-behaved as the theoretical one that’s shown below.

example of wave packet

The shape above is usually referred to as a Gaussian wave packet, because of the nice normal (Gaussian) probability density functions that are associated with it. But we can also imagine a ‘square’ wave packet: a somewhat weird shape but – in terms of the math involved – as consistent as the smooth Gaussian wave packet, in the sense that we can demonstrate that the wave packet is made up of an infinite number of waves with an angular frequency ω that is linearly related to their wave number k, so the dispersion relation is ω = ak + b. [Remember we need to impose that condition to ensure that our wave packet will not dissipate (or disperse or disappear – whatever term you prefer.] That’s shown below: a Fourier analysis of a square wave.

Square wave packet

While we can construct many theoretical shapes of wave packets that respect the ‘no dispersion!’ condition, we cannot know which one will actually represent that electron we’re trying to visualize. Worse, if push comes to shove, we don’t know if these matter waves (so these wave packets) actually consist of component waves (or time-independent stationary states or whatever).

[…] OK. Let me finally admit it: while I am trying to explain you the ‘reality’ of these matter waves, we actually don’t know how real these matter waves actually are. We cannot ‘see’ or ‘touch’ them indeed. All that we know is that (i) assuming their existence, and (ii) assuming these matter waves are more or less well-behaved (e.g. that actual particles will be represented by a composite wave characterized by a linear dispersion relation between the angular frequencies and the wave numbers of its (theoretical) component waves) allows us to do all that arithmetic with these (complex-valued) probability amplitudes. More importantly, all that arithmetic with these complex numbers actually yields (real-valued) probabilities that are consistent with the probabilities we obtain through repeated experiments. So that’s what’s real and ‘not so real’ I’d say.

Indeed, the bottom-line is that we do not know what goes on inside that envelope. Worse, according to the commonly accepted Copenhagen interpretation of the Uncertainty Principle (and tons of experiments have been done to try to overthrow that interpretation – all to no avail), we never will.

A not so easy piece: introducing the wave equation (and the Schrödinger equation)

The title above refers to a previous post: An Easy Piece: Introducing the wave function.

Indeed, I may have been sloppy here and there – I hope not – and so that’s why it’s probably good to clarify that the wave function (usually represented as Ψ – the psi function) and the wave equation (Schrödinger’s equation, for example – but there are other types of wave equations as well) are two related but different concepts: wave equations are differential equations, and wave functions are their solutions.

Indeed, from a mathematical point of view, a differential equation (such as a wave equation) relates a function (such as a wave function) with its derivatives, and its solution is that function or – more generally – the set (or family) of functions that satisfies this equation. 

The function can be real-valued or complex-valued, and it can be a function involving only one variable (such as y = y(x), for example) or more (such as u = u(x, t) for example). In the first case, it’s a so-called ordinary differential equation. In the second case, the equation is referred to as a partial differential equation, even if there’s nothing ‘partial’ about it: it’s as ‘complete’ as an ordinary differential equation (the name just refers to the presence of partial derivatives in the equation). Hence, in an ordinary differential equation, we will have terms involving dy/dx and/or d2y/dx2, i.e. the first and second derivative of y respectively (and/or higher-order derivatives, depending on the degree of the differential equation), while in partial differential equations, we will see terms involving ∂u/∂t and/or ∂u2/∂x(and/or higher-order partial derivatives), with ∂ replacing d as a symbol for the derivative.

The independent variables could also be complex-valued but, in physics, they will usually be real variables (or scalars as real numbers are also being referred to – as opposed to vectors, which are nothing but two-, three- or more-dimensional numbers really). In physics, the independent variables will usually be x – or let’s use r = (x, y, z) for a change, i.e. the three-dimensional space vector – and the time variable t. An example is that wave function which we introduced in our ‘easy piece’.

Ψ(r, t) = Aei(p·r – Et)ħ

[If you read the Easy Piece, then you might object that this is not quite what I wrote there, and you are right: I wrote Ψ(r, t) = Aei(p/ħr – ωt). However, here I am just introducing the other de Broglie relation (i.e. the one relating energy and frequency): E = hf =ħω and, hence, ω = E/ħ. Just re-arrange a bit and you’ll see it’s the same.]

From a physics point of view, a differential equation represents a system subject to constraints, such as the energy conservation law (the sum of the potential and kinetic energy remains constant), and Newton’s law of course: F = d(mv)/dt. A differential equation will usually also be given with one or more initial conditions, such as the value of the function at point t = 0, i.e. the initial value of the function. To use Wikipedia’s definition: “Differential equations arise whenever a relation involving some continuously varying quantities (modeled by functions) and their rates of change in space and/or time (expressed as derivatives) is known or postulated.”

That sounds a bit more complicated, perhaps, but it means the same: once you have a good mathematical model of a physical problem, you will often end up with a differential equation representing the system you’re looking at, and then you can do all kinds of things, such as analyzing whether or not the actual system is in an equilibrium and, if not, whether it will tend to equilibrium or, if not, what the equilibrium conditions would be. But here I’ll refer to my previous posts on the topic of differential equations, because I don’t want to get into these details – as I don’t need them here.

The one thing I do need to introduce is an operator referred to as the gradient (it’s also known as the del operator, but I don’t like that word because it does not convey what it is). The gradient – denoted by ∇ – is a shorthand for the partial derivatives of our function u or Ψ with respect to space, so we write:

∇ = (∂/∂x, ∂/∂y, ∂/∂z)

You should note that, in physics, we apply the gradient only to the spatial variables, not to time. For the derivative in regard to time, we just write ∂u/∂t or ∂Ψ/∂t.

Of course, an operator means nothing until you apply it to a (real- or complex-valued) function, such as our u(x, t) or our Ψ(r, t):

∇u = ∂u/∂x and ∇Ψ = (∂Ψ/∂x, ∂Ψ/∂y, ∂Ψ/∂z)

As you can see, the gradient operator returns a vector with three components if we apply it to a real- or complex-valued function of r, and so we can do all kinds of funny things with it combining it with the scalar or vector product, or with both. Here I need to remind you that, in a vector space, we can multiply vectors using either (i) the scalar product, aka the dot product (because of the dot in its notation: ab) or (ii) the vector product, aka as the cross product (yes, because of the cross in its notation: b).

So we can define a whole range of new operators using the gradient and these two products, such as the divergence and the curl of a vector field. For example, if E is the electric field vector (I am using an italic bold-type E so you should not confuse E with the energy E, which is a scalar quantity), then div E = ∇•E, and curl E =∇×E. Taking the divergence of a vector will yield some number (so that’s a scalar), while taking the curl will yield another vector. 

I am mentioning these operators because you will often see them. A famous example is the set of equations known as Maxwell’s equations, which integrate all of the laws of electromagnetism and from which we can derive the electromagnetic wave equation:

(1) ∇•E = ρ/ε(Gauss’ law)

(2) ∇×E = –∂B/∂t (Faraday’s law)

(3) ∇•B = 0

(4) c2∇×B =  j+  ∂E/∂t  

I should not explain these but let me just remind you of the essentials:

  1. The first equation (Gauss’ law) can be derived from the equations for Coulomb’s law and the forces acting upon a charge q in an electromagnetic field: F = q(E + v×B) – with B the magnetic field vector (F is also referred to as the Lorentz force: it’s the combined force on a charged particle caused by the electric and magnetic fields; v the velocity of the (moving) charge;  ρ the charge density (so charge is thought of as being distributed in space, rather than being packed into points, and that’s OK because our scale is not the quantum-mechanical one here); and, finally, ε0 the electric constant (some 8.854×10−12 farads per meter).
  2. The second equation (Faraday’s law) gives the electric field associated with a changing magnetic field.
  3. The third equation basically states that there is no such thing as a magnetic charge: there are only electric charges.
  4. Finally, in the last equation, we have a vector j representing the current density: indeed, remember than magnetism only appears when (electric) charges are moving, so if there’s an electric current. As for the equation itself, well… That’s a more complicated story so I will leave that for the post scriptum.

We can do many more things: we can also take the curl of the gradient of some scalar, or the divergence of the curl of some vector (both have the interesting property that they are zero), and there are many more possible combinations – some of them useful, others not so useful. However, this is not the place to introduce differential calculus of vector fields (because that’s what it is).

The only other thing I need to mention here is what happens when we apply this gradient operator twice. Then we have an new operator ∇•∇ = ∇which is referred to as the Laplacian. In fact, when we say ‘apply ∇ twice’, we are actually doing a dot product. Indeed, ∇ returns a vector, and so we are going to multiply this vector once again with a vector using the dot product rule: a= ∑aib(so we multiply the individual vector components and then add them). In the case of our functions u and Ψ, we get:

∇•(∇u) =∇•∇u = (∇•∇)u = ∇u =∂2u/∂x2

∇•(∇Ψ) = ∇Ψ = ∂2Ψ/∂x+ ∂2Ψ/∂y+ ∂2Ψ/∂z2

Now, you may wonder what it means to take the derivative (or partial derivative) of a complex-valued function (which is what we are doing in the case of Ψ) but don’t worry about that: a complex-valued function of one or more real variables,  such as our Ψ(x, t), can be decomposed as Ψ(x, t) =ΨRe(x, t) + iΨIm(x, t), with ΨRe and ΨRe two real-valued functions representing the real and imaginary part of Ψ(x, t) respectively. In addition, the rules for integrating complex-valued functions are, to a large extent, the same as for real-valued functions. For example, if z is a complex number, then dez/dz = ez and, hence, using this and other very straightforward rules, we can indeed find the partial derivatives of a function such as Ψ(r, t) = Aei(p·r – Et)ħ with respect to all the (real-valued) variables in the argument.

The electromagnetic wave equation  

OK. That’s enough math now. We are ready now to look at – and to understand – a real wave equation – I mean one that actually represents something in physics. Let’s take Maxwell’s equations as a start. To make it easy – and also to ensure that you have easy access to the full derivation – we’ll take the so-called Heaviside form of these equations:

Heaviside form of Maxwell's equations

This Heaviside form assumes a charge-free vacuum space, so there are no external forces acting upon our electromagnetic wave. There are also no other complications such as electric currents. Also, the c2 (i.e. the square of the speed of light) is written here c2 = 1/με, with μ and ε the so-called permeability (μ) and permittivity (ε) respectively (c0, μand ε0 are the values in a vacuum space: indeed, light travels slower elsewhere (e.g. in glass) – if at all).

Now, these four equations can be replaced by just two, and it’s these two equations that are referred to as the electromagnetic wave equation(s):

electromagnetic wave equation

The derivation is not difficult. In fact, it’s much easier than the derivation for the Schrödinger equation which I will present in a moment. But, even if it is very short, I will just refer to Wikipedia in case you would be interested in the details (see the article on the electromagnetic wave equation). The point here is just to illustrate what is being done with these wave equations and why – not so much howIndeed, you may wonder what we have gained with this ‘reduction’.

The answer to this very legitimate question is easy: the two equations above are second-order partial differential equations which are relatively easy to solve. In other words, we can find a general solution, i.e. a set or family of functions that satisfy the equation and, hence, can represent the wave itself. Why a set of functions? If it’s a specific wave, then there should only be one wave function, right? Right. But to narrow our general solution down to a specific solution, we will need extra information, which are referred to as initial conditions, boundary conditions or, in general, constraints. [And if these constraints are not sufficiently specific, then we may still end up with a whole bunch of possibilities, even if they narrowed down the choice.]

Let’s give an example by re-writing the above wave equation and using our function u(x, t) or, to simplify the analysis, u(x, t) – so we’re looking at a plane wave traveling in one dimension only:

Wave equation for u

There are many functional forms for u that satisfy this equation. One of them is the following:

general solution for wave equation

This resembles the one I introduced when presenting the de Broglie equations, except that – this time around – we are talking a real electromagnetic wave, not some probability amplitude. Another difference is that we allow a composite wave with two components: one traveling in the positive x-direction, and one traveling in the negative x-direction. Now, if you read the post in which I introduced the de Broglie wave, you will remember that these Aei(kx–ωt) or Be–i(kx+ωt) waves give strange probabilities. However, because we are not looking at some probability amplitude here – so it’s not a de Broglie wave but a real wave (so we use complex number notation only because it’s convenient but, in practice, we’re only considering the real part), this functional form is quite OK.

That being said, the following functional form, representing a wave packet (aka a wave train) is also a solution (or a set of solutions better):

Wave packet equation

Huh? Well… Yes. If you really can’t follow here, I can only refer you to my post on Fourier analysis and Fourier transforms: I cannot reproduce that one here because that would make this post totally unreadable. We have a wave packet here, and so that’s the sum of an infinite number of component waves that interfere constructively in the region of the envelope (so that’s the location of the packet) and destructively outside. The integral is just the continuum limit of a summation of n such waves. So this integral will yield a function u with x and t as independent variables… If we know A(k) that is. Now that’s the beauty of these Fourier integrals (because that’s what this integral is). 

Indeed, in my post on Fourier transforms I also explained how these amplitudes A(k) in the equation above can be expressed as a function of u(x, t) through the inverse Fourier transform. In fact, I actually presented the Fourier transform pair Ψ(x) and Φ(p) in that post, but the logic is same – except that we’re inserting the time variable t once again (but with its value fixed at t=0):

Fourier transformOK, you’ll say, but where is all of this going? Be patient. We’re almost done. Let’s now introduce a specific initial condition. Let’s assume that we have the following functional form for u at time t = 0:

u at time 0

You’ll wonder where this comes from. Well… I don’t know. It’s just an example from Wikipedia. It’s random but it fits the bill: it’s a localized wave (so that’s a a wave packet) because of the very particular form of the phase (θ = –x2+ ik0x). The point to note is that we can calculate A(k) when inserting this initial condition in the equation above, and then – finally, you’ll say – we also get a specific solution for our u(x, t) function by inserting the value for A(k) in our general solution. In short, we get:



u final form

As mentioned above, we are actually only interested in the real part of this equation (so that’s the e with the exponent factor (note there is no in it, so it’s just some real number) multiplied with the cosine term).

However, the example above shows how easy it is to extend the analysis to a complex-valued wave function, i.e. a wave function describing a probability amplitude. We will actually do that now for Schrödinger’s equation. [Note that the example comes from Wikipedia’s article on wave packets, and so there is a nice animation which shows how this wave packet (be it the real or imaginary part of it) travels through space. Do watch it!]

Schrödinger’s equation

Let me just write it down:

Schrodinger's equation

That’s it. This is the Schrödinger equation – in a somewhat simplified form but it’s OK.

[…] You’ll find that equation above either very simple or, else, very difficult depending on whether or not you understood most or nothing at all of what I wrote above it. If you understood something, then it should be fairly simple, because it hardly differs from the other wave equation.

Indeed, we have that imaginary unit (i) in front of the left term, but then you should not panic over that: when everything is said and done, we are working here with the derivative (or partial derivative) of a complex-valued function, and so it should not surprise us that we have an i here and there. It’s nothing special. In fact, we had them in the equation above too, but they just weren’t explicit. The second difference with the electromagnetic wave equation is that we have a first-order derivative of time only (in the electromagnetic wave equation we had 2u/∂t2, so that’s a second-order derivative). Finally, we have a -1/2 factor in front of the right-hand term, instead of c2. OK, so what? It’s a different thing – but that should not surprise us: when everything is said and done, it is a different wave equation because it describes something else (not an electromagnetic wave but a quantum-mechanical system).

To understand why it’s different, I’d need to give you the equivalent of Maxwell’s set of equations for quantum mechanics, and then show how this wave equation is derived from them. I could do that. The derivation is somewhat lengthier than for our electromagnetic wave equation but not all that much. The problem is that it involves some new concepts which we haven’t introduced as yet – mainly some new operators. But then we have introduced a lot of new operators already (such as the gradient and the curl and the divergence) so you might be ready for this. Well… Maybe. The treatment is a bit lengthy, and so I’d rather do in a separate post. Why? […] OK. Let me say a few things about it then. Here we go:

  • These new operators involve matrix algebra. Fine, you’ll say. Let’s get on with it. Well… It’s matrix algebra with matrices with complex elements, so if we write a n×m matrix A as A = (aiaj), then the elements aiaj (i = 1, 2,… n and j = 1, 2,… m) will be complex numbers.
  • That allows us to define Hermitian matrices: a Hermitian matrix is a square matrix A which is the same as the complex conjugate of its transpose.
  • We can use such matrices as operators indeed: transformations acting on a column vector X to produce another column vector AX.
  • Now, you’ll remember – from your course on matrix algebra with real (as opposed to complex) matrices, I hope – that we have this very particular matrix equation AX = λX which has non-trivial solutions (i.e. solutions X ≠ 0) if and only if the determinant of A-λI is equal to zero. This condition (det(A-λI) = 0) is referred to as the characteristic equation.
  • This characteristic equation is a polynomial of degree n in λ and its roots are called eigenvalues or characteristic values of the matrix A. The non-trivial solutions X ≠ 0 corresponding to each eigenvalue are called eigenvectors or characteristic vectors.

Now – just in case you’re still with me – it’s quite simple: in quantum mechanics, we have the so-called Hamiltonian operator. The Hamiltonian in classical mechanics represents the total energy of the system: H = T + V (total energy H = kinetic energy T + potential energy V). Here we have got something similar but different. 🙂 The Hamiltonian operator is written as H-hat, i.e. an H with an accent circonflexe (as they say in French). Now, we need to let this Hamiltonian operator act on the wave function Ψ and if the result is proportional to the same wave function Ψ, then Ψ is a so-called stationary state, and the proportionality constant will be equal to the energy E of the state Ψ. These stationary states correspond to standing waves, or ‘orbitals’, such as in atomic orbitals or molecular orbitals. So we have:

E\Psi=\hat H \Psi

I am sure you are no longer there but, in fact, that’s it. We’re done with the derivation. The equation above is the so-called time-independent Schrödinger equation. It’s called like that not because the wave function is time-independent (it is), but because the Hamiltonian operator is time-independent: that obviously makes sense because stationary states are associated with specific energy levels indeed. However, if we do allow the energy level to vary in time (which we should do – if only because of the uncertainty principle: there is no such thing as a fixed energy level in quantum mechanics), then we cannot use some constant for E, but we need a so-called energy operator. Fortunately, this energy operator has a remarkably simple functional form:

\hat{E} \Psi = i\hbar\dfrac{\partial}{\partial t}\Psi = E\Psi  Now if we plug that in the equation above, we get our time-dependent Schrödinger equation  

i \hbar \frac{\partial}{\partial t}\Psi = \hat H \Psi

OK. You probably did not understand one iota of this but, even then, you will object that this does not resemble the equation I wrote at the very beginning: i(u/∂t) = (-1/2)2u.

You’re right, but we only need one more step for that. If we leave out potential energy (so we assume a particle moving in free space), then the Hamiltonian can be written as:

\hat{H} = -\frac{\hbar^2}{2m}\nabla^2

You’ll ask me how this is done but I will be short on that: the relationship between energy and momentum is being used here (and so that’s where the 2m factor in the denominator comes from). However, I won’t say more about it because this post would become way too lengthy if I would include each and every derivation and, remember, I just want to get to the result because the derivations here are not the point: I want you to understand the functional form of the wave equation only. So, using the above identity and, OK, let’s be somewhat more complete and include potential energy once again, we can write the time-dependent wave equation as:

 i\hbar\frac{\partial}{\partial t}\Psi(\mathbf{r},t) = -\frac{\hbar^2}{2m}\nabla^2\Psi(\mathbf{r},t) + V(\mathbf{r},t)\Psi(\mathbf{r},t)

Now, how is the equation above related to i(u/∂t) = (-1/2)2u? It’s a very simplified version of it: potential energy is, once again, assumed to be not relevant (so we’re talking a free particle again, with no external forces acting on it) but the real simplification is that we give m and ħ the value 1, so m = ħ = 1. Why?

Well… My initial idea was to do something similar as I did above and, hence, actually use a specific example with an actual functional form, just like we did for that the real-valued u(x, t) function. However, when I look at how long this post has become already, I realize I should not do that. In fact, I would just copy an example from somewhere else – probably Wikipedia once again, if only because their examples are usually nicely illustrated with graphs (and often animated graphs). So let me just refer you here to the other example given in the Wikipedia article on wave packets: that example uses that simplified i(u/∂t) = (-1/2)2u equation indeed. It actually uses the same initial condition:

u at time 0

However, because the wave equation is different, the wave packet behaves differently. It’s a so-called dispersive wave packet: it delocalizes. Its width increases over time and so, after a while, it just vanishes because it diffuses all over space. So there’s a solution to the wave equation, given this initial condition, but it’s just not stable – as a description of some particle that is (from a mathematical point of view – or even a physical point of view – there is no issue).

In any case, this probably all sounds like Chinese – or Greek if you understand Chinese :-). I actually haven’t worked with these Hermitian operators yet, and so it’s pretty shaky territory for me myself. However, I felt like I had picked up enough math and physics on this long and winding Road to Reality (I don’t think I am even halfway) to give it a try. I hope I succeeded in passing the message, which I’ll summarize as follows:

  1. Schrödinger’s equation is just like any other differential equation used in physics, in the sense that it represents a system subject to constraints, such as the relationship between energy and momentum.
  2. It will have many general solutions. In other words, the wave function – which describes a probability amplitude as a function in space and time – will have many general solutions, and a specific solution will depend on the initial conditions.
  3. The solution(s) can represent stationary states, but not necessary so: a wave (or a wave packet) can be non-dispersive or dispersive. However, when we plug the wave function into the wave equation, it will satisfy that equation.

That’s neither spectacular nor difficult, is it? But, perhaps, it helps you to ‘understand’ wave equations, including the Schrödinger equation. But what is understanding? Dirac once famously said: “I consider that I understand an equation when I can predict the properties of its solutions, without actually solving it.”

Hmm… I am not quite there yet, but I am sure some more practice with it will help. 🙂

Post scriptum: On Maxwell’s equations

First, we should say something more about these two other operators which I introduced above: the divergence and the curl. First on the divergence.

The divergence of a field vector E (or B) at some point r represents the so-called flux of E, i.e. the ‘flow’ of E per unit volume. So flux and divergence both deal with the ‘flow’ of electric field lines away from (positive) charges. [The ‘away from’ is from positive charges indeed – as per the convention: Maxwell himself used the term ‘convergence’ to describe flow towards negative charges, but so his ‘convention’ did not survive. Too bad, because I think convergence would be much easier to remember.]

So if we write that ∇•ρ/ε0, then it means that we have some constant flux of E because of some (fixed) distribution of charges.

Now, we already mentioned that equation (2) in Maxwell’s set meant that there is no such thing as a ‘magnetic’ charge: indeed, ∇•B = 0 means there is no magnetic flux. But, of course, magnetic fields do exist, don’t they? They do. A current in a wire, for example, i.e. a bunch of steadily moving electric charges, will induce a magnetic field according to Ampère’s law, which is part of equation (4) in Maxwell’s set: c2∇×B =  j0, with j representing the current density and ε0 the electric constant.

Now, at this point, we have this curl: ∇×B. Just like divergence (or convergence as Maxwell called it – but then with the sign reversed), curl also means something in physics: it’s the amount of ‘rotation’, or ‘circulation’ as Feynman calls it, around some loop.

So, to summarize the above, we have (1) flux (divergence) and (2) circulation (curl) and, of course, the two must be related. And, while we do not have any magnetic charges and, hence, no flux for B, the current in that wire will cause some circulation of B, and so we do have a magnetic field. However, that magnetic field will be static, i.e. it will not change. Hence, the time derivative ∂B/∂t will be zero and, hence, from equation (2) we get that ∇×E = 0, so our electric field will be static too. The time derivative ∂E/∂t which appears in equation (4) also disappears and we just have c2∇×B =  j0. This situation – of a constant magnetic and electric field – is described as electrostatics and magnetostatics respectively. It implies a neat separation of the four equations, and it makes magnetism and electricity appear as distinct phenomena. Indeed, as long as charges and currents are static, we have:

[I] Electrostatics: (1) ∇•E = ρ/εand (2) ∇×E = 0

[II] Magnetostatics: (3) c2∇×B =  jand (4) ∇•B = 0

The first two equations describe a vector field with zero curl and a given divergence (i.e. the electric field) while the third and fourth equations second describe a seemingly separate vector field with a given curl but zero divergence. Now, I am not writing this post scriptum to reproduce Feynman’s Lectures on Electromagnetism, and so I won’t say much more about this. I just want to note two points:

1. The first point to note is that factor cin the c2∇×B =  jequation. That’s something which you don’t have in the ∇•E = ρ/εequation. Of course, you’ll say: So what? Well… It’s weird. And if you bring it to the other side of the equation, it becomes clear that you need an awful lot of current for a tiny little bit of magnetic circulation (because you’re dividing by c , so that’s a factor 9 with 16 zeroes after it (9×1016):  an awfully big number in other words). Truth be said, it reveals something very deep. Hmm? Take a wild guess. […] Relativity perhaps? Well… Yes!

It’s obvious that we buried v somewhere in this equation, the velocity of the moving charges. But then it’s part of j of course: the rate at which charge flows through a unit area per second. But – Hey! – velocity as compared to what? What’s the frame of reference? The frame of reference is us obviously or – somewhat less subjective – the stationary charges determining the electric field according to equation (1) in the set above: ∇•E = ρ/ε0. But so here we can ask the same question: stationary in what reference frame? As compared to the moving charges? Hmm… But so how does it work with relativity? I won’t copy Feynman’s 13th Lecture here, but so, in that lecture, he analyzes what happens to the electric and magnetic force when we look at the scene from another coordinate system – let’s say one that moves parallel to the wire at the same speed as the moving electrons, so – because of our new reference frame – the ‘moving electrons’ now appear to have no speed at all but, of course, our stationary charges will now seem to move.

What Feynman finds – and his calculations are very easy and straightforward – is that, while we will obviously insert different input values into Maxwell’s set of equations and, hence, get different values for the E and B fields, the actual physical effect – i.e. the final Lorentz force on a (charged) particle – will be the same. To be very specific, in a coordinate system at rest with respect to the wire (so we see charges move in the wire), we find a ‘magnetic’ force indeed, but in a coordinate system moving at the same speed of those charges, we will find an ‘electric’ force only. And from yet another reference frame, we will find a mixture of E and B fields. However, the physical result is the same: there is only one combined force in the end – the Lorentz force F = q(E + v×B) – and it’s always the same, regardless of the reference frame (inertial or moving at whatever speed – relativistic (i.e. close to c) or not).

In other words, Maxwell’s description of electromagnetism is invariant or, to say exactly the same in yet other words, electricity and magnetism taken together are consistent with relativity: they are part of one physical phenomenon: the electromagnetic interaction between (charged) particles. So electric and magnetic fields appear in different ‘mixtures’ if we change our frame of reference, and so that’s why magnetism is often described as a ‘relativistic’ effect – although that’s not very accurate. However, it does explain that cfactor in the equation for the curl of B. [How exactly? Well… If you’re smart enough to ask that kind of question, you will be smart enough to find the derivation on the Web. :-)]

Note: Don’t think we’re talking astronomical speeds here when comparing the two reference frames. It would also work for astronomical speeds but, in this case, we are talking the speed of the electrons moving through a wire. Now, the so-called drift velocity of electrons – which is the one we have to use here – in a copper wire of radius 1 mm carrying a steady current of 3 Amps is only about 1 m per hour! So the relativistic effect is tiny  – but still measurable !

2. The second thing I want to note is that  Maxwell’s set of equations with non-zero time derivatives for E and B clearly show that it’s changing electric and magnetic fields that sort of create each other, and it’s this that’s behind electromagnetic waves moving through space without losing energy. They just travel on and on. The math behind this is beautiful (and the animations in the related Wikipedia articles are equally beautiful – and probably easier to understand than the equations), but that’s stuff for another post. As the electric field changes, it induces a magnetic field, which then induces a new electric field, etc., allowing the wave to propagate itself through space. I should also note here that the energy is in the field and so, when electromagnetic waves, such as light, or radiowaves, travel through space, they carry their energy with them.

Let me be fully complete here, and note that there’s energy in electrostatic fields as well, and the formula for it is remarkably beautiful. The total (electrostatic) energy U in an electrostatic field generated by charges located within some finite distance is equal to:

Energy of electrostatic field

This equation introduces the electrostatic potential. This is a scalar field Φ from which we can derive the electric field vector just by applying the gradient operator. In fact, all curl-free fields (such as the electric field in this case) can be written as the gradient of some scalar field. That’s a universal truth. See how beautiful math is? 🙂