The Strange Theory of Light and Matter (III)

This is my third and final comments on Feynman’s popular little booklet: The Strange Theory of Light and Matter, also known as Feynman’s Lectures on Quantum Electrodynamics (QED).

The origin of this short lecture series is quite moving: the death of Alix G. Mautner, a good friend of Feynman’s. She was always curious about physics but her career was in English literature and so she did not manage the math. Hence, Feynman introduces this 1985 publication by writing: “Here are the lectures I really prepared for Alix, but unfortunately I can’t tell them to her directly, now.”

Alix Mautner died from a brain tumor, and it is her husband, Leonard Mautner, who sponsored the QED lectures series at the UCLA, which Ralph Leigton transcribed and published as the booklet that we’re talking about here. Feynman himself died a few years later, at the relatively young age of 69. Tragic coincidence: he died of cancer too. Despite all this weirdness, Feynman’s QED never quite got the same iconic status of, let’s say, Stephen Hawking’s Brief History of Time. I wonder why, but the answer to that question is probably in the realm of chaos theory. 🙂 I actually just saw the movie on Stephen Hawking’s life (The Theory of Everything), and I noted another strange coincidence: Jane Wilde, Hawking’s first wife, also has a PhD in literature. It strikes me that, while the movie documents that Jane Wilde gave Hawking three children, after which he divorced her to marry his nurse, Elaine, the movie does not mention that he separated from Elaine too, and that he has some kind of ‘working relationship’ with Jane again.

Hmm… What to say? I should get back to quantum mechanics here or, to be precise, to quantum electrodynamics.

One reason why Feynman’s Strange Theory of Light and Matter did not sell like Hawking’s Brief History of Time, might well be that, in some places, the text is not entirely accurate. Why? Who knows? It would make for an interesting PhD thesis in History of Science. Unfortunately, I have no time for such PhD thesis. Hence, I must assume that Richard Feynman simply didn’t have much time or energy left to correct some of the writing of Ralph Leighton, who transcribed and edited these four short lectures a few years before Feynman’s death. Indeed, when everything is said and done, Ralph Leighton is not a physicist and, hence, I think he did compromise – just a little bit – on accuracy for the sake of readability. Ralph Leighton’s father, Robert Leighton, an eminent physicist who worked with Feynman, would probably have done a much better job.

I feel that one should not compromise on accuracy, even when trying to write something reader-friendly. That’s why I am writing this blog, and why I am writing three posts specifically on this little booklet. Indeed, while I’d warmly recommend that little book on QED as an excellent non-mathematical introduction to the weird world of quantum mechanics, I’d also say that, while Ralph Leighton’s story is great, it’s also, in some places, not entirely accurate indeed.

So… Well… I want to do better than Ralph Leighton here. Nothing more. Nothing less. 🙂 Let’s go for it.

I. Probability amplitudes: what are they?

The greatest achievement of that little QED publication is that it manages to avoid any reference to wave functions and other complicated mathematical constructs: all of the complexity of quantum mechanics is reduced to three basic events or actions and, hence, three basic amplitudes which are represented as ‘arrows’—literally.

Now… Well… You may or may not know that a (probability) amplitude is actually a complex number, but it’s not so easy to intuitively understand the concept of a complex number. In contrast, everyone easily ‘gets’ the concept of an ‘arrow’. Hence, from a pedagogical point of view, representing complex numbers by some ‘arrow’ is truly a stroke of genius.

Whatever we call it, a complex number or an ‘arrow’, a probability amplitude is something with (a) a magnitude and (b) a phase. As such, it resembles a vector, but it’s not quite the same, if only because we’ll impose some restrictions on the magnitude. But I shouldn’t get ahead of myself. Let’s start with the basics.

A magnitude is some real positive number, like a length, but you should not associate it with some spatial dimension in physical space: it’s just a number. As for the phase, we could associate that concept with some direction but, again, you should just think of it as a direction in a mathematical space, not in the real (physical) space.

Let me insert a parenthesis here. If I say the ‘real’ or ‘physical’ space, I mean the space in which the electrons and photons and all other real-life objects that we’re looking at exist and move. That’s a non-mathematical definition. In fact, in math, the real space is defined as a coordinate space, with sets of real numbers (vectors) as coordinates, so… Well… That’s a mathematical space only, not the ‘real’ (physical) space. So the real (vector) space is not real. 🙂 The mathematical real space may, or may not, accurately describe the real (physical) space. Indeed, you may have heard that physical space is curved because of the presence of massive objects, which means that the real coordinate space will actually not describe it very accurately. I know that’s a bit confusing but I hope you understand what I mean: if mathematicians talk about the real space, they do not mean the real space. They refer to a vector space, i.e. a mathematical construct. To avoid confusion, I’ll use the term ‘physical space’ rather than ‘real’ space in the future. So I’ll let the mathematicians get away with using the term ‘real space’ for something that isn’t real actually. 🙂

End of digression. Let’s discuss these two mathematical concepts – magnitude and phase – somewhat more in detail.

A. The magnitude

Let’s start with the magnitude or ‘length’ of our arrow. We know that we have to square these lengths to find some probability, i.e. some real number between 0 and 1. Hence, the length of our arrows cannot be larger than one. That’s the restriction I mentioned already, and this ‘normalization’ condition reinforces the point that these ‘arrows’ do not have any spatial dimension (not in any real space anyway): they represent a function. To be specific, they represent a wavefunction.

If we’d be talking complex numbers instead of ‘arrows’, we’d say the absolute value of the complex number cannot be larger than one. We’d also say that, to find the probability, we should take the absolute square of the complex number, so that’s the square of the magnitude or absolute value of the complex number indeed. We cannot just square the complex number: it has to be the square of the absolute value.

Why? Well… Just write it out. [You can skip this section if you’re not interested in complex numbers, but I would recommend you try to understand. It’s not that difficult. Indeed, if you’re reading this, you’re most likely to understand something of complex numbers and, hence, you should be able to work your way through it. Just remember that a complex number is like a two-dimensional number, which is why it’s sometimes written using bold-face (z), rather than regular font (z). However, I should immediately add this convention is usually not followed. I like the boldface though, and so I’ll try to use it in this post.] The square of a complex number z = a + bi is equal to z= a+ 2abi – b2, while the square of its absolute value (i.e. the absolute square) is |z|= [√(a+ b2)]2 = a+ b2. So you can immediately see that the square and the absolute square of a complex numbers are two very different things indeed: it’s not only the 2abi term, but there’s also the minus sign in the first expression, because of the i= –1 factor. In case of doubt, always remember that the square of a complex number may actually yield a negative number, as evidenced by the definition of the imaginary unit itself: i= –1.

End of digression. Feynman and Leighton manage to avoid any reference to complex numbers in that short series of four lectures and, hence, all they need to do is explain how one squares a length. Kids learn how to do that when making a square out of rectangular paper: they’ll fold one corner of the paper until it meets the opposite edge, forming a triangle first. They’ll then cut or tear off the extra paper, and then unfold. Done. [I could note that the folding is a 90 degree rotation of the original length (or width, I should say) which, in mathematical terms, is equivalent to multiplying that length with the imaginary unit (i). But I am sure the kids involved would think I am crazy if I’d say this. 🙂 So let me get back to Feynman’s arrows.

B. The phase

Feynman and Leighton’s second pedagogical stroke of genius is the metaphor of the ‘stopwatch’ and the ‘stopwatch hand’ for the variable phase. Indeed, although I think it’s worth explaining why z = a + bi = rcosφ + irsinφ in the illustration below can be written as z = reiφ = |z|eiφ, understanding Euler’s representation of complex number as a complex exponential requires swallowing a very substantial piece of math and, if you’d want to do that, I’ll refer you to one of my posts on complex numbers).


The metaphor of the stopwatch represents a periodic function. To be precise, it represents a sinusoid, i.e. a smooth repetitive oscillation. Now, the stopwatch hand represents the phase of that function, i.e. the φ angle in the illustration above. That angle is a function of time: the speed with which the stopwatch turns is related to some frequency, i.e. the number of oscillations per unit of time (i.e. per second).

You should now wonder: what frequency? What oscillations are we talking about here? Well… As we’re talking photons and electrons here, we should distinguish the two:

  1. For photons, the frequency is given by Planck’s energy-frequency relation, which relates the energy (E) of a photon (1.5 to 3.5 eV for visible light) to its frequency (ν). It’s a simple proportional relation, with Planck’s constant (h) as the proportionality constant: E = hν, or ν = E/h.
  2. For electrons, we have the de Broglie relation, which looks similar to the Planck relation (E = hf, or f = E/h) but, as you know, it’s something different. Indeed, these so-called matter waves are not so easy to interpret because there actually is no precise frequency f. In fact, the matter wave representing some particle in space will consist of a potentially infinite number of waves, all superimposed one over another, as illustrated below.


For the sake of accuracy, I should mention that the animation above has its limitations: the wavetrain is complex-valued and, hence, has a real as well as an imaginary part, so it’s something like the blob underneath. Two functions in one, so to speak: the imaginary part follows the real part with a phase difference of 90 degrees (or π/2 radians). Indeed, if the wavefunction is a regular complex exponential reiθ, then rsin(φ–π/2) = rcos(φ), which proves the point: we have two functions in one here. 🙂 I am actually just repeating what I said before already: the probability amplitude, or the wavefunction, is a complex number. You’ll usually see it written as Ψ (psi) or Φ (phi). Here also, using boldface (Ψ or Φ instead of Ψ or Φ) would usefully remind the reader that we’re talking something ‘two-dimensional’ (in mathematical space, that is), but this convention is usually not followed.

Photon wave

In any case… Back to frequencies. The point to note is that, when it comes to analyzing electrons (or any other matter-particle), we’re dealing with a range of frequencies f really (or, what amounts to the same, a range of wavelengths λ) and, hence, we should write Δf = ΔE/h, which is just one of the many expressions of the Uncertainty Principle in quantum mechanics.

Now, that’s just one of the complications. Another difficulty is that matter-particles, such as electrons, have some rest mass, and so that enters the energy equation as well (literally). Last but not least, one should distinguish between the group velocity and the phase velocity of matter waves. As you can imagine, that makes for a very complicated relationship between ‘the’ wavelength and ‘the’ frequency. In fact, what I write above should make it abundantly clear that there’s no such thing as the wavelength, or the frequency: it’s a range really, related to the fundamental uncertainty in quantum physics. I’ll come back to that, and so you shouldn’t worry about it here. Just note that the stopwatch metaphor doesn’t work very well for an electron!

In his postmortem lectures for Alix Mautner, Feynman avoids all these complications. Frankly, I think that’s a missed opportunity because I do not think it’s all that incomprehensible. In fact, I write all that follows because I do want you to understand the basics of waves. It’s not difficult. High-school math is enough here. Let’s go for it.

One turn of the stopwatch corresponds to one cycle. One cycle, or 1 Hz (i.e. one oscillation per second) covers 360 degrees or, to use a more natural unit, 2π radians. [Why is radian a more natural unit? Because it measures an angle in terms of the distance unit itself, rather than in arbitrary 1/360 cuts of a full circle. Indeed, remember that the circumference of the unit circle is 2π.] So our frequency ν (expressed in cycles per second) corresponds to a so-called angular frequency ω = 2πν. From this formula, it should be obvious that ω is measured in radians per second.

We can also link this formula to the period of the oscillation, T, i.e. the duration of one cycle. T = 1/ν and, hence, ω = 2π/T. It’s all nicely illustrated below. [And, yes, it’s an animation from Wikipedia: nice and simple.]


The easy math above now allows us to formally write the phase of a wavefunction – let’s denote the wavefunction as φ (phi), and the phase as θ (theta) – as a function of time (t) using the angular frequency ω. So we can write: θ = ωt = 2π·ν·t. Now, the wave travels through space, and the two illustrations above (i.e. the one with the super-imposed waves, and the one with the complex wave train) would usually represent a wave shape at some fixed point in time. Hence, the horizontal axis is not t but x. Hence, we can and should write the phase not only as a function of time but also of space. So how do we do that? Well… If the hypothesis is that the wave travels through space at some fixed speed c, then its frequency ν will also determine its wavelength λ. It’s a simple relationship: c = λν (the number of oscillations per second times the length of one wavelength should give you the distance traveled per second, so that’s, effectively, the wave’s speed).

Now that we’ve expressed the frequency in radians per second, we can also express the wavelength in radians per unit distance too. That’s what the wavenumber does: think of it as the spatial frequency of the wave. We denote the wavenumber by k, and write: k = 2π/λ. [Just do a numerical example when you have difficulty following. For example, if you’d assume the wavelength is 5 units distance (i.e. 5 meter) – that’s a typical VHF radio frequency: ν = (3×10m/s)/(5 m) = 0.6×108 Hz = 60 MHz – then that would correspond to (2π radians)/(5 m) ≈ 1.2566 radians per meter. Of course, we can also express the wave number in oscillations per unit distance. In that case, we’d have to divide k by 2π, because one cycle corresponds to 2π radians. So we get the reciprocal of the wavelength: 1/λ. In our example, 1/λ is, of course, 1/5 = 0.2, so that’s a fifth of a full cycle. You can also think of it as the number of waves (or wavelengths) per meter: if the wavelength is λ, then one can fit 1/λ waves in a meter.


Now, from the ω = 2πν, c = λν and k = 2π/λ relations, it’s obvious that k = 2π/λ = 2π/(c/ν) = (2πν)/c = ω/c. To sum it all up, frequencies and wavelengths, in time and in space, are all related through the speed of propagation of the wave c. More specifically, they’re related as follows:

c = λν = ω/k

From that, it’s easy to see that k = ω/c, which we’ll use in a moment. Now, it’s obvious that the periodicity of the wave implies that we can find the same phase by going one oscillation (or a multiple number of oscillations back or forward in time, or in space. In fact, we can also find the same phase by letting both time and space vary. However, if we want to do that, it should be obvious that we should either (a) go forward in space and back in time or, alternatively, (b) go back in space and forward in time. In other words, if we want to get the same phase, then time and space sort of substitute for each other. Let me quote Feynman on this: “This is easily seen by considering the mathematical behavior of a(tr/c). Evidently, if we add a little time Δt, we get the same value for a(tr/c) as we would have if we had subtracted a little distance: ΔcΔt.” The variable a stands for the acceleration of an electric charge here, causing an electromagnetic wave, but the same logic is valid for the phase, with a minor twist though: we’re talking a nice periodic function here, and so we need to put the angular frequency in front. Hence, the rate of change of the phase in respect to time is measured by the angular frequency ω. In short, we write:

θ = ω(t–x/c) = ωt–kx

Hence, we can re-write the wavefunction, in terms of its phase, as follows:

φ(θ) = φ[θ(x, t)] = φ[ωt–kx]

Note that, if the wave would be traveling in the ‘other’ direction (i.e. in the negative x-direction), we’d write φ(θ) = φ[kx+ωt]. Time travels in one direction only, of course, but so one minus sign has to be there because of the logic involved in adding time and subtracting distance. You can work out an example (with a sine or cosine wave, for example) for yourself.

So what, you’ll say? Well… Nothing. I just hope you agree that all of this isn’t rocket science: it’s just high-school math. But so it shows you what that stopwatch really is and, hence, – but who am I? – would have put at least one or two footnotes on this in a text like Feynman’s QED.

Now, let me make a much longer and more serious digression:

Digression 1: on relativity and spacetime

As you can see from the argument (or phase) of that wave function φ(θ) = φ[θ(x, t)] = φ[ωt–kx] = φ[–k(x–ct)], any wave equation establishes a deep relation between the wave itself (i.e. the ‘thing’ we’re describing) and space and time. In fact, that’s what the whole wave equation is all about! So let me say a few things more about that.

Because you know a thing or two about physics, you may ask: when we’re talking time, whose time are we talking about? Indeed, if we’re talking photons going from A to B, these photons will be traveling at or near the speed of light and, hence, their clock, as seen from our (inertial) frame of reference, doesn’t move. Likewise, according to the photon, our clock seems to be standing still.

Let me put the issue to bed immediately: we’re looking at things from our point of view. Hence, we’re obviously using our clock, not theirs. Having said that, the analysis is actually fully consistent with relativity theory. Why? Well… What do you expect? If it wasn’t, the analysis would obviously not be valid. 🙂 To illustrate that it’s consistent with relativity theory, I can mention, for example, that the (probability) amplitude for a photon to travel from point A to B depends on the spacetime interval, which is invariant. Hence, A and B are four-dimensional points in spacetime, involving both spatial as well as time coordinates: A = (xA, yA, zA, tA) and B = (xB, yB, zB, tB). And so the ‘distance’ – as measured through the spacetime interval – is invariant.

Now, having said that, we should draw some attention to the intimate relationship between space and time which, let me remind you, results from the absoluteness of the speed of light. Indeed, one will always measure the speed of light c as being equal to 299,792,458 m/s, always and everywhere. It does not depend on your reference frame (inertial or moving). That’s why the constant c anchors all laws in physics, and why we can write what we write above, i.e. include both distance (x) as well as time (t) in the wave function φ = φ(x, t) = φ[ωt–kx] = φ[–k(x–ct)]. The k and ω are related through the ω/k = c relationship: the speed of light links the frequency in time (ν = ω/2π = 1/T) with the frequency in space (i.e. the wavenumber or spatial frequency k). There is only degree of freedom here: the frequency—in space or in time, it doesn’t matter: ν and ω are not independent.  [As noted above, the relationship between the frequency in time and in space is not so obvious for electrons, or for matter waves in general: for those matter-waves, we need to distinguish group and phase velocity, and so we don’t have a unique frequency.]

Let me make another small digression within the digression here. Thinking about travel at the speed of light invariably leads to paradoxes. In previous posts, I explained the mechanism of light emission: a photon is emitted – one photon only – when an electron jumps back to its ground state after being excited. Hence, we may imagine a photon as a transient electromagnetic wave–something like what’s pictured below. Now, the decay time of this transient oscillation (τ) is measured in nanoseconds, i.e. billionths of a second (1 ns = 1×10–9 s): the decay time for sodium light, for example, is some 30 ns only.

decay time

However, because of the tremendous speed of light, that still makes for a wavetrain that’s like ten meter long, at least (30×10–9 s times 3×10m/s is nine meter, but you should note that the decay time measures the time for the oscillation to die out by a factor 1/e, so the oscillation itself lasts longer than that). Those nine or ten meters cover like 16 to 17 million oscillations (the wavelength of sodium light is about 600 nm and, hence, 10 meter fits almost 17 million oscillations indeed). Now, how can we reconcile the image of a photon as a ten-meter long wavetrain with the image of a photon as a point particle?

The answer to that question is paradoxical: from our perspective, anything traveling at the speed of light – including this nine or ten meter ‘long’ photon – will have zero length because of the relativistic length contraction effect. Length contraction? Yes. I’ll let you look it up, because… Well… It’s not easy to grasp. Indeed, from the three measurable effects on objects moving at relativistic speeds – i.e. (1) an increase of the mass (the energy needed to further accelerate particles in particle accelerators increases dramatically at speeds nearer to c), (2) time dilation, i.e. a slowing down of the (internal) clock (because of their relativistic speeds when entering the Earth’s atmosphere, the measured half-life of muons is five times that when at rest), and (3) length contraction – length contraction is probably the most paradoxical of all.

Let me end this digression with yet another short note. I said that one will always measure the speed of light c as being equal to 299,792,458 m/s, always and everywhere and, hence, that it does not depend on your reference frame (inertial or moving). Well… That’s true and not true at the same time. I actually need to nuance that statement a bit in light of what follows: an individual photon does have an amplitude to travel faster or slower than c, and when discussing matter waves (such as the wavefunction that’s associated with an electron), we can have phase velocities that are faster than light! However, when calculating those amplitudes, is a constant.

That doesn’t make sense, you’ll say. Well… What can I say? That’s how it is unfortunately. I need to move on and, hence, I’ll end this digression and get back to the main story line. Part I explained what probability amplitudes are—or at least tried to do so. Now it’s time for part II: the building blocks of all of quantum electrodynamics (QED).

II. The building blocks: P(A to B), E(A to B) and j

The three basic ‘events’ (and, hence, amplitudes) in QED are the following:

1. P(A to B)

P(A to B) is the (probability) amplitude for a photon to travel from point A to B. However, I should immediately note that A and B are points in spacetime. Therefore, we associate them not only with some specific (x, y, z) position in space, but also with a some specific time t. Now, quantum-mechanical theory gives us an easy formula for P(A to B): it depends on the so-called (spacetime) interval between the two points A and B, i.e. I = Δr– Δt= (x2–x1)2+(y2–y1)2+(z2–z1)– (t2–t1)2. The point to note is that the spacetime interval takes both the distance in space as well as the ‘distance’ in time into account. As I mentioned already, this spacetime interval does not depend on our reference frame and, hence, it’s invariant (as long as we’re talking reference frames that move with constant speed relative to each other). Also note that we should measure time and distance in equivalent units when using that Δr– Δtformula for I. So we either measure distance in light-seconds or, else, we measure time in units that correspond to the time that’s needed for light to travel one meter. If no equivalent units are adopted, the formula is I = Δrc·Δt2.

Now, in quantum theory, anything is possible and, hence, not only do we allow for crooked paths, but we also allow for the difference in time to differ from  the time you’d expect a photon to need to travel along some curve (whose length we’ll denote by l), i.e. l/c. Hence, our photon may actually travel slower or faster than the speed of light c! There is one lucky break, however, that makes all come out alright: it’s easy to show that the amplitudes associated with the odd paths and strange timings generally cancel each other out. [That’s what the QED booklet shows.] Hence, what remains, are the paths that are equal or, importantly, those that very near to the so-called ‘light-like’ intervals in spacetime only. The net result is that light – even one single photon – effectively uses a (very) small core of space as it travels, as evidenced by the fact that even one single photon interferes with itself when traveling through a slit or a small hole!

[If you now wonder what it means for a photon to interfere for itself, let me just give you the easy explanation: it may change its path. We assume it was traveling in a straight line – if only because it left the source at some point in time and then arrived at the slit obviously – but so it no longer travels in a straight line after going through the slit. So that’s what we mean here.]

2. E(A to B)

E(A to B) is the (probability) amplitude for an electron to travel from point A to B. The formula for E(A to B) is much more complicated, and it’s the one I want to discuss somewhat more in detail in this post. It depends on some complex number j (see the next remark) and some real number n.

3. j

Finally, an electron could emit or absorb a photon, and the amplitude associated with this event is denoted by j, for junction number. It’s the same number j as the one mentioned when discussing E(A to B) above.

Now, this junction number is often referred to as the coupling constant or the fine-structure constant. However, the truth is, as I pointed out in my previous post, that these numbers are related, but they are not quite the same: α is the square of j, so we have α = j2. There is also one more, related, number: the gauge parameter, which is denoted by g (despite the g notation, it has nothing to do with gravitation). The value of g is the square root of 4πε0α, so g= 4πε0α. I’ll come back to this. Let me first make an awfully long digression on the fine-structure constant. It will be awfully long. So long that it’s actually part of the ‘core’ of this post actually.

Digression 2: on the fine-structure constant, Planck units and the Bohr radius

The value for j is approximately –0.08542454.

How do we know that?

The easy answer to that question is: physicists measured it. In fact, they usually publish the measured value as the square root of the (absolute value) of j, which is that fine-structure constant α. Its value is published (and updated) by the US National Institute on Standards and Technology. To be precise, the currently accepted value of α is 7.29735257×10−3. In case you doubt, just check that square root:

j = –0.08542454 ≈ –√0.00729735257 = –√α

As noted in Feynman’s (or Leighton’s) QED, older and/or more popular books will usually mention 1/α as the ‘magical’ number, so the ‘special’ number you may have seen is the inverse fine-structure constant, which is about 137, but not quite:

1/α = 137.035999074 ± 0.000000044

I am adding the standard uncertainty just to give you an idea of how precise these measurements are. 🙂 About 0.32 parts per billion (just divide the 137.035999074 number by the uncertainty). So that‘s the number that excites popular writers, including Leighton. Indeed, as Leighton puts it:

“Where does this number come from? Nobody knows. It’s one of the greatest damn mysteries of physics: a magic number that comes to us with no understanding by man. You might say the “hand of God” wrote that number, and “we don’t know how He pushed his pencil.” We know what kind of a dance to do experimentally to measure this number very accurately, but we don’t know what kind of dance to do on the computer to make this number come out, without putting it in secretly!”

Is it Leighton, or did Feynman really say this? Not sure. While the fine-structure constant is a very special number, it’s not the only ‘special’ number. In fact, we derive it from other ‘magical’ numbers. To be specific, I’ll show you how we derive it from the fundamental properties – as measured, of course – of the electron. So, in fact, I should say that we do know how to make this number come out, which makes me doubt whether Feynman really said what Leighton said he said. 🙂

So we can derive α from some other numbers. That brings me to the more complicated answer to the question as to what the value of j really is: j‘s value is the electron charge expressed in Planck units, which I’ll denote by –eP:

j = –eP

[You may want to reflect on this, and quickly verify on the Web. The Planck unit of electric charge, expressed in Coulomb, is about 1.87555×10–18 C. If you multiply that j = –eP, so with –0.08542454, you get the right answer: the electron charge is about –0.160217×10–18 C.]

Now that is strange.

Why? Well… For starters, when doing all those quantum-mechanical calculations, we like to think of j as a dimensionless number: a coupling constant. But so here we do have a dimension: electric charge.

Let’s look at the basics. If is –√α, and it’s also equal to –eP, then the fine-structure constant must also be equal to the square of the electron charge eP, so we can write:

α = eP2

You’ll say: yes, so what? Well… I am pretty sure that, if you’ve ever seen a formula for α, it’s surely not this simple j = –eP or α = eP2 formula. What you’ve seen, most likely, is one or more of the following expressions below :

Fine-structure constant formula

That’s a pretty impressive collection of physical constants, isn’t it? 🙂 They’re all different but, somehow, when we combine them in one or the other ratio (we have not less than five different expressions here (each identity is a separate expression), and I could give you a few more!), we get the very same number: α. Now that is what I call strange. Truly strange. Incomprehensibly weird!

You’ll say… Well… Those constants must all be related… Of course! That’s exactly the point I am making here. They are, but look how different they are: mmeasures mass, rmeasures distance, e is a charge, and so these are all very different numbers with very different dimensions. Yet, somehow, they are all related through this α number. Frankly, I do not know of any other expression that better illustrates some kind of underlying unity in Nature than the one with those five identities above.

Let’s have a closer look at those constants. You know most of them already. The only constants you may not have seen before are μ0Rand, perhaps, ras well as m. However, these can easily be defined as some easy function of the constants that you did see before, so let me quickly do that:

  1. The μ0 constant is the so-called magnetic constant. It’s something similar as ε0 and it’s referred to as the magnetic permeability of the vacuum. So it’s just like the (electric) permittivity of the vacuum (i.e. the electric constant ε0) and the only reason why this blog hasn’t mentioned this constant before is because I haven’t really discussed magnetic fields so far. I only talked about the electric field vector. In any case, you know that the electric and magnetic force are part and parcel of the same phenomenon (i.e. the electromagnetic interaction between charged particles) and, hence, they are closely related. To be precise, μ0ε0 = 1/c= c–2. So that shows the first and second expression for α are, effectively, fully equivalent. [Just in case you’d doubt that μ0ε0 = 1/c2, let me give you the values: μ0 = 4π·10–7 N/A2, and ε0 = (1/4π·c2)·10C2/N·m2. Just plug them in, and you’ll see it’s bang on. Moreover, note that the ampere (A) unit is equal to the coulomb per second unit (C/s), so even the units come out alright. 🙂 Of course they do!]
  2. The ke constant is the Coulomb constant and, from its definition ke = 1/4πε0, it’s easy to see how those two expressions are, in turn, equivalent with the third expression for α.
  3. The Rconstant is the so-called von Klitzing constant. Huh? Yes. I know. I am pretty sure you’ve never ever heard of that one before. Don’t worry about it. It’s, quite simply, equal to Rh/e2. Hence, substituting (and don’t forget that h = 2πħ) will demonstrate the equivalence of the fourth expression for α.
  4. Finally, the re factor is the classical electron radius, which is usually written as a function of me, i.e. the electron mass: re = e2/4πε0mec2. Also note that this also implies that reme = e2/4πε0c2. In words: the product of the electron mass and the electron radius is equal to some constant involving the electron (e), the electric constant (ε0), and c (the speed of light).

I am sure you’re under some kind of ‘formula shock’ now. But you should just take a deep breath and read on. The point to note is that all these very different things are all related through α.

So, again, what is that α really? Well… A strange number indeed. It’s dimensionless (so we don’t measure in kg, m/s, eV·s or whatever) and it pops up everywhere. [Of course, you’ll say: “What’s everywhere? This is the first time I‘ve heard of it!” :-)]

Well… Let me start by explaining the term itself. The fine structure in the name refers to the splitting of the spectral lines of atoms. That’s a very fine structure indeed. 🙂 We also have a so-called hyperfine structure. Both are illustrated below for the hydrogen atom. The numbers n, JI, and are quantum numbers used in the quantum-mechanical explanation of the emission spectrum, which is  also depicted below, but note that the illustration gives you the so-called Balmer series only, i.e. the colors in the visible light spectrum (there are many more ‘colors’ in the high-energy ultraviolet and the low-energy infrared range).



To be precise: (1) n is the principal quantum number: here it takes the values 1 or 2, and we could say these are the principal shells; (2) the S, P, D,… orbitals (which are usually written in lower case: s, p, d, f, g, h and i) correspond to the (orbital) angular momentum quantum number l = 0, 1, 2,…, so we could say it’s the subshell; (3) the J values correspond to the so-called magnetic quantum number m, which goes from –l to +l; (4) the fourth quantum number is the spin angular momentum s. I’ve copied another diagram below so you see how it works, more or less, that is.

hydrogen spectrum

Now, our fine-structure constant is related to these quantum numbers. How exactly is a bit of a long story, and so I’ll just copy Wikipedia’s summary on this: ” The gross structure of line spectra is the line spectra predicted by the quantum mechanics of non-relativistic electrons with no spin. For a hydrogenic atom, the gross structure energy levels only depend on the principal quantum number n. However, a more accurate model takes into account relativistic and spin effects, which break the degeneracy of the the energy levels and split the spectral lines. The scale of the fine structure splitting relative to the gross structure energies is on the order of ()2, where Z is the atomic number and α is the fine-structure constant.” There you go. You’ll say: so what? Well… Nothing. If you aren’t amazed by that, you should stop reading this.

It is an ‘amazing’ number, indeed, and, hence, it does quality for being “one of the greatest damn mysteries of physics”, as Feynman and/or Leighton put it. Having said that, I would not go as far as to write that it’s “a magic number that comes to us with no understanding by man.” In fact, I think Feynman/Leighton could have done a much better job when explaining what it’s all about. So, yes, I hope to do better than Leighton here and, as he’s still alive, I actually hope he reads this. 🙂

The point is: α is not the only weird number. What’s particular about it, as a physical constant, is that it’s dimensionless, because it relates a number of other physical constants in such a way that the units fall away. Having said that, the Planck or Boltzmann constant are at least as weird.

So… What is this all about? Well… You’ve probably heard about the so-called fine-tuning problem in physics and, if you’re like me, your first reaction will be to associate fine-tuning with fine-structure. However, the two terms have nothing in common, except for four letters. 🙂 OK. Well… I am exaggerating here. The two terms are actually related, to some extent at least, but let me explain how.

The term fine-tuning refers to the fact that all the parameters or constants in the so-called Standard Model of physics are, indeed, all related to each other in the way they are. We can’t sort of just turn the knob of one and change it, because everything falls apart then. So, in essence, the fine-tuning problem in physics is more like a philosophical question: why is the value of all these physical constants and parameters exactly what it is? So it’s like asking: could we change some of the ‘constants’ and still end up with the world we’re living in? Or, if it would be some different world, how would it look like? What if was some other number? What if ke or ε0 was some other number? In short, and in light of those expressions for α, we may rephrase the question as: why is α what is is?

Of course, that’s a question one shouldn’t try to answer before answering some other, more fundamental, question: how many degrees of freedom are there really? Indeed, we just saw that ke and εare intimately related through some equation, and other constants and parameters are related too. So the question is like: what are the ‘dependent’ and the ‘independent’ variables in this so-called Standard Model?

There is no easy answer to that question. In fact, one of the reasons why I find physics so fascinating is that one cannot easily answer such questions. There are the obvious relationships, of course. For example, the ke = 1/4πεrelationship, and the context in which they are used (Coulomb’s Law) does, indeed, strongly suggest that both constants are actually part and parcel of the same thing. Identical, I’d say. Likewise, the μ0ε0 = 1/crelation also suggests there’s only one degree of freedom here, just like there’s only one degree of freedom in that ω/k = relationship (if we set a value for ω, we have k, and vice versa). But… Well… I am not quite sure how to phrase this, but… What physical constants could be ‘variables’ indeed?

It’s pretty obvious that the various formulas for α cannot answer that question: you could stare at them for days and weeks and months and years really, but I’d suggest you use your time to read more of Feynman’s real Lectures instead. 🙂 One point that may help to come to terms with this question – to some extent, at least – is what I casually mentioned above already: the fine-structure constant is equal to the square of the electron charge expressed in Planck units: α = eP2.

Now, that’s very remarkable because Planck units are some kind of ‘natural units’ indeed (for the detail, see my previous post: among other things, it explains what these Planck units really are) and, therefore, it is quite tempting to think that we’ve actually got only one degree of freedom here: α itself. All the rest should follow from it.


It should… But… Does it?

The answer is: yes and no. To be frank, it’s more no than yes because, as I noted a couple of times already, the fine-structure constant relates a lot of stuff but it’s surely not the only significant number in the Universe. For starters, I said that our E(A to B) formula has two ‘variables’:

  1. We have that complex number j, which, as mentioned, is equal to the electron charge expressed in Planck units. [In case you wonder why –eP ≈ –0.08542455 is said to be an amplitude, i.e. a complex number or an ‘arrow’… Well… Complex numbers include the real numbers and, hence, –0.08542455 is both real and complex. When combining ‘arrows’ or, to be precise, when multiplying some complex number with –0.08542455, we will (a) shrink the original arrow to about 8.5% of its original value (8.542455% to be precise) and (b) rotate it over an angle of plus or minus 180 degrees. In other words, we’ll reverse its direction. Hence, using Euler’s notation for complex numbers, we can write: –1 = eiπ eiπ and, hence, –0.085 = 0.085·eiπ = 0.085·eiπ. So, in short, yes, j is a complex number, or an ‘arrow’, if you prefer that term.]
  2. We also have some some real number n in the E(A to B) formula. So what’s the n? Well… Believe it or not, it’s the electron mass! Isn’t that amazing?

You’ll say: “Well… Hmm… I suppose so.” But then you may – and actually should – also wonder: the electron mass? In what units? Planck units again? And are we talking relativistic mass (i.e. its total mass, including the equivalent mass of its kinetic energy) or its rest mass only? And we were talking α here, so can we relate it to α too, just like the electron charge?

These are all very good questions. Let’s start with the second one. We’re talking rather slow-moving electrons here, so the relativistic mass (m) and its rest mass (m0) is more or less the same. Indeed, the Lorentz factor γ in the m = γm0 equation is very close to 1 for electrons moving at their typical speed. So… Well… That question doesn’t matter very much. Really? Yes. OK. Because you’re doubting, I’ll quickly show it to you. What is their ‘typical’ speed?

We know we shouldn’t attach too much importance to the concept of an electron in orbit around some nucleus (we know it’s not like some planet orbiting around some star) and, hence, to the concept of speed or velocity (velocity is speed with direction) when discussing an electron in an atom. The concept of momentum (i.e. velocity combined with mass or energy) is much more relevant. There’s a very easy mathematical relationship that gives us some clue here: the Uncertainty Principle. In fact, we’ll use the Uncertainty Principle to relate the momentum of an electron (p) to the so-called Bohr radius r (think of it as the size of a hydrogen atom) as follows: p ≈ ħ/r. [I’ll come back on this in a moment, and show you why this makes sense.]

Now we also know its kinetic energy (K.E.) is mv2/2, which we can write as p2/2m. Substituting our p ≈ ħ/r conjecture, we get K.E. = mv2/2 = ħ2/2mr2. This is equivalent to m2v2 = ħ2/r(just multiply both sides with m). From that, we get v = ħ/mr. Now, one of the many relations we can derive from the formulas for the fine-structure constant is re = α2r. [I haven’t showed you that yet, but I will shortly. It’s a really amazing expression. However, as for now, just accept it as a simple formula for interim use in this digression.] Hence, r = re2. The rfactor in this expression is the so-called classical electron radius. So we can now write v = ħα2/mre. Let’s now throw c in: v/c = α2ħ/mcre. However, from that fifth expression for α, we know that ħ/mcre = α, so we get v/c = α. We have another amazing result here: the v/c ratio for an electron (i.e. its speed expressed as a fraction of the speed of light) is equal to that fine-structure constant α. So that’s about 1/137, so that’s less than 1% of the speed of light. Now… I’ll leave it to you to calculate the Lorentz factor γ but… Well… It’s obvious that it will be very close to 1. 🙂 Hence, the electron’s speed – however we want to visualize that – doesn’t matter much indeed, so we should not worry about relativistic corrections in the formulas.

Let’s now look at the question in regard to the Planck units. If you know nothing at all about them, I would advise you to read what I wrote about them in my previous post. Let me just note we get those Planck units by equating not less than five fundamental physical constants to 1, notably (1) the speed of light, (2) Planck’s (reduced) constant, (3) Boltzmann’s constant, (4) Coulomb’s constant and (5) Newton’s constant (i.e. the gravitational constant). Hence, we have a set of five equations here (ħ = kB = ke = G = 1), and so we can solve that to get the five Planck units, i.e. the Planck length unit, the Planck time unit, the Planck mass unit, the Planck energy unit, the Planck charge unit and, finally (oft forgotten), the Planck temperature unit. Of course, you should note that all mass and energy units are directly related because of the mass-energy equivalence relation E = mc2, which simplifies to E = m if c is equated to 1. [I could also say something about the relation between temperature and (kinetic) energy, but I won’t, as it would only further confuse you.]

Now, you may or may not remember that the Planck time and length units are unimaginably small, but that the Planck mass unit is actually quite sizable—at the atomic scale, that is. Indeed, the Planck mass is something huge, like the mass of an eyebrow hair, or a flea egg. Is that huge? Yes. Because if you’d want to pack it in a Planck-sized particle, it would make for a tiny black hole. 🙂 No kidding. That’s the physical significance of the Planck mass and the Planck length and, yes, it’s weird. 🙂

Let me give you some values. First, the Planck mass itself: it’s about 2.1765×10−8 kg. Again, if you think that’s tiny, think again. From the E = mc2 equivalence relationship, we get that this is equivalent to 2 giga-joule, approximately. Just to give an idea, that’s like the monthly electricity consumption of an average American family. So that’s huge indeed! 🙂 [Many people think that nuclear energy involves the conversion of mass into energy, but the story is actually more complicated than that. In any case… I need to move on.]

Let me now give you the electron mass expressed in the Planck mass unit:

  1. Measured in our old-fashioned super-sized SI kilogram unit, the electron mass is me = 9.1×10–31 kg.
  2. The Planck mass is mP = 2.1765×10−8 kg.
  3. Hence, the electron mass expressed in Planck units is meP = me/mP = (9.1×10–31 kg)/(2.1765×10−8 kg) = 4.181×10−23.

We can, once again, write that as some function of the fine-structure constant. More specifically, we can write:

meP = α/reP = α/α2rP  = 1/αrP

So… Well… Yes: yet another amazing formula involving α.

In this formula, we have reP and rP, which are the (classical) electron radius and the Bohr radius expressed in Planck (length) units respectively. So you can see what’s going on here: we have all kinds of numbers here expressed in Planck units: a charge, a radius, a mass,… And we can relate all of them to the fine-structure constant

Why? Who knows? I don’t. As Leighton puts it: that’s just the way “God pushed His pencil.” 🙂

Note that the beauty of natural units ensures that we get the same number for the (equivalent) energy of an electron. Indeed, from the E = mc2 relation, we know the mass of an electron can also be written as 0.511 MeV/c2. Hence, the equivalent energy is 0.511 MeV (so that’s, quite simply, the same number but without the 1/cfactor). Now, the Planck energy EP (in eV) is 1.22×1028 eV, so we get EeP = Ee/EP = (0.511×10eV)/(1.22×1028 eV) = 4.181×10−23. So it’s exactly the same as the electron mass expressed in Planck units. Isn’t that nice? 🙂

Now, are all these numbers dimensionless, just like α? The answer to that question is complicated. Yes, and… Well… No:

  1. Yes. They’re dimensionless because they measure something in natural units, i.e. Planck units, and, hence, that’s some kind of relative measure indeed so… Well… Yes, dimensionless.
  2. No. They’re not dimensionless because they do measure something, like a charge, a length, or a mass, and when you chose some kind of relative measure, you still need to define some gauge, i.e. some kind of standard measure. So there’s some ‘dimension’ involved there.

So what’s the final answer? Well… The Planck units are not dimensionless. All we can say is that they are closely related, physically. I should also add that we’ll use the electron charge and mass (expressed in Planck units) in our amplitude calculations as a simple (dimensionless) number between zero and one. So the correct answer to the question as to whether these numbers have any dimension is: expressing some quantities in Planck units sort of normalizes them, so we can use them directly in dimensionless calculations, like when we multiply and add amplitudes.

Hmm… Well… I can imagine you’re not very happy with this answer but it’s the best I can do. Sorry. I’ll let you further ponder that question. I need to move on.  

Note that that 4.181×10−23 is still a very small number (23 zeroes after the decimal point!), even if it’s like 46 million times larger than the electron mass measured in our conventional SI unit (i.e. 9.1×10–31 kg). Does such small number make any sense? The answer is: yes, it does. When we’ll finally start discussing that E(A to B) formula (I’ll give it to you in a moment), you’ll see that a very small number for n makes a lot of sense.

Before diving into it all, let’s first see if that formula for that alpha, that fine-structure constant, still makes sense with me expressed in Planck units. Just to make sure. 🙂 To do that, we need to use the fifth (last) expression for a, i.e. the one with re in it. Now, in my previous post, I also gave some formula for re: re = e2/4πε0mec2, which we can re-write as reme = e2/4πε0c2. If we substitute that expression for reme  in the formula for α, we can calculate α from the electron charge, which indicates both the electron radius and its mass are not some random God-given variable, or “some magic number that comes to us with no understanding by man“, as Feynman – well… Leighton, I guess – puts it. No. They are magic numbers alright, one related to another through the equally ‘magic’ number α, but so I do feel we actually can create some understanding here.

At this point, I’ll digress once again, and insert some quick back-of-the-envelope argument from Feynman’s very serious Caltech Lectures on Physics, in which, as part of the introduction to quantum mechanics, he calculates the so-called Bohr radius from Planck’s constant h. Let me quickly explain: the Bohr radius is, roughly speaking, the size of the simplest atom, i.e. an atom with one electron (so that’s hydrogen really). So it’s not the classical electron radius re. However, both are also related to that ‘magical number’ α. To be precise, if we write the Bohr radius as r, then re = α2r ≈ 0.000053… times r, which we can re-write as:

α = √(re /r) = (re /r)1/2

So that’s yet another amazing formula involving the fine-structure constant. In fact, it’s the formula I used as an ‘interim’ expression to calculate the relative speed of electrons. I just used it without any explanation there, but I am coming back to it here. Alpha again…

Just think about it for a while. In case you’d still doubt the magic of that number, let me write what we’ve discovered so far:

(1) α is the square of the electron charge expressed in Planck units: α = eP2.

(2) α is the square root of the ratio of (a) the classical electron radius and (b) the Bohr radius: α = √(re /r). You’ll see this more often written as re = α2r. Also note that this is an equation that does not depend on the units, in contrast to equation 1 (above), and 4 and 5 (below), which require you to switch to Planck units. It’s the square of a ratio and, hence, the units don’t matter. They fall away.

(3) α is the (relative) speed of an electron: α = v/c. [The relative speed is the speed as measured against the speed of light. Note that the ‘natural’ unit of speed in the Planck system of units is equal to c. Indeed, if you divide one Planck length by one Planck time unit, you get (1.616×10−35 m)/(5.391×10−44 s) = m/s. However, this is another equation, just like (2), that does not depend on the units: we can express v and c in whatever unit we want, as long we’re consistent and express both in the same units.]

(4) Finally – I’ll show you in a moment – α is also equal to the product of (a) the electron mass (which I’ll simply write as me here) and (b) the classical electron radius re (if both are expressed in Planck units): α = me·re. Now think that’s, perhaps, the most amazing of all of the expressions for α. If you don’t think that’s amazing, I’d really suggest you stop trying to study physics. 🙂

Note that, from (2) and (4), we find that:

(5) The electron mass (in Planck units) is equal me = α/r= α/α2r = 1/αr. So that gives us an expression, using α once again, for the electron mass as a function of the Bohr radius r expressed in Planck units.

Finally, we can also substitute (1) in (5) to get:

(6) The electron mass (in Planck units) is equal to me = α/r = eP2/re. Using the Bohr radius, we get me = 1/αr = 1/eP2r.

So… As you can see, this fine-structure constant really links ALL of the fundamental properties of the electron: its charge, its radius, its distance to the nucleus (i.e. the Bohr radius), its velocity, its mass (and, hence, its energy),… In short,


Now that should answer the question in regard to the degrees of freedom we have here, doesn’t it? It looks like we’ve got only one degree of freedom here. Indeed, if we’ve got some value for α, then we’ve have the electron charge, and from the electron charge, we can calculate the Bohr radius r (as I will show below), and if we have r, we have mand re. And then we can also calculate v, which gives us its momentum (mv) and its kinetic energy (mv2/2). In short,


Isn’t that amazing? Hmm… You should reserve your judgment as for now, and carefully go over all of the formulas above and verify my statement. If you do that, you’ll probably struggle to find the Bohr radius from the charge (i.e. from α). So let me show you how you do that, because it will also show you why you should, indeed, reserve your judgment. In other words, I’ll show you why alpha does NOT give us everything! The argument below will, finally, prove some of the formulas that I didn’t prove above. Let’s go for it:

1. If we assume that (a) an electron takes some space – which I’ll denote by r 🙂 – and (b) that it has some momentum p because of its mass m and its velocity v, then the ΔxΔp = ħ relation (i.e. the Uncertainty Principle in its roughest form) suggests that the order of magnitude of r and p should be related in the very same way. Hence, let’s just boldly write r ≈ ħ/p and see what we can do with that. So we equate Δx with r and Δp with p. As Feynman notes, this is really more like a ‘dimensional analysis’ (he obviously means something very ‘rough’ with that) and so we don’t care about factors like 2 or 1/2. [Indeed, note that the more precise formulation of the Uncertainty Principle is σxσ≥ ħ/2.] In fact, we didn’t even bother to define r very rigorously. We just don’t care about precise statements at this point. We’re only concerned about orders of magnitude. [If you’re appalled by the rather rude approach, I am sorry for that, but just try to go along with it.]

2. From our discussions on energy, we know that the kinetic energy is mv2/2, which we can write as p2/2m so we get rid of the velocity factor. [Why? Because we can’t really imagine what it is anyway. As I said a couple of times already, we shouldn’t think of electrons as planets orbiting around some star. That model doesn’t work.] So… What’s next? Well… Substituting our p ≈ ħ/r conjecture, we get K.E. = ħ2/2mr2. So that’s a formula for the kinetic energy. Next is potential.

3. Unfortunately, the discussion on potential energy is a bit more complicated. You’ll probably remember that we had an easy and very comprehensible formula for the energy that’s needed (i.e. the work that needs to be done) to bring two charges together from a large distance (i.e. infinity). Indeed, we derived that formula directly from Coulomb’s Law (and Newton’s law of force) and it’s U = q1q2/4πε0r12. [If you think I am going too fast, sorry, please check for yourself by reading my other posts.] Now, we’re actually talking about the size of an atom here in my previous post, so one charge is the proton (+e) and the other is the electron (–e), so the potential energy is U = P.E. = –e2/4πε0r, with r the ‘distance’ between the proton and the electron—so that’s the Bohr radius we’re looking for!

[In case you’re struggling a bit with those minus signs when talking potential energy  – I am not ashamed to admit I did! – let me quickly help you here. It has to do with our reference point: the reference point for measuring potential energy is at infinity, and it’s zero there (that’s just our convention). Now, to separate the proton and the electron, we’d have to do quite a lot of work. To use an analogy: imagine we’re somewhere deep down in a cave, and we have to climb back to the zero level. You’ll agree that’s likely to involve some sweat, don’t you? Hence, the potential energy associated with us being down in the cave is negative. Likewise, if we write the potential energy between the proton and the electron as U(r), and the potential energy at the reference point as U(∞) = 0, then the work to be done to separate the charges, i.e. the potential difference U(∞) – U(r), will be positive. So U(∞) – U(r) = 0 – U(r) > 0 and, hence, U(r) < 0. If you still don’t ‘get’ this, think of the electron being in some (potential) well, i.e. below the zero level, and so it’s potential energy is less than zero. Huh? Sorry. I have to move on. :-)]

4. We can now write the total energy (which I’ll denote by E, but don’t confuse it with the electric field vector!) as

E = K.E. + P.E. =  ħ2/2mr– e2/4πε0r

Now, the electron (whatever it is) is, obviously, in some kind of equilibrium state. Why is that obvious? Well… Otherwise our hydrogen atom wouldn’t or couldn’t exist. 🙂 Hence, it’s in some kind of energy ‘well’ indeed, at the bottom. Such equilibrium point ‘at the bottom’ is characterized by its derivative (in respect to whatever variable) being equal to zero. Now, the only ‘variable’ here is r (all the other symbols are physical constants), so we have to solve for dE/dr = 0. Writing it all out yields:

dE/dr = –ħ2/mr+ e2/4πε0r= 0 ⇔ r = 4πε0ħ2/me2

You’ll say: so what? Well… We’ve got a nice formula for the Bohr radius here, and we got it in no time! 🙂 But the analysis was rough, so let’s check if it’s any good by putting the values in:

r = 4πε0h2/me2

= [(1/(9×109) C2/N·m2)·(1.055×10–34 J·s)2]/[(9.1×10–31 kg)·(1.6×10–19 C)2]

= 53×10–12 m = 53 pico-meter (pm)

So what? Well… Double-check it on the Internet: the Bohr radius is, effectively, about 53 trillionths of a meter indeed! So we’re right on the spot! 

[In case you wonder about the units, note that mass is a measure of inertia: one kg is the mass of an object which, subject to a force of 1 newton, will accelerate at the rate of 1 m/s per second. Hence, we write F = m·a, which is equivalent to m = F/a. Hence, the kg, as a unit, is equivalent to 1 N/(m/s2). If you make this substitution, we get r in the unit we want to see: [(C2/N·m2)·(N2·m2·s2)/[(N·s2/m)·C2] = m.]

Moreover, if we take that value for r and put it in the (total) energy formula above, we’d find that the energy of the electron is –13.6 eV. [Don’t forget to convert from joule to electronvolt when doing the calculation!] Now you can check that on the Internet too: 13.6 eV is exactly the amount of energy that’s needed to ionize a hydrogen atom (i.e. the energy that’s needed to kick the electron out of that energy well)!

Waw ! Isn’t it great that such simple calculations yield such great results? 🙂 [Of course, you’ll note that the omission of the 1/2 factor in the Uncertainty Principle was quite strategic. :-)] Using the r = 4πε0ħ2/meformula for the Bohr radius, you can now easily check the re = α2r formula. You should find what we jotted down already: the classical electron radius is equal to re = e2/4πε0mec2. To be precise, re = (53×10–6)·(53×10–12m) = 2.8×10–15 m. Now that’s again something you should check on the Internet. Guess what? […] It’s right on the spot again. 🙂

We can now also check that α = m·re formula: α = m·r= 4.181×10−23 times… Hey! Wait! We have to express re in Planck units as well, of course! Now, (2.81794×10–15 m)/(1.616×10–35 m) ≈ 1.7438 ×1020. So now we get 4.181×10−23 times 1.7438×1020 = 7.29×10–3 = 0.00729 ≈ 1/137. Bingo! We got the magic number once again. 🙂

So… Well… Doesn’t that confirm we actually do have it all with α?

Well… Yes and no… First, you should note that I had to use h in that calculation of the Bohr radius. Moreover, the other physical constants (most notably c and the Coulomb constant) were actually there as well, ‘in the background’ so to speak, because one needs them to derive the formulas we used above. And then we have the equations themselves, of course, most notably that Uncertainty Principle… So… Well…

It’s not like God gave us one number only (α) and that all the rest flows out of it. We have a whole bunch of ‘fundamental’ relations and ‘fundamental’ constants here.

Having said that, it’s true that statement still does not diminish the magic of alpha.

Hmm… Now you’ll wonder: how many? How many constants do we need in all of physics?

Well… I’d say, you should not only ask about the constants: you should also ask about the equations: how many equations do we need in all of physics? [Just for the record, I had to smile when the Hawking of the movie says that he’s actually looking for one formula that sums up all of physics. Frankly, that’s a nonsensical statement. Hence, I think the real Hawking never said anything like that. Or, if he did, that it was one of those statements one needs to interpret very carefully.]

But let’s look at a few constants indeed. For example, if we have c, h and α, then we can calculate the electric charge e and, hence, the electric constant ε= e2/2αhc. From that, we get Coulomb’s constant ke, because ke is defined as 1/4πε0… But…

Hey! Wait a minute! How do we know that ke = 1/4πε0? Well… From experiment. But… Yes? That means 1/4π is some fundamental proportionality coefficient too, isn’t it?

Wow! You’re smart. That’s a good and valid remark. In fact, we use the so-called reduced Planck constant ħ in a number of calculations, and so that involves a 2π factor too (ħ = h/2π). Hence… Well… Yes, perhaps we should consider 2π as some fundamental constant too! And, then, well… Now that I think of it, there’s a few other mathematical constants out there, like Euler’s number e, for example, which we use in complex exponentials.


I am joking, right? I am not saying that 2π and Euler’s number are fundamental ‘physical’ constants, am I? [Note that it’s a bit of a nuisance we’re also using the symbol for Euler’s number, but so we’re not talking the electron charge here: we’re talking that 2.71828…etc number that’s used in so-called ‘natural’ exponentials and logarithms.]

Well… Yes and no. They’re mathematical constants indeed, rather than physical, but… Well… I hope you get my point. What I want to show here, is that it’s quite hard to say what’s fundamental and what isn’t. We can actually pick and choose a bit among all those constants and all those equations. As one physicist puts its: it depends on how we slice it. The one thing we know for sure is that a great many things are related, in a physical way (α connects all of the fundamental properties of the electron, for example) and/or in a mathematical way (2π connects not only the circumference of the unit circle with the radius but quite a few other constants as well!), but… Well… What to say? It’s a tough discussion and I am not smart enough to give you an unambiguous answer. From what I gather on the Internet, when looking at the whole Standard Model (including the strong force, the weak force and the Higgs field), we’ve got a few dozen physical ‘fundamental’ constants, and then a few mathematical ones as well.

That’s a lot, you’ll say. Yes. At the same time, it’s not an awful lot. Whatever number it is, it does raise a very fundamental question: why are they what they are? That brings us back to that ‘fine-tuning’ problem. Now, I can’t make this post too long (it’s way too long already), so let me just conclude this discussion by copying Wikipedia on that question, because what it has on this topic is not so bad:

“Some physicists have explored the notion that if the physical constants had sufficiently different values, our Universe would be so radically different that intelligent life would probably not have emerged, and that our Universe therefore seems to be fine-tuned for intelligent life. The anthropic principle states a logical truism: the fact of our existence as intelligent beings who can measure physical constants requires those constants to be such that beings like us can exist.

I like this. But the article then adds the following, which I do not like so much, because I think it’s a bit too ‘frivolous’:

“There are a variety of interpretations of the constants’ values, including that of a divine creator (the apparent fine-tuning is actual and intentional), or that ours is one universe of many in a multiverse (e.g. the many-worlds interpretation of quantum mechanics), or even that, if information is an innate property of the universe and logically inseparable from consciousness, a universe without the capacity for conscious beings cannot exist.”

Hmm… As said, I am quite happy with the logical truism: we are there because alpha (and a whole range of other stuff) is what it is, and we can measure alpha (and a whole range of other stuff) as what it is, because… Well… Because we’re here. Full stop. As for the ‘interpretations’, I’ll let you think about that for yourself. 🙂

I need to get back to the lesson. Indeed, this was just a ‘digression’. My post was about the three fundamental events or actions in quantum electrodynamics, and so I was talking about that E(A to B) formula. However, I had to do that digression on alpha to ensure you understand what I want to write about that. So let me now get back to it. End of digression. 🙂

The E(A to B) formula

Indeed, I must assume that, with all these digressions, you are truly despairing now. Don’t. We’re there! We’re finally ready for the E(A to B) formula! Let’s go for it.

We’ve now got those two numbers measuring the electron charge and the electron mass in Planck units respectively. They’re fundamental indeed and so let’s loosen up on notation and just write them as e and m respectively. Let me recap:

1. The value of e is approximately –0.08542455, and it corresponds to the so-called junction number j, which is the amplitude for an electron-photon coupling. When multiplying it with another amplitude (to find the amplitude for an event consisting of two sub-events, for example), it corresponds to a ‘shrink’ to less than one-tenth (something like 8.5% indeed, corresponding to the magnitude of e) and a ‘rotation’ (or a ‘turn’) over 180 degrees, as mentioned above.

Please note what’s going on here: we have a physical quantity, the electron charge (expressed in Planck units), and we use it in a quantum-mechanical calculation as a dimensionless (complex) number, i.e. as an amplitude. So… Well… That’s what physicists mean when they say that the charge of some particle (usually the electric charge but, in quantum chromodynamics, it will be the ‘color’ charge of a quark) is a ‘coupling constant’.

2. We also have m, the electron mass, and we’ll use in the same way, i.e. as some dimensionless amplitude. As compared to j, it’s is a very tiny number: approximately 4.181×10−23. So if you look at it as an amplitude, indeed, then it corresponds to an enormous ‘shrink’ (but no turn) of the amplitude(s) that we’ll be combining it with.

So… Well… How do we do it?

Well… At this point, Leighton goes a bit off-track. Just a little bit. 🙂 From what he writes, it’s obvious that he assumes the frequency (or, what amounts to the same, the de Broglie wavelength) of an electron is just like the frequency of a photon. Frankly, I just can’t imagine why and how Feynman let this happen. It’s wrong. Plain wrong. As I mentioned in my introduction already, an electron traveling through space is not like a photon traveling through space.

For starters, an electron is much slower (because it’s a matter-particle: hence, it’s got mass). Secondly, the de Broglie wavelength and/or frequency of an electron is not like that of a photon. For example, if we take an electron and a photon having the same energy, let’s say 1 eV (that corresponds to infrared light), then the de Broglie wavelength of the electron will be 1.23 nano-meter (i.e. 1.23 billionths of a meter). Now that’s about one thousand times smaller than the wavelength of our 1 eV photon, which is about 1240 nm. You’ll say: how is that possible? If they have the same energy, then the f = E/h and ν = E/h should give the same frequency and, hence, the same wavelength, no?

Well… No! Not at all! Because an electron, unlike the photon, has a rest mass indeed – measured as not less than 0.511 MeV/c2, to be precise (note the rather particular MeV/c2 unit: it’s from the E = mc2 formula) – one should use a different energy value! Indeed, we should include the rest mass energy, which is 0.511 MeV. So, almost all of the energy here is rest mass energy! There’s also another complication. For the photon, there is an easy relationship between the wavelength and the frequency: it has no mass and, hence, all its energy is kinetic, or movement so to say, and so we can use that ν = E/h relationship to calculate its frequency ν: it’s equal to ν = E/h = (1 eV)/(4.13567×10–15 eV·s) ≈ 0.242×1015 Hz = 242 tera-hertz (1 THz = 1012 oscillations per second). Now, knowing that light travels at the speed of light, we can check the result by calculating the wavelength using the λ = c/ν relation. Let’s do it: (2.998×10m/s)/(242×1012 Hz) ≈ 1240 nm. So… Yes, done!

But so we’re talking photons here. For the electron, the story is much more complicated. That wavelength I mentioned was calculated using the other of the two de Broglie relations: λ = h/p. So that uses the momentum of the electron which, as you know, is the product of its mass (m) and its velocity (v): p = mv. You can amuse yourself and check if you find the same wavelength (1.23 nm): you should! From the other de Broglie relation, f = E/h, you can also calculate its frequency: for an electron moving at non-relativistic speeds, it’s about 0.123×1021 Hz, so that’s like 500,000 times the frequency of the photon we we’re looking at! When multiplying the frequency and the wavelength, we should get its speed. However, that’s where we get in trouble. Here’s the problem with matter waves: they have a so-called group velocity and a so-called phase velocity. The idea is illustrated below: the green dot travels with the wave packet – and, hence, its velocity corresponds to the group velocity – while the red dot travels with the oscillation itself, and so that’s the phase velocity. [You should also remember, of course, that the matter wave is some complex-valued wavefunction, so we have both a real as well as an imaginary part oscillating and traveling through space.]

Wave_group (1)

To be precise, the phase velocity will be superluminal. Indeed, using the usual relativistic formula, we can write that p = γm0v and E = γm0c2, with v the (classical) velocity of the electron and what it always is, i.e. the speed of light. Hence, λ = h/γm0v and = γm0c2/h, and so λf = c2/v. Because v is (much) smaller than c, we get a superluminal velocity. However, that’s the phase velocity indeed, not the group velocity, which corresponds to v. OK… I need to end this digression.

So what? Well, to make a long story short, the ‘amplitude framework’ for electrons is differerent. Hence, the story that I’ll be telling here is different from what you’ll read in Feynman’s QED. I will use his drawings, though, and his concepts. Indeed, despite my misgivings above, the conceptual framework is sound, and so the corrections to be made are relatively minor.

So… We’re looking at E(A to B), i.e. the amplitude for an electron to go from point A to B in spacetime, and I said the conceptual framework is exactly the same as that for a photon. Hence, the electron can follow any path really. It may go in a straight line and travel at a speed that’s consistent with what we know of its momentum (p), but it may also follow other paths. So, just like the photon, we’ll have some so-called propagator function, which gives you amplitudes based on the distance in space as well as in the distance in ‘time’ between two points. Now, Ralph Leighton identifies that propagator function with the propagator function for the photon, i.e. P(A to B), but that’s wrong: it’s not the same.

The propagator function for an electron depends on its mass and its velocity, and/or on the combination of both (like it momentum p = mv and/or its kinetic energy: K.E. = mv2 = p2/2m). So we have a different propagator function here. However, I’ll use the same symbol for it: P(A to B).

So, the bottom line is that, because of the electron’s mass (which, remember, is a measure for inertia), momentum and/or kinetic energy (which, remember, are conserved in physics), the straight line is definitely the most likely path, but (big but!), just like the photon, the electron may follow some other path as well.

So how do we formalize that? Let’s first associate an amplitude P(A to B) with an electron traveling from point A to B in a straight line and in a time that’s consistent with its velocity. Now, as mentioned above, the P here stands for propagator function, not for photon, so we’re talking a different P(A to B) here than that P(A to B) function we used for the photon. Sorry for the confusion. 🙂 The left-hand diagram below then shows what we’re talking about: it’s the so-called ‘one-hop flight’, and so that’s what the P(A to B) amplitude is associated with.

Diagram 1Now, the electron can follow other paths. For photons, we said the amplitude depended on the spacetime interval I: when negative or positive (i.e. paths that are not associated with the photon traveling in a straight line and/or at the speed of light), the contribution of those paths to the final amplitudes (or ‘final arrow’, as it was called) was smaller.

For an electron, we have something similar, but it’s modeled differently. We say the electron could take a ‘two-hop flight’ (via point C or C’), or a ‘three-hop flight’ (via D and E) from point A to B. Now, it makes sense that these paths should be associated with amplitudes that are much smaller. Now that’s where that n-factor comes in. We just put some real number n in the formula for the amplitude for an electron to go from A to B via C, which we write as:

P(A to C)∗n2∗P(C to B)

Note what’s going on here. We multiply two amplitudes, P(A to C) and P(C to B), which is OK, because that’s what the rules of quantum mechanics tell us: if an ‘event’ consists of two sub-events, we need to multiply the amplitudes (not the probabilities) in order to get the amplitude that’s associated with both sub-events happening. However, we add an extra factor: n2. Note that it must be some very small number because we have lots of alternative paths and, hence, they should not be very likely! So what’s the n? And why n2 instead of just n?

Well… Frankly, I don’t know. Ralph Leighton boldly equates n to the mass of the electron. Now, because he obviously means the mass expressed in Planck units, that’s the same as saying n is the electron’s energy (again, expressed in Planck’s ‘natural’ units), so n should be that number m = meP = EeP = 4.181×10−23. However, I couldn’t find any confirmation on the Internet, or elsewhere, of the suggested n = m identity, so I’ll assume n = m indeed, but… Well… Please check for yourself. It seems the answer is to be found in a mathematical theory that helps physicists to actually calculate j and n from experiment. It’s referred to as perturbation theory, and it’s the next thing on my study list. As for now, however, I can’t help you much. I can only note that the equation makes sense.

Of course, it does: inserting a tiny little number n, close to zero, ensures that those other amplitudes don’t contribute too much to the final ‘arrow’. And it also makes a lot of sense to associate it with the electron’s mass: if mass is a measure of inertia, then it should be some factor reducing the amplitude that’s associated with the electron following such crooked path. So let’s go along with it, and see what comes out of it.

A three-hop flight is even weirder and uses that n2 factor two times:

P(A to E)∗n2∗P(E to D)∗n2∗P(D to B)

So we have an (n2)= nfactor here, which is good, because two hops should be much less likely than one hop. So what do we get? Well… (4.181×10−23)≈ 305×10−92. Pretty tiny, huh? 🙂 Of course, any point in space is a potential hop for the electron’s flight from point A to B and, hence, there’s a lot of paths and a lot of amplitudes (or ‘arrows’ if you want), which, again, is consistent with a very tiny value for n indeed.

So, to make a long story short, E(A to B) will be a giant sum (i.e. some kind of integral indeed) of a lot of different ways an electron can go from point A to B. It will be a series of terms P(A to E) + P(A to C)∗n2∗P(C to B) + P(A to E)∗n2∗P(E to D)∗n2∗P(D to B) + … for all possible intermediate points C, D, E, and so on.

What about the j? The junction number of coupling constant. How does that show up in the E(A to B) formula? Well… Those alternative paths with hops here and there are actually the easiest bit of the whole calculation. Apart from taking some strange path, electrons can also emit and/or absorb photons during the trip. In fact, they’re doing that constantly actually. Indeed, the image of an electron ‘in orbit’ around the nucleus is that of an electron exchanging so-called ‘virtual’ photons constantly, as illustrated below. So our image of an electron absorbing and then emitting a photon (see the diagram on the right-hand side) is really like the tiny tip of a giant iceberg: most of what’s going on is underneath! So that’s where our junction number j comes in, i.e. the charge (e) of the electron.

So, when you hear that a coupling constant is actually equal to the charge, then this is what it means: you should just note it’s the charge expressed in Planck units. But it’s a deep connection, isn’t? When everything is said and done, a charge is something physical, but so here, in these amplitude calculations, it just shows up as some dimensionless negative number, used in multiplications and additions of amplitudes. Isn’t that remarkable?

d2 d3

The situation becomes even more complicated when more than one electron is involved. For example, two electrons can go in a straight line from point 1 and 2 to point 3 and 4 respectively, but there’s two ways in which this can happen, and they might exchange photons along the way, as shown below. If there’s two alternative ways in which one event can happen, you know we have to add amplitudes, rather than multiply them. Hence, the formula for E(A to B) becomes even more complicated.


Moreover, a single electron may first emit and then absorb a photon itself, so there’s no need for other particles to be there to have lots of j factors in our calculation. In addition, that photon may briefly disintegrate into an electron and a positron, which then annihilate each other to again produce a photon: in case you wondered, that’s what those little loops in those diagrams depicting the exchange of virtual photons is supposed to represent. So, every single junction (i.e. every emission and/or absorption of a photon) involves a multiplication with that junction number j, so if there are two couplings involved, we have a j2 factor, and so that’s 0.085424552 = α ≈ 0.0073. Four couplings implies a factor of 0.085424554 ≈ 0.000053.

Just as an example, I copy two diagrams involving four, five or six couplings indeed. They all have some ‘incoming’ photon, because Feynman uses them to explain something else (the so-called magnetic moment of a photon), but it doesn’t matter: the same illustrations can serve multiple purposes.

d6 d7

Now, it’s obvious that the contributions of the alternatives with many couplings add almost nothing to the final amplitude – just like the ‘many-hop’ flights add almost nothing – but… Well… As tiny as these contributions are, they are all there, and so they all have to be accounted for. So… Yes. You can easily appreciate how messy it all gets, especially in light of the fact that there are so many points that can serve as a ‘hop’ or a ‘coupling’ point!

So… Well… Nothing. That’s it! I am done! I realize this has been another long and difficult story, but I hope you appreciated and that it shed some light on what’s really behind those simplified stories of what quantum mechanics is all about. It’s all weird and, admittedly, not so easy to understand, but I wouldn’t say an understanding is really beyond the reach of us, common mortals. 🙂

Post scriptum: When you’ve reached here, you may wonder: so where’s the final formula then for E(A to B)? Well… I have no easy formula for you. From what I wrote above, it should be obvious that we’re talking some really awful-looking integral and, because it’s so awful, I’ll let you find it yourself. 🙂

I should also note another reason why I am reluctant to identify n with m. The formulas in Feynman’s QED are definitely not the standard ones. The more standard formulations will use the gauge coupling parameter about which I talked already. I sort of discussed it, indirectly, in my first comments on Feynman’s QED, when I criticized some other part of the book, notably its explanation of the phenomenon of diffraction of light, which basically boiled down to: “When you try to squeeze light too much [by forcing it to go through a small hole], it refuses to cooperate and begins to spread out”, because “there are not enough arrows representing alternative paths.”

Now that raises a lot of questions, and very sensible ones, because that simplification is nonsensical. Not enough arrows? That statement doesn’t make sense. We can subdivide space in as many paths as we want, and probability amplitudes don’t take up any physical space. We can cut up space in smaller and smaller pieces (so we analyze more paths within the same space). The consequence – in terms of arrows – is that directions of our arrows won’t change but their length will be much and much smaller as we’re analyzing many more paths. That’s because of the normalization constraint. However, when adding them all up – a lot of very tiny ones, or a smaller bunch of bigger ones – we’ll still get the same ‘final’ arrow. That’s because the direction of those arrows depends on the length of the path, and the length of the path doesn’t change simply because we suddenly decide to use some other ‘gauge’.

Indeed, the real question is: what’s a ‘small’ hole? What’s ‘small’ and what’s ‘large’ in quantum electrodynamics? Now, I gave an intuitive answer to that question in that post of mine, but it’s much more accurate than Feynman’s, or Leighton’s. The answer to that question is: there’s some kind of natural ‘gauge’, and it’s related to the wavelength. So the wavelength of a photon, or an electron, in this case, comes with some kind of scale indeed. That’s why the fine-structure constant is often written in yet another form:

α = 2πree = rek

λe and kare the Compton wavelength and wavenumber of the electron (so kis not the Coulomb constant here). The Compton wavelength is the de Broglie wavelength of the electron. [You’ll find that Wikipedia defines it as “the wavelength that’s equivalent to the wavelength of a photon whose energy is the same as the rest-mass energy of the electron”, but that’s a very confusing definition, I think.]

The point to note is that the spatial dimension in both the analysis of photons as well as of matter waves, especially in regard to studying diffraction and/or interference phenomena, is related to the frequencies, wavelengths and/or wavenumbers of the wavefunctions involved. There’s a certain ‘gauge’ involved indeed, i.e. some measure that is relative, like the gauge pressure illustrated below. So that’s where that gauge parameter g comes in. And the fact that it’s yet another number that’s closely related to that fine-structure constant is… Well… Again… That alpha number is a very magic number indeed… 🙂


Post scriptum (5 October 2015):

Much stuff is physics is quite ‘magical’, but it’s never ‘too magical’. I mean: there’s always an explanation. So there is a very logical explanation for the above-mentioned deep connection between the charge of an electron, its energy and/or mass, its various radii (or physical dimensions) and the coupling constant too. I wrote a piece about that, much later than when I wrote the piece above. I would recommend you read that piece too. It’s a piece in which I do take the magic out of ‘God’s number’. Understanding it involves a deep understanding of electromagnetism, however, and that requires some effort. It’s surely worth the effort, though.

Fields and charges (II)

My previous posts was, perhaps, too full of formulas, without offering much reflection. Let me try to correct that here by tying up a few loose ends. The first loose end is about units. Indeed, I haven’t been very clear about that and so let me somewhat more precise on that now.

Note: In case you’re not interested in units, you can skip the first part of this post. However, please do look at the section on the electric constant εand, most importantly, the section on natural units—especially Planck units, as I will touch upon the topic of gauge coupling parameters there and, hence, on quantum mechanics. Also, the third and last part, on the theoretical contradictions inherent in the idea of point charges, may be of interest to you.]

The field energy integrals

When we wrote that down that u = ε0E2/2 formula for the energy density of an electric field (see my previous post on fields and charges for more details), we noted that the 1/2 factor was there to avoid double-counting. Indeed, those volume integrals we use to calculate the energy over all space (i.e. U = ∫(u)dV) count the energy that’s associated with a pair of charges (or, to be precise, charge elements) twice and, hence, they have a 1/2 factor in front. Indeed, as Feynman notes, there is no convenient way, unfortunately, of writing an integral that keeps track of the pairs so that each pair is counted just once. In fact, I’ll have to come back to that assumption of there being ‘pairs’ of charges later, as that’s another loose end in the theory.

U 6

U 7

Now, we also said that that εfactor in the second integral (i.e. the one with the vector dot product EE =|E||E|cos(0) = E2) is there to make the units come out alright. Now, when I say that, what does it mean really? I’ll explain. Let me first make a few obvious remarks:

  1. Densities are always measures in terms per unit volume, so that’s the cubic meter (m3). That’s, obviously, an astronomical unit at the atomic or molecular scale.
  2. Because of historical reasons, the conventional unit of charge is not the so-called elementary charge +e (i.e. the charge of a proton), but the coulomb. Hence, the charge density ρ is expressed in Coulomb per cubic meter (C/m3). The coulomb is a rather astronomical unit too—at the atomic or molecular scale at least: 1 e ≈ 1.6022×10−19 C. [I am rounding here to four digits after the decimal point.]
  3. Energy is in joule (J) and that’s, once again, a rather astronomical unit at the lower end of the scales. Indeed, theoretical physicists prefer to use the electronvolt (eV), which is the energy gained (or lost) when an electron (so that’s a charge of –e, i.e. minus e) moves across a potential difference of one volt. But so we’ll stick to the joule as for now, not the eV, because the joule is the SI unit that’s used when defining most electrical units, such as the ampere, the watt and… Yes. The volt. Let’s start with that one.

The volt

The volt unit (V) measures both potential (energy) as well as potential difference (in both cases, we mean electric potential only, of course). Now, from all that you’ve read so far, it should be obvious that potential (energy) can only be measured with respect to some reference point. In physics, the reference point is infinity, which is so far away from all charges that there is no influence there. Hence, any charge we’d bring there (i.e. at infinity) will just stay where it is and not be attracted or repelled by anything. We say the potential there is zero: Φ(∞) = 0. The choice of that reference point allows us, then, to define positive or negative potential: the potential near positive charges will be positive and, vice versa, the potential near negative charges will be negative. Likewise, the potential difference between the positive and negative terminal of a battery will be positive.

So you should just note that we measure both potential as well as potential difference in volt and, hence, let’s now answer the question of what a volt really is. The answer is quite straightforward: the potential at some point r = (x, y, z) measures the work done when bringing one unit charge (i.e. +e) from infinity to that point. Hence, it’s only natural that we define one volt as one joule per unit charge:

1 volt = 1 joule/coulomb (1 V = 1 J/C).

Also note the following:

  1. One joule is the energy energy transferred (or work done) when applying a force of one newton over a distance of one meter, so one volt can also be measured in newton·meter per coulomb: 1 V = 1 J/C = N·m/C.
  2. One joule can also be written as 1 J = 1 V·C.

It’s quite easy to see why that energy = volt-coulomb product makes sense: higher voltage will be associated with higher energy, and the same goes for higher charge. Indeed, the so-called ‘static’ on our body is usually associated with potential differences of thousands of volts (I am not kidding), but the charges involved are extremely small, because the ability of our body to store electric charge is minimal (i.e. the capacitance (aka capacity) of our body). Hence, the shock involved in the discharge is usually quite small: it is measured in milli-joules (mJ), indeed.

The farad

The remark on ‘static’ brings me to another unit which I should mention in passing: the farad. It measures the capacitance (formerly known as the capacity) of a capacitor (formerly known as a condenser). A condenser consists, quite simply, of two separated conductors: it’s usually illustrated as consisting of two plates or of thin foils (e.g. aluminum foil) separated by an insulating film (e.g. waxed paper), but one can also discuss the capacity of a single body, like our human body, or a charged sphere. In both cases, however, the idea is the same: we have a ‘positive’ charge on one side (+q), and a ‘negative’ charge on the other (–q). In case of a single object, we imagine the ‘other’ charge to be some other large object (the Earth, for instance, but it can also be a car or whatever object that could potentially absorb the charge on our body) or, in case of the charged sphere, we could imagine some other sphere of much larger radius. The farad will then measure the capacity of one or both conductors to store charge.

Now, you may think we don’t need another unit here if that’s the definition: we could just express the capacity of a condensor in terms of its maximum ‘load’, couldn’t we? So that’s so many coulomb before the thing breaks down, when the waxed paper fails to separate the two opposite charges on the aluminium foil, for example. No. It’s not like that. It’s true we can not continue to increase the charge without consequences. However, what we want to measure with the farad is another relationship. Because of the opposite charges on both sides, there will be a potential difference, i.e. a voltage difference. Indeed, a capacitor is like a little battery in many ways: it will have two terminals. Now, it is fairly easy to show that the potential difference (i.e. the voltage) between the two plates will be proportional to the charge. Think of it as follows: if we double the charges, we’re doubling the fields, right? So then we need to do twice the amount of work to carry the unit charge (against the field) from one plate to the other. Now, because the distance is the same, that means the potential difference must be twice what it was.

Now, while we have a simple proportionality here between the voltage and the charge, the coefficient of proportionality will depend on the type of conductors, their shape, the distance and the type of insulator (aka dielectric) between them, and so on and so on. Now, what’s being measured in farad is that coefficient of proportionality, which we’ll denote by C(the proportionality coefficient for the charge), CV ((the proportionality coefficient for the voltage) or, because we should make a choice between the two, quite simply, as C. Indeed, we can either write (1) Q = CQV or, alternatively, V = CVQ, with C= 1/CV. As Feynman notes, “someone originally wrote the equation of proportionality as Q = CV”, so that’s what it will be: the capacitance (aka capacity) of a capacitor (aka condenser) is the ratio of the electric charge Q (on each conductor) to the potential difference V between the two conductors. So we know that’s a constant typical of the type of condenser we’re talking about. Indeed, the capacitance is the constant of proportionality defining the linear relationship between the charge and the voltage means doubling the voltage, and so we can write:

C = Q/V

Now, the charge is measured in coulomb, and the voltage is measured in volt, so the unit in which we will measure C is coulomb per volt (C/V), which is also known as the farad (F):

1 farad = 1 coulomb/volt (1 F = 1 C/V)

[Note the confusing use of the same symbol C for both the unit of charge (coulomb) as well as for the proportionality coefficient! I am sorrry about that, but so that’s convention!].

To be precise, I should add that the proportionality is generally there, but there are exceptions. More specifically, the way the charge builds up (and the way the field builds up, at the edges of the capacitor, for instance) may cause the capacitance to vary a little bit as it is being charged (or discharged). In that case, capacitance will be defined in terms of incremental changes: C = dQ/dV.

Let me conclude this section by giving you two formulas, which are also easily proved but so I will just give you the result:

  1. The capacity of a parallel-plate condenser is C = ε0A/d. In this formula, we have, once again, that ubiquitous electric constant ε(think of it as just another coefficient of proportionality), and then A, i.e. the area of the plates, and d, i.e. the separation between the two plates.
  2. The capacity of a charged sphere of radius r (so we’re talking the capacity of a single conductor here) is C = 4πε0r. This may remind you of the formula for the surface of a sphere (A = 4πr2), but note we’re not squaring the radius. It’s just a linear relationship with r.

I am not giving you these two formulas to show off or fill the pages, but because they’re so ubiquitous and you’ll need them. In fact, I’ll need the second formula in this post when talking about the other ‘loose end’ that I want to discuss.

Other electrical units

From your high school physics classes, you know the ampere and the watt, of course:

  1. The ampere is the unit of current, so it measures the quantity of charge moving or circulating per second. Hence, one ampere is one coulomb per second: 1 A = 1 C/s.
  2. The watt measures power. Power is the rate of energy conversion or transfer with respect to time. One watt is one joule per second: 1 W = 1 J/s = 1 N·m/s. Also note that we can write power as the product of current and voltage: 1 W = (1 A)·(1 V) = (1 C/s)·(1 J/C) = 1 J/s.

Now, because electromagnetism is such well-developed theory and, more importantly, because it has so many engineering and household applications, there are many other units out there, such as:

  • The ohm (Ω): that’s the unit of electrical resistance. Let me quickly define it: the ohm is defined as the resistance between two points of a conductor when a (constant) potential difference (V) of one volt, applied to these points, produces a current (I) of one ampere. So resistance (R) is another proportionality coefficient: R = V/I, and 1 ohm (Ω) = 1 volt/ampere (V/A). [Again, note the (potential) confusion caused by the use of the same symbol (V) for voltage (i.e. the difference in potential) as well as its unit (volt).] Now, note that it’s often useful to write the relationship as V = R·I, so that gives the potential difference as the product of the resistance and the current.
  • The weber (Wb) and the tesla (T): that’s the unit of magnetic flux (i.e. the strength of the magnetic field) and magnetic flux density (i.e. one tesla = one weber per square meter) respectively. So these have to do with the field vector B, rather than E. So we won’t talk about it here.
  • The henry (H): that’s the unit of electromagnetic inductance. It’s also linked to the magnetic effect. Indeed, from Maxwell’s equations, we know that a changing electric current will cause the magnetic field to change. Now, a changing magnetic field causes circulation of E. Hence, we can make the unit charge go around in some loop (we’re talking circulation of E indeed, not flux). The related energy, or the work that’s done by a unit of charge as it travels (once) around that loop, is – quite confusingly! – referred to as electromotive force (emf). [The term is quite confusing because we’re not talking force but energy, i.e. work, and, as you know by now, energy is force times distance, so energy and force are related but not the same.] To ensure you know what we’re talking about, let me note that emf is measured in volts, so that’s in joule per coulomb: 1 V = 1 J/C. Back to the henry now. If the rate of change of current in a circuit (e.g. the armature winding of an electric motor) is one ampere per second, and the resulting electromotive force (remember: emf is energy per coulomb) is one volt, then the inductance of the circuit is one henry. Hence, 1 H = 1 V/(1 A/s) = 1 V·s/A.     

The concept of impedance

You’ve probably heard about the so-called impedance of a circuit. That’s a complex concept, literally, because it’s a complex-valued ratio. I should refer you to the Web for more details, but let me try to summarize it because, while it’s complex, that doesn’t mean it’s complicated. 🙂 In fact, I think it’s rather easy to grasp after all you’ve gone through already. 🙂 So let’s give it a try.

When we have a simple direct current (DC), then we have a very straightforward definition of resistance (R), as mentioned above: it’s a simple ratio between the voltage (as measured in volt) and the current (as measured in ampere). Now, with alternating current (AC) circuits, it becomes more complicated, and so then it’s the concept of impedance that kicks in. Just like resistance, impedance also sort of measures the ‘opposition’ that a circuit presents to a current when a voltage is applied, but we have a complex ratio—literally: it’s a ratio with a magnitude and a direction, or a phase as it’s usually referred to. Hence, one will often write the impedance (denoted by Z) using Euler’s formula:

Z = |Z|eiθ

Now, if you don’t know anything about complex numbers, you should just skip all of what follows and go straight to the next section. However, if you do know what a complex number is (it’s an ‘arrow’, basically, and if θ is a variable, then it’s a rotating arrow, or a ‘stopwatch hand’, as Feynman calls it in his more popular Lectures on QED), then you may want to carry on reading.

The illustration below (credit goes to Wikipedia, once again) is, probably, the most generic view of an AC circuit that one can jot down. If we apply an alternating current, both the current as well as the voltage will go up and down. However, the current signal will lag the voltage signal, and the phase factor θ tells us by how much. Hence, using complex-number notation, we write:

V = IZ = I∗|Z|eiθ


Now, while that resembles the V = R·I formula I mentioned when discussing resistance, you should note the bold-face type for V and I, and the ∗ symbol I am using here for multiplication. First the ∗ symbol: that’s a convention Feynman adopts in the above-mentioned popular account of quantum mechanics. I like it, because it makes it very clear we’re not talking a vector cross product A×B here, but a product of two complex numbers. Now, that’s also why I write V and I in bold-face: they have a phase too and, hence, we can write them as:

  • = |V|ei(ωt + θV)
  • = |I|ei(ωt + θI)

This works out as follows:

IZ = |I|ei(ωt + θI)∗|Z|eiθ = |I||Z|ei(ωt + θ+ θ) = |V|ei(ωt + θV)

Indeed, because the equation must hold for all t, we can equate the magnitudes and phases and, hence, we get: |V| = |I||Z| and θ= θI + θ. But voltage and current is something real, isn’t it? Not some complex number? You’re right. The complex notation is used mainly to simplify the calculus, but it’s only the real part of those complex-valued functions that count. [In any case, because we limit ourselves to complex exponentials here, the imaginary part (which is the sine, as opposed to the real part, which is the cosine) is the same as the real part, but with a lag of its own (π/2 or 90 degrees, to be precise). Indeed: when writing Euler’s formula out (eiθ = cos(θ) + isin(θ), you should always remember that the sine and cosine function are basically the same function: they differ only in the phase, as is evident from the trigonometric identity sin(θ+π/) = cos(θ).]

Now, that should be more than enough in terms of an introduction to the units used in electromagnetic theory. Hence, let’s move on.

The electric constant ε0

Let’s now look at  that energy density formula once again. When looking at that u = ε0E2/2 formula, you may think that its unit should be the square of the unit in which we measure field strength. How do we measure field strength? It’s defined as the force on a unit charge (E = F/q), so it should be newton per coulomb (N/C). Because the coulomb can also be expressed in newton·meter/volt (1 V = 1 J/C = N·m/C and, hence, 1 C = 1 N·m/V), we can express field strength not only in newton/coulomb but also in volt per meter: 1 N/C = 1 N·V/N·m = 1 V/m. How do we get from N2/C2 and/or V2/mto J/m3?

Well… Let me first note there’s no issue in terms of units with that ρΦ formula in the first integral for U: [ρ]·[Φ] = (C/m3)·V = [(N·m/V)/m3)·V = (N·m)/m3 = J/m3. No problem whatsoever. It’s only that second expression for U, with the u = ε0E2/2 in the integrand, that triggers the question. Here, we just need to accept that we need that εfactor to make the units come out alright. Indeed, just like other physical constants (such as c, G, or h, for example), it has a dimension: its unit is either C2/N·m2 or, what amounts to the same, C/V·m. So the units come out alright indeed if, and only if, we multiply the N2/C2 and/or V2/m2 units with the dimension of ε0:

  1. (N2/C2)·(C2/N·m2) = (N2·m)·(1/m3) = J/m3
  2. (V2/m2)·(C/V·m) = V·C/m3 = (V·N·m/V)/m= N·m/m3 = J/m3


But so that’s the units only. The electric constant also has a numerical value:

ε0 = 8.854187817…×10−12 C/V·m ≈ 8.8542×10−12 C/V·m

This numerical value of εis as important as its unit to ensure both expressions for U yield the same result. Indeed, as you may or may not remember from the second of my two posts on vector calculus, if we have a curl-free field C (that means ×= 0 everywhere, which is the case when talking electrostatics only, as we are doing here), then we can always find some scalar field ψ such that C = ψ. But so here we have E = –ε0Φ, and so it’s not the minus sign that distinguishes the expression from the C = ψ expression, but the εfactor in front.

It’s just like the vector equation for heat flow: h = –κT. Indeed, we also have a constant of proportionality here, which is referred to as the thermal conductivity. Likewise, the electric constant εis also referred to as the permittivity of the vacuum (or of free space), for similar reasons obviously!

Natural units

You may wonder whether we can’t find some better units, so we don’t need the rather horrendous 8.8542×10−12 C/V·m factor (I am rounding to four digits after the decimal point). The answer is: yes, it’s possible. In fact, there are several systems in which the electric constant (and the magnetic constant, which we’ll introduce later) reduce to 1. The best-known are the so-called Gaussian and Lorentz-Heaviside units respectively.

Gauss defined the unit of charge in what is now referred to as the statcoulomb (statC), which is also referred to as the franklin (Fr) and/or the electrostatic unit of charge (esu), but I’ll refer you to the Wikipedia article on it in case you’d want to find out more about it. You should just note the definition of this unit is problematic in other ways. Indeed, it’s not so easy to try to define ‘natural units’ in physics, because there are quite a few ‘fundamental’ relations and/or laws in physics and, hence, equating this or that constant to one usually has implications on other constants. In addition, one should note that many choices that made sense as ‘natural’ units in the 19th century seem to be arbitrary now. For example:

  1. Why would we select the charge of the electron or the proton as the unit charge (+1 or –1) if we now assume that protons (and neutrons) consists of quarks, which have +2/3 or –1/3?
  2. What unit would we choose as the unit for mass, knowing that, despite all of the simplification that took place as a result of the generalized acceptance of the quark model, we’re still stuck with quite a few elementary particles whose mass would be a ‘candidate’ for the unit mass? Do we chose the electron, the u quark, or the d quark?

Therefore, the approach to ‘natural units’ has not been to redefine mass or charge or temperature, but the physical constants themselves. Obvious candidates are, of course, c and ħ, i.e. the speed of light and Planck’s constant. [You may wonder why physicists would select ħ, rather than h, as a ‘natural’ unit, but I’ll let you think about that. The answer is not so difficult.] That can be done without too much difficulty indeed, and so one can equate some more physical constants with one. The next candidate is the so-called Boltzmann constant (kB). While this constant is not so well known, it does pop up in a great many equations, including those that led Planck to propose his quantum of action, i.e.(see my post on Planck’s constant). When we do that – so when we equate c, ħ and kB with one (ħ = kB = 1), we still have a great many choices, so we need to impose further constraints. The next is to equate the gravitational constant with one, so then we have ħ = kB = G = 1.

Now, it turns out that the ‘solution’ of this ‘set’ of four equations (ħ = kB = G = 1) does, effectively, lead to ‘new’ values for most of our SI base units, most notably length, time, mass and temperature. These ‘new’ units are referred to as Planck units. You can look up their values yourself, and I’ll let you appreciate the ‘naturalness’ of the new units yourself. They are rather weird. The Planck length and time are usually referred to as the smallest possible measurable units of length and time and, hence, they are related to the so-called limits of quantum theory. Likewise, the Planck temperature is a related limit in quantum theory: it’s the largest possible measurable unit of temperature. To be frank, it’s hard to imagine what the scale of the Planck length, time and temperature really means. In contrast, the scale of the Planck mass is something we actually can imagine – it is said to correspond to the mass of an eyebrow hair, or a flea egg – but, again, its physical significance is not so obvious: Nature’s maximum allowed mass for point-like particles, or the mass capable of holding a single elementary charge. That triggers the question: do point-like charges really exist? I’ll come back to that question. But first I’ll conclude this little digression on units by introducing the so-called fine-structure constant, of which you’ve surely heard before.

The fine-structure constant

I wrote that the ‘set’ of equations ħ = kB = G = 1 gave us Planck units for most of our SI base units. It turns out that these four equations do not lead to a ‘natural’ unit for electric charge. We need to equate a fifth constant with one to get that. That fifth constant is Coulomb’s constant (often denoted as ke) and, yes, it’s the constant that appears in Coulomb’s Law indeed, as well as in some other pretty fundamental equations in electromagnetics, such as the field caused by a point charge q: E = q/4πε0r2. Hence, ke = 1/4πε0. So if we equate kwith one, then ε0 will, obviously, be equal to ε= 1/4π.

To make a long story short, adding this fifth equation to our set of five also gives us a Planck charge, and I’ll give you its value: it’s about 1.8755×10−18 C. As I mentioned that the elementary charge is 1 e ≈ 1.6022×10−19 C, it’s easy to that the Planck charge corresponds to some 11.7 times the charge of the proton. In fact, let’s be somewhat more precise and round, once again, to four digits after the decimal point: the qP/e ratio is about 11.7062. Conversely, we can also say that the elementary charge as expressed in Planck units, is about 1/11.7062 ≈ 0.08542455. In fact, we’ll use that ratio in a moment in some other calculation, so please jot it down.

0.08542455? That’s a bit of a weird number, isn’t it? You’re right. And trying to write it in terms of the charge of a u or d quark doesn’t make it any better. Also, note that the first four significant digits (8542) correspond to the first four significant digits after the decimal point of our εconstant. So what’s the physical significance here? Some other limit of quantum theory?

Frankly, I did not find anything on that, but the obvious thing to do is to relate is to what is referred to as the fine-structure constant, which is denoted by α. This physical constant is dimensionless, and can be defined in various ways, but all of them are some kind of ratio of a bunch of these physical constants we’ve been talking about:

Fine-structure constant formula

The only constants you have not seen before are μ0Rand, perhaps, ras well as m. However, these can be defined as a function of the constants that you did see before:

  1. The μ0 constant is the so-called magnetic constant. It’s something similar as ε0 and it’s referred to as the magnetic permeability of the vacuum. So it’s just like the (electric) permittivity of the vacuum (i.e. the electric constant ε0) and the only reason why you haven’t heard of this before is because we haven’t discussed magnetic fields so far. In any case, you know that the electric and magnetic force are part and parcel of the same phenomenon (i.e. the electromagnetic interaction between charged particles) and, hence, they are closely related. To be precise, μ= 1/ε0c2. That shows the first and second expression for α are, effectively, fully equivalent.
  2. Now, from the definition of ke = 1/4πε0, it’s easy to see how those two expressions are, in turn, equivalent with the third expression for α.
  3. The Rconstant is the so-called von Klitzing constant, but don’t worry about it: it’s, quite simply, equal to Rh/e2. Hene, substituting (and don’t forget that h = 2πħ) will demonstrate the equivalence of the fourth expression for α.
  4. Finally, the re factor is the classical electron radius, which is usually written as a function of me, i.e. the electron mass: re = e2/4πε0mec2. This very same equation implies that reme = e2/4πε0c2. So… Yes. It’s all the same really.

Let’s calculate its (rounded) value in the old units first, using the third expression:

  • The econstant is (roughly) equal to (1.6022×10–19 C)= 2.5670×10–38 C2. Coulomb’s constant k= 1/4πεis about 8.9876×10N·m2/C2. Hence, the numerator e2k≈ 23.0715×10–29 N·m2.
  • The (rounded) denominator is ħc = (1.05457×10–34 N·m·s)(2.998×108 m/s) = 3.162×10–26 N·m2.
  • Hence, we get α = kee2/ħc ≈ 7.297×10–3 = 0.007297.

Note that this number is, effectively, dimensionless. Now, the interesting thing is that if we calculate α using Planck units, we get an econstant that is (roughly) equal to 0.08542455= … 0.007297! Now, because all of the other constants are equal to 1 in Planck’s system of units, that’s equal to α itself. So… Yes ! The two values for α are one and the same in the two systems of units and, of course, as you might have guessed, the fine-structure constant is effectively dimensionless because it does not depend on our units of measurement. So what does it correspond to?

Now that would take me a very long time to explain, but let me try to summarize what it’s all about. In my post on quantum electrodynamics (QED) – so that’s the theory of light and matter basically and, most importantly, how they interact – I wrote about the three basic events in that theory, and how they are associated with a probability amplitude, so that’s a complex number, or an ‘arrow’, as Feynman puts it: something with (a) a magnitude and (b) a direction. We had to take the absolute square of these amplitudes in order to calculate the probability (i.e. some real number between 0 and 1) of the event actually happening. These three basic events or actions were:

  1. A photon travels from point A to B. To keep things simple and stupid, Feynman denoted this amplitude by P(A to B), and please note that the P stands for photon, not for probability. I should also note that we have an easy formula for P(A to B): it depends on the so-called space-time interval between the two points A and B, i.e. I = Δr– Δt= (x2–x1)2+(y2–y1)2+(z2–z1)– (t2–t1)2. Hence, the space-time interval takes both the distance in space as well as the ‘distance’ in time into account.
  2. An electron travels from point A to B: this was denoted by E(A to B) because… Well… You guessed it: the of electron. The formula for E(A to B) was much more complicated, but the two key elements in the formula was some complex number j (see below), and some other (real) number n.
  3. Finally, an electron could emit or absorb a photon, and the amplitude associated with this event was denoted by j, for junction.

Now, that junction number j is about –0.1. To be somewhat more precise, I should say it’s about –0.08542455.

–0.08542455? That’s a bit of a weird number, isn’t it? Hey ! Didn’t we see this number somewhere else? We did, but before you scroll up, let’s first interpret this number. It looks like an ordinary (real) number, but it’s an amplitude alright, so you should interpret it as an arrow. Hence, it can be ‘combined’ (i.e. ‘added’ or ‘multiplied’) with other arrows. More in particular, when you multiply it with another arrow, it amounts to a shrink to a bit less than one-tenth (because its magnitude is about 0.085 = 8.5%), and half a turn (the minus sign amounts to a rotation of 180°). Now, in that post of mine, I wrote that I wouldn’t entertain you on the difficulties of calculating this number but… Well… We did see this number before indeed. Just scroll up to check it. We’ve got a very remarkable result here:

j ≈ –0.08542455 = –√0.007297 = –√α = –e expressed in Planck units

So we find that our junction number j or – as it’s better known – our coupling constant in quantum electrodynamics (aka as the gauge coupling parameter g) is equal to the (negative) square root of that fine-structure constant which, in turn, is equal to the charge of the electron expressed in the Planck unit for electric charge. Now that is a very deep and fundamental result which no one seems to be able to ‘explain’—in an ‘intuitive’ way, that is.

I should immediately add that, while we can’t explain it, intuitively, it does make sense. A lot of sense actually. Photons carry the electromagnetic force, and the electromagnetic field is caused by stationary and moving electric charges, so one would expect to find some relation between that junction number j, describing the amplitude to emit or absorb a photon, and the electric charge itself, but… An equality? Really?

Well… Yes. That’s what it is, and I look forward to trying to understand all of this better. For now, however, I should proceed with what I set out to do, and that is to tie up a few loose ends. This was one, and so let’s move to the next, which is about the assumption of point charges.

Note: More popular accounts of quantum theory say α itself is ‘the’ coupling constant, rather than its (negative) square –√α = j = –e (expressed in Planck units). That’s correct: g or j are, technically speaking, the (gauge) coupling parameter, not the coupling constant. But that’s a little technical detail which shouldn’t bother you. The result is still what it is: very remarkable! I should also note that it’s often the value of the reciprocal (1/α) that is specified, i.e. 1/0.007297 ≈ 137.036. But so now you know what this number actually stands for. 🙂

Do point charges exist?

Feynman’s Lectures on electrostatics are interesting, among other things, because, besides highlighting the precision and successes of the theory, he also doesn’t hesitate to point out the contradictions. He notes, for example, that “the idea of locating energy in the field is inconsistent with the assumption of the existence of point charges.”


Yes. Let’s explore the point. We do assume point charges in classical physics indeed. The electric field caused by a point charge is, quite simply:

E = q/4πε0r2

Hence, the energy density u is ε0E2/2 = q2/32πε0r4. Now, we have that volume integral U = (ε0/2)∫EEdV = ∫(ε0E2/2)dV integral. As Feynman notes, nothing prevents us from taking a spherical shell for the volume element dV, instead of an infinitesimal cube. This spherical shell would have the charge q at its center, an inner radius equal to r, an infinitesimal thickness dr, and, finally, a surface area 4πr(that’s just the general formula for the surface area of a spherical shell, which I also noted above). Hence, its (infinitesimally small) volume is 4πr2dr, and our integral becomes:


To calculate this integral, we need to take the limit of –q2/8πε0r for (a) r tending to zero (r→0) and for (b) r tending to infinity (r→∞). The limit for r = ∞ is zero. That’s OK and consistent with the choice of our reference point for calculating the potential of a field. However, the limit for r = 0 is zero is infinity! Hence, that U = (ε0/2)∫EEdV basically says there’s an infinite amount of energy in the field of a point charge! How is that possible? It cannot be true, obviously.

So… Where did we do wrong?

Your first reaction may well be that this very particular approach (i.e. replacing our infinitesimal cubes by infinitesimal shells) to calculating our integral is fishy and, hence, not allowed. Maybe you’re right. Maybe not. It’s interesting to note that we run into similar problems when calculating the energy of a charged sphere. Indeed, we mentioned the formula for the capacity of a charged sphere: C = 4πε0r. Now, there’s a similarly easy formula for the energy of a charged sphere. Let’s look at how we charge a condenser:

  • We know that the potential difference between two plates of a condenser represents the work we have to do, per unit charge, to transfer a charge (Q) from one plate to the other. Hence, we can write V = ΔU/ΔQ.
  • We will, of course, want to do a differential analysis. Hence, we’ll transfer charges incrementally, one infinitesimal little charge dQ at the time, and re-write V as V = dU/dQ or, what amounts to the same: dU = V·dQ.
  • Now, we’ve defined the capacitance of a condenser as C = Q/V. [Again, don’t be confused: C stands for capacity here, measured in coulomb per volt, not for the coulomb unit.] Hence, we can re-write dU as dU = Q·dQ/C.
  • Now we have to integrate dU going from zero charge to the final charge Q. Just do a little bit of effort here and try it. You should get the following result: U = Q2/2C. [We could re-write this as U = (C2/V2)/2C =  C·V2/2, which is a form that may be more useful in some other context but not here.]
  • Using that C = 4πε0r formula, we get our grand result. The energy of a charged sphere is:

U = Q2/8πε0r

From that formula, it’s obvious that, if the radius of our sphere goes to zero, its energy should also go to infinity! So it seems we can’t really pack a finite charge Q in one single point. Indeed, to do that, our formula says we need an infinite amount of energy. So what’s going on here?

Nothing much. You should, first of all, remember how we got that integral: see my previous post for the full derivation indeed. It’s not that difficult. We first assumed we had pairs of charges qi and qfor which we calculated the total electrostatic energy U as the sum of the energies of all possible pairs of charges:

U 3

And, then, we looked at a continuous distribution of charge. However, in essence, we still did the same: we counted the energy of interaction between infinitesimal charges situated at two different points (referred to as point 1 and 2 respectively), with a 1/2 factor in front so as to ensure we didn’t double-count (there’s no way to write an integral that keeps track of the pairs so that each pair is counted only once):

U 4

Now, we reduced this double integral by a clever substitution to something that looked a bit better:

U 6

Finally, some more mathematical tricks gave us that U = (ε0/2)∫EEdV integral.

In essence, what’s wrong in that integral above is that it actually includes the energy that’s needed to assemble the finite point charge q itself from an infinite number of infinitesimal parts. Now that energy is infinitely large. We just can’t do it: the energy required to construct a point charge is ∞.

Now that explains the physical significance of that Planck mass ! We said Nature has some kind of maximum allowable mass for point-like particles, or the mass capable of holding a single elementary charge. What’s going on is, as we try to pile more charge on top of the charge that’s already there, we add energy. Now, energy has an equivalent mass. Indeed, the Planck charge (q≈ 1.8755×10−18 C), the Planck length (l= 1.616×10−35 m), the Planck energy (1.956×109 J), and the Planck mass (2.1765×10−8 kg) are all related. Now things start making sense. Indeed, we said that the Planck mass is tiny but, still, it’s something we can imagine, like a flea’s egg or the mass of a hair of a eyebrow. The associated energy (E = mc2, so that’s (2.1765×10−8 kg)·(2.998×108 m/s)2 ≈ 19.56×108 kg·m2/s= 1.956×109 joule indeed.

Now, how much energy is that? Well… That’s about 2 giga-joule, obviously, but so what’s that in daily life? It’s about the energy you would get when burning 40 liter of fuel. It’s also likely to amount, more or less, to your home electricity consumption over a month. So it’s sizable, and so we’re packing all that energy into a Planck volume (lP≈ 4×10−105 m3). If we’d manage that, we’d be able to create tiny black holes, because that’s what that little Planck volume would become if we’d pack so much energy in it. So… Well… Here I just have to refer you to more learned writers than I am. As Wikipedia notes dryly: “The physical significance of the Planck length is a topic of theoretical research. Since the Planck length is so many orders of magnitude smaller than any current instrument could possibly measure, there is no way of examining it directly.”

So… Well… That’s it for now. The point to note is that we would not have any theoretical problems if we’d assume our ‘point charge’ is actually not a point charge but some small distribution of charge itself. You’ll say: Great! Problem solved! 

Well… For now, yes. But Feynman rightly notes that assuming that our elementary charges do take up some space results in other difficulties of explanation. As we know, these difficulties are solved in quantum mechanics, but so we’re not supposed to know that when doing these classical analyses. 🙂

Fields and charges (I)

My previous posts focused mainly on photons, so this one should be focused more on matter-particles, things that have a mass and a charge. However, I will use it more as an opportunity to talk about fields and present some results from electrostatics using our new vector differential operators (see my posts on vector analysis).

Before I do so, let me note something that is obvious but… Well… Think about it: photons carry the electromagnetic force, but have no electric charge themselves. Likewise, electromagnetic fields have energy and are caused by charges, but so they also carry no charge. So… Fields act on a charge, and photons interact with electrons, but it’s only matter-particles (notably the electron and the proton, which is made of quarks) that actually carry electric charge. Does that make sense? It should. 🙂

Another thing I want to remind you of, before jumping into it all head first, are the basic units and relations that are valid always, regardless of what we are talking about. They are represented below:


Let me recapitulate the main points:

  • The speed of light is always the same, regardless of the reference frame (inertial or moving), and nothing can travel faster than light (except mathematical points, such as the phase velocity of a wavefunction).
  • This universal rule is the basis of relativity theory and the mass-energy equivalence relation E = mc2.
  • The constant speed of light also allows us to redefine the units of time and/or distance such that c = 1. For example, if we re-define the unit of distance as the distance traveled by light in one second, or the unit of time as the time light needs to travel one meter, then c = 1.
  • Newton’s laws of motion define a force as the product of a mass and its acceleration: F = m·a. Hence, mass is a measure of inertia, and the unit of force is 1 newton (N) = 1 kg·m/s2.
  • The momentum of an object is the product of its mass and its velocity: p = m·v. Hence, its unit is 1 kg·m/s = 1 N·s. Therefore, the concept of momentum combines force (N) as well as time (s).
  • Energy is defined in terms of work: 1 Joule (J) is the work done when applying a force of one newton over a distance of one meter: 1 J = 1 N·m. Hence, the concept of energy combines force (N) and distance (m).
  • Relativity theory establishes the relativistic energy-momentum relation pc = Ev/c, which can also be written as E2 = p2c+ m02c4, with mthe rest mass of an object (i.e. its mass when the object would be at rest, relative to the observer, of course). These equations reduce to m = E and E2 = p2 + m0when choosing time and/or distance units such that c = 1. The mass is the total mass of the object, including its inertial mass as well as the equivalent mass of its kinetic energy.
  • The relationships above establish (a) energy and time and (b) momentum and position as complementary variables and, hence, the Uncertainty Principle can be expressed in terms of both. The Uncertainty Principle, as well as the Planck-Einstein relation and the de Broglie relation (not shown on the diagram), establish a quantum of action, h, whose dimension combines force, distance and time (h ≈ 6.626×10−34 N·m·s). This quantum of action (Wirkung) can be defined in various ways, as it pops up in more than one fundamental relation, but one of the more obvious approaches is to define h as the proportionality constant between the energy of a photon (i.e. the ‘light particle’) and its frequency: h = E/ν.

Note that we talked about forces and energy above, but we didn’t say anything about the origin of these forces. That’s what we are going to do now, even if we’ll limit ourselves to the electromagnetic force only.


According to Wikipedia, electrostatics deals with the phenomena and properties of stationary or slow-moving electric charges with no acceleration. Feynman usually uses the term when talking about stationary charges only. If a current is involved (i.e. slow-moving charges with no acceleration), the term magnetostatics is preferred. However, the distinction does not matter all that much because  – remarkably! – with stationary charges and steady currents, the electric and magnetic fields (E and B) can be analyzed as separate fields: there is no interconnection whatsoever! That shows, mathematically, as a neat separation between (1) Maxwell’s first and second equation and (2) Maxwell’s third and fourth equation:

  1. Electrostatics: (i) ∇•E = ρ/ε0 and (ii) ×E = 0.
  2. Magnetostatics: (iii) c2∇×B = j0 and (iv) B = 0.

Electrostatics: The ρ in equation (i) is the so-called charge density, which describes the distribution of electric charges in space: ρ = ρ(x, y, z). To put it simply: ρ is the ‘amount of charge’ (which we’ll denote by Δq) per unit volume at a given point. As for ε0, that’s a constant which ensures all units are ‘compatible’. Equation (i) basically says we have some flux of E, the exact amount of which is determined by the charge density ρ or, more in general, by the charge distribution in space. As for equation (ii), i.e. ×E = 0, we can sort of forget about that. It means the curl of E is zero: everywhere, and always. So there’s no circulation of E. Hence, E is a so-called curl-free field, in this case at least, i.e. when only stationary charges and steady currents are involved.

Magnetostatics: The j in (iii) represents a steady current indeed, causing some circulation of B. The cfactor is related to the fact that magnetism is actually only a relativistic effect of electricity, but I can’t dwell on that here. I’ll just refer you to what Feynman writes about this in his Lectures, and warmly recommend to read it. Oh… Equation (iv), B = 0, means that the divergence of B is zero: everywhere, and always. So there’s no flux of B. None. So B is a divergence-free field.

Because of the neat separation, we’ll just forget about B and talk about E only.

The electric potential

OK. Let’s try to go through the motions as quickly as we can. As mentioned in my introduction, energy is defined in terms of work done. So we should just multiply the force and the distance, right? 1 Joule = 1 newton × 1 meter, right? Well… Yes and no. In discussions like this, we talk potential energy, i.e. energy stored in the system, so to say. That means that we’re looking at work done against the force, like when we carry a bucket of water up to the third floor or, to use a somewhat more scientific description of what’s going on, when we are separating two masses. Because we’re doing work against the force, we put a minus sign in front of our integral:

formula 1

Now, the electromagnetic force works pretty much like gravity, except that, when discussing gravity, we only have positive ‘charges’ (the mass of some object is always positive). In electromagnetics, we have positive as well as negative charge, and please note that two like charges repel (that’s not the case with gravity). Hence, doing work against the electromagnetic force may involve bringing like charges together or, alternatively, separating opposite charges. We can’t say. Fortunately, when it comes to the math of it, it doesn’t matter: we will have the same minus sign in front of our integral. The point is: we’re doing work against the force, and so that’s what the minus sign stands for. So it has nothing to do with the specifics of the law of attraction and repulsion in this case (electromagnetism as opposed to gravity) and/or the fact that electrons carry negative charge. No.

Let’s get back to the integral. Just in case you forgot, the integral sign ∫ stands for an S: the S of summa, i.e. sum in Latin, and we’re using these integrals because we’re adding an infinite number of infinitesimally small contributions to the total effort here indeed. You should recognize it, because it’s a general formula for energy or work. It is, once again, a so-called line integral, so it’s a bit different than the ∫f(x)dx stuff you learned from high school. Not very different, but different nevertheless. What’s different is that we have a vector dot product F•ds after the integral sign here, so that’s not like f(x)dx. In case you forgot, that f(x)dx product represents the surface of an infinitesimally rectangle, as shown below: we make the base of the rectangle smaller and smaller, so dx becomes an infinitesimal indeed. And then we add them all up and get the area under the curve. If f(x) is negative, then the contributions will be negative.


But so we don’t have little rectangles here. We have two vectors, F and ds, and their vector dot product, F•ds, which will give you… Well… I am tempted to write: the tangential component of the force along the path, but that’s not quite correct: if ds was a unit vector, it would be true—because then it’s just like that h•n product I introduced in our first vector calculus class. However, ds is not a unit vector: it’s an infinitesimal vector, and, hence, if we write the tangential component of the force along the path as Ft, then F•d= |F||ds|cosθ = F·cosθ·ds = Ft·ds. So this F•ds is a tangential component over an infinitesimally small segment of the curve. In short, it’s an infinitesimally small contribution to the total amount of work done indeed. You can make sense of this by looking at the geometrical representation of the situation below.

illustration 1

I am just saying this so you know what that integral stands for. Note that we’re not adding arrows once again, like we did when calculating amplitudes or so. It’s all much more straightforward really: a vector dot product is a scalar, so it’s just some real number—just like any component of a vector (tangential, normal, in the direction of one of the coordinates axes, or in whatever direction) is not a vector but a real number. Hence, W is also just some real number. It can be positive or negative because… Well… When we’d be going down the stairs with our bucket of water, our minus sign doesn’t disappear. Indeed, our convention to put that minus sign there should obviously not depend on what point a and b we’re talking about, so we may actually be going along the direction of the force when going from a to b.

As a matter of fact, you should note that’s actually the situation which is depicted above. So then we get a negative number for W. Does that make sense? Of course it does: we’re obviously not doing any work here as we’re moving along the direction, so we’re surely not adding any (potential) energy to the system. On the contrary, we’re taking energy out of the system. Hence, we are reducing its (potential) energy and, hence, we should have a negative value for W indeed. So, just think of the minus sign being there to ensure we add potential energy to the system when going against the force, and reducing it when going with the force.

OK. You get this. You probably also know we’ll re-define W as a difference in potential between two points, which we’ll write as Φ(b) – Φ(a). Now that should remind you of your high school integral ∫f(x)dx once again. For a definite integral over a line segment [a, b], you’d have to find the antiderivative of f(x), which you’d write as F(x), and then you’d take the difference F(b) – F(a) too. Now, you may or may not remember that this antiderivative was actually a family of functions F(x) + k, and k could be any constant – 5/9, 6π, 3.6×10124, 0.86, whatever! – because such constant vanishes when taking the derivative.

Here we have the same, we can define an infinite number of functions Φ(r) + k, of which the gradient will yield… Stop! I am going too fast here. First, we need to re-write that W function above in order to ensure we’re calculating stuff in terms of the unit charge, so we write:

unit chage

Huh? Well… Yes. I am using the definition of the field E here really: E is the force (F) when putting a unit charge in the field. Hence, if we want the work done per unit charge, i.e. W(unit), then we have to integrate the vector dot product E·ds over the path from a to b. But so now you see what I want to do. It makes the comparison with our high school integral complete. Instead of taking a derivative in regard to one variable only, i.e. dF(x)/dx) = f(x), we have a function Φ here not in one but in three variables: Φ = Φ(x, y, z) = Φ(r) and, therefore, we have to take the vector derivative (or gradient as it’s called) of Φ to get E:

Φ(x, y, z) = (∂Φ/∂x, ∂Φ/∂y, ∂Φ/∂z) = –E(x, y, z)

But so it’s the same principle as what you learned how to use to solve your high school integral. Now, you’ll usually see the expression above written as:

E = –Φ

Why so short? Well… We all just love these mysterious abbreviations, don’t we? 🙂 Jokes aside, it’s true some of those vector equations pack an awful lot of information. Just take Feynman’s advice here: “If it helps to write out the components to be sure you understand what’s going on, just do it. There is nothing inelegant about that. In fact, there is often a certain cleverness in doing just that.” So… Let’s move on.

I should mention that we can only apply this more sophisticated version of the ‘high school trick’ because Φ and E are like temperature (T) and heat flow (h): they are fields. T is a scalar field and h is a vector field, and so that’s why we can and should apply our new trick: if we have the scalar field, we can derive the vector field. In case you want more details, I’ll just refer you to our first vector calculus class. Indeed, our so-called First Theorem in vector calculus was just about the more sophisticated version of the ‘high school trick’: if we have some scalar field ψ (like temperature or potential, for example: just substitute the ψ in the equation below for T or Φ), then we’ll always find that:

First theorem

The Γ here is the curve between point 1 and 2, so that’s the path along which we’re going, and ψ must represent some vector field.

Let’s go back to our W integral. I should mention that it doesn’t matter what path we take: we’ll always get the same value for W, regardless of what path we take. That’s why the illustration above showed two possible paths: it doesn’t matter which one we take. Again, that’s only because E is a vector field. To be precise, the electrostatic field is a so-called conservative vector field, which means that we can’t get energy out of the field by first carrying some charge along one path, and then carrying it back along another. You’ll probably find that’s obvious,  and it is. Just note it somewhere in the back of your mind.

So we’re done. We should just substitute E for Φ, shouldn’t we? Well… Yes. For minus Φ, that is. Another minus sign. Why? Well… It makes that W(unit) integral come out alright. Indeed, we want a formula like W = Φ(b) – Φ(a), not like Φ(a) – Φ(b). Look at it. We could, indeed, define E as the (positive) gradient of some scalar field ψ = –Φ, and so we could write E = ψ, but then we’d find that W = –[ψ(b) – ψ(a)] = ψ(a) – ψ(b).

You’ll say: so what? Well… Nothing much. It’s just that our field vectors would point from lower to higher values of ψ, so they would be flowing uphill, so to say. Now, we don’t want that in physics. Why? It just doesn’t look good. We want our field vectors to be directed from higher potential to lower potential, always. Just think of it: heat (h) flows from higher temperature (T) to lower, and Newton’s apple falls from greater to lower height. Likewise, when putting a unit charge in the field, we want to see it move from higher to lower electric potential. Now, we can’t change the direction of E, because that’s the direction of the force and Nature doesn’t care about our conventions and so we can’t choose the direction of the force. But we can choose our convention. So that’s why we put a minus sign in front of Φ when writing E = –Φ. It makes everything come out alright. 🙂 That’s why we also have a minus sign in the differential heat flow equation: h = –κT.

So now we have the easy W(unit) = Φ(b) – Φ(a) formula that we wanted all along. Now, note that, when we say a unit charge, we mean a plus one charge. Yes: +1. So that’s the charge of the proton (it’s denoted by e) so you should stop thinking about moving electrons around! [I am saying this because I used to confuse myself by doing that. You end up with the same formulas for W and Φ but it just takes you longer to get there, so let me save you some time here. :-)]

But… Yes? In reality, it’s electrons going through a wire, isn’t? Not protons. Yes. But it doesn’t matter. Units are units in physics, and they’re always +1, for whatever (time, distance, charge, mass, spin, etcetera). AlwaysFor whatever. Also note that in laboratory experiments, or particle accelerators, we often use protons instead of electrons, so there’s nothing weird about it. Finally, and most fundamentally, if we have a –e charge moving through a neutral wire in one direction, then that’s exactly the same as a +e charge moving in the other way.

Just to make sure you get the point, let’s look at that illustration once again. We already said that we have F and, hence, E pointing from a to b and we’ll be reducing the potential energy of the system when moving our unit charge from a to b, so W was some negative value. Now, taking into account we want field lines to point from higher to lower potential, Φ(a) should be larger than Φ(b), and so… Well.. Yes. It all makes sense: we have a negative difference Φ(b) – Φ(a) = W(unit), which amounts, of course, to the reduction in potential energy.

The last thing we need to take care of now, is the reference point. Indeed, any Φ(r) + k function will do, so which one do we take? The approach here is to take a reference point Pat infinity. What’s infinity? Well… Hard to say. It’s a place that’s very far away from all of the charges we’ve got lying around here. Very far away indeed. So far away we can say there is nothing there really. No charges whatsoever. 🙂 Something like that. 🙂 In any case. I need to move on. So Φ(P0) is zero and so we can finally jot down the grand result for the electric potential Φ(P) (aka as the electrostatic or electric field potential):


So now we can calculate all potentials, i.e. when we know where the charges are at least. I’ve shown an example below. As you can see, besides having zero potential at infinity, we will usually also have one or more equipotential surfaces with zero potential. One could say these zero potential lines sort of ‘separate’ the positive and negative space. That’s not a very scientifically accurate description but you know what I mean.


Let me make a few final notes about the units. First, let me, once again, note that our unit charge is plus one, and it will flow from positive to negative potential indeed, as shown below, even if we know that, in an actual electric circuit, and so now I am talking about a copper wire or something similar, that means the (free) electrons will move in the other direction.

1280px-Current_notationIf you’re smart (and you are), you’ll say: what about the right-hand rule for the magnetic force? Well… We’re not discussing the magnetic force here but, because you insist, rest assured it comes out alright. Look at the illustration below of the magnetic force on a wire with a current, which is a pretty standard one.

terminalSo we have a given B, because of the bar magnet, and then v, the velocity vector for the… Electrons? No. You need to be consistent. It’s the velocity vector for the unit charges, which are positive (+e). Now just calculate the force F = qv×B = ev×B using the right-hand rule for the vector cross product, as illustrated below. So v is the thumb and B is the index finger in this case. All you need to do is tilt your hand, and it comes out alright.


But… We know it’s electrons going the other way. Well… If you insist. But then you have to put a minus sign in front of the q, because we’re talking minus e (–e). So now v is in the other direction and so v×B is in the other direction indeed, but our force F = qv×B = –ev×is not. Fortunately not, because physical reality should not depend on our conventions. 🙂 So… What’s the conclusion. Nothing. You may or may not want to remember that, when we say that our current j current flows in this or that direction, we actually might be talking electrons (with charge minus one) flowing in the opposite direction, but then it doesn’t matter. In addition, as mentioned above, in laboratory experiments or accelerators, we may actually be talking protons instead of electrons, so don’t assume electromagnetism is the business of electrons only.

To conclude this disproportionately long introduction (we’re finally ready to talk more difficult stuff), I should just make a note on the units. Electric potential is measured in volts, as you know. However, it’s obvious from all that I wrote above that it’s the difference in potential that matters really. From the definition above, it should be measured in the same unit as our unit for energy, or for work, so that’s the joule. To be precise, it should be measured in joule per unit charge. But here we have one of the very few inconsistencies in physics when it comes to units. The proton is said to be the unit charge (e), but its actual value is measured in coulomb (C). To be precise: +1 e = 1.602176565(35)×10−19 C. So we do not measure voltage – sorry, potential difference 🙂 – in joule but in joule per coulomb (J/C).

Now, we usually use another term for the joule/coulomb unit. You guessed it (because I said it): it’s the volt (V). One volt is one joule/coulomb: 1 V = 1 J/C. That’s not fair, you’ll say. You’re right, but so the proton charge e is not a so-called SI unit. Is the Coulomb an SI unit? Yes. It’s derived from the ampere (A) which, believe it or not, is actually an SI base unit. One ampere is 6.241×1018 electrons (i.e. one coulomb) per second. You may wonder how the ampere (or the coulomb) can be a base unit. Can they be expressed in terms of kilogram, meter and second, like all other base units. The answer is yes but, as you can imagine, it’s a bit of a complex description and so I’ll refer you to the Web for that.

The Poisson equation

I started this post by saying that I’d talk about fields and present some results from electrostatics using our ‘new’ vector differential operators, so it’s about time I do that. The first equation is a simple one. Using our E = –Φ formula, we can re-write the ∇•E = ρ/ε0 equation as:

∇•E = ∇•∇Φ = ∇2Φ = –ρ/ε0

This is a so-called Poisson equation. The ∇2 operator is referred to as the Laplacian and is sometimes also written as Δ, but I don’t like that because it’s also the symbol for the total differential, and that’s definitely not the same thing. The formula for the Laplacian is given below. Note that it acts on a scalar field (i.e. the potential function Φ in this case).

LaplacianAs Feynman notes: “The entire subject of electrostatics is merely the study of the solutions of this one equation.” However, I should note that this doesn’t prevent Feynman from devoting at least a dozen of his Lectures on it, and they’re not the easiest ones to read. [In case you’d doubt this statement, just have a look at his lecture on electric dipoles, for example.] In short: don’t think the ‘study of this one equation’ is easy. All I’ll do is just note some of the most fundamental results of this ‘study’.

Also note that ∇•E is one of our ‘new’ vector differential operators indeed: it’s the vector dot product of our del operator () with E. That’s something very different than, let’s say, Φ. A little dot and some bold-face type make an enormous difference here. 🙂 You may or may remember that we referred to the ∇• operator as the divergence (div) operator (see my post on that).

Gauss’ Law

Gauss’ Law is not to be confused with Gauss’ Theorem, about which I wrote elsewhere. It gives the flux of E through a closed surface S, any closed surface S really, as the sum of all charges inside the surface divided by the electric constant ε(but then you know that constant is just there to make the units come out alright).

Gauss' Law

The derivation of Gauss’ Law is a bit lengthy, which is why I won’t reproduce it here, but you should note its derivation is based, mainly, on the fact that (a) surface areas are proportional to r2 (so if we double the distance from the source, the surface area will quadruple), and (b) the magnitude of E is given by an inverse-square law, so it decreases as 1/r2. That explains why, if the surface S describes a sphere, the number we get from Gauss’ Law is independent of the radius of the sphere. The diagram below (credit goes to Wikipedia) illustrates the idea.


The diagram can be used to show how a field and its flux can be represented. Indeed, the lines represent the flux of E emanating from a charge. Now, the total number of flux lines depends on the charge but is constant with increasing distance because the force is radial and spherically symmetric. A greater density of flux lines (lines per unit area) means a stronger field, with the density of flux lines (i.e. the magnitude of E) following an inverse-square law indeed, because the surface area of a sphere increases with the square of the radius. Hence, in Gauss’ Law, the two effect cancel out: the two factors vary with distance, but their product is a constant.

Now, if we describe the location of charges in terms of charge densities (ρ), then we can write Qint as:

Q int

Now, Gauss’ Law also applies to an infinitesimal cubical surface and, in one of my posts on vector calculus, I showed that the flux of E out of such cube is given by E·dV. At this point, it’s probably a good idea to remind you of what this ‘new’ vector differential operator •, i.e. our ‘divergence’ operator, stands for: the divergence of E (i.e. • applied to E, so that’s E) represents the volume density of the flux of E out of an infinitesimal volume around a given point. Hence, it’s the flux per unit volume, as opposed to the flux out of the infinitesimal cube itself, which is the product of and dV, i.e. E·dV.

So what? Well… Gauss’ Law applied to our infinitesimal volume gives us the following equality:

ES 1

That, in turn, simplifies to:

ES 2

So that’s Maxwell’s first equation once again, which is equivalent to our Poisson equation: E = ∇2Φ = –ρ/ε0. So what are we doing here? Just listing equivalent formulas? Yes. I should also note they can be derived from Coulomb’s law of force, which is probably the one you learned in high school. So… Yes. It’s all consistent. But then that’s what we should expect, of course. 🙂

The energy in a field

All these formulas look very abstract. It’s about time we use them for something. A lot of what’s written in Feynman’s Lectures on electrostatics is applied stuff indeed: it focuses, among other things, on calculating the potential in various circumstances and for various distributions of charge. Now, funnily enough, while that E = –ρ/ε0 equation is equivalent to Coulomb’s law and, obviously, much more compact to write down, Coulomb’s law is easier to start with for basic calculations. Let me first write Coulomb’s law. You’ll probably recognize it from your high school days:

Coulomb's law

Fis the force on charge q1, and Fis the force on charge q2. Now, qand q2. may attract or repel each other but, in both cases, the forces will be equal and opposite. [In case you wonder, yes, that’s basically the law of action and reaction.] The e12 vector is the unit vector from qto q1, not from qto q2, as one might expect. That’s because we’re not talking gravity here: like charges do not attract but repel and, hence, we have to switch the order here. Having said that, that’s basically the only peculiar thing about the equation. All the rest is standard:

  1. The force is inversely proportional to the square of the distance and so we have an inverse-square law here indeed.
  2. The force is proportional to the charge(s).
  3. Finally, we have a proportionality constant, 1/4πε0, which makes the units come out alright. You may wonder why it’s written the way it’s written, i.e. with that 4π factor, but that factor (4π or 2π) actually disappears in a number of calculations, so then we will be left with just a 1/ε0 or a 1/2ε0 factor. So don’t worry about it.

We want to calculate potentials and all that, so the first thing we’ll do is calculate the force on a unit charge. So we’ll divide that equation by q1, to calculate E(1) = F1/q1:

E 1

Piece of cake. But… What’s E(1) really? Well… It’s the force on the unit charge (+e), but so it doesn’t matter whether or not that unit charge is actually there, so it’s the field E caused by a charge q2. [If that doesn’t make sense to you, think again.] So we can drop the subscripts and just write:

E 3

What a relief, isn’t it? The simplest formula ever: the (magnitude) of the field as a simple function of the charge q and its distance (r) from the point that we’re looking at, which we’ll write as P = (x, y, z). But what origin are we using to measure x, y and z. Don’t be surprised: the origin is q.

Now that’s a formula we can use in the Φ(P) integral. Indeed, the antiderivative is ∫(q/4πε0r2)dr. Now, we can bring q/4πε0 out and so we’re left with ∫(1/r2)dr. Now ∫(1/r2)dr is equal to –1/r + k, and so the whole antiderivative is –q/4πε0r + k. However, the minus sign cancels out with the minus sign in front of the Φ(P) = Φ(x, y, z)  integral, and so we get:

E 4

You should just do the integral to check this result. It’s the same integral but with P0 (infinity) as point a and P as point b in the integral, so we have ∞ as start value and r as end value. The integral then yields Φ(P) – Φ(P0) = –q/4πε0[1/r – 1/∞). [The k constant falls away when subtracting Φ(P0) from Φ(P).] But 1/∞ = 0, and we had a minus sign in front of the integral, which cancels the sign of –q/4πε0. So, yes, we get the wonderfully simple result above. Also please do quickly check if it makes sense in terms of sign: the unit charge is +e, so that’s a positive charge. Hence, Φ(x, y, z) will be positive if the sign of q is also positive, but negative if q would happen to be negative. So that’s OK.

Also note that the potential – which, remember, represents the amount of work to be done when bringing a unit charge (e) from infinity to some distance r from a charge q – is proportional to the charge of q. We also know that the force and, hence, the work is proportional to the charge that we are bringing in (that’s how we calculated the work per unit in the first place: by dividing the total amount of work by the charge). Hence, if we’d not bring some unit charge but some other charge q2, the work done would also be proportional to q2. Now, we need to make sure we understand what we’re writing and so let’s tidy up and re-label our first charge once again as q1, and the distance r as r12, because that’s what r is: the distance between the two charges. We then have another obvious but nice result: the work done in bringing two charges together from a large distance (infinity) is

U 1Now, one of the many nice properties of fields (scalar or vector fields) and the associated energies (because that’s what we are talking about here) is that we can simply add up contributions. For example, if we’d have many charges and we’d want to calculate the potential Φ at a point which we call 1, we can use the same Φ(r) = q/4πε0r formula which we had derived for one charge only, for all charges, and then we simply add the contributions of each to get the total potential:

P 1

Now that we’re here, I should, of course, also give the continuum version of this formula, i.e. the formula used when we’re talking charge densities rather than individual charges. The sum then becomes an infinite sum (i.e. an integral), and qj (note that j goes from 2 to n) becomes a variable which we write as ρ(2). We get:

U 2

Going back to the discrete situation, we get the same type of sum when bringing multiple pairs of charges qi and qj together. Hence, the total electrostatic energy U is the sum of the energies of all possible pairs of charges:

U 3It’s been a while since you’ve seen any diagram or so, so let me insert one just to reassure you it’s as simple as that indeed:

U system

Now, we have to be aware of the risk of double-counting, of course. We should not be adding qiqj/4πε0rij twice. That’s why we write ‘all pairs’ under the ∑ summation sign, instead of the usual i, j subscripts. The continuum version of this equation below makes that 1/2 factor explicit:

U 4

Hmm… What kind of integral is that? It’s a so-called double integral because we have two variables here. Not easy. However, there’s a lucky break. We can use the continuum version of our formula for Φ(1) to get rid of the ρ(2) and dV2 variables and reduce the whole thing to a more standard ‘single’ integral. Indeed, we can write:

U 5Now, because our point (2) no longer appears, we can actually write that more elegantly as:

U 6That looks nice, doesn’t it? But do we understand it? Just to make sure. Let me explain it. The potential energy of the charge ρdV is the product of this charge and the potential at the same point. The total energy is therefore the integral over ϕρdV, but then we are counting energies twice, so that’s why we need the 1/2 factor. Now, we can write this even more beautifully as:

U 7

Isn’t this wonderful? We have an expression for the energy of a field, not in terms of the charges or the charge distribution, but in terms of the field they produce.

I am pretty sure that, by now, you must be suffering from ‘formula overload’, so you probably are just gazing at this without even bothering to try to understand. Too bad, and you should take a break then or just go do something else, like biking or so. 🙂

First, you should note that you know this EE expression already: EE is just the square of the magnitude of the field vector E, so EE = E2. That makes sense because we know, from what we know about waves, that the energy is always proportional to the square of an amplitude, and so we’re just writing the same here but with a little proportionality constant (ε0).

OK, you’ll say. But you probably still wonder what use this formula could possibly have. What is that number we get from some integration over all space? So we associate the Universe with some number and then what? Well… Isn’t that just nice? 🙂 Jokes aside, we’re actually looking at that EE = Eproduct inside of the integral as representing an energy density (i.e. the energy per unit volume). We’ll denote that with a lower-case symbol and so we write:

D 6

Just to make sure you ‘get’ what we’re talking about here: u is the energy density in the little cube dV in the rather simplistic (and, therefore, extremely useful) illustration below (which, just like most of what I write above, I got from Feynman).


Now that should make sense to you—I hope. 🙂 In any case, if you’re still with me, and if you’re not all formula-ed out you may wonder how we get that ε0EE = ε0E2 expression from that ρΦ expression. Of course, you know that E = –∇Φ, and we also have the Poisson equation ∇2Φ = –ρ/ε0, but that doesn’t get you very far. It’s one of those examples where an easy-looking formula requires a lot of gymnastics. However, as the objective of this post is to do some of that, let me take you through the derivation.

Let’s do something with that Poisson equation first, so we’ll re-write it as ρ = –ε02Φ, and then we can substitute ρ in the integral with the ρΦ product. So we get:

U 8

Now, you should check out those fancy formulas with our new vector differential operators which we listed in our second class on vector calculus, but, unfortunately, none of them apply. So we have to write it all out and see what we get:

D 1

Now that looks horrendous and so you’ll surely think we won’t get anywhere with that. Well… Physicists don’t despair as easily as we do, it seems, and so they do substitute it in the integral which, of course, becomes an even more monstrous expression, because we now have two volume integrals instead of one! Indeed, we get:

D 2But if Φ is a vector field (it’s minus E, remember!), then ΦΦ is a vector field too, and we can then apply Gauss’ Theorem, which we mentioned in our first class on vector calculus, and which – mind you! – has nothing to do with Gauss’ Law. Indeed, Gauss produced so much it’s difficult to keep track of it all. 🙂 So let me remind you of this theorem. [I should also show why ΦΦ still yields a field, but I’ll assume you believe me.] Gauss’ Theorem basically shows how we can go from a volume integral to a surface integral:

Gauss Theorem-2If we apply this to the second integral in our U expression, we get:

D 4

So what? Where are we going with this? Relax. Be patient. What volume and surface are we talking about here? To make sure we have all charges and influences, we should integrate over all space and, hence, the surface goes to infinity. So we’re talking a (spherical) surface of enormous radius R whose center is the origin of our coordinate system. I know that sounds ridiculous but, from a math point of view, it is just the same like bringing a charge in from infinity, which is what we did to calculate the potential. So if we don’t difficulty with infinite line integrals, we should not have difficulty with infinite surface and infinite volumes. That’s all I can, so… Well… Let’s do it.

Let’s look at that product ΦΦ•n in the surface integral. Φ is a scalar and Φ is a vector, and so… Well… Φ•is a scalar too: it’s the normal component of Φ = –E. [Just to make sure, you should note that the way we define the normal unit vector n is such that ∇Φ•n is some positive number indeed! So n will point in the same direction, more or less, as ∇Φ = –E. So the θ angle  between ∇Φ = –E and n is surely less than ± 90° and, hence, the cosine factor in the ∇Φ•= |∇Φ||n|cosθ = |∇Φ|cosθ is positive, and so the whole vector dot product is positive.]

So, we have a product of two scalars here.  What happens with them if R goes to infinity? Well… The potential varies as 1/r as we’re going to infinity. That’s obvious from that Φ = (q/4πε0)(1/r) formula: just think of q as some kind of average now, which works because we assume all charges are located within some finite distance, while we’re going to infinity. What about Φ•n? Well… Again assuming that we’re reasonably far away from the charges, we’re talking the density of flux lines here (i.e. the magnitude of E) which, as shown above, follows an inverse-square law, because the surface area of a sphere increases with the square of the radius. So Φ•n varies not as 1/r but as 1/r2. To make a long story short, the whole product ΦΦ•n falls of as 1/r goes to infinity. Now, we shouldn’t forget we’re integrating a surface integral here, with r = R, and so it’s R going to infinity. So that surface integral has to go to zero when we include all space. The volume integral still stands however, so our formula for U now consists of one term only, i.e. the volume integral, and so we now have:

D 5

Done !

What’s left?

In electrostatics? Lots. Electric dipoles (like polar molecules), electrolytes, plasma oscillations, ionic crystals, electricity in the atmosphere (like lightning!), dielectrics and polarization (including condensers), ferroelectricity,… As soon as we try to apply our theory to matter, things become hugely complicated. But the theory works. Fortunately! 🙂 I have to refer you to textbooks, though, in case you’d want to know more about it. [I am sure you don’t, but then one never knows.]

What I wanted to do is to give you some feel for those vector and field equations in the electrostatic case. We now need to bring magnetic field back into the picture and, most importantly, move to electrodynamics, in which the electric and magnetic field do not appear as completely separate things. No! In electrodynamics, they are fully interconnected through the time derivatives ∂E/∂t and ∂B/∂t. That shows they’re part and parcel of the same thing really: electromagnetism. 

But we’ll try to tackle that in future posts. Goodbye for now!

The wave-particle duality revisited

As an economist, having some knowledge of what’s around in my field (social science), I think I am well-placed to say that physics is not an easy science. Its ‘first principles’ are complicated, and I am not ashamed to say that, after more than a year of study now, I haven’t reached what I would call a ‘true understanding’ of it.

Sometimes, the teachers are to be blamed. For example, I just found out that, in regard to the question of the wave function of a photon, the answer of two nuclear scientists was plain wrong. Photons do have a de Broglie wave, and there is a fair amount of research and actual experimenting going on trying to measure it. One scientific article which I liked in particular, and I hope to fully understand a year from now or so, is on such ‘direct measurement of the (quantum) wavefunction‘. For me, it drove home the message that these idealized ‘thought experiments’ that are supposed to make autodidacts like me understand things better, are surely instructive in regard to the key point, but confusing in other respects.

A typical example of such idealized thought experiment is the double-slit experiment with ‘special detectors’ near the slits, which may or may not detect a photon, depending on whether or not they’re switched on as well as on their accuracy. Depending on whether or not the detectors are switched on, and their accuracy, we get full interference (a), no interference (b), or a mixture of (a) and (b), as shown in (c) and (d).

set-up photons double-slit photons - results

I took the illustrations from Feynman’s lovely little book, QED – The Strange Theory of Light and Matter, and he surely knows what he’s talking about. Having said that, the set-up raises a key question in regard to these detectors: how do they work, exactly? More importantly, how do they disturb the photons?

I googled for actual double-slit experiments with such ‘special detectors’ near the slits, but only found such experiments for electrons. One of these, a 2010 experiment of an Italian team, suggests that it’s the interaction between the detector and the electron wave that may cause the interference pattern to disappear. The idea is shown below. The electron is depicted as an incoming plane wave, which breaks up as it goes through the slits. The slit on the left has no ‘filter’ (which you may think of as a detector) and, hence, the plane wave goes through as a cylindrical wave. The slit on the right-hand side is covered by a ‘filter’ made of several layers of ‘low atomic number material’, so the electron goes through but, at the same time, the barrier creates a spherical wave as it goes through. The researchers note that “the spherical and cylindrical wave do not have any phase correlation, and so even if an electron passed through both slits, the two different waves that come out cannot create an interference pattern on the wall behind them.” [Needless to say, while being represented as ‘real’ waves here, the ‘waves’ are, in fact, complex-valued psi functions.]

double-slit experiment

In fact, to be precise, there actually still was an interference effect if the filter was thin enough. Let me quote the reason for that: “The thicker the filter, the greater the probability for inelastic scattering. When the electron suffers inelastic scattering, it is localized. This means that its wavefunction collapses and, after the measurement act, it propagates roughly as a spherical wave from the region of interaction, with no phase relation at all with other elastically or inelastically scattered electrons. If the filter is made thick enough, the interference effects cancels out almost completely.”

This, of course, doesn’t solve the mystery. The mystery, in such experiments, is that, when we put detectors, it is either the detector at A or the detector at B that goes off. They should never go off together—”at half strength, perhaps?”, as Feynman puts it. That’s why I used italics when writing “even if an electron passed through both slits.” The electron, or the photon in a similar set-up, is not supposed to do that. As mentioned above, the wavefunction collapses or reduces. Now that’s where these so-called ‘weak measurement’ experiments come in: they indicate the interaction doesn’t have to be that way. It’s not all or nothing: our observations should not necessarily destroy the wavefunction. So, who knows, perhaps we will be able, one day, to show that the wavefunction does go through both slits, as it should (otherwise the interference pattern cannot be explained), and then we will have resolved the paradox.

I am pretty sure that, when that’s done, physicists will also be able to relate the image of a photon as a transient electromagnetic wave (first diagram below), being emitted by an atomic oscillator for a few nanoseconds only (we gave the example for sodium light, for which the decay time was 3.2×10–8 seconds) with the image of a photon as a de Broglie wave (second diagram below). I look forward to that day. I think it will come soon.

Photon wavePhoton wave


In the previous posts, I showed how the ‘real-world’ properties of photons and electrons emerge out of very simple mathematical notions and shapes. The basic notions are time and space. The shape is the wavefunction.

Let’s recall the story once again. Space is an infinite number of three-dimensional points (x, y, z), and time is a stopwatch hand going round and round—a cyclical thing. All points in space are connected by an infinite number of paths – straight or crooked, whatever  – of which we measure the length. And then we have ‘photons’ that move from A to B, but so we don’t know what is actually moving in space here. We just associate each and every possible path (in spacetime) between A and B with an amplitude: an ‘arrow‘ whose length and direction depends on (1) the length of the path l (i.e. the ‘distance’ in space measured along the path, be it straight or crooked), and (2) the difference in time between the departure (at point A) and the arrival (at point B) of our photon (i.e. the ‘distance in time’ as measured by that stopwatch hand).

Now, in quantum theory, anything is possible and, hence, not only do we allow for crooked paths, but we also allow for the difference in time to differ from l/c. Hence, our photon may actually travel slower or faster than the speed of light c! There is one lucky break, however, that makes all come out alright: the arrows associated with the odd paths and strange timings cancel each other out. Hence, what remains, are the nearby paths in spacetime only—the ‘light-like’ intervals only: a small core of space which our photon effectively uses as it travels through empty space. And when it encounters an obstacle, like a sheet of glass, it may or may not interact with the other elementary particle–the electron. And then we multiply and add the arrows – or amplitudes as we call them – to arrive at a final arrow, whose square is what physicists want to find, i.e. the likelihood of the event that we are analyzing (such a photon going from point A to B, in empty space, through two slits, or through as sheet of glass, for example) effectively happening.

The combining of arrows leads to diffraction, refraction or – to use the more general description of what’s going on – interference patterns:

  1. Adding two identical arrows that are ‘lined up’ yields a final arrow with twice the length of either arrow alone and, hence, a square (i.e. a probability) that is four times as large. This is referred to as ‘positive’ or ‘constructive’ interference.
  2. Two arrows of the same length but with opposite direction cancel each other out and, hence, yield zero: that’s ‘negative’ or ‘destructive’ interference.

Both photons and electrons are represented by wavefunctions, whose argument is the position in space (x, y, z) and time (t), and whose value is an amplitude or ‘arrow’ indeed, with a specific direction and length. But here we get a bifurcation. When photons interact with other, their wavefunctions interact just like amplitudes: we simply add them. However, when electrons interact with each other, we have to apply a different rule: we’ll take a difference. Indeed, anything is possible in quantum mechanics and so we combine arrows (or amplitudes, or wavefunctions) in two different ways: we can either add them or, as shown below, subtract one from the other.

vector addition

There are actually four distinct logical possibilities, because we may also change the order of A and B in the operation, but when calculating probabilities, all we need is the square of the final arrow, so we’re interested in its final length only, not in its direction (unless we want to use that arrow in yet another calculation). And so… Well… The fundamental duality in Nature between light and matter is based on this dichotomy only: identical (elementary) particles behave in one of two ways: their wavefunctions interfere either constructively or destructively, and that’s what distinguishes bosons (i.e. force-carrying particles, such as photons) from fermions (i.e. matter-particles, such as electrons). The mathematical description is complete and respects Occam’s Razor. There is no redundancy. One cannot further simplify: every logical possibility in the mathematical description reflects a physical possibility in the real world.

Having said that, there is more to an electron than just Fermi-Dirac statistics, of course. What about its charge, and this weird number, its spin?,

Well… That’s what’s this post is about. As Feynman puts it: “So far we have been considering only spin-zero electrons and photons, fake electrons and fake photons.”

I wouldn’t call them ‘fake’, because they do behave like real photons and electrons already but… Yes. We can make them more ‘real’ by including charge and spin in the discussion. Let’s go for it.

Charge and spin

From what I wrote above, it’s clear that the dichotomy between bosons and fermions (i.e. between ‘matter-particles’ and ‘force-carriers’ or, to put it simply, between light and matter) is not based on the (electric) charge. It’s true we cannot pile atoms or molecules on top of each other because of the repulsive forces between the electron clouds—but it’s not impossible, as nuclear fusion proves: nuclear fusion is possible because the electrostatic repulsive force can be overcome, and then the nuclear force is much stronger (and, remember, no quarks are being destroyed or created: all nuclear energy that’s being released or used is nuclear binding energy).

It’s also true that the force-carriers we know best, notably photons and gluons, do not carry any (electric) charge, as shown in the table below. So that’s another reason why we might, mistakenly, think that charge somehow defines matter-particles. However, we can see that matter-particles, first carry very different charges (positive or negative, and with very different values: 1/3, 2/3 or 1), and even be neutral, like the neutrinos. So, if there’s a relation, it’s very complex. In addition, one of the two force-carrier for the weak force, the W boson, can have positive or negative charge too, so that doesn’t make sense, does it? [I admit the weak force is a bit of a ‘special’ case, and so I should leave it out of the analysis.] The point is: the electric charge is what it is, but it’s not what defines matter. It’s just one of the possible charges that matter-particles can carry. [The other charge, as you know, is the color charge but, to confuse the picture once again, that’s a charge that can also be carried by gluons, i.e. the carriers of the strong force.]

Standard_Model_of_Elementary_ParticlesSo what is it, then? Well… From the table above, you can see that the property of ‘spin’ (i.e. the third number in the top left-hand corner) matches the above-mentioned dichotomy in behavior, i.e. the two different types of interference (bosons versus fermions or, to use a heavier term, Bose-Einstein statistics versus Fermi-Dirac statistics): all matter-particles are so-called spin-1/2 particles, while all force-carriers (gauge bosons) all have spin one. [Never mind the Higgs particle: that’s ‘just’ a mechanism to give (most) elementary particles some mass.]

So why is that? Why are matter-particles spin-1/2 particles and force-carries spin-1 particles? To answer that question, we need to answer the question: what’s this spin number? And to answer that question, we first need to answer the question: what’s spin?

Spin in the classical world

In the classical world, it’s, quite simply, the momentum associated with a spinning or rotating object, which is referred to as the angular momentum. We’ve analyzed the math involved in another post, and so I won’t dwell on that here, but you should note that, in classical mechanics, we distinguish two types of angular momentum:

  1. Orbital angular momentum: that’s the angular momentum an object gets from circling in an orbit, like the Earth around the Sun.
  2. Spin angular momentum: that’s the angular momentum an object gets from spinning around its own axis., just like the Earth, in addition to rotating around the Sun, is rotating around its own axis (which is what causes day and night, as you know).

The math involved in both is pretty similar, but it’s still useful to distinguish the two, if only because we’ll distinguish them in quantum mechanics too! Indeed, when I analyzed the math in the above-mentioned post, I showed how we represent angular momentum by a vector that’s perpendicular to the direction of rotation, with its direction given by the ubiquitous right-hand rule—as in the illustration below, which shows both the angular momentum (L) as well as the torque (τ) that’s produced by a rotating mass. The formulas are given too: the angular momentum L is the vector cross product of the position vector r and the linear momentum p, while the magnitude of the torque τ is given by the product of the length of the lever arm and the applied force. An alternative approach is to define the angular velocity ω and the moment of inertia I, and we get the same result: L = Iω. 


Of course, the illustration above shows orbital angular momentum only and, as you know, we no longer have a ‘planetary model’ (aka the Rutherford model) of an atom. So should we be looking at spin angular momentum only?

Well… Yes and no. More yes than no, actually. But it’s ambiguous. In addition, the analogy between the concept of spin in quantum mechanics, and the concept of spin in classical mechanics, is somewhat less than straightforward. Well… It’s not straightforward at all actually. But let’s get on with it and use more precise language. Let’s first explore it for light, not because it’s easier (it isn’t) but… Well… Just because. 🙂

The spin of a photon

I talked about the polarization of light in previous posts (see, for example, my post on vector analysis): when we analyze light as a traveling electromagnetic wave (so we’re still in the classical analysis here, not talking about photons as ‘light particles’), we know that the electric field vector oscillates up and down and is, in fact, likely to rotate in the xy-plane (with z being the direction of propagation). The illustration below shows the idealized (aka limiting) case of perfectly circular polarization: if there’s polarization, it is more likely to be elliptical. The other limiting case is plane polarization: in that case, the electric field vector just goes up and down in one direction only. [In case you wonder whether ‘real’ light is polarized, it often is: there’s an easy article on that on the Physics Classroom site.]

spin angular momentumThe illustration above uses Dirac’s bra-ket notation |L〉 and |R〉 to distinguish the two possible ‘states’, which are left- or right-handed polarization respectively. In case you forgot about bra-ket notations, let me quickly remind you: an amplitude is usually denoted by 〈x|s〉, in which 〈x| is the so-called ‘bra’, i.e. the final condition, and |s〉 is the so-called ‘ket’, i.e. the starting condition, so 〈x|s〉 could mean: a photon leaves at s (from source) and arrives at x. It doesn’t matter much here. We could have used any notation, as we’re just describing some state, which is either |L〉 (left-handed polarization) or |R〉 (right-handed polarization). The more intriguing extras in the illustration above, besides the formulas, are the values: ± ħ = ±h/2π. So that’s plus or minus the (reduced) Planck constant which, as you know, is a very tiny constant. I’ll come back to that. So what exactly is being represented here?

At first, you’ll agree it looks very much like the momentum of light (p) which, in a previous post, we calculated from the (average) energy (E) as p = E/c. Now, we know that E is related to the (angular) frequency of the light through the Planck-Einstein relation E = hν = ħω. Now, ω is the speed of light (c) times the wave number (k), so we can write: p = ħω = ħck/c = ħk. The wave number is the ‘spatial frequency’, expressed either in cycles per unit distance (1/λ) or, more usually, in radians per unit distance (k = 2π/λ), so we can also write p = ħk = h/λ. Whatever way we write it, we find that this momentum (p) depends on the energy and/or, what amounts to saying the same, the frequency and/or the wavelength of the light.

So… Well… The momentum of light is not just h or ħ, i.e. what’s written in that illustration above. So it must be something different. In addition, I should remind you this momentum was calculated from the magnetic field vector, as shown below (for more details, see my post on vector calculus once again), so it had nothing to do with polarization really.

radiation pressure

Finally, last but not least, the dimensions of ħ and p = h/λ are also different (when one is confused, it’s always good to do a dimensional analysis in physics):

  1. The dimension of Planck’s constant (both h as well as ħ = h/2π) is energy multiplied by time (J·s or eV·s) or, equivalently, momentum multiplied by distance. It’s referred to as the dimension of action in physics, and h is effectively, the so-called quantum of action.
  2. The dimension of (linear) momentum is… Well… Let me think… Mass times velocity (mv)… But what’s the mass in this case? Light doesn’t have any mass. However, we can use the mass-energy equivalence: 1 eV = 1.7826×10−36 kg. [10−36? Well… Yes. An electronvolt is a very tiny measure of energy.] So we can express p in eV·m/s units.

Hmm… We can check: momentum times distance gives us the dimension of Planck’s constant again – (eV·m/s)·m = eV·s. OK. That’s good… […] But… Well… All of this nonsense doesn’t make us much smarter, does it? 🙂 Well… It may or may not be more useful to note that the dimension of action is, effectively, the same as the dimension of angular momentum. Huh? Why? Well… From our classical L = r×p formula, we find L should be expressed in m·(eV·m/s) = eV·m2/s  units, so that’s… What? Well… Here we need to use a little trick and re-express energy in mass units. We can then write L in kg·m2/s units and, because 1 Newton (N) is 1 kg⋅m/s2, the kg·m2/s unit is equivalent to the N·m·s = J·s unit. Done!

Having said that, all of this still doesn’t answer the question: are the linear momentum of light, i.e. our p, and those two angular momentum ‘states’, |L〉 and |R〉, related? Can we relate |L〉 and |R〉 to that L = r×p formula?

The answer is simple: no. The |L〉 and |R〉 states represent spin angular momentum indeed, while the angular momentum we would derive from the linear momentum of light using that L = r×p is orbital angular momentum. Let’s introduce the proper symbols: orbital angular momentum is denoted by L, while spin angular momentum is denoted by S. And then the total angular momentum is, quite simply, J = L + S.

L and S can both be calculated using either a vector cross product r × p (but using different values for r and p, of course) or, alternatively, using the moment of inertia tensor I and the angular velocity ω. The illustrations below (which I took from Wikipedia) show how, and also shows how L and S are added to yield J = L + S.



So what? Well… Nothing much. The illustration above show that the analysis – which is entirely classical, so far – is pretty complicated. [You should note, for example, that in the S = Iω and L Iω formulas, we don’t use the simple (scalar) moment of inertia but the moment of inertia tensor (so that’s a matrix denoted by I, instead of the scalar I), because S (or L) and ω are not necessarily pointing in the same direction.

By now, you’re probably very confused and wondering what’s wiggling really. The answer for the orbital angular momentum is: it’s the linear momentum vector p. Now…

Hey! Stop! Why would that vector wiggle?

You’re right. Perhaps it doesn’t. The linear momentum p is supposed to be directed in the direction of travel of the wave, isn’t it? It is. In vector notation, we have p = ħk, and that k vector (i.e. the wavevector) points in the direction of travel of the wave indeed and so… Well… No. It’s not that simple. The wave vector is perpendicular to the surfaces of constant phase, i.e. the so-called wave fronts, as show in the illustration below (see the direction of ek, which is a unit vector in the direction of k).

wave vector

So, yes, if we’re analyzing light moving in a straight one-dimensional line only, or we’re talking a plane wave, as illustrated below, then the orbital angular momentum vanishes.

plane wave

But the orbital angular momentum L does not vanish when we’re looking at a real light beam, like the ones below. Real waves? Well… OK… The ones below are idealized wave shapes as well, but let’s say they are somewhat more real than a plane wave. 🙂


So what do we have here? We have wavefronts that are shaped as helices, except for the one in the middle (marked by m = 0) which is, once again, an example of plane wave—so for that one (m = 0), we have zero orbital angular momentum indeed. But look, very carefully, at the m = ± 1 and m = ± 2 situations. For m = ± 1, we have one helical surface with a step length equal to the wavelength λ. For m = ± 2, we have two intertwined helical surfaces with the step length of each helix surface equal to 2λ. [Don’t worry too much about the second and third column: they show a beam cross-section (so that’s not a wave front but a so-called phase front) and the (averaged) light intensity, again of a beam cross-section.] Now, we can further generalize and analyze waves composed of m helices with the step length of each helix surface equal to |m|λ. The Wikipedia article on OAM (orbital angular momentum of light), from which I got this illustration, gives the following formula to calculate the OAM:

Formula OAMThe same article also notes that the quantum-mechanical equivalent of this formula, i.e. the orbital angular momentum of the photons one would associate with the not-cylindrically-symmetric waves above (i.e. all those for which m ≠ 0), is equal to:

Lz = mħ

So what? Well… I guess we should just accept that as a very interesting result. For example, I duly note that Lis along the direction of propagation of the wave (as indicated by the z subscript), and I also note the very interesting fact that, apparently, Lz  can be either positive or negative. Now, I am not quite sure how such result is consistent with the idea of radiation pressure, but I am sure there must be some logical explanation to that. The other point you should note is that, once again, any reference to the energy (or to the frequency or wavelength) of our photon has disappeared. Hmm… I’ll come back to this, as I promised above already.

The thing is that this rather long digression on orbital angular momentum doesn’t help us much in trying to understand what that spin angular momentum (SAM) is all about. So, let me just copy the final conclusion of the Wikipedia article on the orbital angular momentum of light: the OAM is the component of angular momentum of light that is dependent on the field spatial distribution, not on the polarization of light.

So, again, what’s the spin angular momentum? Well… The only guidance we have is that same little drawing again and, perhaps, another illustration that’s supposed to compare SAM with OAM (underneath).

spin angular momentum

800px-Sam-oam-interactionNow, the Wikipedia article on SAM (spin angular momentum), from which I took the illustrations above, gives a similar-looking formula for it:

Formula SAM

When I say ‘similar-looking’, I don’t mean it’s the same. [Of course not! Spin and orbital angular momentum are two different things!]. So what’s different in the two formulas? Well… We don’t have any del operator () in the SAM formula, and we also don’t have any position vector (r) in the integral kernel (or integrand, if you prefer that term). However, we do find both the electric field vector (E) as well as the (magnetic) vector potential (A) in the equation again. Hence, the SAM (also) takes both the electric as well as the magnetic field into account, just like the OAM. [According to the author of the article, the expression also shows that the SAM is nonzero when the light polarization is elliptical or circular, and that it vanishes if the light polarization is linear, but I think that’s much more obvious from the illustration than from the formula… However, I realize I really need to move on here, because this post is, once again, becoming way too long. So…]

OK. What’s the equivalent of that formula in quantum mechanics?

Well… In quantum mechanics, the SAM becomes a ‘quantum observable’, described by a corresponding operator which has only two eigenvalues:

Sz = ± ħ

So that corresponds to the two possible values for Jz, as mentioned in the illustration, and we can understand, intuitively, that these two values correspond to two ‘idealized’ photons which describe a left- and right-handed circularly polarized wave respectively.

So… Well… There we are. That’s basically all there is to say about it. So… OK. So far, so good.

But… Yes? Why do we call a photon a spin-one particle?

That has to do with convention. A so-called spin-zero particle has no degrees of freedom in regard to polarization. The implied ‘geometry’ is that a spin-zero particle is completely symmetric: no matter in what direction you turn it, it will always look the same. In short, it really behaves like a (zero-dimensional) mathematical point. As you can see from the overview of all elementary particles, it is only the Higgs boson which has spin zero. That’s why the Higgs field is referred to as a scalar field: it has no direction. In contrast, spin-one particles, like photons, are also ‘point particles’, but they do come with one or the other kind of polarization, as evident from all that I wrote above. To be specific, they are polarized in the xy-plane, and can have one of two directions. So, when rotating them, you need a full rotation of 360° if you want them to look the same again.

Now that I am here, let me exhaust the topic (to a limited extent only, of course, as I don’t want to write a book here) and mention that, in theory, we could also imagine spin-2 particles, which would look the same after half a rotation (180°). However, as you can see from the overview, none of the elementary particles has spin-2. A spin-2 particle could be like some straight stick, as that looks the same even after it is rotated 180 degrees. I am mentioning the theoretical possibility because the graviton, if it would effectively exist, is expected to be a massless spin-2 boson. [Now why do I mention this? Not sure. I guess I am just noting this to remind you of the fact that the Higgs boson is definitely not the (theoretical) graviton, and/or that we have no quantum theory for gravity.]

Oh… That’s great, you’ll say. But what about all those spin-1/2 particles in the table? You said that all matter-particles are spin 1/2 particles, and that it’s this particular property that actually makes them matter-particles. So what’s the geometry here? What kind of ‘symmetries’ do they respect?

Well… As strange as it sounds, a spin-1/2 particle needs two full rotations (2×360°=720°) until it is again in the same state. Now, in regard to that particularity, you’ll often read something like: “There is nothing in our macroscopic world which has a symmetry like that.” Or, worse, “Common sense tells us that something like that cannot exist, that it simply is impossible.” [I won’t quote the site from which I took this quotes, because it is, in fact, the site of a very respectable  research center!] Bollocks! The Wikipedia article on spin has this wonderful animation: look at how the spirals flip between clockwise and counterclockwise orientations, and note that it’s only after spinning a full 720 degrees that this ‘point’ returns to its original configuration after spinning a full 720 degrees.


So, yes, we can actually imagine spin-1/2 particles, and with not all that much imagination, I’d say. But… OK… This is all great fun, but we have to move on. So what’s the ‘spin’ of these spin-1/2 particles and, more in particular, what’s the concept of ‘spin’ of an electron?

The spin of an electron

When starting to read about it, I thought that the angular momentum of an electron would be easier to analyze than that of a photon. Indeed, while a photon has no mass and no electric charge, that analysis with those E and B vectors is damn complicated, even when sticking to a strictly classical analysis. For an electron, the classical picture seems to be much more straightforward—but only at first indeed. It quickly becomes equally weird, if not more.

We can look at an electron in orbit as a rotating electrically charged ‘cloud’ indeed. Now, from Maxwell’s equations (or from your high school classes even), you know that a rotating electric charged body creates a magnetic dipole. So an electron should behave just like a tiny bar magnet. Of course, we have to make certain assumptions about the distribution of the charge in space but, in general, we can write that the magnetic dipole moment μ is equal to:

formule magnetic dipole moment

In case you want to know, in detail, where this formula comes from, let me refer you to Feynman once again, but trust me – for once 🙂 – it’s quite straightforward indeed: the L in this formula is the angular momentum, which may be the spin angular momentum, the orbital angular momentum, or the total angular momentum. The e and m are, of course, the charge and mass of the electron respectively.

So that’s a good and nice-looking formula, and it’s actually even correct except for the spin angular momentum as measured in experiments. [You’ll wonder how we can measure orbital and spin angular momentum respectively, but I’ll talk about an 1921 experiment in a few minutes, and so that will give you some clue as to that mystery. :-)] To be precise, it turns out that one has to multiply the above formula for μ with a factor which is referred to as the g-factor. [And, no, it’s got nothing to do with the gravitational constant or… Well… Nothing. :-)] So, for the spin angular momentum, the formula should be:

formula spin angular momentum

Experimental physicists are constantly checking that value and they know measure it to be something like g = is 2.00231930419922 ± 1.5×10−12. So what’s the explanation for that g? Where does it come from? There is, in fact, a classical explanation for it, which I’ll copy hereunder (yes, from Wikipedia). This classical explanation is based on assuming that the distribution of the electric charge of the electron and its mass does not coincide:

classical theory

Why do I mention this classical explanation? Well… Because, in most popular books on quantum mechanics (including Feynman’s delightful QED), you’ll read that (a) the value for g can be derived from a quantum-theoretical equation known as Dirac’s equation (or ‘Dirac theory’, as it’s referred to above) and, more importantly, that (b) physicists call the “accurate prediction of the electron g-factor” from quantum theory (i.e. ‘Dirac’s theory’ in this case) “one of the greatest triumphs” of the theory.

So what about it? Well… Whatever the merits of both explanations (classical or quantum-mechanical), they are surely not the reason why physicists abandoned the classical theory. So what was the reason then? What a stupid question! You know that already! The Rutherford model was, quite simply, not consistent: according to classical theory, electrons should just radiate their energy away and spiral into the nucleus. More in particular, there was yet another experiment that wasn’t consistent with classical theory, and it’s one that’s very relevant for the discussion at hand: it’s the so-called Stern-Gerlach experiment.

It was just as ‘revolutionary’ as the Michelson-Morley experiment (which couldn’t measure the speed of light), or the discovery of the positron in 1932. The Stern-Gerlach experiment was done in 1921, so that’s many years before quantum theory replaced classical theory and, hence, it’s not one of those experiments confirming quantum theory. No. Quite the contrary. It was, in fact, one of the experiments that triggered the so-called quantum revolution. Let me insert the experimental set-up and summarize it (below).


  • The German scientists Otto Stern and Walther Gerlach produced a beam of electrically-neutral silver atoms and let it pass through a (non-uniform) magnetic field. Why silver atoms? Well… Silver atoms are easy to handle (in a lab, that is) and easy to detect with a photoplate.
  • These atoms came out of an oven (literally), in which the silver was being evaporated (yes, one can evaporate silver), so they had no special orientation in space and, so Stern and Gerlach thought, the magnetic moment (or spin) of the outer electrons in these atoms would point into all possible directions in space.
  • As expected, the magnetic field did deflect the silver atoms, just like it would deflect little dipole magnets if you would shoot them through the field. However, the pattern of deflection was not the one which they expected. Instead of hitting the plate all over the place, within some contour, of course, only the contour itself was hit by the atoms. There was nothing in the middle!
  • And… Well… It’s a long story, but I’ll make it short. There was only one possible explanation for that behavior, and that’s that the magnetic moments – and, therefore the spins – had only two orientations in space, and two possible values only which – Surprise, surprise! – are ±ħ/2 (so that’s half the value of the spin angular momentum of photons, which explains the spin-1/2 terminology).

The spin angular momentum of an electron is more popularly known as ‘up’ or ‘down’.

So… What about it? Well… It explains why a atomic orbital can have two electrons, rather than one only and, as such, the behavior of the electron here is the basis of the so-called periodic table, which explains all properties of the chemical elements. So… Yes. Quantum theory is relevant, I’d say. 🙂


This has been a terribly long post, and you may no longer remember what I promised to do. What I promised to do, is to write some more about the difference between a photon and an electron and, more in particular, I said I’d write more about their charge, and that “weird number”: their spin. I think I lived up to that promise. The summary is simple:

  1. Photons have no (electric) charge, but they do have spin. Their spin is linked to their polarization in the xy-plane (if z is the direction of propagation) and, because of the strangeness of quantum mechanics (i.e. the quantization of (quantum) observables), the value for this spin is either +ħ orħ, which explains why they are referred to as spin-one particles (because either value is one unit of the Planck constant).
  2. Electrons have both electric charge as well as spin. Their spin is different and is, in fact, related to their electric charge. It can be interpreted as the magnetic dipole moment, which results from the fact we have a spinning charge here. However, again, because of the strangeness of quantum mechanics, their dipole moment is quantized and can take only one of two values: ±ħ/2, which is why they are referred to as spin-1/2 particles.

So now you know everything you need to know about photons and electrons, and then I mean real photons and electrons, including their properties of charge and spin. So they’re no longer ‘fake’ spin-zero photons and electrons now. Isn’t that great? You’ve just discovered the real world! 🙂

So… I am done—for the moment, that is… 🙂 If anything, I hope this post shows that even those ‘weird’ quantum numbers are rooted in ‘physical reality’ (or in physical ‘geometry’ at least), and that quantum theory may be ‘crazy’ indeed, but that it ‘explains’ experimental results. Again, as Feynman says:

“We understand how Nature works, but not why Nature works that way. Nobody understands that. I can’t explain why Nature behave in this particular way. You may not like quantum theory and, hence, you may not accept it. But physicists have learned to realize that whether they like a theory or not is not the essential question. Rather, it is whether or not the theory gives predictions that agree with experiment. The theory of quantum electrodynamics describes Nature as absurd from the point of view of common sense. But it agrees fully with experiment. So I hope you can accept Nature as She is—absurd.”

Frankly speaking, I am not quite prepared to accept Nature as absurd: I hope that some more familiarization with the underlying mathematical forms and shapes will make it look somewhat less absurd. More, I hope that such familiarization will, in the end, make everything look just as ‘logical’, or ‘natural’ as the two ways in which amplitudes can ‘interfere’.

Post scriptum: I said I would come back to the fact that, in the analysis of orbital and spin angular momentum of a photon (OAM and SAM), the frequency or energy variable sort of ‘disappears’. So why’s that? Let’s look at those expressions for |L〉 and |R〉 once again:

Formula L spin

Formula R spin

What’s written here really? If |L〉 and |R〉 are supposed to be equal to either +ħ orħ, then that product of ei(kz–ωt) with the 3×1 matrix (which is a ‘column vector’ in this case) does not seem to make much sense, does it? Indeed, you’ll remember that ei(kz–ωt) just a regular wave function. To be precise, its phase is φ = kz–ωt (with z the direction of propagation of the wave), and its real and imaginary part can be written as eiφ = cos(φ) + isin(φ) = a + bi. Multiplying it with that 3×1 column vector (1, i, 0) or (1, –i, 0) just yields another 3×1 column vector. To be specific, we get:

  1. The 3×1 ‘vector’ (a + bi, –b+ai, 0) for |L〉, and
  2. The 3×1 ‘vector’ (a + bi, b–ai, 0) for |R〉.

So we have two new ‘vectors’ whose components are complex numbers. Furthermore, we can note that their ‘x’-component is the same, their ‘y’-component is each other’s opposite –b+ai = –(b–ai), and their ‘z’-component is 0.

So… Well… In regard to their ‘y’-component, I should note that’s just the result of the multiplication with i and/or –i: multiplying a complex number with i amounts to a 90° degree counterclockwise rotation, while multiplication with –i amounts to the same but clockwise. Hence, we must arrive at two complex numbers that are each other’s opposite. [Indeed, in complex analysis, the value –1 = eiπ = eiπ is a 180° rotation, both clockwise (φ = –π) or counterclockwise (φ = +π), of course!.]

Hmm… Still… What does it all mean really? The truth is that it takes some more advanced math to interpret the result. To be precise, pure quantum states, such |L〉 and |R〉 here, are represented by so-called ‘state vectors’ in a Hilbert space over complex numbers. So that’s what we’ve got here. So… Well… I can’t say much more about this right now: we’ll just need to study some more before we’ll ‘understand’ those expressions for |L〉 and |R〉. So let’s not worry about it right now. We’ll get there.

Just for the record, I should note that, initially, I thought 1/√2 factor in front gave some clue as to what’s going on here: 1/√2 ≈ 0.707 is a factor that’s used to calculate the root mean square (RMS) value for a sine wave. It’s illustrated below. The RMS value is a ‘special average’ one can use to calculate the energy or power (i.e. energy per time unit) of a wave. [Using the term ‘average’ is misleading, because the average of a sine wave is 1/2 over half a cycle, and 0 over a fully cycle, as you can easily see from the shape of the function. But I guess you know what I mean.]

V-rmsIndeed, you’ll remember that the energy (E) of a wave is proportional to the square of its amplitude (A): E ∼ A2. For example, when we have a constant current I, the power P will be proportional to its square: P ∼ I2. With a varying current (I) and voltage (V), the formula is more complicated but we can simply it using the rms values: Pavg = VRMS·IRMS.

So… Looking at that formula, should we think of h and/or ħ as some kind of ‘average’ energy, like the energy of a photon per cycle or per radian? That’s an interesting idea so let’s explore it. If the energy of a photon is equal to E = ν = ω/2π = ħω, then we can also write:

h = E/ν and/or ħ = E/ω

So, yes: is the energy of a photon per cycle obviously and, because the phase covers 2π radians during each cycle, and ħ must be the energy of the photon per radian! That’s a great result, isn’t it? It also gives a wonderfully simple interpretation to Planck’s quantum of action!

Well… No. We made at least two mistakes here. The first mistake is that if we think of a photon as wave train being radiated by an atom – which, as we calculated in another post, lasts about 3.2×10–8 seconds – the graph for its energy is going to resemble the graph of its amplitude, so it’s going to die out and each oscillation will carry less and less energy. Indeed, the decay time given here (τ = 3.2×10–8 seconds) was the time it takes for the radiation (we assumed sodium light with a wavelength of 600 nanometer) to die out by a factor 1/e. To be precise, the shape of the energy curve is E = E0e−t/τ, and so it’s an envelope resembling the A(t) curve below.

decay time

Indeed, remember, the energy of a wave is determined not only by its frequency (or wavelength) but also by its amplitude, and so we cannot assume the amplitude of a ‘photon wave’ is going to be the same everywhere. Just for the record: note that the energy of a wave is proportional to the frequency (doubling the frequency doubles the energy) but, when linking it to the amplitude, we should remember that the energy is proportional to the square of the amplitude, so we write E ∼ A2.

The second mistake is that both ν and ω are the light frequency (expressed in cycles or radians respectively) of the light per second, i.e per time unit. So that’s not the number of cycles or radians that we should associate with the wavetrain! We should use the number of cycles (or radians) packed into one photon. We can calculate that easily from the value for the decay time τ. Indeed, for sodium light, which which has a frequency of 500 THz (500×1012 oscillations per second) and a wavelength of 600 nm (600×10–9 meter), we said the radiation lasts about 3.2×10–8 seconds (that’s actually the time it takes for the radiation’s energy to die out by a factor 1/e, so the wavetrain will actually last (much) longer, but so the amplitude becomes quite small after that time), and so that makes for some 16 million oscillations, and a ‘length’ of the wavetrain of about 9.6 meter! Now, the energy of a sodium light photon is about 2eV (h·ν ≈ 4×10−15 electronvolt·second times 0.5×1015 cycles/sec = 2eV) and so we could say the average energy of each of those 16 million oscillations would be 2/(16×106) eV = 0.125×10–6 eV. But, from all that I wrote above, it’s obvious that this number doesn’t mean all that much, because the wavetrain is not likely to be shaped very regularly.

So, in short, we cannot say that h is the photon energy per cycle or that ħ is the photon energy per radian!  That’s not only simplistic but, worse, false. Planck’s constant is what is is: a factor of proportionality for which there is no simple ‘arithmetic’ and/or ‘geometric’ explanation. It’s just there, and we’ll need to study some more math to truly understand the meaning of those two expressions for |L〉 and |R〉.

Having said that, and having thought about it all some more, I find there’s, perhaps, a more interesting way to re-write E = ν:

h = E/ν = (λ/c)E = T·E

T? Yes. T is the period, so that’s the time needed for one oscillation: T is just the reciprocal of the frequency (T = 1/ν = λ/c). It’s a very tiny number, because we divide (1) a very small number (the wavelength of light measured in meter) by (2) a very large number (the distance (in meter) traveled by light). For sodium light, T is equal to 2×10–15 seconds, so that’s two femtoseconds, i.e. two quadrillionths of a second.

Now, we can think of the period as a fraction of a second, and smaller fractions are, obviously, associated with higher frequencies and, what amounts to the same, shorter wavelengths (and, hence, higher energies). However, when writing T = λ/c, we can also think of T being another kind of fraction: λ/can also be written as the ratio of the wavelength and the distance traveled by light in one second, i.e. a light-second (remember that light-seconds are measures of length, not of distance). The two fractions are the same when we express time and distance in equivalent units indeed (i.e. distance in light-second, or time in sec/units).

So that links h to both time as well as distance and we may look at h as some kind of fraction or energy ‘density’ even (although the term ‘density’ in this context is not quite accurate). In the same vein, I should note that, if there’s anything that should make you think about h, is the fact that its value depends on how we measure time and distance. For example, if w’d measure time in other units (for example, the more ‘natural’ unit defined by the time light needs to travel one meter), then we get a different unit for h. And, of course, you also know we can relate energy to distance (1 J = 1 N·m). But that’s something that’s obvious from h‘s dimension (J·s), and so I shouldn’t dwell on that.

Hmm… Interesting thoughts. I think I’ll develop these things a bit further in one of my next posts. As for now, however, I’ll leave you with your own thoughts on it.

Note 1: As you’re trying to follow what I am writing above, you may have wondered whether or not the duration of the wavetrain that’s emitted by an atom is a constant, or whether or not it packs some constant number of oscillations. I’ve thought about that myself as I wrote down the following formula at some point of time:

h = (the duration of the wave)·(the energy of the photon)/(the number of oscillations in the wave)

As mentioned above, interpreting h as some kind of average energy per oscillation is not a great idea but, having said that, it would be a good exercise for you to try to answer that question in regard to the duration of these wavetrains, and/or the number of oscillations packed into them, yourself. There are various formulas for the Q of an atomic oscillator, but the simplest one is the one expressed in terms of the so-called classical electron radius r0:

Q = 3λ/4πr0

As you can see, the Q depends on λ: higher wavelengths (so lower energy) are associated with higher Q. In fact, the relationship is directly proportional: twice the wavelength will give you twice the Q. Now, the formula for the decay time τ is also dependent on the wavelength. Indeed, τ = 2Q/ω = Qλ/πc. Combining the two formulas yields (if I am not mistaken):

τ = 3λ2/4π2r0c.

Hence, the decay time is proportional to the square of the wavelength. Hmm… That’s an interesting result. But I really need to learn how to be a bit shorter, and so I’ll really let you think now about what all this means or could mean.

Note 2: If that 1/√2 factor has nothing to do with some kind of rms calculation, where does it come from? I am not sure. It’s related to state vector math, it seems, and I haven’t started that as yet. I just copy a formula from Wikipedia here, which shows the same factor in front:

state vector

The formula above is said to represent the “superposition of joint spin states for two particles”. My gut instinct tells me 1/√2 factor has to do with the normalization condition and/or with the fact that we have to take the (absolute) square of the (complex-valued) amplitudes to get the probability.

The Strange Theory of Light and Matter (II)

If we limit our attention to the interaction between light and matter (i.e. the behavior of photons and electrons only—so we we’re not talking quarks and gluons here), then the ‘crazy ideas’ of quantum mechanics can be summarized as follows:

  1. At the atomic or sub-atomic scale, we can no longer look at light as an electromagnetic wave. It consists of photons, and photons come in blobs. Hence, to some extent, photons are ‘particle-like’.
  2. At the atomic or sub-atomic scale, electrons don’t behave like particles. For example, if we send them through a slit that’s small enough, we’ll observe a diffraction pattern. Hence, to some extent, electrons are ‘wave-like’.

In short, photons aren’t waves, but they aren’t particles either. Likewise, electrons aren’t particles, but they aren’t waves either. They are neither. The weirdest thing of all, perhaps, is that, while light and matter are two very different things in our daily experience – light and matter are opposite concepts, I’d say, just like particles and waves are opposite concepts) – they look pretty much the same in quantum physics: they are both represented by a wavefunction.

Let me immediately make a little note on terminology here. The term ‘wavefunction’ is a bit ambiguous, in my view, because it makes one think of a real wave, like a water wave, or an electromagnetic wave. Real waves are described by real-valued wave functions describing, for example, the motion of a ball on a spring, or the displacement of a gas (e.g. air) as a sound wave propagates through it, or – in the case of an electromagnetic wave – the strength of the electric and magnetic field.

You may have questions about the ‘reality’ of fields, but electromagnetic waves – i.e. the classical description of light – are quite ‘real’ too, even if:

  1. Light doesn’t travel in a medium (like water or air: there is no aether), and
  2. The magnitude of the electric and magnetic field (they are usually denoted by E and B) depend on your reference frame: if you calculate the fields using a moving coordinate system, you will get a different mixture of E and B. Therefore, E and B may not feel very ‘real’ when you look at them separately, but they are very real when we think of them as representing one physical phenomenon: the electromagnetic interaction between particles. So the E and B mix is, indeed, a dual representation of one reality. I won’t dwell on that, as I’ve done that in another post of mine.

How ‘real’ is the quantum-mechanical wavefunction?

The quantum-mechanical wavefunction is not like any of these real waves. In fact, I’d rather use the term ‘probability wave’ but, apparently, that’s used only by bloggers like me 🙂 and so it’s not very scientific. That’s for a good reason, because it’s not quite accurate either: the wavefunction in quantum mechanics represents probability amplitudes, not probabilities. So we should, perhaps, be consistent and term it a ‘probability amplitude wave’ – but then that’s too cumbersome obviously, so the term ‘probability wave’ may be confusing, but it’s not so bad, I think.

Amplitudes and probabilities are related as follows:

  1. Probabilities are real numbers between 0 and 1: they represent the probability of something happening, e.g. a photon moves from point A to B, or a photon is absorbed (and emitted) by an electron (i.e. a ‘junction’ or ‘coupling’, as you know).
  2. Amplitudes are complex numbers, or ‘arrows’ as Feynman calls them: they have a length (or magnitude) and a direction.
  3. We get the probabilities by taking the (absolute) square of the amplitudes.

So photons aren’t waves, but they aren’t particles either. Likewise, electrons aren’t particles, but they aren’t waves either. They are neither. So what are they? We don’t have words to describe what they are. Some use the term ‘wavicle’ but that doesn’t answer the question, because who knows what a ‘wavicle’ is? So we don’t know what they are. But we do know how they behave. As Feynman puts it, when comparing the behavior of light and then of electrons in the double-slit experiment—struggling to find language to describe what’s going on: “There is one lucky break: electrons behave just like light.”

He says so because of that wave function: the mathematical formalism is the same, for photons and for electrons. Exactly the same? […] But that’s such a weird thing to say, isn’t it? We can’t help thinking of light as waves, and of electrons as particles. They can’t be the same. They’re different, aren’t they? They are.

Scales and senses

To some extent, the weirdness can be explained because the scale of our world is not atomic or sub-atomic. Therefore, we ‘see’ things differently. Let me say a few words about the instrument we use to look at the world: our eye.

Our eye is particular. The retina has two types of receptors: the so-called cones are used in bright light, and distinguish color, but when we are in a dark room, the so-called rods become sensitive, and it is believed that they actually can detect a single photon of light. However, neural filters only allow a signal to pass to the brain when at least five photons arrive within less than a tenth of a second. A tenth of a second is, roughly, the averaging time of our eye. So, as Feynman puts it: “If we were evolved a little further so we could see ten times more sensitively, we wouldn’t have this discussion—we would all have seen very dim light of one color as a series of intermittent little flashes of equal intensity.” In other words, the ‘particle-like’ character of light would have been obvious to us.

Let me make a few more remarks here, which you may or may not find useful. The sense of ‘color’ is not something ‘out there’:  colors, like red or brown, are experiences in our eye and our brain. There are ‘pigments’ in the cones (cones are the receptors that work only if the intensity of the light is high enough) and these pigments absorb the light spectrum somewhat differently, as a result of which we ‘see’ color. Different animals see different things. For example, a bee can distinguish between white paper using zinc white versus lead white, because they reflect light differently in the ultraviolet spectrum, which the bee can see but we don’t. Bees can also tell the direction of the sun without seeing the sun itself, because they are sensitive to polarized light, and the scattered light of the sky (i.e. the blue sky as we see it) is polarized. The bee can also notice flicker up to 200 oscillations per second, while we see it only up to 20, because our averaging time is like a tenth of a second, which is short for us, but so the averaging time of the bee is much shorter. So we cannot see the quick leg movements and/or wing vibrations of bees, but the bee can!

Sometimes we can’t see any color. For example, we see the night sky in ‘black and white’ because the light intensity is very low, and so it’s our rods, not the cones, that process the signal, and so these rods can’t ‘see’ color. So those beautiful color pictures of nebulae are not artificial (although the pictures are often enhanced). It’s just that the camera that is used to take those pictures (film or, nowadays, digital) is much more sensitive than our eye. 

Regardless, color is a quality which we add to our experience of the outside world ourselves. What’s out there are electromagnetic waves with this or that wavelength (or, what amounts to the same, this or that frequency). So when critics of the exact sciences say so much is lost when looking at (visible) light as an electromagnetic wave in the range of 430 to 790 teraherz, they’re wrong. Those critics will say that physics reduces reality. That is not the case.

What’s going on is that our senses process the signal that they are receiving, especially when it comes to vision. As Feynman puts it: “None of the other senses involves such a large amount of calculation, so to speak, before the signal gets into a nerve that one can make measurements on. The calculations for all the rest of the senses usually happen in the brain itself, where it is very difficult to get at specific places to make measurements, because there are so many interconnections. Here, with the visual sense, we have the light, three layers of cells making calculations, and the results of the calculations being transmitted through the optic nerve.”

Hence, things like color and all of the other sensations that we have are the object of study of other sciences, including biochemistry and neurobiology, or physiology. For all we know, what’s ‘out there’ is, effectively, just ‘boring’ stuff, like electromagnetic radiation, energy and ‘elementary particles’—whatever they are. No colors. Just frequencies. 🙂

Light versus matter

If we accept the crazy ideas of quantum mechanics, then the what and the how become one and the same. Hence we can say that photons and electrons are a wavefunction somewhere in space. Photons, of course, are always traveling, because they have energy but no rest mass. Hence, all their energy is in the movement: it’s kinetic, not potential. Electrons, on the other hand, usually stick around some nucleus. And, let’s not forget, they have an electric charge, so their energy is not only kinetic but also potential.

But, otherwise, it’s the same type of ‘thing’ in quantum mechanics: a wavefunction, like those below.


Why diagram A and B? It’s just to emphasize the difference between a real-valued wave function and those ‘probability waves’ we’re looking at here (diagram C to H). A and B represent a mass on a spring, oscillating at more or less the same frequency but a different amplitude. The amplitude here means the displacement of the mass. The function describing the displacement of a mass on a spring (so that’s diagram A and B) is an example of a real-valued wave function: it’s a simple sine or cosine function, as depicted below. [Note that a sine and a cosine are the same function really, except for a phase difference of 90°.]

cos and sine

Let’s now go back to our ‘probability waves’. Photons and electrons, light and matter… The same wavefunction? Really? How can the sunlight that warms us up in the morning and makes trees grow be the same as our body, or the tree? The light-matter duality that we experience must be rooted in very different realities, isn’t it?

Well… Yes and no. If we’re looking at one photon or one electron only, it’s the same type of wavefunction indeed. The same type… OK, you’ll say. So they are the same family or genus perhaps, as they say in biology. Indeed, both of them are, obviously, being referred to as ‘elementary particles’ in the so-called Standard Model of physics. But so what makes an electron and a photon specific as a species? What are the differences?

There’re  quite a few, obviously:

1. First, as mentioned above, a photon is a traveling wave function and, because it has no rest mass, it travels at the ultimate speed, i.e. the speed of light (c). An electron usually sticks around or, if it travels through a wire, it travels at very low speeds. Indeed, you may find it hard to believe, but the drift velocity of the free electrons in a standard copper wire is measured in cm per hour, so that’s very slow indeed—and while the electrons in an electron microscope beam may be accelerated up to 70% of the speed of light, and close to in those huge accelerators, you’re not likely to find an electron microscope or accelerator in Nature. In fact, you may want to remember that a simple thing like electricity going through copper wires in our houses is a relatively modern invention. 🙂

So, yes, those oscillating wave functions in those diagrams above are likely to represent some electron, rather than a photon. To be precise, the wave functions above are examples of standing (or stationary) waves, while a photon is a traveling wave: just extend that sine and cosine function in both directions if you’d want to visualize it or, even better, think of a sine and cosine function in an envelope traveling through space, such as the one depicted below.

Photon wave

Indeed, while the wave function of our photon is traveling through space, it is likely to be limited in space because, when everything is said and done, our photon is not everywhere: it must be somewhere. 

At this point, it’s good to pause and think about what is traveling through space. It’s the oscillation. But what’s the oscillation? There is no medium here, and even if there would be some medium (like water or air or something like aether—which, let me remind you, isn’t there!), the medium itself would not be moving, or – I should be precise here – it would only move up and down as the wave propagates through space, as illustrated below. To be fully complete, I should add we also have longitudinal waves, like sound waves (pressure waves): in that case, the particles oscillate back and forth along the direction of wave propagation. But you get the point: the medium does not travel with the wave.


When talking electromagnetic waves, we have no medium. These E and B vectors oscillate but is very wrong to assume they use ‘some core of nearby space’, as Feynman puts it. They don’t. Those field vectors represent a condition at one specific point (admittedly, a point along the direction of travel) in space but, for all we know, an electromagnetic wave travels in a straight line and, hence, we can’t talk about its diameter or so.

Still, as mentioned above, we can imagine, more or less, what E and B stand for (we can use field line to visualize them, for instance), even if we have to take into account their relativity (calculating their values from a moving reference frame results in different mixtures of E and B). But what are those amplitudes? How should we visualize them?

The honest answer is: we can’t. They are what they are: two mathematical quantities which, taken together, form a two-dimensional vector, which we square to find a value for a real-life probability, which is something that – unlike the amplitude concept – does make sense to us. Still, that representation of a photon above (i.e. the traveling envelope with a sine and cosine inside) may help us to ‘understand’ it somehow. Again, you absolute have to get rid of the idea that these ‘oscillations’ would somehow occupy some physical space. They don’t. The wave itself has some definite length, for sure, but that’s a measurement in the direction of travel, which is often denoted as x when discussing uncertainty in its position, for example—as in the famous Uncertainty Principle (ΔxΔp > h).

You’ll say: Oh!—but then, at the very least, we can talk about the ‘length’ of a photon, can’t we? So then a photon is one-dimensional at least, not zero-dimensional! The answer is yes and no. I’ve talked about this before and so I’ll be short(er) on it now. A photon is emitted by an atom when an electron jumps from one energy level to another. It thereby emits a wave train that lasts about 10–8 seconds. That’s not very long but, taking into account the rather spectacular speed of light (3×10m/s), that still makes for a wave train with a length of not less than 3 meter. […] That’s quite a length, you’ll say. You’re right. But you forget that light travels at the speed of light and, hence, we will see this length as zero because of the relativistic length contraction effect. So… Well… Let me get back to the question: if photons and electrons are both represented by a wavefunction, what makes them different?

2. A more fundamental difference between photons and electrons is how they interact with each other.

From what I’ve written above, you understand that probability amplitudes are complex numbers, or ‘arrows’, or ‘two-dimensional vectors’. [Note that all of these terms have precise mathematical definitions and so they’re actually not the same, but the difference is too subtle to matter here.] Now, there are two ways of combining amplitudes, which are referred to as ‘positive’ and ‘negative’ interference respectively. I should immediately note that there’s actually nothing ‘positive’ or ‘negative’ about the interaction: we’re just putting two arrows together, and there are two ways to do that. That’s all.

The diagrams below show you these two ways. You’ll say: there are four! However, remember that we square an arrow to get a probability. Hence, the direction of the final arrow doesn’t matter when we’re taking the square: we get the same probability. It’s the direction of the individual amplitudes that matters when combining them. So the square of A+B is the same as the square of –(A+B) = –A+(–B) = –AB. Likewise, the square of AB is the same as the square of –(AB) = –A+B.

vector addition

These are the only two logical possibilities for combining arrows. I’ve written ad nauseam about this elsewhere: see my post on amplitudes and statistics, and so I won’t go into too much detail here. Or, in case you’d want something less than a full mathematical treatment, I can refer you to my previous post also, where I talked about the ‘stopwatch’ and the ‘phase’: the convention for the stopwatch is to have its hand turn clockwise (obviously!) while, in quantum physics, the phase of a wave function will turn counterclockwise. But so that’s just convention and it doesn’t matter, because it’s the phase difference between two amplitudes that counts. To use plain language: it’s the difference in the angles of the arrows, and so that difference is just the same if we reverse the direction of both arrows (which is equivalent to putting a minus sign in front of the final arrow).

OK. Let me get back to the lesson. The point is: this logical or mathematical dichotomy distinguishes bosons (i.e. force-carrying ‘particles’, like photons, which carry the electromagnetic force) from fermions (i.e. ‘matter-particles’, such as electrons and quarks, which make up protons and neutrons). Indeed, the so-called ‘positive’ and ‘negative’ interference leads to two very different behaviors:

  1. The probability of getting a boson where there are already present, is n+1 times stronger than it would be if there were none before.
  2. In contrast, the probability of getting two electrons into exactly the same state is zero. 

The behavior of photons makes lasers possible: we can pile zillions of photon on top of each other, and then release all of them in one powerful burst. [The ‘flickering’ of a laser beam is due to the quick succession of such light bursts. If you want to know how it works in detail, check my post on lasers.]

The behavior of electrons is referred to as Fermi’s exclusion principle: it is only because real-life electrons can have one of two spin polarizations (i.e. two opposite directions of angular momentum, which are referred to as ‘up’ or ‘down’, but they might as well have been referred to as ‘left’ or ‘right’) that we find two electrons (instead of just one) in any atomic or molecular orbital.

So, yes, while both photons and electrons can be described by a similar-looking wave function, their behavior is fundamentally different indeed. How is that possible? Adding and subtracting ‘arrows’ is a very similar operation, isn’it?

It is and it isn’t. From a mathematical point of view, I’d say: yes. From a physics point of view, it’s obviously not very ‘similar’, as it does lead to these two very different behaviors: the behavior of photons allows for laser shows, while the behavior of electrons explain (almost) all the peculiarities of the material world, including us walking into doors. 🙂 If you want to check it out for yourself, just check Feynman’s Lectures for more details on this or, else, re-read my posts on it indeed.

3. Of course, there are even more differences between photons and electrons than the two key differences I mentioned above. Indeed, I’ve simplified a lot when I wrote what I wrote above. The wavefunctions of electrons in orbit around a nucleus can take very weird shapes, as shown in the illustration below—and please do google a few others if you’re not convinced. As mentioned above, they’re so-called standing waves, because they occupy a well-defined position in space only, but standing waves can look very weird. In contrast, traveling plane waves, or envelope curves like the one above, are much simpler.


In short: yes, the mathematical representation of photons and electrons (i.e. the wavefunction) is very similar, but photons and electrons are very different animals indeed.

Potentiality and interconnectedness

I guess that, by now, you agree that quantum theory is weird but, as you know, quantum theory does explain all of the stuff that couldn’t be explained before: “It works like a charm”, as Feynman puts it. In fact, he’s often quoted as having said the following:

“It is often stated that of all the theories proposed in this century, the silliest is quantum theory. Some say the the only thing that quantum theory has going for it, in fact, is that it is unquestionably correct.”

Silly? Crazy? Uncommon-sensy? Truth be told, you do get used to thinking in terms of amplitudes after a while. And, when you get used to them, those ‘complex’ numbers are no longer complicated. 🙂 Most importantly, when one thinks long and hard enough about it (as I am trying to do), it somehow all starts making sense.

For example, we’ve done away with dualism by adopting a unified mathematical framework, but the distinction between bosons and fermions still stands: an ‘elementary particle’ is either this or that. There are no ‘split personalities’ here. So the dualism just pops up at a different level of description, I’d say. In fact, I’d go one step further and say it pops up at a deeper level of understanding.

But what about the other assumptions in quantum mechanics. Some of them don’t make sense, do they? Well… I struggle for quite a while with the assumption that, in quantum mechanics, anything is possible really. For example, a photon (or an electron) can take any path in space, and it can travel at any speed (including speeds that are lower or higher than light). The probability may be extremely low, but it’s possible.

Now that is a very weird assumption. Why? Well… Think about it. If you enjoy watching soccer, you’ll agree that flying objects (I am talking about the soccer ball here) can have amazing trajectories. Spin, lift, drag, whatever—the result is a weird trajectory, like the one below:


But, frankly, a photon taking the ‘southern’ route in the illustration below? What are the ‘wheels and gears’ there? There’s nothing sensible about that route, is there?


In fact, there’s at least three issues here:

  1. First, you should note that strange curved paths in the real world (such as the trajectories of billiard or soccer balls) are possible only because there’s friction involved—between the felt of the pool table cloth and the ball, or between the balls, or, in the case of soccer, between the ball and the air. There’s no friction in the vacuum. Hence, in empty space, all things should go in a straight line only.
  2. While it’s quite amazing what’s possible, in the real world that is, in terms of ‘weird trajectories’, even the weirdest trajectories of a billiard or soccer ball can be described by a ‘nice’ mathematical function. We obviously can’t say the same of that ‘southern route’ which a photon could follow, in theory that is. Indeed, you’ll agree the function describing that trajectory cannot be ‘nice’. So even we’d allow all kinds of ‘weird’ trajectories, shouldn’t we limit ourselves to ‘nice’ trajectories only? I mean: it doesn’t make sense to allow the photons traveling from your computer screen to your retina take some trajectory to the Sun and back, does it?
  3. Finally, and most fundamentally perhaps, even when we would assume that there’s some mechanism combining (a) internal ‘wheels and gears’ (such as spin or angular momentum) with (b) felt or air or whatever medium to push against, what would be the mechanism determining the choice of the photon in regard to these various paths? In Feynman’s words: How does the photon ‘make up its mind’?

Feynman answers these questions, fully or partially (I’ll let you judge), when discussing the double-slit experiment with photons:

“Saying that a photon goes this or that way is false. I still catch myself saying, “Well, it goes either this way or that way,” but when I say that, I have to keep in mind that I mean in the sense of adding amplitudes: the photon has an amplitude to go one way, and an amplitude to go the other way. If the amplitudes oppose each other, the light won’t get there—even though both holes are open.”

It’s probably worth re-calling the results of that experiment here—if only to help you judge whether or not Feynman fully answer those questions above!

The set-up is shown below. We have a source S, two slits (A and B), and a detector D. The source sends photons out, one by one. In addition, we have two special detectors near the slits, which may or may not detect a photon, depending on whether or not they’re switched on as well as on their accuracy.

set-up photons

First, we close one of the slits, and we find that 1% of the photons goes through the other (so that’s one photon for every 100 photons that leave S). Now, we open both slits to study interference. You know the results already:

  1. If we switch the detectors off (so we have no way of knowing where the photon went), we get interference. The interference pattern depends on the distance between A and B and varies from 0% to 4%, as shown in diagram (a) below. That’s pretty standard. As you know, classical theory can explain that too assuming light is an electromagnetic wave. But so we have blobs of energy – photons – traveling one by one. So it’s really that double-slit experiment with electrons, or whatever other microscopic particles (as you know, they’ve done these interference electrons with large molecules as well—and they get the same result!). We get the interference pattern by using those quantum-mechanical rules to calculate probabilities: we first add the amplitudes, and it’s only when we’re finished adding those amplitudes, that we square the resulting arrow to the final probability.
  2. If we switch those special detectors on, and if they are 100% reliable (i.e. all photons going through are being detected), then our photon suddenly behaves like a particle, instead of as a wave: they will go through one of the slits only, i.e. either through A, or, alternatively, through B. So the two special detectors never go off together. Hence, as Feynman puts it: we shouldn’t think there is “sneaky way that the photon divides in two and then comes back together again.” It’s one or the other way and, and there’s no interference: the detector at D goes off 2% of the time, which is the simple sum of the probabilities for A and B (i.e. 1% + 1%).
  3. When the special detectors near A and B are not 100% reliable (and, hence, do not detect all photons going through), we have three possible final conditions: (i) A and D go off, (ii) B and D go off, and (iii) D goes off alone (none of the special detectors went off). In that case, we have a final curve that’s a mixture, as shown in diagram (c) and (d) below. We get it using the same quantum-mechanical rules: we add amplitudes first, and then we square to get the probabilities.

double-slit photons - results

Now, I think you’ll agree with me that Feynman doesn’t answer my (our) question in regard to the ‘weird paths’. In fact, all of the diagrams he uses assume straight or nearby paths. Let me re-insert two of those diagrams below, to show you what I mean.

 Many arrowsFew arrows

So where are all the strange non-linear paths here? Let me, in order to make sure you get what I am saying here, insert that illustration with the three crazy routes once again. What we’ve got above (Figure 33 and 34) is not like that. Not at all: we’ve got only straight lines there! Why? The answer to that question is easy: the crazy paths don’t matter because their amplitudes cancel each other out, and so that allows Feynman to simplify the whole situation and show all the relevant paths as straight lines only.


Now, I struggled with that for quite a while. Not because I can’t see the math or the geometry involved. No. Feynman does a great job showing why those amplitudes cancel each other out indeed (if you want a summary, see my previous post once again).  My ‘problem’ is something else. It’s hard to phrase it, but let me try: why would we even allow for the logical or mathematical possibility of ‘weird paths’ (and let me again insert that stupid diagram below) if our ‘set of rules’ ensures that the truly ‘weird’ paths (like that photon traveling from your computer screen to your eye doing a detour taking it to the Sun and back) cancel each other out anyway? Does that respect Occam’s Razor? Can’t we devise some theory including ‘sensible’ paths only?

Of course, I am just an autodidact with limited time, and I know hundreds (if not thousands) of the best scientists have thought long and hard about this question and, hence, I readily accept the answer is quite simply: no. There is no better theory. I accept that answer, ungrudgingly, not only because I think I am not so smart as those scientists but also because, as I pointed out above, one can’t explain any path that deviates from a straight line really, as there is no medium, so there are no ‘wheels and gears’. The only path that makes sense is the straight line, and that’s only because…

Well… Thinking about it… We think the straight path makes sense because we have no good theory for any of the other paths. Hmm… So, from a logical point of view, assuming that the straight line is the only reasonable path is actually pretty random too. When push comes to shove, we have no good theory for the straight line either!

You’ll say I’ve just gone crazy. […] Well… Perhaps you’re right. 🙂 But… Somehow, it starts to make sense to me. We allow for everything to, then, indeed weed out the crazy paths using our interference theory, and so we do end up with what we’re ending up with: some kind of vague idea of “light not really traveling in a straight line but ‘smelling’ all of the neighboring paths around it and, hence, using a small core of nearby space“—as Feynman puts it.

Hmm… It brings me back to Richard Feynman’s introduction to his wonderful little book, in which he says we should just be happy to know how Nature works and not aspire to know why it works that way. In fact, he’s basically saying that, when it comes to quantum mechanics, the ‘how’ and the ‘why’ are one and the same, so asking ‘why’ doesn’t make sense, because we know ‘how’. He compares quantum theory with the system of calculation used by the Maya priests, which was based on a system of bars and dots, which helped them to do complex multiplications and divisions, for example. He writes the following about it: “The rules were tricky, but they were a much more efficient way of getting an answer to complicated questions (such as when Venus would rise again) than by counting beans.”

When I first read this, I thought the comparison was flawed: if a common Maya Indian did not want to use the ‘tricky’ rules of multiplication and what have you (or, more likely, if he didn’t understand them), he or she could still resort to counting beans. But how do we count beans in quantum mechanics? We have no ‘simpler’ rules than those weird rules about adding amplitudes and taking the (absolute) square of complex numbers so… Well… We actually are counting beans here then:

  1. We allow for any possibility—any path: straight, curved or crooked. Anything is possible.
  2. But all those possibilities are inter-connected. Also note that every path has a mirror image: for every route ‘south’, there is a similar route ‘north’, so to say, except for the straight line, which is a mirror image of itself.
  3. And then we have some clock ticking. Time goes by. It ensures that the paths that are too far removed from the straight line cancel each other. [Of course, you’ll ask: what is too far? But I answered that question –  convincingly, I hope – in my previous post: it’s not about the ‘number of arrows’ (as suggested in the caption under that Figure 34 above), but about the frequency and, hence, the ‘wavelength’ of our photon.]
  4. And so… Finally, what’s left is a limited number of possibilities that interfere with each other, which results in what we ‘see’: light seems to use a small core of space indeed–a limited number of nearby paths.

You’ll say… Well… That still doesn’t ‘explain’ why the interference pattern disappears with those special detectors or – what amounts to the same – why the special detectors at the slits never click simultaneously.

You’re right. How do we make sense of that? I don’t know. You should try to imagine what happens for yourself. Everyone has his or her own way of ‘conceptualizing’ stuff, I’d say, and you may well be content and just accept all of the above without trying to ‘imagine’ what’s happening really when a ‘photon’ goes through one or both of those slits. In fact, that’s the most sensible thing to do. You should not try to imagine what happens and just follow the crazy calculus rules.

However, when I think about it, I do have some image in my head. The image is of one of those ‘touch-me-not’ weeds. I quickly googled one of these images, but I couldn’t quite find what I am looking for: it would be more like something that, when you touch it, curls up in a little ball. Any case… You know what I mean, I hope.


You’ll shake your head now and solemnly confirm that I’ve gone mad. Touch-me-not weeds? What’s that got to do with photons? 

Well… It’s obvious you and I cannot really imagine how a photon looks like. But I think of it as a blob of energy indeed, which is inseparable, and which effectively occupies some space (in three dimensions that is). I also think that, whatever it is, it actually does travel through both slits, because, as it interferes with itself, the interference pattern does depend on the space between the two slits as well as the width of those slits. In short, the whole ‘geometry’ of the situation matters, and so the ‘interaction’ is some kind of ‘spatial’ thing. [Sorry for my awfully imprecise language here.]

Having said that, I think it’s being detected by one detector only because only one of them can sort of ‘hook’ it, somehow. Indeed, because it’s interconnected and inseparable, it’s the whole blob that gets hooked, not just one part of it. [You may or may not imagine that the detectors that’s got the best hold of it gets it, but I think that’s pushing the description too much.] In any case, the point is that a photon is surely not like a lizard dropping its tail while trying to escape. Perhaps it’s some kind of unbreakable ‘string’ indeed – and sorry for summarizing string theory so unscientifically here – but then a string oscillating in dimensions we can’t imagine (or in some dimension we can’t observe, like the Kaluza-Klein theory suggests). It’s something, for sure, and something that stores energy in some kind of oscillation, I think.

What it is, exactly, we can’t imagine, and we’ll probably never find out—unless we accept that the how of quantum mechanics is not only the why, but also the what. 🙂

Does this make sense? Probably not but, if anything, I hope it fired your imagination at least. 🙂

The Strange Theory of Light and Matter (I)

I am of the opinion that Richard Feynman’s wonderful little common-sense introduction to the ‘uncommon-sensy‘ theory of quantum electrodynamics (The Strange Theory of Light and Matter), which were published a few years before his death only, should be mandatory reading for high school students.

I actually mean that: it should just be part of the general education of the first 21st century generation. Either that or, else, the Education Board should include a full-fledged introduction to complex analysis and quantum physics in the curriculum. 🙂

Having praised it (just now, as well as in previous posts), I re-read it recently during a trek in Nepal with my kids – I just grabbed the smallest book I could find the morning we left 🙂 – and, frankly, I now think Ralph Leighton, who transcribed and edited these four short lectures, could have cross-referenced it better. Moreover, there are two or three points where Feynman (or Leighton?) may have sacrificed accuracy for readability. Let me recapitulate the key points and try to improve here and there.

Amplitudes and arrows

The booklet avoids scary mathematical terms and formulas but doesn’t avoid the fundamental concepts behind, and it doesn’t avoid the kind of ‘deep’ analysis one needs to get some kind of ‘feel’ for quantum mechanics either. So what are the simplifications?

A probability amplitude (i.e. a complex number) is, quite simply, an arrow, with a direction and a length. Thus Feynman writes: “Arrows representing probabilities from 0% to 16% [as measured by the surface of the square which has the arrow as its side] have lengths from 0 to 0.4.” That makes sense: such geometrical approach does away, for example, with the need to talk about the absolute square (i.e. the square of the absolute value, or the squared norm) of a complex number – which is what we need to calculate probabilities from probability amplitudes. So, yes, it’s a wonderful metaphor. We have arrows and surfaces now, instead of wave functions and absolute squares of complex numbers.

The way he combines these arrows make sense too. He even notes the difference between photons (bosons) and electrons (fermions): for bosons, we just add arrows; for fermions, we need to subtract them (see my post on amplitudes and statistics in this regard).

There is also the metaphor for the phase of a wave function, which is a stroke of genius really (I mean it): the direction of the ‘arrow’ is determined by a stopwatch hand, which starts turning when a photon leaves the light source, and stops when it arrives, as shown below.

front and back reflection amplitude

OK. Enough praise. What are the drawbacks?

The illustration above accompanies an analysis of how light is either reflected from the front surface of a sheet of a glass or, else, from the back surface. Because it takes more time to bounce off the back surface (the path is associated with a greater distance), the front and back reflection arrows point in different directions indeed (the stopwatch is stopped somewhat later when the photon reflects from the back surface). Hence, the difference in phase (but that’s a term that Feynman also avoids) is determined by the thickness of the glass. Just look at it. In the upper part of the illustration above, the thickness is such that the chance of a photon reflecting off the front or back surface is 5%: we add two arrows, each with a length of 0.2, and then we square the resulting (aka final) arrow. Bingo! We get a surface measuring 0.05, or 5%.

Huh? Yes. Just look at it: if the angle between the two arrows would be 90° exactly, it would be 0.08 or 8%, but the angle is a bit less. In the lower part of the illustration, the thickness of the glass is such that the two arrows ‘line up’ and, hence, they form an arrow that’s twice the length of either arrow alone (0.2 + 0.2 = 0.4), with a square four times as large (0.16 = 16%). So… It all works like a charm, as Feynman puts it.


But… Hey! Look at the stopwatch for the front reflection arrows in the upper and lower diagram: they point in the opposite direction of the stopwatch hand! Well… Hmm… You’re right. At this point, Feynman just notes that we need an extra rule: “When we are considering the path of a photon bouncing off the front surface of the glass, we reverse the direction of the arrow.

He doesn’t say why. He just adds this random rule to the other rules – which most readers who read this book already know. But why this new rule? Frankly, this inconsistency – or lack of clarity – would wake me up at night. This is Feynman: there must be a reason. Why?

Initially, I suspected it had something to do with the two types of ‘statistics’ in quantum mechanics (i.e. those different rules for combining amplitudes of bosons and fermions respectively, which I mentioned above). But… No. Photons are bosons anyway, so we surely need to add, not subtract. So what is it?

[…] Feynman explains it later, much later – in the third of the four chapters of this little book, to be precise. It’s, quite simply, the result of the simplified model he uses in that first chapter. The photon can do anything really, and so there are many more arrows than just two. We actually should look at an infinite number of arrows, representing all possible paths in spacetime, and, hence, the two arrows (i.e. the one for the reflection from the front and back surface respectively) are combinations of many other arrows themselves. So how does that work?

An analysis of partial reflection (I)

The analysis in Chapter 3 of the same phenomenon (i.e. partial reflection by glass) is a simplified analysis too, but it’s much better – because there are no ‘random’ rules here. It is what Leighton promises to the reader in his introduction: “A complete description, accurate in every detail, of a framework onto which more advanced concepts can be attached without modification. Nothing has to be ‘unlearned’ later.

Well… Accurate in every detail? Perhaps not. But it’s good, and I still warmly recommend a reading of this delightful little book to anyone who’d ask me what to read as a non-mathematical introduction to quantum mechanics. I’ll limit myself here to just some annotations.

The first drawing (a) depicts the situation:

  1. A photon from a light source is being reflected by the glass. Note that it may also go straight through, but that’s a possibility we’ll analyze separately. We first assume that the photon is effectively being reflected by the glass, and so we want to calculate the probability of that event using all these ‘arrows’, i.e. the underlying probability amplitudes.
  2. As for the geometry of the situation: while the light source and the detector seem to be positioned at some angle from the normal, that is not the case: the photon travels straight down (and up again when reflected). It’s just a limitation of the drawing. It doesn’t really matter much for the analysis: we could look at a light beam coming in at some angle, but so we’re not doing that. It’s the simplest situation possible, in terms of experimental set-up that is. I just want to be clear on that.

partial reflection

Now, rather than looking at the front and back surface only (as Feynman does in Chapter 1), the glass sheet is now divided into a number of very thin sections: five, in this case, so we have six points from which the photon can be scattered into the detector at A: X1 to X6. So that makes six possible paths. That’s quite a simplification but it’s easy to see it doesn’t matter: adding more sections would result in many more arrows, but these arrows would also be much smaller, and so the final arrow would be the same.

The more significant simplification is that the paths are all straight paths, and that the photon is assumed to travel at the speed of light, always. If you haven’t read the booklet, you’ll say that’s obvious, but it’s not: a photon has an amplitude to go faster or slower than c but, as Feynman points out, these amplitudes cancel out over longer distances. Likewise, a photon can follow any path in space really, including terribly crooked paths, but these paths also cancel out. As Feynman puts it: “Only the paths near the straight-line path have arrows pointing in nearly the same direction, because their timings are nearly the same, and only these arrows are important, because it is from them that we accumulate a large final arrow.” That makes perfect sense, so there’s no problem with the analysis here either.

So let’s have a look at those six arrows in illustration (b). They point in a slightly different direction because the paths are slightly different and, hence, the distances (and, therefore, the timings) are different too. Now, Feynman (but I think it’s Leighton really) loses himself here in a digression on monochromatic light sources. A photon is a photon: it will have some wave function with a phase that varies in time and in space and, hence, illustration (b) makes perfect sense. [I won’t quote what he writes on a ‘monochromatic light source’ because it’s quite confusing and, IMHO, not correct.]

The stopwatch metaphor has only one minor shortcoming: the hand of a stopwatch rotates clockwise (obviously!), while the phase of an actual wave function goes counterclockwise with time. That’s just convention, and I’ll come back to it when I discuss the mathematical representation of the so-called wave function, which gives you these amplitudes. However, it doesn’t change the analysis, because it’s the difference in the phase that matters when combining amplitudes, so the clock can turn in either way indeed, as long as we’re agreed on it.

At this point, I can’t resist: I’ll just throw the math in. If you don’t like it, you can just skip the section that follows.

Feynman’s arrows and the wave function

The mathematical representation of Feynman’s ‘arrows’ is the wave function:

f = f(x–ct)

Is that the wave function? Yes. It is: it’s a function whose argument is x – ct, with x the position in space, and t the time variable. As for c, that’s the speed of light. We throw it in to make the units in which we measure time and position compatible. 

Really? Yes: f is just a regular wave function. To make it look somewhat more impressive, I could use the Greek symbol Φ (phi) or Ψ (psi) for it, but it’s just what it is: a function whose value depends on position and time indeed, so we write f = f(x–ct). Let me explain the minus sign and the c in the argument.

Time and space are interchangeable in the argument, provided we measure time in the ‘right’ units, and so that’s why we multiply the time in seconds with c, so the new unit of time becomes the time that light needs to travel a distance of one meter. That also explains the minus sign in front of ct: if we add one distance unit (i.e. one meter) to the argument, we have to subtract one time unit from it – the new time unit of course, so that’s the time that light needs to travel one meter – in order to get the same value for f. [If you don’t get that x–ct thing, just think a while about this, or make some drawing of a wave function. Also note that the spacetime diagram in illustration (b) above assumes the same: time is measured in an equivalent unit as distance, so the 45% line from the south-west to the north-east, that bounces back to the north-west, represents a photon traveling at speed c in space indeed: one unit of time corresponds to one meter of travel.]

Now I want to be a bit more aggressive. I said is a simple function. That’s true and not true at the same time. It’s a simple function, but it gives you probability amplitudes, which are complex numbers – and you may think that complex numbers are, perhaps, not so simple. However, you shouldn’t be put off. Complex numbers are really like Feynman’s ‘arrows’ and, hence, fairly simple things indeed. They have two dimensions, so to say: an a– and a b-coordinate. [I’d say an x– and y-coordinate, because that’s what you usually see, but then I used the x symbol already for the position variable in the argument of the function, so you have to switch to a and b for a while now.]

This a– and b-coordinate are referred to as the real and imaginary part of a complex number respectively. The terms ‘real’ and ‘imaginary’ are confusing because both parts are ‘real’ – well… As real as numbers can be, I’d say. 🙂 They’re just two different directions in space: the real axis is the a-axis in coordinate space, and the imaginary axis is the b-axis. So we could write it as an ordered pair of numbers (a, b). However, we usually write it as a number itself, and we distinguish the b-coordinate from the a-coordinate by writing an i in front: (a, b) = a + ib. So our function f = f(x–ct) is a complex-valued function: it will give you two numbers (an a and a b) instead of just one when you ‘feed’ it with specific values for x and t. So we write:

f = f(x–ct) = (a, b) = a + ib

So what’s the shape of this function? Is it linear or irregular or what? We’re talking a very regular wave function here, so it’s shape is ‘regular’ indeed. It’s a periodic function, so it repeats itself again and again. The animations below give you some idea of such ‘regular’ wave functions. Animation A and B shows a real-valued ‘wave’: a ball on a string that goes up and down, for ever and ever. Animations C to H are – believe it or not – basically the same thing, but so we have two numbers going up and down. That’s all.


The wave functions above are, obviously, confined in space, and so the horizontal axis represents the position in space. What we see, then, is how the real and imaginary part of these wave functions varies as time goes by. [Think of the blue graph as the real part, and the imaginary part as the pinkish thing – or the other way around. It doesn’t matter.] Now, our wave function – i.e. the one that Feynman uses to calculate all those probabilities – is even more regular than those shown above: its real part is an ordinary cosine function, and it’s imaginary part is a sine. Let me write this in math:

f = f(x–ct) = a + ib = r(cosφ + isinφ)

It’s really the most regular wave function in the world: the very simple illustration below shows how the two components of f vary as a function in space (i.e. the horizontal axis) while we keep the time fixed, or vice versa: it could also show how the function varies in time at one particular point in space, in which case the horizontal axis would represent the time variable. It is what it is: a sine and a cosine function, with the angle φ as its argument.

cos and sine

Note that a sine function is the same as a cosine function, but it just lags a bit. To be precise, the phase difference is 90°, or π/2 in radians (the radian (i.e. the length of the arc on the unit circle) is a much more natural unit to express angles, as it’s fully compatible with our distance unit and, hence, most – if not all – of our other units). Indeed, you may or may not remember the following trigonometric identities: sinφ = cos(π/2–φ) = cos(φ–π/2).

In any case, now we have some r and φ here, instead of a and b. You probably wonder where I am going with all of this. Where are the x and t variables? Be patient! You’re right. We’ll get there. I have to explain that r and φ first. Together, they are the so-called polar coordinates of Feynman’s ‘arrow’ (i.e. the amplitude). Polar coordinates are just as good as coordinates as these Cartesian coordinates we’re used to (i.e. a and b). It’s just a different coordinate system. The illustration below shows how they are related to each other. If you remember anything from your high school trigonometry course, you’ll immediately agree that a is, obviously, equal to rcosφ, and b is rsinφ, which is what I wrote above. Just as good? Well… The polar coordinate system has some disadvantages (all of those expressions and rules we learned in vector analysis assume rectangular coordinates, and so we should watch out!) but, for our purpose here, polar coordinates are actually easier to work with, so they’re better.


Feynman’s wave function is extremely simple because his ‘arrows’ have a fixed length, just like the stopwatch hand. They’re just turning around and around and around as time goes by. In other words, is constant and does not depend on position and time. It’s the angle φ that’s turning and turning and turning as the stopwatch ticks while our photon is covering larger and larger distances. Hence, we need to find a formula for φ that makes it explicit how φ changes as a function in spacetime. That φ variable is referred to as the phase of the wave function. That’s a term you’ll encounter frequently and so I had better mention it. In fact, it’s generally used as a synonym for any angle, as you can see from my remark on the phase difference between a sine and cosine function.

So how do we express φ as a function of x and t? That’s where Euler’s formula comes in. Feynman calls it the most remarkable formula in mathematics – our jewel! And he’s probably right: of all the theorems and formulas, I guess this is the one we can’t do without when studying physics. I’ve written about this in another post, and repeating what I wrote there would eat up too much space, so I won’t do it and just give you that formula. A regular complex-valued wave function can be represented as a complex (natural) exponential function, i.e. an exponential function with Euler’s number e (i.e. 2.728…) as the base, and the complex number iφ as the (variable) exponent. Indeed, according to Euler’s formula, we can write:

f = f(x–ct) = a + ib = r(cosφ + isinφ) = r·eiφ

As I haven’t explained Euler’s formula (you should really have a look at my posts on it), you should just believe me when I say that r·eiφ is an ‘arrow’ indeed, with length r and angle φ (phi), as illustrated above, with a and b coordinates arcosφ and b = rsinφ. What you should be able to do now, is to imagine how that φ angle goes round and round as time goes by, just like Feynman’s ‘arrow’ goes round and round – just like a stopwatch hand indeed, but note our φ angle turns counterclockwise indeed.

Fine, you’ll say – but so we need a mathematical expression, don’t we? Yes,we do. We need to know how that φ angle (i.e. the variable in our r·eiφ function) changes as a function of x and t indeed. It turns out that the φ in r·eiφ can be substituted as follows:

eiφ = r·ei(ωt–kx) = r·eik(x–ct)

Huh? Yes. The phase (φ) of the probability amplitude (i.e. the ‘arrow’) is a simple linear function of x and t indeed: φ = ωt–kx = –k(x–ct). What about all these new symbols, k and ω? The ω and k in this equation are the so-called angular frequency and the wave number of the wave. The angular frequency is just the frequency expressed in radians, and you should think of the wave number as the frequency in space. [I could write some more here, but I can’t make it too long, and you can easily look up stuff like this on the Web.] Now, the propagation speed c of the wave is, quite simply, the ratio of these two numbers: c = ω/k. [Again, it’s easy to show how that works, but I won’t do it here.]

Now you know it all, and so it’s time to get back to the lesson.

An analysis of partial reflection (II)

Why did I digress? Well… I think that what I write above makes much more sense than Leighton’s rather convoluted description of a monochromatic light source as he tries to explain those arrows in diagram (b) above. Whatever it is, a monochromatic light source is surely not “a device that has been carefully arranged so that the amplitude for a photon to be emitted at a certain time can be easily calculated.” That’s plain nonsense. Monochromatic light is light of a specific color, so all photons have the same frequency (or, to be precise, their wave functions have all the same well-defined frequency), but these photons are not in phase. Photons are emitted by atoms, as an electron moves from one energy level to the other. Now, when a photon is emitted, what actually happens is that the atom radiates a train of waves only for about 10–8 sec, so that’s about 10 billionths of a second. After 10–8 sec, some other atom takes over, and then another atom, and so on. Each atom emits one photon, whose energy is the difference between the two energy levels that the electron is jumping to or from. So the phase of the light that is being emitted can really only stay the same for about 10–8 sec. Full stop.

Now, what I write above on how atoms actually emit photons is a paraphrase of Feynman’s own words in his much more serious series of Lectures on Mechanics, Radiation and Heat. Therefore, I am pretty sure it’s Leighton who gets somewhat lost when trying to explain what’s happening. It’s not photons that interfere. It’s the probability amplitudes associated with the various paths that a photon can take. To be fully precise, we’re talking the photon here, i.e. the one that ends up in the detector, and so what’s going on is that the photon is interfering with itself. Indeed, that’s exactly what the ‘craziness’ of quantum mechanics is all about: we sent electrons, one by one, through two slits, and we observe an interference pattern. Likewise, we got one photon here, that can go various ways, and it’s those amplitudes that interfere, so… Yes: the photon interferes with itself.

OK. Let’s get back to the lesson and look at diagram (c) now, in which the six arrows are added. As mentioned above, it would not make any difference if we’d divide the glass in 10 or 20 or 1000 or a zillion ‘very thin’ sections: there would be many more arrows, but they would be much smaller ones, and they would cover the same circular segment: its two endpoints would define the same arc, and the same chord on the circle that we can draw when extending that circular segment. Indeed, the six little arrows define a circle, and that’s the key to understanding what happens in the first chapter of Feynman’s QED, where he adds two arrows only, but with a reversal of the direction of the ‘front reflection’ arrow. Here there’s no confusion – Feynman (or Leighton) eloquently describe what they do:

“There is a mathematical trick we can use to get the same answer [i.e. the same final arrow]: Connecting the arrows in order from 1 to 6, we get something like an arc, or part of a circle. The final arrow forms the chord of this arc. If we draw arrows from the center of the ‘circle’ to the tail of arrow 1 and to the head of arrow 6, we get two radii. If the radius arrow from the center to arrow 1 is turned 180° (“subtracted”), then it can be combined with the other radius arrow to give us the same final arrow! That’s what I was doing in the first lecture: these two radii are the two arrows I said represented the ‘front surface’ and ‘back surface’ reflections. They each have the famous length of 0.2.”

That’s what’s shown in part (d) of the illustration above and, in case you’re still wondering what’s going on, the illustration below should help you to make your own drawings now.

CircularsegmentSo… That explains the phenomenon Feynman wanted to explain, which is a phenomenon that cannot be explained in classical physics. Let me copy the original here:


Partial reflection by glass—a phenomenon that cannot be explained in classical physics? Really?

You’re right to raise an objection: partial reflection by glass can, in fact, be explained by the classical theory of light as an electromagnetic wave. The assumption then is that light is effectively being reflected by both the front and back surface and the reflected waves combine or cancel out (depending on the thickness of the glass and the angle of reflection indeed) to match the observed pattern. In fact, that’s how the phenomenon was explained for hundreds of years! The point to note is that the wave theory of light collapsed as technology advanced, and experiments could be made with very weak light hitting photomultipliers. As Feynman writes: “As the light got dimmer and dimmer, the photomultipliers kept making full-sized clicks—there were just fewer of them. Light behaved as particles!”

The point is that a photon behaves like an electron when going through two slits: it interferes with itself! As Feynman notes, we do not have any ‘common-sense’ theory to explain what’s going on here. We only have quantum mechanics, and quantum mechanics is an “uncommon-sensy” theory: a “strange” or even “absurd” theory, that looks “cockeyed” and incorporates “crazy ideas”. But… It works.

Now that we’re here, I might just as well add a few more paragraphs to fully summarize this lovely publication – if only because summarizing stuff like this helps me to come to terms with understanding things better myself!

Calculating amplitudes: the basic actions

So it all boils down to calculating amplitudes: an event is divided into alternative ways of how the event can happen, and the arrows for each way are ‘added’. Now, every way an event can happen can be further subdivided into successive steps. The amplitudes for these steps are then ‘multiplied’. For example, the amplitude for a photon to go from A to C via B is the ‘product’ of the amplitude to go from A to B and the amplitude to go from B to C.

I marked the terms ‘multiplied’ and ‘product’ with apostrophes, as if to say it’s not a ‘real’ product. But it is an actual multiplication: it’s the product of two complex numbers. Feynman does not explicitly compare this product to other products, such as the dot (•) or cross (×) product of two vectors, but he uses the ∗ symbol for multiplication here, which clearly distinguishes VW from VW or V×W indeed or, more simply, from the product of two ordinary numbers. [Ordinary numbers? Well… With ‘ordinary’ numbers, I mean real numbers, of course, but once you get used to complex numbers, you won’t like that term anymore, because complex numbers start feeling just as ‘real’ as other numbers – especially when you get used to the idea of those complex-valued wave functions underneath reality.]

Now, multiplying complex numbers, or ‘arrows’ using QED’s simpler language, consists of adding their angles and multiplying their lengths. That being said, the arrows here all have a length smaller than one (because their square cannot be larger than one, because that square is a probability, i.e. a (real) number between 0 and 1), Feynman defines successive multiplication as successive ‘shrinks and turns’ of the unit arrow. That all makes sense – very much sense.

But what’s the basic action? As Feynman puts the question: “How far can we push this process of splitting events into simpler and simpler subevents? What are the smallest possible bits and pieces? Is there a limit?” He immediately answers his own question. There are three ‘basic actions’:

  1. A photon goes from one point (in spacetime) to another: this amplitude is denoted by P(A to B).
  2. An electron goes from one point to another: E(A to B).
  3. An electron emits and/or absorbs a photon: this is referred to as a ‘junction’ or a ‘coupling’, and the amplitude for this is denoted by the symbol j, i.e. the so-called junction number.

How do we find the amplitudes for these?

The amplitudes for (1) and (2) are given by a so-called propagator functions, which give you the probability amplitude for a particle to travel from one place to another in a given time indeed, or to travel with a certain energy and momentum. Judging from the Wikipedia article on these functions, the subject-matter is horrendously complicated, and the formulas are too, even if Feynman says it’s ‘very simple’ – for a photon, that is. The key point to note is that any path is possible. Moreover, there are also amplitudes for photons to go faster or slower than the speed of light (c)! However, these amplitudes make smaller contributions, and cancel out over longer distances. The same goes for the crooked paths: the amplitudes cancel each other out as well.

What remains are the ‘nearby paths’. In my previous post (check the section on electromagnetic radiation), I noted that, according to classical wave theory, a light wave does not occupy any physical space: we have electric and magnetic field vectors that oscillate in a direction that’s perpendicular to the direction of propagation, but these do not take up any space. In quantum mechanics, the situation is quite different. As Feynman puts it: “When you try to squeeze light too much [by forcing it to go through a small hole, for example, as illustrated below], it refuses to cooperate and begins to spread out.” He explains this in the text below the second drawing: “There are not enough arrows representing the paths to Q to cancel each other out.”

Many arrowsFew arrows

Not enough arrows? We can subdivide space in as many paths as we want, can’t we? Do probability amplitudes take up space? And now that we’re asking the tougher questions, what’s a ‘small’ hole? What’s ‘small’ and what’s ‘large’ in this funny business?

Unfortunately, there’s not much of an attempt in the booklet to try to answer these questions. One can begin to formulate some kind of answer when doing some more thinking about these wave functions. To be precise, we need to start looking at their wavelength. The frequency of a typical photon (and, hence, of the wave function representing that photon) is astronomically high. For visible light, it’s in the range of 430 to 790 teraherz, i.e. 430–790×1012 Hz. We can’t imagine such incredible numbers. Because the frequency is so high, the wavelength is unimaginably small. There’s a very simple and straightforward relation between wavelength (λ) and frequency (ν) indeed: c = λν. In words: the speed of a wave is the wavelength (i.e. the distance (in space) of one cycle) times the frequency (i.e. the number of cycles per second). So visible light has a wavelength in the range of 390 to 700 nanometer, i.e. 390–700 billionths of a meter. A meter is a rather large unit, you’ll say, so let me express it differently: it’s less than one thousandth of a micrometer, and a micrometer itself is one thousandth of a millimeter. So, no, we can’t imagine that distance either.

That being said, that wavelength is there, and it does imply that some kind of scale is involved. A wavelength covers one full cycle of the oscillation: it means that, if we travel one wavelength in space, our ‘arrow’ will point in the same direction again. Both drawings above (Figure 33 and 34) suggest the space between the two blocks is less than one wavelength. It’s a bit hard to make sense of the direction of the arrows but note the following:

  1. The phase difference between (a) the ‘arrow’ associated with the straight route (i.e. the ‘middle’ path) and (b) the ‘arrow’ associated with the ‘northern’ or ‘southern’ route (i.e. the ‘highest’ and ‘lowest’ path) in Figure 33 is like quarter of a full turn, i.e. 90°. [Note that the arrows for the northern and southern route to P point in the same direction, because they are associated with the same timing. The same is true for the two arrows in-between the northern/southern route and the middle path.]
  2. In Figure 34, the phase difference between the longer routes and the straight route is much less, like 10° only.

Now, the calculations involved in these analyses are quite complicated but you can see the explanation makes sense: the gap between the two blocks is much narrower in Figure 34 and, hence, the geometry of the situation does imply that the phase difference between the amplitudes associated with the ‘northern’ and ‘southern’ routes to Q is much smaller than the phase difference between those amplitudes in Figure 33. To be precise,

  1. The phase difference between (a) the ‘arrow’ associated with the ‘northern route’ to Q and (b) the ‘arrow’ associated with the ‘southern’ route to Q (i.e. the ‘highest’ and ‘lowest’ path) in Figure 33 is like three quarters of a full turn, i.e. 270°. Hence, the final arrow is very short indeed, which means that the probability of the photon going to Q is very low indeed. [Note that the arrows for the northern and southern route no longer point in the same direction, because they are associated with very different timings: the ‘southern route’ is shorter and, hence, faster.]
  2. In Figure 34, we have a phase difference between the shortest and longest route that is like 60° only and, hence, the final arrow is very sizable and, hence, the probability of the photon going to Q is, accordingly, quite substantial.

OK… What did I say here about P(A to B)? Nothing much. I basically complained about the way Feynman (or Leighton, more probably) explained the interference or diffraction phenomenon and tried to do a better job before tacking the subject indeed: how do we get that P(A to B)?

A photon can follow any path from A to B, including the craziest ones (as shown below). Any path? Good players give a billiard ball extra spin that may make the ball move in a curved trajectory, and will also affect its its collision with any other ball – but a trajectory like the one below? Why would a photon suddenly take a sharp turn left, or right, or up, or down? What’s the mechanism here? What are the ‘wheels and gears inside’ of the photon that (a) make a photon choose this path in the first place and (b) allow it to whirl, swirl and twirl like that?


We don’t know. In fact, the question may make no sense, because we don’t know what actually happens when a photon travels through space. We know it leaves as a lump of energy, and we know it arrives as a similar lump of energy. When we actually put a detector to check which path is followed – by putting special detectors at the slits in the famous double-slit experiment, for example – the interference pattern disappears. So… Well… We don’t know how to describe what’s going on: a photon is not a billiard ball, and it’s not a classical electromagnetic wave either. It is neither. The only thing that we know is that we get probabilities that match with the results of experiment if we accept this nonsensical assumptions and do all of the crazy arithmetic involved. Let me get back to the lesson.  

Photons can also travel faster or slower than the speed of light (c is some 3×108 meter per second but, in our special time unit, it’s equal to one). Does that violate relativity? It doesn’t, apparently, but for the reasoning behind I must, once again, refer you to more sophisticated writing.

In any case, if the mathematicians and physicists have to take into account both of these assumptions (any path is possible, and speeds higher or lower than c are possible too!), they must be looking at some kind of horrendous integral, don’t they?

They are. When everything is said and done, that propagator function is some monstrous integral indeed, and I can’t explain it to you in a couple of words – if only because I am struggling with it myself. 🙂 So I will just believe Feynman when he says that, when the mathematicians and physicists are finished with that integral, we do get some simple formula which depends on the value of the so-called spacetime interval between two ‘points’ – let’s just call them 1 and 2 – in space and time. You’ve surely heard about it before: it’s denoted by sor I (or whatever) and it’s zero if an object moves at the speed of light, which is what light is supposed to do – but so we’re dealing with a different situation here. 🙂 To be precise, I consists of two parts:

  1. The distance d between the two points (1 and 2), i.e. Δr, which is just the square root of d= Δr= (x2–x2)2+(y2–y1)2+(z2–z1)2. [This formula is just a three-dimensional version of the Pythagorean Theorem.]
  2. The ‘distance’ (or difference) in time, which is usually expressed in those ‘equivalent’ time units that we introduced above already, i.e. the time that light – traveling at the speed of light 🙂 – needs to travel one meter. We will usually see that component of I in a squared version too: Δt= (t2–t1)2, or, if time is expressed in the ‘old’ unit (i.e. seconds), then we write c2Δt2 = c2(t2–t1)2.

Now, the spacetime interval itself is defined as the excess of the squared distance (in space) over the squared time difference:

s= I = Δr– Δt= (x2–x2)2+(y2–y1)2+(z2–z1)– (t2–t1)2

You know we can then define time-like, space-like and light-like intervals, and these, in turn, define the so-called light cone. The spacetime interval can be negative, for example. In that case, Δt2 will be greater than Δr2, so there is no ‘excess’ of distance over time: it means that the time difference is large enough to allow for a cause–effect relation between the two events, and the interval is said to be time-like. In any case, that’s not the topic of this post, and I am sorry I keep digressing.

The point to note is that the formula for the propagator favors light-like intervals: they are associated with large arrows. Space- and time-like intervals, on the other hand, will contribute much smaller arrows. In addition, the arrows for space- and time-like intervals point in opposite directions, so they will cancel each other out. So, when everything is said and done, over longer distances, light does tend to travel in a straight line and at the speed of light. At least, that’s what Feynman tells us, and I tend to believe him. 🙂

But so where’s the formula? Feynman doesn’t give it, probably because it would indeed confuse us. Just google ‘propagator for a photon’ and you’ll see what I mean. He does integrate the above conclusions in that illustration (b) though. What illustration? 

Oh… Sorry. You probably forgot what I am trying to do here, but so we’re looking at that analysis of partial reflection of light by glass. Let me insert it once again so you don’t have to scroll all the way up.

partial reflection

You’ll remember that Feynman divided the glass sheet into five sections and, hence, there are six points from which the photon can be scattered into the detector at A: X1 to X6. So that makes six possible paths: these paths are all straight (so Feynman makes abstraction of all of the crooked paths indeed), and the other assumption is that the photon effectively traveled at the speed of light, whatever path it took (so Feynman also assumes the amplitudes for speeds higher or lower than c cancel each other out). So that explains the difference in time at emission from the light source. The longest path is the path to point X6 and then back up to the detector. If the photon would have taken that path, it would have to be emitted earlier in time – earlier as compared to the other possibilities, which take less time. So it would have to be emitted at T = T6. The direction of the ‘arrow’ is like one o’clock. The shorter paths are associated with shorter times (the difference between the time of arrival and departure is shorter) and so T5 is associated with an arrow in the 12 o’clock direction, T5 is 11 o’clock, and so on, till T5, which points at the 9 o’clock direction.

But… What? These arrows also include the reflection, i.e. the interaction between the photon and some electron in the glass, don’t they? […] Right you are. Sorry. So… Yes. The action above involves four ‘basic actions’:

  1. A photon is emitted by the source at a time T = T1, T2, T3, T4, T5 or T6: we don’t know. Quantum-mechanical uncertainty. 🙂
  2. It goes from the source to one of the points X = X1, X2, X3, X4, X5 or Xin the glass: we don’t know which one, because we don’t have a detector there.
  3. The photon interacts with an electron at that point.
  4. It makes it way back up to the detector at A.

Step 1 does not have any amplitude. It’s just the start of the event. Well… We start with the unit arrow pointing north actually, so its length is one and its direction is 12 o’clock. And so we’ll shrink and turn it, i.e. multiply it with other arrows, in the next steps.

Steps 2 and 4 are straightforward and are associated with arrows of the same length. Their direction depends on the distance traveled and/or the time of emission: it amounts to the same because we assume the speed is constant and exactly the same for the six possibilities (that speed is c = 1 obviously). But what length? Well… Some length according to that formula which Feynman didn’t give us. 🙂

So now we need to analyze the third of those three basic actions: a ‘junction’ or ‘coupling’ between an electron and a photon. At this point, Feynman embarks on a delightful story highlighting the difficulties involved in calculating that amplitude. A photon can travel following crooked paths and at devious speeds, but an electron is even worse: it can take what Feynman refers to as ‘one-hop flights’, ‘two-hop flights’, ‘three-hop flights’,… any ‘n-hop flight’ really. Each stop involves an additional amplitude, which is represented by n2, with n some number that has been determined from experiment. The formula for E(A to B) then becomes a series of terms: P(A to B) + (P(A to C)∗n2∗(P(C to B) + (P(A to D)∗n2∗P(D to E)∗n2∗P(E to C)+…

P(A to B) is the ‘one-hop flight’ here, while C, D and E are intermediate points, and (P(A to C)∗n2∗(P(C to B) and (P(A to D)∗n2∗P(D to E)∗n2∗P(E to C) are the ‘two-hop’ and ‘three-hop’ flight respectively. Note that this calculation has to be made for all possible intermediate points C, D, E and so on. To make matters worse, the theory assumes that electrons can emit and absorb photons along the way, and then there’s a host of other problems, which Feynman tries to explain in the last and final chapter of his little book. […]

Hey! Stop it!


You’re talking about E(A to B) here. You’re supposed to be talking about that junction number j.

Oh… Sorry. You’re right. Well… That junction number j is about –0.1. I know that looks like an ordinary number, but it’s an amplitude, so you should interpret it as an arrow. When you multiply it with another arrow, it amounts to a shrink to one-tenth, and half a turn. Feynman entertains us also on the difficulties of calculating this number but, you’re right, I shouldn’t be trying to copy him here – if only because it’s about time I finish this post. 🙂

So let me conclude it indeed. We can apply the same transformation (i.e. we multiply with j) to each of the six arrows we’ve got so far, and the result is those six arrows next to the time axis in illustration (b). And then we combine them to get that arc, and then we apply that mathematical trick to show we get the same result as in a classical wave-theoretical analysis of partial reflection.

Done. […] Are you happy now?

[…] You shouldn’t be. There are so many questions that have been left unanswered. For starters, Feynman never gives that formula for the length of P(A to B), so we have no clue about the length of these arrows and, hence, about that arc. If physicists know their length, it seems to have been calculated backwards – from those 0.2 arrows used in the classical wave theory of light. Feynman is actually quite honest about that, and simply writes:

“The radius of the arc [i.e. the arc that determines the final arrow] evidently depends on the length of the arrow for each section, which is ultimately determined by the amplitude S that an electron in an atom of glass scatters a photon. This radius can be calculated using the formulas for the three basic actions. […] It must be said, however, that no direct calculation from first principles for a substance as complex as glass has actually been done. In such cases, the radius is determined by experiment. For glass, it has been determined from experiment that the radius is approximately 0.2 (when the light shines directly onto the glass at right angles).”

Well… OK. I think that says enough. So we have a theory – or first principles at last – but we don’t them to calculate. That actually sounds a bit like metaphysics to me. 🙂 In any case… Well… Bye for now!

But… Hey! You said you’d analyze how light goes straight through the glass as well?

Yes. I did. But I don’t feel like doing that right now. I think we’ve got enough stuff to think about right now, don’t we? 🙂

Applied vector analysis (II)

We’ve covered a lot of ground in the previous post, but we’re not quite there yet. We need to look at a few more things in order to gain some kind of ‘physical’ understanding’ of Maxwell’s equations, as opposed to a merely ‘mathematical’ understanding only. That will probably disappoint you. In fact, you probably wonder why one needs to know about Gauss’ and Stokes’ Theorems if the only objective is to ‘understand’ Maxwell’s equations.

To some extent, your skepticism is justified. It’s already quite something to get some feel for those two new operators we’ve introduced in the previous post, i.e. the divergence (div) and curl operators, denoted by ∇• and × respectively. By now, you understand that these two operators act on a vector field, such as the electric field vector E, or the magnetic field vector B, or, in the example we used, the heat flow h, so we should write •(a vector) and ×(a vector. And, as for that del operator – i.e.  without the dot (•) or the cross (×) – if there’s one diagram you should be able to draw off the top of your head, it’s the one below, which shows:

  1. The heat flow vector h, whose magnitude is the thermal energy that passes, per unit time and per unit area, through an infinitesimally small isothermal surface, so we write: h = |h| = ΔJ/ΔA.
  2. The gradient vector T, whose direction is opposite to that of h, and whose magnitude is proportional to h, so we can write the so-called differential equation of heat flow: h = –κT.
  3. The components of the vector dot product ΔT = T•ΔR = |T|·ΔR·cosθ.

Temperature drop

You should also remember that we can re-write that ΔT = T•ΔR = |T|·ΔR·cosθ equation – which we can also write as ΔT/ΔR = |T|·cosθ – in a more general form:

Δψ/ΔR = |ψ|·cosθ

That equation says that the component of the gradient vector ψ along a small displacement ΔR is equal to the rate of change of ψ in the direction of ΔRAnd then we had three important theorems, but I can imagine you don’t want to hear about them anymore. So what can we do without them? Let’s have a look at Maxwell’s equations again and explore some linkages.

Curl-free and divergence-free fields

From what I wrote in my previous post, you should remember that:

  1. The curl of a vector field (i.e. ×C) represents its circulation, i.e. its (infinitesimal) rotation.
  2. Its divergence (i.e. ∇•C) represents the outward flux out of an (infinitesimal) volume around the point we’re considering.

Back to Maxwell’s equations:

Maxwell's equations-2

Let’s start at the bottom, i.e. with equation (4). It says that a changing electric field (i.e. ∂E/∂t ≠ 0) and/or a (steady) electric current (j0) will cause some circulation of B, i.e. the magnetic field. It’s important to note that (a) the electric field has to change and/or (b) that electric charges (positive or negative) have to move  in order to cause some circulation of B: a steady electric field will not result in any magnetic effects.

This brings us to the first and easiest of all the circumstances we can analyze: the static case. In that case, the time derivatives ∂E/∂t and ∂B/∂t are zero, and Maxwell’s equations reduce to:

  1. ∇•E = ρ/ε0. In this equation, we have ρ, which represents the so-called charge density, which describes the distribution of electric charges in space: ρ = ρ(x, y, z). To put it simply: ρ is the ‘amount of charge’ (which we’ll denote by Δq) per unit volume at a given point. Hence, if we  consider a small volume (ΔV) located at point (x, y, z) in space – an infinitesimally small volume, in fact (as usual) –then we can write: Δq =  ρ(x, y, z)ΔV. [As for ε0, you already know this is a constant which ensures all units are ‘compatible’.] This equation basically says we have some flux of E, the exact amount of which is determined by the charge density ρ or, more in general, by the charge distribution in space.  
  2. ×E = 0. That means that the curl of E is zero: everywhere, and always. So there’s no circulation of E. We call this a curl-free field.
  3. B = 0. That means that the divergence of B is zero: everywhere, and always. So there’s no flux of B. None. We call this a divergence-free field.
  4. c2∇×B = j0. So here we have steady current(s) causing some circulation of B, the exact amount of which is determined by the (total) current j. [What about that cfactor? Well… We talked about that before: magnetism is, basically, a relativistic effect, and so that’s where that factor comes from. I’ll just refer you to what Feynman writes about this in his Lectures, and warmly recommend to read it, because it’s really quite interesting: it gave me at least a much deeper understanding of what it’s all about, and so I hope it will help you as much.]

Now you’ll say: why bother with all these difficult mathematical constructs if we’re going to consider curl-free and divergence-free fields only. Well… B is not curl-free, and E is not divergence-free. To be precise:

  1. E is a field with zero curl and a given divergence, and
  2. B is a field with zero divergence and a given curl.

Yeah, but why can’t we analyze fields that have both curl and divergence? The answer is: we can, and we will, but we have to start somewhere, and so we start with an easier analysis first.

Electrostatics and magnetostatics

The first thing you should note is that, in the static case (i.e. when charges and currents are static), there is no interdependence between E and B. The two fields are not interconnected, so to say. Therefore, we can neatly separate them into two pairs:

  1. Electrostatics: (1) ∇•E = ρ/ε0 and (2) ×E = 0.
  2. Magnetostatics: (1) ∇×B = j/c2ε0 and (2) B = 0.

Now, I won’t go through all of the particularities involved. In fact, I’ll refer you to a real physics textbook on that (like Feynman’s Lectures indeed). My aim here is to use these equations to introduce some more math and to gain a better understanding of vector calculus – an understanding that goes, in fact, beyond the math (i.e. a ‘physical’ understanding, as Feynman terms it).

At this point, I have to introduce two additional theorems. They are nice and easy to understand (although not so easy to prove, and so I won’t):

Theorem 1: If we have a vector field – let’s denote it by C – and we find that its curl is zero everywhere, then C must be the gradient of something. In other words, there must be some scalar field ψ (psi) such that C is equal to the gradient of ψ. It’s easier to write this down as follows:

If ×= 0, there is a ψ such that C = ψ.

Theorem 2: If we have a vector field – let’s denote it by D, just to introduce yet another letter – and we find that its divergence is zero everywhere, then D must be the curl of some vector field A. So we can write:

If D = 0, there is an A such that D = ×A.

We can apply this to the situation at hand:

  1. For E, there is some scalar potential Φ such that E = –Φ. [Note that we could integrate the minus sign in Φ, but we leave it there as a reminder that the situation is similar to that of heat flow. It’s a matter of convention really: E ‘flows’ from higher to lower potential.]
  2. For B, there is a so-called vector potential A such that B = ×A.

The whole game is then to compute Φ and A everywhere. We can then take the gradient of Φ, and the curl of A, to find the electric and magnetic field respectively, at every single point in space. In fact, most of Feynman’s second Volume of his Lectures is devoted to that, so I’ll refer you that if you’d be interested. As said, my goal here is just to introduce the basics of vector calculus, so you gain a better understanding of physics, i.e. an understanding which goes beyond the math.


We’re almost done. Electrodynamics is, of course, much more complicated than the static case, but I don’t have the intention to go too much in detail here. The important thing is to see the linkages in Maxwell’s equations. I’ve highlighted them below:

Maxwell interaction

I know this looks messy, but it’s actually not so complicated. The interactions between the electric and magnetic field are governed by equation (2) and (4), so equation (1) and (3) is just ‘statics’. Something needs to trigger it all, of course. I assume it’s an electric current (that’s the arrow marked by [0]).

Indeed, equation (4), i.e. c2∇×B = ∂E/∂t + j0, implies that a changing electric current – an accelerating electric charge, for instance – will cause the circulation of B to change. More specifically, we can write: ∂[c2∇×B]/∂t = ∂[j0]∂t. However, as the circulation of B changes, the magnetic field B itself must be changing. Hence, we have a non-zero time derivative of B (∂B/∂t ≠ 0). But, then, according to equation (2), i.e. ∇×E = –∂B/∂t, we’ll have some circulation of E. That’s the dynamics marked by the red arrows [1].

Now, assuming that ∂B/∂t is not constant (because that electric charge accelerates and decelerates, for example), the time derivative ∂E/∂t will be non-zero too (∂E/∂t ≠ 0). But so that feeds back into equation (4), according to which a changing electric field will cause the circulation of B to change. That’s the dynamics marked by the yellow arrows [2].

The ‘feedback loop’ is closed now: I’ve just explained how an electromagnetic field (or radiation) actually propagates through space. Below you can see one of the fancier animations you can find on the Web. The blue oscillation is supposed to represent the oscillating magnetic vector, while the red oscillation is supposed to represent the electric field vector. Note how the effect travels through space.


This is, of course, an extremely simplified view. To be precise, it assumes that the light wave (that’s what an electromagnetic wave actually is) is linearly (aka as plane) polarized, as the electric (and magnetic field) oscillate on a straight line. If we choose the direction of propagation as the z-axis of our reference frame, the electric field vector will oscillate in the xy-plane. In other words, the electric field will have an x- and a y-component, which we’ll denote as Ex and Erespectively, as shown in the diagrams below, which give various examples of linear polarization.

linear polarizationLight is, of course, not necessarily plane-polarized. The animation below shows circular polarization, which is a special case of the more general elliptical polarization condition.


The relativity of magnetic and electric fields

Allow me to make a small digression here, which has more to do with physics than with vector analysis. You’ll have noticed that we didn’t talk about the magnetic field vector anymore when discussing the polarization of light. Indeed, when discussing electromagnetic radiation, most – if not all – textbooks start by noting we have E and B vectors, but then proceed to discuss the E vector only. Where’s the magnetic field? We need to note two things here.

1. First, I need to remind you of the force on any electrically charged particle (and note we only have electric charge: there’s no such thing as a magnetic charge according to Maxwell’s third equation) consists of two components. Indeed, the total electromagnetic force (aka Lorentz force) on a charge q is:

F = q(E + v×B) = qE + q(v×B) = FE + FM

The velocity vector v is the velocity of the charge: if the charge is not moving, then there’s no magnetic force. The illustration below shows you the components of the vector cross product that, by now, you’re fully familiar with. Indeed, in my previous post, I gave you the expressions for the x, y and z coordinate of a cross product, but there’s a geometrical definition as well:

v×B = |v||B|sin(θ)n

magnetic force507px-Right_hand_rule_cross_product

The magnetic force FM is q(v×B) = qv×B q|v||B|sin(θ)n. The unit vector n determines the direction of the force, which is determined by that right-hand rule that, by now, you also are fully familiar with: it’s perpendicular to both v and B (cf. the two 90° angles in the illustration). Just to make sure, I’ve also added the right-hand rule illustration above: check it out, as it does involve a bit of arm-twisting in this case. 🙂

In any case, the point to note here is that there’s only one electromagnetic force on the particle. While we distinguish between an E and a B vector, the E and B vector depend on our reference frame. Huh? Yes. The velocity v is relative: we specify the magnetic field in a so-called inertial frame of reference here. If we’d be moving with the charge, the magnetic force would, quite simply, disappear, because we’d have a v equal to zero, so we’d have v×B = 0×B= 0. Of course, all other charges (i.e. all ‘stationary’ and ‘moving’ charges that were causing the field in the first place) would have different velocities as well and, hence, our E and B vector would look very different too: they would come in a ‘different mixture’, as Feynman puts it. [If you’d want to know in what mixture exactly, I’ll refer you Feynman: it’s a rather lengthy analysis (five rather dense pages, in fact), but I can warmly recommend it: in fact, you should go through it if only to test your knowledge at this point, I think.]

You’ll say: So what? That doesn’t answer the question above. Why do physicists leave out the magnetic field vector in all those illustrations?

You’re right. I haven’t answered the question. This first remark is more like a warning. Let me quote Feynman on it:

“Since electric and magnetic fields appear in different mixtures if we change our frame of reference, we must be careful about how we look at the fields E and B. […] The fields are our way of describing what goes on at a point in space. In particular, E and B tell us about the forces that will act on a moving particle. The question “What is the force on a charge from a moving magnetic field?” doesn’t mean anything precise. The force is given by the values of E and B at the charge, and the F = q(E + v×B) formula is not to be altered if the source of E or B is moving: it is the values of E and B that will be altered by the motion. Our mathematical description deals only with the fields as a function of xy, z, and t with respect to some inertial frame.”

If you allow me, I’ll take this opportunity to insert another warning, one that’s quite specific to how we should interpret this concept of an electromagnetic wave. When we say that an electromagnetic wave ‘travels’ through space, we often tend to think of a wave traveling on a string: we’re smart enough to understand that what is traveling is not the string itself (or some part of the string) but the amplitude of the oscillation: it’s the vertical displacement (i.e. the movement that’s perpendicular to the direction of ‘travel’) that appears first at one place and then at the next and so on and so on. It’s in that sense, and in that sense only, that the wave ‘travels’. However, the problem with this comparison to a wave traveling on a string is that we tend to think that an electromagnetic wave also occupies some space in the directions that are perpendicular to the direction of travel (i.e. the x and y directions in those illustrations on polarization). Now that’s a huge misconception! The electromagnetic field is something physical, for sure, but the E and B vectors do not occupy any physical space in the x and y direction as they ‘travel’ along the z direction!

Let me conclude this digression with Feynman’s conclusion on all of this:

“If we choose another coordinate system, we find another mixture of E and B fields. However, electric and magnetic forces are part of one physical phenomenon—the electromagnetic interactions of particles. While the separation of this interaction into electric and magnetic parts depends very much on the reference frame chosen for the description, the complete electromagnetic description is invariant: electricity and magnetism taken together are consistent with Einstein’s relativity.”

2. You’ll say: I don’t give a damn about other reference frames. Answer the question. Why are magnetic fields left out of the analysis when discussing electromagnetic radiation?

The answer to that question is very mundane. When we know E (in one or the other reference frame), we also know B, and, while B is as ‘essential’ as E when analyzing how an electromagnetic wave propagates through space, the truth is that the magnitude of B is only a very tiny fraction of that of E.

Huh? Yes. That animation with these oscillating blue and red vectors is very misleading in this regard. Let me be precise here and give you the formulas:

E vector of wave

B vector of a wave

I’ve analyzed these formulas in one of my other posts (see, for example, my first post on light and radiation), and so I won’t repeat myself too much here. However, let me recall the basics of it all. The eR′ vector is a unit vector pointing in the apparent direction of the charge. When I say ‘apparent’, I mean that this unit vector is not pointing towards the present position of the charge, but at where is was a little while ago, because this ‘signal’ can only travel from the charge to where we are now at the same speed of the wave, i.e. at the speed of light c. That’s why we prime the (radial) vector R also (so we write R′ instead of R). So that unit vector wiggles up and down and, as the formula makes clear, it’s the second-order derivative of that movement which determines the electric field. That second-order derivative is the acceleration vector, and it can be substituted for the vertical component of the acceleration of the charge that caused the radiation in the first place but, again, I’ll refer you my post on that, as it’s not the topic we want to cover here.

What we do want to look at here, is that formula for B: it’s the cross product of that eR′ vector (the minus sign just reverses the direction of the whole thing) and E divided by c. We also know that the E and eR′ vectors are at right angles to each, so the sine factor (sinθ) is 1 (or –1) too. In other words, the magnitude of B is |E|/c =  E/c, which is a very tiny fraction of E indeed (remember: c ≈ 3×108).

So… Yes, for all practical purposes, B doesn’t matter all that much when analyzing electromagnetic radiation, and so that’s why physicists will note it but then proceed and look at E only when discussing radiation. Poor BThat being said, the magnetic force may be tiny, but it’s quite interesting. Just look at its direction! Huh? Why? What’s so interesting about it?  I am not talking the direction of B here: I am talking the direction of the force. Oh… OK… Hmm… Well…

Let me spell it out. Take the force formula: F = q(E + v×B) = qE + q(v×B). When our electromagnetic wave hits something real (I mean anything real, like a wall, or some molecule of gas), it is likely to hit some electron, i.e. an actual electric charge. Hence, the electric and magnetic field should have some impact on it. Now, as we pointed here, the magnitude of the electric force will be the most important one – by far – and, hence, it’s the electric field that will ‘drive’ that charge and, in the process, give it some velocity v, as shown below. In what direction? Don’t ask stupid questions: look at the equation. FE = qE, so the electric force will have the same direction as E.

radiation pressure

But we’ve got a moving charge now and, therefore, the magnetic force comes into play as well! That force is FM  = q(v×B) and its direction is given by the right-hand rule: it’s the F above in the direction of the light beam itself. Admittedly, it’s a tiny force, as its magnitude is F = qvE/c only, but it’s there, and it’s what causes the so-called radiation pressure (or light pressure tout court). So, yes, you can start dreaming of fancy solar sailing ships (the illustration below shows one out of of Star Trek) but… Well… Good luck with it! The force is very tiny indeed and, of course, don’t forget there’s light coming from all directions in space!

solar sail

Jokes aside, it’s a real and interesting effect indeed, but I won’t say much more about it. Just note that we are really talking the momentum of light here, and it’s a ‘real’ as any momentum. In an interesting analysis, Feynman calculates this momentum and, rather unsurprisingly (but please do check out how he calculates these things, as it’s quite interesting), the same 1/c factor comes into play once: the momentum (p) that’s being delivered when light hits something real is equal to 1/c of the energy that’s being absorbed. So, if we denote the energy by W (in order to not create confusion with the E symbol we’ve used already), we can write: p = W/c.

Now I can’t resist one more digression. We’re, obviously, fully in classical physics here and, hence, we shouldn’t mention anything quantum-mechanical here. That being said, you already know that, in quantum physics, we’ll look at light as a stream of photons, i.e. ‘light particles’ that also have energy and momentum. The formula for the energy of a photon is given by the Planck relation: E = hf. The h factor is Planck’s constant here – also quite tiny, as you know – and f is the light frequency of course. Oh – and I am switching back to the symbol E to denote energy, as it’s clear from the context I am no longer talking about the electric field here.

Now, you may or may not remember that relativity theory yields the following relations between momentum and energy:  

E2 – p2c2 = m0cand/or pc = Ev/c

In this equations, mstands, obviously, for the rest mass of the particle, i.e. its mass at v = 0. Now, photons have zero rest mass, but their speed is c. Hence, both equations reduce to p = E/c, so that’s the same as what Feynman found out above: p = W/c.

Of course, you’ll say: that’s obvious. Well… No, it’s not obvious at all. We do find the same formula for the momentum of light (p) – which is great, of course –  but so we find the same thing coming from very different necks parts of the woods. The formula for the (relativistic) momentum and energy of particles comes from a very classical analysis of particles – ‘real-life’ objects with mass, a very definite position in space and whatever other properties you’d associate with billiard balls – while that other p = W/c formula comes out of a very long and tedious analysis of light as an electromagnetic wave. The two analytical frameworks couldn’t differ much more, could they? Yet, we come to the same conclusion indeed.

Physics is wonderful. 🙂

So what’s left?

Lots, of course! For starters, it would be nice to show how these formulas for E and B with eR′ in them can be derived from Maxwell’s equations. There’s no obvious relation, is there? You’re right. Yet, they do come out of the very same equations. However, for the details, I have to refer you to Feynman’s Lectures once again – to the second Volume to be precise. Indeed, besides calculating scalar and vector potentials in various situations, a lot of what he writes there is about how to calculate these wave equations from Maxwell’s equations. But so that’s not the topic of this post really. It’s, quite simply, impossible to ‘summarize’ all those arguments and derivations in a single post. The objective here was to give you some idea of what vector analysis really is in physics, and I hope you got the gist of it, because that’s what needed to proceed. 🙂

The other thing I left out is much more relevant to vector calculus. It’s about that del operator () again: you should note that it can be used in many more combinations. More in particular, it can be used in combinations involving second-order derivatives. Indeed, till now, we’ve limited ourselves to first-order derivatives only. I’ll spare you the details and just copy a table with some key results:

  1. •(T) = div(grad T) = T = ()T = ∇2T = ∂2T/∂x+ ∂2T/∂y+ ∂2T/∂z= a scalar field
  2. ()h = ∇2= a vector field
  3. (h) = grad(div h) = a vector field
  4. ×(×h) = curl(curl h) =(h) – ∇2h
  5. ∇•(×h) = div(curl h) = 0 (always)
  6. ×(T) = curl(grad T) = 0 (always)

So we have yet another set of operators here: not less than six, to be precise. You may think that we can have some more, like (×), for example. But… No. A (×) operator doesn’t make sense. Just write it out and think about it. Perhaps you’ll see why. You can try to invent some more but, if you manage, you’ll see they won’t make sense either. The combinations that do make sense are listed above, all of them.

Now, while of these combinations make (some) sense, it’s obvious that some of these combinations are more useful than others. More in particular, the first operator, ∇2, appears very often in physics and, hence, has a special name: it’s the Laplacian. As you can see, it’s the divergence of the gradient of a function.

Note that the Laplace operator (∇2) can be applied to both scalar as well as vector functions. If we operate with it on a vector, we’ll apply it to each component of the vector function. The Wikipedia article on the Laplace operator shows how and where it’s used in physics, and so I’ll refer to that if you’d want to know more. Below, I’ll just write out the operator itself, as well as how we apply it to a vector:



So that covers (1) and (2) above. What about the other ‘operators’?

Let me start at the bottom. Equations (5) and (6) are just what they are: two results that you can use in some mathematical argument or derivation. Equation (4) is… Well… Similar: it’s an identity that may or may not help one when doing some derivation.

What about (3), i.e. the gradient of the divergence of some vector function? Nothing special. As Feynman puts it: “It is a possible vector field, but there is nothing special to say about it. It’s just some vector field which may occasionally come up.”

So… That should conclude my little introduction to vector analysis, and so I’ll call it a day now. 🙂 I hope you enjoyed it.