The de Broglie relations, the wave equation, and relativistic length contraction

Pre-script (dated 26 June 2020): Our ideas have evolved into a full-blown realistic (or classical) interpretation of all things quantum-mechanical. So no use to read this. Read my recent papers instead. 🙂

Original post:

You know the two de Broglie relations, also known as matter-wave equations:

f = E/h and λ = h/p

You’ll find them in almost any popular account of quantum mechanics, and the writers of those popular books will tell you that f is the frequency of the ‘matter-wave’, and λ is its wavelength. In fact, to add some more weight to their narrative, they’ll usually write them in a somewhat more sophisticated form: they’ll write them using ω and k. The omega symbol (using a Greek letter always makes a big impression, doesn’t it?) denotes the angular frequency, while k is the so-called wavenumber. Now, k = 2π/λ and ω = 2π·f and, therefore, using the definition of the reduced Planck constant, i.e. ħ = h/2π, they’ll write the same relations as:

λ = h/p = 2π/k ⇔ k = 2π·p/h
f = E/h = (ω/2π)

⇒ k = p/ħ and ω = E/ħ

They’re the same thing: it’s just that working with angular frequencies and wavenumbers is more convenient, from a mathematical point of view that is: it’s why we prefer expressing angles in radians rather than in degrees (k is expressed in radians per meter, while ω is expressed in radians per second). In any case, the ‘matter wave’ – even Wikipedia uses that term now – is, of course, the amplitude, i.e. the wave-function ψ(x, t), which has a frequency and a wavelength, indeed. In fact, as I’ll show in a moment, it’s got two frequencies: one temporal, and one spatial. I am modest and, hence, I’ll admit it took me quite a while to fully distinguish the two frequencies, and so that’s why I always had trouble connecting these two ‘matter wave’ equations.

Indeed, if they represent the same thing, they must be related, right? But how exactly? It should be easy enough. The wavelength and the frequency must be related through the wave velocity, so we can write: f·λ = v, with v the velocity of the wave, which must be equal to the classical particle velocity, right? And then momentum and energy are also related. To be precise, we have the relativistic energy-momentum relationship: p·c = m_v·v·c = m_v·c²·v/c = E·v/c. So it’s just a matter of substitution. We should be able to go from one equation to the other, and vice versa. Right?

Well… No. It’s not that simple. We can start with either of the two equations but it doesn’t work. Try it. Whatever substitution you try, there’s no way you can derive one of the two equations above from the other. The fact that it’s impossible is evidenced by what we get when we’d multiply both equations. We get:

f·λ = (E/h)·(h/p) = E/p
v = f·λ ⇒ f·λ = v = E/p ⇔ E = v·p = v·(m·v)

⇒ E = m·v²

Huh? What kind of formula is that? E = m·v²? That’s a formula you’ve never ever seen, have you? It reminds you of the kinetic energy formula of course—K.E. = m·v²/2—but… That factor 1/2 should not be there. Let’s think about it for a while. First note that this E = m·v²relation makes perfectly sense if v = c. In that case, we get Einstein’s mass-energy equivalence (E = m·c²), but that’s besides the point here. The point is: if v = c, then our ‘particle’ is a photon, really, and then the E = h·f is referred to as the Planck-Einstein relation. The wave velocity is then equal to c and, therefore, f·λ = c, and so we can effectively substitute to find what we’re looking for:

E/p = (h·f)/(h/λ) = f·λ = c ⇒ E = p·c

So that’s fine: we just showed that the de Broglie relations are correct for photons. [You remember that E = p·c relation, no? If not, check out my post on it.] However, while that’s all nice, it is not what the de Broglie equations are about: we’re talking the matter-wave here, and so we want to do something more than just re-confirm that Planck-Einstein relation, which you can interpret as the limit of the de Broglie relations for v = c. In short, we’re doing something wrong here! Of course, we are. I’ll tell you what exactly in a moment: it’s got to do with the fact we’ve got two frequencies really.

Let’s first try something else. We’ve been using the relativistic E = m_v·c² equation above. Let’s try some other energy concept: let’s substitute the E in the f = E/h by the kinetic energy and then see where we get—if anywhere at all. So we’ll use the E_kinetic = m∙v²/2 equation. We can then use the definition of momentum (p = m∙v) to write E = p²/(2m), and then we can relate the frequency f to the wavelength λ using the v = λ∙f formula once again. That should work, no? Let’s do it. We write:

E = p²/(2m)
E = h∙f = h·v/λ

⇒ λ = h·v/E = h·v/(p²/(2m)) = h·v/[m²·v²/(2m)] = h/[m·v/2] = 2∙h/p

So we find λ = 2∙h/p. That is almost right, but not quite: that factor 2 should not be there. Well… Of course you’re smart enough to see it’s just that factor 1/2 popping up once more—but as a reciprocal, this time around. 🙂 So what’s going on? The honest answer is: you can try anything but it will never work, because the f = E/h and λ = h/p equations cannot be related—or at least not so easily. The substitutions above only work if we use that E = m·v² energy concept which, you’ll agree, doesn’t make much sense—at first, at least. Again: what’s going on? Well… Same honest answer: the f = E/h and λ = h/p equations cannot be related—or at least not so easily—because the wave equation itself is not so easy.

Let’s review the basics once again.

The wavefunction

The amplitude of a particle is represented by a wavefunction. If we have no information whatsoever on its position, then we usually write that wavefunction as the following complex-valued exponential:

ψ(x, t) = a·e^{−i·[(E/ħ)·t − (p/ħ)∙x]} = a·e^{−i·(ω·t − k∙x)}= a·e^{i(k∙x−ω·t)}= a·e^iθ= a·(cosθ + i·sinθ)

θ is the so-called phase of our wavefunction and, as you can see, it’s the argument of a wavefunction indeed, with temporal frequency ω and spatial frequency k (if we choose our x-axis so its direction is the same as the direction of k, then we can substitute the k and x vectors for the k and x scalars, so that’s what we’re doing here). Now, we know we shouldn’t worry too much about a, because that’s just some normalization constant (remember: all probabilities have to add up to one). However, let’s quickly develop some logic here. Taking the absolute square of this wavefunction gives us the probability of our particle being somewhere in space at some point in time. So we get the probability as a function of x and t. We write:

P(x ,t) = |a·e^{−i·[(E/ħ)·t − (p/ħ)∙x]}|²= a²

As all probabilities have to add up to one, we must assume we’re looking at some box in spacetime here. So, if the length of our box is Δx = x₂ − x₁, then (Δx)·a² = (x₂−x₁)·a²= 1 ⇔ Δx = 1/a². [We obviously simplify the analysis by assuming a one-dimensional space only here, but the gist of the argument is essentially correct.] So, freezing time (i.e. equating t to some point t = t₀), we get the following probability density function:

That’s simple enough. The point is: the two de Broglie equations f = E/h and λ = h/p give us the temporal and spatial frequencies in that ψ(x, t) = a·e^{−i·[(E/ħ)·t − (p/ħ)∙x]} relation. As you can see, that’s an equation that implies a much more complicated relationship between E/ħ = ω and p/ħ = k. Or… Well… Much more complicated than what one would think of at first.

To appreciate what’s being represented here, it’s good to play a bit. We’ll continue with our simple exponential above, which also illustrates how we usually analyze those wavefunctions: we either assume we’re looking at the wavefunction in space at some fixed point in time (t = t₀) or, else, at how the wavefunction changes in time at some fixed point in space (x = x₀). Of course, we know that Einstein told us we shouldn’t do that: space and time are related and, hence, we should try to think of spacetime, i.e. some ‘kind of union’ of space and time—as Minkowski famously put it. However, when everything is said and done, mere mortals like us are not so good at that, and so we’re sort of condemned to try to imagine things using the classical cut-up of things. 🙂 So we’ll just an online graphing tool to play with that a·e^{i(k∙x−ω·t)}= a·e^iθ= a·(cosθ + i·sinθ) formula.

Compare the following two graps, for example. Just imagine we either look at how the wavefunction behaves at some point in space, with the time fixed at some point t = t₀, or, alternatively, that we look at how the wavefunction behaves in time at some point in space x = x₀. As you can see, increasing k = p/ħ or increasing ω = E/ħ gives the wavefunction a higher ‘density’ in space or, alternatively, in time.

That makes sense, intuitively. In fact, when thinking about how the energy, or the momentum, affects the shape of the wavefunction, I am reminded of an airplane propeller: as it spins, faster and faster, it gives the propeller some ‘density’, in space as well as in time, as its blades cover more space in less time. It’s an interesting analogy: it helps—me, at least—to think through what that wavefunction might actually represent.

So as to stimulate your imagination even more, you should also think of representing the real and complex part of that ψ = a·e^{i(k∙x−ω·t)}= a·e^iθ= a·(cosθ + i·sinθ) formula in a different way. In the graphs above, we just showed the sine and cosine in the same plane but, as you know, the real and the imaginary axis are orthogonal, so Euler’s formula a·e^iθ= a·(cosθ + i·sinθ) = a·cosθ + i·a·sinθ = Re(ψ) + i·Im(ψ) may also be graphed as follows:

The illustration above should make you think of yet another illustration you’ve probably seen like a hundred times before: the electromagnetic wave, propagating through space as the magnetic and electric field induce each other, as illustrated below. However, there’s a big difference: Euler’s formula incorporates a phase shift—remember: sinθ = cos(θ − π/2)—and you don’t have that in the graph below. The difference is much more fundamental, however: it’s really hard to see how one could possibly relate the magnetic and electric field to the real and imaginary part of the wavefunction respectively. Having said that, the mathematical similarity makes one think!

Of course, you should remind yourself of what E and B stand for: they represent the strength of the electric (E) and magnetic (B) field at some point x at some time t. So you shouldn’t think of those wavefunctions above as occupying some three-dimensional space. They don’t. Likewise, our wavefunction ψ(x, t) does not occupy some physical space: it’s some complex number—an amplitude that’s associated with each and every point in spacetime. Nevertheless, as mentioned above, the visuals make one think and, as such, do help us as we try to understand all of this in a more intuitive way.

Let’s now look at that energy-momentum relationship once again, but using the wavefunction, rather than those two de Broglie relations.

Energy and momentum in the wavefunction

I am not going to talk about uncertainty here. You know that Spiel. If there’s uncertainty, it’s in the energy or the momentum, or in both. The uncertainty determines the size of that ‘box’ (in spacetime) in which we hope to find our particle, and it’s modeled by a splitting of the energy levels. We’ll say the energy of the particle may be E₀, but it might also be some other value, which we’ll write as E_n = E₀ ± n·ħ. The thing to note is that energy levels will always be separated by some integer multiple of ħ, so ħ is, effectively , the quantum of energy for all practical—and theoretical—purposes. We then super-impose the various wave equations to get a wave function that might—or might not—resemble something like this:

Who knows? 🙂 In any case, that’s not what I want to talk about here. Let’s repeat the basics once more: if we write our wavefunction a·e^{−i·[(E/ħ)·t − (p/ħ)∙x]} as a·e^{−i·[ω·t − k∙x]}, we refer to ω = E/ħ as the temporal frequency, i.e. the frequency of our wavefunction in time (i.e. the frequency it has if we keep the position fixed), and to k = p/ħ as the spatial frequency (i.e. the frequency of our wavefunction in space (so now we stop the clock and just look at the wave in space). Now, let’s think about the energy concept first. The energy of a particle is generally thought of to consist of three parts:

The particle’s rest energy m₀c², which de Broglie referred to as internal energy (E_int): it includes the rest mass of the ‘internal pieces’, as Feynman puts it (now we call those ‘internal pieces’ quarks), as well as their binding energy (i.e. the quarks’ interaction energy);
Any potential energy it may have because of some field (so de Broglie was not assuming the particle was traveling in free space), which we’ll denote by U, and note that the field can be anything—gravitational, electromagnetic: it’s whatever changes the energy because of the position of the particle;
The particle’s kinetic energy, which we write in terms of its momentum p: m·v²/2 = m²·v²/(2m) = (m·v)²/(2m) = p²/(2m).

So we have one energy concept here (the rest energy) that does not depend on the particle’s position in spacetime, and two energy concepts that do depend on position (potential energy) and/or how that position changes because of its velocity and/or momentum (kinetic energy). The two last bits are related through the energy conservation principle. The total energy is E = m_vc², of course—with the little subscript (v) ensuring the mass incorporates the equivalent mass of the particle’s kinetic energy.

So what? Well… In my post on quantum tunneling, I drew attention to the fact that different potentials , so different potential energies (indeed, as our particle travels one region to another, the field is likely to vary) have no impact on the temporal frequency. Let me re-visit the argument, because it’s an important one. Imagine two different regions in space that differ in potential—because the field has a larger or smaller magnitude there, or points in a different direction, or whatever: just different fields, which corresponds to different values for U₁ and U₂, i.e. the potential in region 1 versus region 2. Now, the different potential will change the momentum: the particle will accelerate or decelerate as it moves from one region to the other, so we also have a different p₁ and p₂. Having said that, the internal energy doesn’t change, so we can write the corresponding amplitudes, or wavefunctions, as:

ψ₁(θ₁) = Ψ₁(x, t) = a·e⁻ⁱ^θ^₁ = a·e^{−i[(E_int+ p₁²/(2m) + U₁)·t − p₁∙x]/ħ}
ψ₂(θ₂) = Ψ₂(x, t) = a·e⁻ⁱ^θ^₂ = a·e^{−i[(E_int+ p₂²/(2m) + U₂)·t − p₂∙x]/ħ}

Now how should we think about these two equations? We are definitely talking different wavefunctions. However, their temporal frequencies ω₁= E_int+ p₁²/(2m) + U₁ and ω₁= E_int+ p₂²/(2m) + U₂must be the same. Why? Because of the energy conservation principle—or its equivalent in quantum mechanics, I should say: the temporal frequency f or ω, i.e. the time-rate of change of the phase of the wavefunction, does not change: all of the change in potential, and the corresponding change in kinetic energy, goes into changing the spatial frequency, i.e. the wave number k or the wavelength λ, as potential energy becomes kinetic or vice versa. The sum of the potential and kinetic energy doesn’t change, indeed. So the energy remains the same and, therefore, the temporal frequency does not change. In fact, we need this quantum-mechanical equivalent of the energy conservation principle to calculate how the momentum and, hence, the spatial frequency of our wavefunction, changes. We do so by boldly equating ω₁= E_int+ p₁²/(2m) + U₁and ω₂ = E_int+ p₂²/(2m) + U₂, and so we write:

ω₁= ω₂ ⇔ E_int+ p₁²/(2m) + U₁= E_int+ p₂²/(2m) + U₂

⇔ p₁²/(2m) − p₂²/(2m) = U₂– U₁⇔ p₂²= (2m)·[p₁²/(2m) – (U₂– U₁)]

⇔ p₂= (p₁² – 2m·ΔU)^1/2

We played with this in a previous post, assuming that p₁² is larger than 2m·ΔU, so as to get a positive number on the right-hand side of the equation for p₂², so then we can confidently take the positive square root of that (p₁² – 2m·ΔU) expression to calculate p₂. For example, when the potential difference ΔU = U₂– U₁ was negative, so ΔU < 0, then we’re safe and sure to get some real positive value for p₂.

Having said that, we also contemplated the possibility that p₂²= p₁² – 2m·ΔU was negative, in which case p₂ has to be some pure imaginary number, which we wrote as p₂= i·p’ (so p’ (read: p prime) is a real positive number here). We could work with that: it resulted in an exponentially decreasing factor e^{−p’·x/ħ} that ended up ‘killing’ the wavefunction in space. However, its limited existence still allowed particles to ‘tunnel’ through potential energy barriers, thereby explaining the quantum-mechanical tunneling phenomenon.

This is rather weird—at first, at least. Indeed, one would think that, because of the E/ħ = ω equation, any change in energy would lead to some change in ω. But no! The total energy doesn’t change, and the potential and kinetic energy are like communicating vessels: any change in potential energy is associated with a change in p, and vice versa. It’s a really funny thing. It helps to think it’s because the potential depends on position only, and so it should not have an impact on the temporal frequency of our wavefunction. Of course, it’s equally obvious that the story would change drastically if the potential would change with time, but… Well… We’re not looking at that right now. In short, we’re assuming energy is being conserved in our quantum-mechanical system too, and so that implies what’s described above: no change in ω, but we obviously do have changes in p whenever our particle goes from one region in space to another, and the potentials differ. So… Well… Just remember: the energy conservation principle implies that the temporal frequency of our wave function doesn’t change. Any change in potential, as our particle travels from one place to another, plays out through the momentum.

Now that we know that, let’s look at those de Broglie relations once again.

**Re-visiting the de Broglie relations**

As mentioned above, we usually think in one dimension only: we either freeze time or, else, we freeze space. If we do that, we can derive some funny new relationships. Let’s first simplify the analysis by re-writing the argument of the wavefunction as:

θ = E·t − p·x

Of course, you’ll say: the argument of the wavefunction is not equal to E·t − p·x: it’s (E/ħ)·t − (p/ħ)∙x. Moreover, θ should have a minus sign in front. Well… Yes, you’re right. We should put that 1/ħ factor in front, but we can change units, and so let’s just measure both E as well as p in units of ħ here. We can do that. No worries. And, yes, the minus sign should be there—Nature choose a clockwise direction for θ—but that doesn’t matter for the analysis hereunder.

The E·t − p·x expression reminds one of those invariant quantities in relativity theory. But let’s be precise here. We’re thinking about those so-called four-vectors here, which we wrote as p_μ = (E, p_x, p_y, p_z) = (E, p) and x_μ = (t, x, y, z) = (t, x) respectively. [Well… OK… You’re right. We wrote those four-vectors as p_μ = (E, p_x·c , p_y·c, p_z·c) = (E, p·c) and x_μ = (c·t, x, y, z) = (t, x). So what we write is true only if we measure time and distance in equivalent units so we have c = 1. So… Well… Let’s do that and move on.] In any case, what was invariant was not E·t − p·x·c or c·t − x (that’s a nonsensical expression anyway: you cannot subtract a vector from a scalar), but p_μ² = p_μp_μ= E² − (p·c)² = E² − p²·c²= E² − (p_x² + p_y² + p_z²)·c² and x_μ² = x_μx_μ= (c·t)² − x² = c²·t² − (x² + y² + z²) respectively. [Remember p_μp_μand x_μx_μare four-vector dot products, so they have that +— signature, unlike the p² and x² or a·b dot products, which are just a simple sum of the squared components.] So… Well… E·t − p·x is not an invariant quantity. Let’s try something else.

Let’s re-simplify by equating ħ as well as c to one again, so we write: ħ = c = 1. [You may wonder if it is possible to ‘normalize’ both physical constants simultaneously, but the answer is yes. The Planck unit system is an example.] then our relativistic energy-momentum relationship can be re-written as E/p = 1/v. [If c would not be one, we’d write: E·β = p·c, with β = v/c. So we got E/p = c/β. We referred to β as the relative velocity of our particle: it was the velocity, but measured as a ratio of the speed of light. So here it’s the same, except that we use the velocity symbol v now for that ratio.]

Now think of a particle moving in free space, i.e. without any fields acting on it, so we don’t have any potential changing the spatial frequency of the wavefunction of our particle, and let’s also assume we choose our x-axis such that it’s the direction of travel, so the position vector (x) can be replaced by a simple scalar (x). Finally, we will also choose the origin of our x-axis such that x = 0 zero when t = 0, so we write: x(t = 0) = 0. It’s obvious then that, if our particle is traveling in spacetime with some velocity v, then the ratio of its position x and the time t that it’s been traveling will always be equal to v = x/t. Hence, for that very special position in spacetime (t, x = v·t) – so we’re talking the actual position of the particle in spacetime here – we get: θ = E·t − p·x = E·t − p·v·t = E·t − m·v·v·t= (E − m∙v²)·t. So… Well… There we have the m∙v² factor.

The question is: what does it mean? How do we interpret this? I am not sure. When I first jotted this thing down, I thought of choosing a different reference potential: some negative value such that it ensures that the sum of kinetic, rest and potential energy is zero, so I could write E = 0 and then the wavefunction would reduce to ψ(t) = e^{−i·m∙v²·t}. Feynman refers to that as ‘choosing the zero of our energy scale such that E = 0’, and you’ll find this in many other works too. However, it’s not that simple. Free space is free space: if there’s no change in potential from one region to another, then the concept of some reference point for the potential becomes meaningless. There is only rest energy and kinetic energy, then. The total energy reduces to E = m (because we chose our units such that c = 1 and, therefore, E = mc² = m·1² = m) and so our wavefunction reduces to:

ψ(t) = a·e^{−i·m·(1 − v²)·t}

We can’t reduce this any further. The mass is the mass: it’s a measure for inertia, as measured in our inertial frame of reference. And the velocity is the velocity, of course—also as measured in our frame of reference. We can re-write it, of course, by substituting t for t = x/v, so we get:

ψ(x) = a·e^{−i·m·(1/v − v)·x}

For both functions, we get constant probabilities, but a wavefunction that’s ‘denser’ for higher values of m. The (1 − v²) and (1/v − v) factors are different, however: these factors becomes smaller for higher v, so our wavefunction becomes less dense for higher v. In fact, for v = 1 (so for travel at the speed of light, i.e. for photons), we get that ψ(t) = ψ(x) = e⁰ = 1. [You should use the graphing tool once more, and you’ll see the imaginary part, i.e. the sine of the a·(cosθ + i·sinθ) expression, just vanishes, as sinθ = 0 for θ = 0.]

The wavefunction and relativistic length contraction

Are exercises like this useful? As mentioned above, these constant probability wavefunctions are a bit nonsensical, so you may wonder why I wrote what I wrote. There may be no real conclusion, indeed: I was just fiddling around a bit, and playing with equations and functions. I feel stuff like this helps me to understand what that wavefunction actually is somewhat better. If anything, it does illustrate that idea of the ‘density’ of a wavefunction, in space or in time. What we’ve been doing by substituting x for x = v·t or t for t = x/v is showing how, when everything is said and done, the mass and the velocity of a particle are the actual variables determining that ‘density’ and, frankly, I really like that ‘airplane propeller’ idea as a pedagogic device. In fact, I feel it may be more than just a pedagogic device, and so I’ll surely re-visit it—once I’ve gone through the rest of Feynman’s Lectures, that is. 🙂

That brings me to what I added in the title of this post: relativistic length contraction. You’ll wonder why I am bringing that into a discussion like this. Well… Just play a bit with those (1 − v²) and (1/v − v) factors. As mentioned above, they decrease the density of the wavefunction. In other words, it’s like space is being ‘stretched out’. Also, it can’t be a coincidence we find the same (1 − v²) factor in the relativistic length contraction formula: L = L₀·√(1 − v²), in which L₀ is the so-called proper length (i.e. the length in the stationary frame of reference) and v is the (relative) velocity of the moving frame of reference. Of course, we also find it in the relativistic mass formula: m = m_v = m₀/√(1−v²). In fact, things become much more obvious when substituting m for m₀/√(1−v²) in that ψ(t) = e^{−i·m·(1 − v²)·t}function. We get:

ψ(t) = a·e^{−i·m·(1 − v²)·t}= a·e^{−i·m₀·√(1−v²)·t}

Well… We’re surely getting somewhere here. What if we go back to our original ψ(x, t) = a·e^{−i·[(E/ħ)·t − (p/ħ)∙x]} function? Using natural units once again, that’s equivalent to:

ψ(x, t) = a·e^{−i·(m·t − p∙x)}= a·e^{−i·[(m₀/√(1−v²))·t − (m₀·v/√(1−v²)∙x)}

= a·e^{−i·[m₀/√(1−v²)]·(t − v∙x)}

Interesting! We’ve got a wavefunction that’s a function of x and t, but with the rest mass (or rest energy) and velocity as parameters! Now that really starts to make sense. Look at the (blue) graph for that 1/√(1−v²) factor: it goes from one (1) to infinity (∞) as v goes from 0 to 1 (remember we ‘normalized’ v: it’s a ratio between 0 and 1 now). So that’s the factor that comes into play for t. For x, it’s the red graph, which has the same shape but goes from zero (0) to infinity (∞) as v goes from 0 to 1.

Now that makes sense: the ‘density’ of the wavefunction, in time and in space, increases as the velocity v increases. In space, that should correspond to the relativistic length contraction effect: it’s like space is contracting, as the velocity increases and, therefore, the length of the object we’re watching contracts too. For time, the reasoning is a bit more complicated: it’s our time that becomes more dense and, therefore, our clock that seems to tick faster.

[…]

I know I need to explore this further—if only so as to assure you I have not gone crazy. Unfortunately, I have no time to do that right now. Indeed, from time to time, I need to work on other stuff besides this physics ‘hobby’ of mine.

Post scriptum 1: As for the E = m·v²formula, I also have a funny feeling that it might be related to the fact that, in quantum mechanics, both the real and imaginary part of the oscillation actually matter. You’ll remember that we’d represent any oscillator in physics by a complex exponential, because it eased our calculations. So instead of writing A = A₀·cos(ωt + Δ), we’d write: A = A₀·e^{i(ωt + Δ)}= A₀·cos(ωt + Δ) + i·A₀·sin(ωt + Δ). When calculating the energy or intensity of a wave, however, we couldn’t just take the square of the complex amplitude of the wave – remembering that E ∼ A². No! We had to get back to the real part only, i.e. the cosine or the sine only. Now the mean (or average) value of the squared cosine function (or a squared sine function), over one or more cycles, is 1/2, so the mean of A² is equal to 1/2 = A₀². cos(ωt + Δ). I am not sure, and it’s probably a long shot, but one must be able to show that, if the imaginary part of the oscillation would actually matter – which is obviously the case for our matter-wave – then 1/2 + 1/2 is obviously equal to 1. I mean: try to think of an image with a mass attached to two springs, rather than one only. Does that make sense? 🙂 […] I know: I am just freewheeling here. 🙂

Post scriptum 2: The other thing that this E = m·v²equation makes me think of is – curiously enough – an eternally expanding spring. Indeed, the kinetic energy of a mass on a spring and the potential energy that’s stored in the spring always add up to some constant, and the average potential and kinetic energy are equal to each other. To be precise: 〈K.E.〉 + 〈P.E.〉 = (1/4)·k·A² + (1/4)·k·A²= k·A²/2. It means that, on average, the total energy of the system is twice the average kinetic energy (or potential energy). You’ll say: so what? Well… I don’t know. Can we think of a spring that expands eternally, with the mass on its end not gaining or losing any speed? In that case, v is constant, and the total energy of the system would, effectively, be equal to E_total = 2·〈K.E.〉 = (1/2)·m·v²/2 = m·v².

Post scriptum 3: That substitution I made above – substituting x for x = v·t – is kinda weird. Indeed, if that E = m∙v² equation makes any sense, then E − m∙v² = 0, of course, and, therefore, θ = E·t − p·x = E·t − p·v·t = E·t − m·v·v·t= (E − m∙v²)·t = 0·t = 0. So the argument of our wavefunction is 0 and, therefore, we get a·e⁰= a for our wavefunction. It basically means our particle is where it is. 🙂

Post scriptum 4: This post scriptum – no. 4 – was added later—much later. On 29 February 2016, to be precise. The solution to the ‘riddle’ above is actually quite simple. We just need to make a distinction between the group and the phase velocity of our complex-valued wave. The solution came to me when I was writing a little piece on Schrödinger’s equation. I noticed that we do not find that weird E = m∙v² formula when substituting ψ for ψ = e^{i(kx − ωt)}in Schrödinger’s equation, i.e. in:

Let me quickly go over the logic. To keep things simple, we’ll just assume one-dimensional space, so ∇²ψ = ∂²ψ/∂x². The time derivative on the left-hand side is ∂ψ/∂t = −iω·e^{i(kx − ωt)}. The second-order derivative on the right-hand side is ∂²ψ/∂x² = (ik)·(ik)·e^{i(kx − ωt)}= −k²·e^{i(kx − ωt)} . The e^{i(kx − ωt)} factor on both sides cancels out and, hence, equating both sides gives us the following condition:

−iω = −(iħ/2m)·k² ⇔ ω = (ħ/2m)·k²

Substituting ω = E/ħ and k = p/ħ yields:

E/ħ = (ħ/2m)·p²/ħ²= m²·v²/(2m·ħ) = m·v²/(2ħ) ⇔ E = m·v²/2

In short: the E = m·v²/2 is the correct formula. It must be, because… Well… Because Schrödinger’s equation is a formula we surely shouldn’t doubt, right? So the only logical conclusion is that we must be doing something wrong when multiplying the two de Broglie equations. To be precise: our v = f·λ equation must be wrong. Why? Well… It’s just something one shouldn’t apply to our complex-valued wavefunction. The ‘correct’ velocity formula for the complex-valued wavefunction should have that 1/2 factor, so we’d write 2·f·λ = v to make things come out alright. But where would this formula come from? The period of cosθ + isinθ is the period of the sine and cosine function: cos(θ+2π) + isin(θ+2π) = cosθ + isinθ, so T = 2π and f = 1/T = 1/2π do not change.

But so that’s a mathematical point of view. From a physical point of view, it’s clear we got two oscillations for the price of one: one ‘real’ and one ‘imaginary’—but both are equally essential and, hence, equally ‘real’. So the answer must lie in the distinction between the group and the phase velocity when we’re combining waves. Indeed, the group velocity of a sum of waves is equal to v_g = dω/dk. In this case, we have:

v_g = d[E/ħ]/d[p/ħ] = dE/dp

We can now use the kinetic energy formula to write E as E = m·v²/2 = p·v/2. Now, v and p are related through m (p = m·v, so v = p/m). So we should write this as E = m·v²/2 = p²/(2m). Substituting E and p = m·v in the equation above then gives us the following:

dω/dk = d[p²/(2m)]/dp = 2p/(2m) = v_g= v

However, for the phase velocity, we can just use the v_p= ω/k formula, which gives us that 1/2 factor:

v_p= ω/k = (E/ħ)/(p/ħ) = E/p = (m·v²/2)/(m·v) = v/2

Bingo! Riddle solved! 🙂 Isn’t it nice that our formula for the group velocity also applies to our complex-valued wavefunction? I think that’s amazing, really! But I’ll let you think about it. 🙂

An introduction to virtual particles

In one of my posts on the rules of quantum math, I introduced the propagator function, which gives us the amplitude for a particle to go from one place to another. It looks like this:

The r₁and r₂ vectors are, obviously, position vectors describing (1) where the particle is right now, so the initial state is written as |r₁〉, and (2) where it might go, so the final state is |r₂〉. Now we can combine this with the analysis in my previous post to think about what might happen when an electron sort of ‘jumps’ from one state to another. It’s a rather funny analysis, but it will give you some feel of what these so-called ‘virtual’ particles might represent.

Let’s first look at the shape of that function. The e^{(i/ħ)·(p∙r₁₂)}function in the numerator is now familiar to you. Note the r₁₂in the argument, i.e. the vector pointing from r₁ to r₂. The p∙r₁₂ dot product equals |p|∙|r₁₂|·cosθ = p∙r₁₂·cosθ, with θ the angle between p and r₁₂. If the angle is the same, then cosθ is equal to 1. If the angle is π/2, then it’s 0, and the function reduces to 1/r₁₂. So the angle θ, through the cosθ factor, sort of scales the spatial frequency. Let me try to give you some idea of how this looks like by assuming the angle between p and r₁₂ is the same, so we’re looking at the space in the direction of the momentum only and |p|∙|r₁₂|·cosθ = p∙r₁₂. Now, we can look at the p/ħ factor as a scaling factor, and measure the distance x in units defined by that scale, so we write: x = p∙r₁₂/ħ. The whole function, including the denominator, then reduces to (ħ/p)·e^i∙x/x = (ħ/p)·cos(x)/x + i·(ħ/p)·sin(x)/x, and we just need to square this to get the probability. All of the graphs are drawn hereunder: I’ll let you analyze them. [Note that the graphs do not include the ħ/p factor, which you may look at as yet another scaling factor.] You’ll see – I hope! – that it all makes perfect sense: the probability quickly drops off with distance, both in the positive as well as in the negative x-direction, while going to infinity when very near, i.e. for very small x. [Note that the absolute square, using cos(x)/x and sin(x)/xyields the same graph as squaring 1/x—obviously!]

Now, this propagator function is not dependent on time: it’s only the momentum that enters the argument. Of course, we assume p to be some positive real number. Of course?

This is where Feynman starts an interesting conversation. In the previous post, we studied a model in which we had two protons, and one electron jumping from one to another, as shown below.

This model told us the equilibrium state is a stable ionized hydrogen molecule (so that’s an H₂⁺molecule), with an interproton distance that’s equal to 1 Ångstrom – so that’s like twice the size of a hydrogen atom (which we simply write as H) – and an energy that’s 2.72 eV less than the energy of a hydrogen atom and a proton (so that’s not an H₂⁺molecule but a system consisting of a separate hydrogen atom and a proton). The why and how of that equilibrium state is illustrated below. [For more details, see my previous post.]

Now, the model implies there is a sort of attractive force pulling the two protons together even when the protons are at larger distances than 1 Å. One can see that from the graph indeed. Now, we would not associate any molecular orbital with those distances, as the system is, quite simply, not a molecule but a separate hydrogen atom and a proton. Nevertheless, the amplitude A is non-zero, and so we have an electron jumping back and forth.

We know how that works from our post on tunneling: particles can cross an energy barrier and tunnel through. One of the weird things we had to consider when a particle crosses such potential barrier, is that the momentum factor p in its wavefunction was some pure imaginary number, which we wrote as p = i·p’. We then re-wrote that wavefunction as a·e⁻ⁱ^θ = a·e⁻ⁱ^[^{(E/ħ)∙t − (i·p’/ħ)x]} = a·e⁻ⁱ^(E/ħ)∙t·e^{i²·p’·x/ħ} = a·e⁻ⁱ^(E/ħ)∙t·e^{−p’·x/ħ}. The e^{−p’·x/ħ} factor in this formula is a real-valued exponential function, that sort of ‘kills’ our wavefunction as we move across the potential barrier, which is what is illustrated below: if the distance is too large, then the amplitude for tunneling goes to zero.

From a mathematical point of view, the analysis of our electron jumping back and forth is very similar. However, there are differences too. We can’t really analyze this in terms of a potential barrier in space. The barrier is the potential energy of the electron itself: it’s happy when it’s bound, because its energy then contributes to a reduction of the total energy of the hydrogen atomic system that is equal to the ionization energy, or the Rydberg energy as it’s called, which is equal to not less than 13.6 eV (which, as mentioned, is pretty big at the atomic level). Well… We can take that propagator function (1/r)·e^{(i/ħ)·p∙r} (note the argument has no minus sign: it can be quite tricky!), and just fill in the value for the momentum of the electron.

Huh? What momentum? It’s got no momentum to spare. On the contrary, it wants to stay with the proton, so it has no energy whatsoever to escape. Well… Not in quantum mechanics. In quantum mechanics it can use all its potential energy and convert it into kinetic energy, so it can get away from its proton and convert the energy that’s being released into kinetic energy.

But there is no release of energy! The energy is negative!

Exactly! You’re right. So we boldly write: K.E. = m·v²/2 = p²/(2m) = −13.6 eV, and, because we’re working with complex numbers, we can take a square root of negative number, using the definition of the imaginary unit: i = √(−1), so we get a purely imaginary value for the momentum p, which we write as:

p = ±i·√(2m·E_H)

The sign of p is chosen so it makes sense: our electron should go in one direction only. It’s going to be the plus sign. [If you’d take the negative root, you’d get a nonsensical propagator function.] To make a long story short, our propagator function becomes:

(1/r)·e^{(i/ħ)·i·√(2m·E_H)∙r} = (1/r)·e^{(i/ħ)·i·√(2m·E_H)∙r} = (1/r)·e^{i²/ħ·√(2m·E_H)∙r} = (1/r)·e^{−√(2m·E_H)/ħ∙r}

Of course, from a mathematical point of view, that’s the same function as e^{−p’·x/ħ}: it’s a real-valued exponential function that quickly dies. But it’s an amplitude alright, and it’s just like an amplitude for tunneling indeed: if the distance is too large, then the amplitude goes to zero. The final cherry on the cake, of course, is to write:

A ∼ (1/r)·e^{−√(2m·E_H)/ħ∙r}

Well… No. It gets better. This amplitude is an amplitude for an electron bond between the two protons which, as we know, lowers the energy of the system. By how much? Well… By A itself. Now we know that work or energy is an integral or antiderivative of force over distance, so force is the derivative of energy with respect to the distance. So we can just take the derivative of the expression above to get the force. I’ll leave that you as an exercise: don’t forget to use the product rule! 🙂

So are we done? No. First, we didn’t talk about virtual particles yet! Let me do that now. However, first note that we should add one more effect in our two-proton-one-electron system: the coulomb field (ε) caused by the bare proton will cause the hydrogen molecule to take on an induced electric dipole moment (μ), so we should integrate that in our energy equation. Feynman shows how, but I won’t bother you with that here. Let’s talk about those virtual particles. What are they?

Well… There’s various definitions, but Feynman’s definition is this one:

“There is an exchange of a virtual electron when–as here–the electron has to jump across a space where it would have a negative energy. More specifically, a ‘virtual exchange’ means that the phenomenon involves a quantum-mechanical interference between an exchanged state and a non-exchanged state.”

You’ll say: what’s virtual about it? The electron does go from one place to another, doesn’t it? Well… Yes and no. We can’t observe it while it’s supposed to be doing that. Our analysis just tells us it seems to be useful to distinguish two different states and analyze all in terms of those differential equations. Who knows what’s really going on? What’s actual and what’s virtual? We just have some ‘model’ here: a model for the interaction between a hydrogen atom and a proton. It explains the attraction between them in terms of a sort of continuous exchange of an electron, but is it real?

The point is: in physics, it’s assumed that the coulomb interaction, i.e. all of electrostatics really, comes from the exchange of virtual photons: one electron, or proton, emits a photon, and then another absorbs it in the reverse of the same reaction. Furthermore, it is assumed that the amplitude for doing so is like that formula we found for the amplitude to exchange a virtual electron, except that the rest mass of a photon is zero, and so the formula reduces to 1/r. Such simple relationship makes sense, of course, because that’s how the electrostatic potential varies in space!

That, in essence, is all what there is to the quantum-mechanical theory of electromagnetism, which Feynman refers to as the ‘particle point of view’.

So… Yes. It’s that simple. Yes! For a change! 🙂

Post scriptum: Feynman’s Lecture on virtual particles is actually focused on a model for the nuclear forces. Most of it is devoted to a discussion of the virtual ‘pion’, or π-meson, which was then, when Feynman wrote his Lectures, supposed to mediate the force between two nucleons. However, this theory is clearly outdated: nuclear forces are described by quantum chromodynamics. So I’ll just skip the Yukawa theory here. It’s actually kinda strange his theory, which he proposed in 1935, was the theory for nuclear forces for such a long time. Hence, it’s surely all very interesting from a historical point of view.

The math behind the maser

Pre-script (dated 26 June 2020): I have come to the conclusion one does not need all this hocus-pocus to explain masers or lasers (and two-state systems in general): classical physics will do. So no use to read this. Read my papers instead. 🙂

Original post:

As I skipped the mathematical arguments in my previous post so as to focus on the essential results only, I thought it would be good to complement that post by looking at the math once again, so as to ensure we understand what it is that we’re doing. So let’s do that now. We start with the easy situation: free space.

The two-state system in free space

We started with an ammonia molecule in free space, i.e. we assumed there were no external force fields, like a gravitational or an electromagnetic force field. Hence, the picture was as simple as the one below: the nitrogen atom could be ‘up’ or ‘down’ with regard to its spin around its axis of symmetry.

It’s important to note that this ‘up’ or ‘down’ direction is defined in regard to the molecule itself, i.e. not in regard to some external reference frame. In other words, the reference frame is that of the molecule itself. For example, if I flip the illustration above – like below – then we’re still talking the same states, i.e. the molecule is still in state 1 in the image on the left-hand side and it’s still in state 2 in the image on the right-hand side.

We then modeled the uncertainty about its state by associating two different energy levels with the molecule: E₀+ A and E₀− A. The idea is that the nitrogen atom needs to tunnel through a potential barrier to get to the other side of the plane of the hydrogens, and that requires energy. At the same time, we’ll show the two energy levels are effectively associated with an ‘up’ or ‘down’ direction of the electric dipole moment of the molecule. So that resembles the two spin states of an electron, which we associated with the +ħ/2 and −ħ/2 energies respectively. So if E₀would be zero (we can always take another reference point, remember?), then we’ve got the same thing: two energy levels that are separated by some definite amount: that amount is 2A for the ammonia molecule, and ħ when we’re talking quantum-mechanical spin. I should make a last note here, before I move on: note that these energies only make sense in the presence of some external field, because the + and − signs in the E₀+ A and E₀− A and +ħ/2 and −ħ/2 expressions make sense only with regard to some external direction defining what’s ‘up’ and what’s ‘down’ really. But I am getting ahead of myself here. Let’s go back to free space: no external fields, so what’s ‘up’ or ‘down’ is completely random here. 🙂

Now, we also know an energy level can be associated with a complex-valued wavefunction, or an amplitude as we call it. To be precise, we can associate it with the generic a·e^{−(i/ħ)·(E·t − p∙x)}expression which you know so well by now. Of course, as the reference frame is that of the molecule itself, its momentum is zero, so the p∙x term in the a·e^{−(i/ħ)·(E·t − p∙x)}expression vanishes and the wavefunction reduces to a·e^−i·ω·t= a·e^{−(i/ħ)·E·t}, with ω = E/ħ. In other words, the energy level determines the temporal frequency, or the temporal variation (as opposed to the spatial frequency or variation), of the amplitude.

We then had to find the amplitudes C₁(t) = 〈 1 | ψ 〉 and C₂(t) =〈 2 | ψ 〉, so that’s the amplitude to be in state 1 or state 2 respectively. In my post on the Hamiltonian, I explained why the dynamics of a situation like this can be represented by the following set of differential equations:

As mentioned, the C₁and C₂functions evolve in time, and so we should write them as C₁= C₁(t) and C₂= C₂(t) respectively. In fact, our Hamiltonian coefficients may also evolve in time, which is why it may be very difficult to solve those differential equations! However, as I’ll show below, one usually assumes they are constant, and then one makes informed guesses about them so as to find a solution that makes sense.

Now, I should remind you here of something you surely know: if C₁and C₂are solutions to this set of differential equations, then the superposition principle tells us that any linear combination a·C₁+ b·C₂will also be a solution. So we need one or more extra conditions, usually some starting condition, which we can combine with a normalization condition, so we can get some unique solution that makes sense.

The H_ijcoefficients are referred to as Hamiltonian coefficients and, as shown in the mentioned post, the H₁₁and H₂₂coefficients are related to the amplitude of the molecule staying in state 1 and state 2 respectively, while the H₁₂and H₂₁coefficients are related to the amplitude of the molecule going from state 1 to state 2 and vice versa. Because of the perfect symmetry of the situation here, it’s easy to see that H₁₁should equal H₂₂, and that H₁₂and H₂₁should also be equal to each other. Indeed, Nature doesn’t care what we call state 1 or 2 here: as mentioned above, we did not define the ‘up’ and ‘down’ direction with respect to some external direction in space, so the molecule can have any orientation and, hence, switching the i an j indices should not make any difference. So that’s one clue, at least, that we can use to solve those equations: the perfect symmetry of the situation and, hence, the perfect symmetry of the Hamiltonian coefficients—in this case, at least!

The other clue is to think about the solution if we’d not have two states but one state only. In that case, we’d need to solve iħ·[dC₁(t)/dt] = H₁₁·C₁(t). That’s simple enough, because you’ll remember that the exponential function is its own derivative. To be precise, we write: d(a·e^iωt)/dt = a·d(e^iωt)/dt = a·iω·e^iωt, and please note that a can be any complex number: we’re not necessarily talking a real number here! In fact, we’re likely to talk complex coefficients, and we multiply with some other complex number (iω) anyway here! So if we write iħ·[dC₁/dt] = H₁₁·C₁ as dC₁/dt = −(i/ħ)·H₁₁·C₁ (remember: i⁻¹ = 1/i = −i), then it’s easy to see that the C₁= a·e^{–(i/ħ)·H₁₁·t}function is the general solution for this differential equation. Let me write it out for you, just to make sure:

dC₁/dt = d[a·e^{–(i/ħ)H₁₁t}]/dt = a·d[e^{–(i/ħ)H₁₁t}]/dt = –a·(i/ħ)·H₁₁·e^{–(i/ħ)H₁₁t}

= –(i/ħ)·H₁₁·a·e^{–(i/ħ)H₁₁t}= −(i/ħ)·H₁₁·C₁

Of course, that reminds us of our generic wavefunction a·e^{−(i/ħ)·E₀·t} wavefunction: we only need to equate H₁₁ with E₀ and we’re done! Hence, in a one-state system, the Hamiltonian coefficient is, quite simply, equal to the energy of the system. In fact, that’s a result can be generalized, as we’ll see below, and so that’s why Feynman says the Hamiltonian ought to be called the energy matrix.

In fact, we actually may have two states that are entirely uncoupled, i.e. a system in which there is no dependence of C₁ on C₂and vice versa. In that case, the two equations reduce to:

iħ·[dC₁/dt] = H₁₁·C₁ and iħ·[dC₂/dt] = H₂₂·C₂

These do not form a coupled system and, hence, their solutions are independent:

C₁(t) = a·e^{–(i/ħ)·H₁₁·t}and C₂(t)= b·e^{–(i/ħ)·H₂₂·t}

The symmetry of the situation suggests we should equate a and b, and then the normalization condition says that the probabilities have to add up to one, so |C₁(t)|²+ |C₂(t)|²= 1, so we’ll find that a = b = 1/√2.

OK. That’s simple enough, and this story has become quite long, so we should wrap it up. The two ‘clues’ – about symmetry and about the Hamiltonian coefficients being energy levels – lead Feynman to suggest that the Hamiltonian matrix for this particular case should be equal to:

Why? Well… It’s just one of Feynman’s clever guesses, and it yields probability functions that makes sense, i.e. they actually describe something real. That’s all. 🙂 I am only half-joking, because it’s a trial-and-error process indeed and, as I’ll explain in a separate section in this post, one needs to be aware of the various approximations involved when doing this stuff. So let’s be explicit about the reasoning here:

We know that H₁₁= H₂₂= E₀if the two states would be identical. In other words, if we’d have only one state, rather than two – i.e. if H₁₂and H₂₁would be zero – then we’d just plug that in. So that’s what Feynman does. So that’s what we do here too! 🙂
However, H₁₂and H₂₁are not zero, of course, and so assume there’s some amplitude to go from one position to the other by tunneling through the energy barrier and flipping to the other side. Now, we need to assign some value to that amplitude and so we’ll just assume that the energy that’s needed for the nitrogen atom to tunnel through the energy barrier and flip to the other side is equal to A. So we equate H₁₂and H₂₁ with −A.

Of course, you’ll wonder: why minus A? Why wouldn’t we try H₁₂= H₂₁ = A? Well… I could say that a particle usually loses potential energy as it moves from one place to another, but… Well… Think about it. Once it’s through, it’s through, isn’t it? And so then the energy is just E₀again. Indeed, if there’s no external field, the + or − sign is quite arbitrary. So what do we choose? The answer is: when considering our molecule in free space, it doesn’t matter. Using +A or −A yields the same probabilities. Indeed, let me give you the amplitudes we get for H₁₁= H₂₂= E₀and H₁₂and H₂₁ = −A:

C₁(t) = 〈 1 | ψ 〉 = (1/2)·e^{−(i/ħ)·(E₀− A)·t}+ (1/2)·e^{−(i/ħ)·(E₀+ A)·t}= e^{−(i/ħ)·E₀·t}·cos[(A/ħ)·t]
C₂(t) = 〈 2 | ψ 〉 = (1/2)·e^{−(i/ħ)·(E₀− A)·t}– (1/2)·e^{−(i/ħ)·(E₀+ A)·t}= i·e^{−(i/ħ)·E₀·t}·sin[(A/ħ)·t]

[In case you wonder how we go from those exponentials to a simple sine and cosine factor, remember that the sum of complex conjugates, i.e e^iθ+ e^−iθreduces to 2·cosθ, while e^iθ− e^−iθreduces to 2·i·sinθ.]

Now, it’s easy to see that, if we’d have used +A rather than −A, we would have gotten something very similar:

C₁(t) = 〈 1 | ψ 〉 = (1/2)·e^{−(i/ħ)·(E₀+ A)·t}+ (1/2)·e^{−(i/ħ)·(E₀− A)·t}= e^{−(i/ħ)·E₀·t}·cos[(A/ħ)·t]
C₂(t) = 〈 2 | ψ 〉 = (1/2)·e^{−(i/ħ)·(E₀+ A)·t}– (1/2)·e^{−(i/ħ)·(E₀− A)·t}= −i·e^{−(i/ħ)·E₀·t}·sin[(A/ħ)·t]

So we get a minus sign in front of our C₂(t) function, because cos(α) = cos(–α) but sin(α) = −sin(α). However, the associated probabilities are exactly the same. For both, we get the same P₁(t) and P₂(t) functions:

P₁(t) = |C₁(t)|² = cos²[(A/ħ)·t]
P₂(t) = |C₂(t)|²= sin²[(A/ħ)·t]

[Remember: the absolute square of i and −i is |i|²= +√1²= +1 and |−i|²= (−1)²|i|²= +1 respectively, so the i and −i in the two C₂(t) formulas disappear.]

You’ll remember the graph:

Of course, you’ll say: that plus or minus sign in front of C₂(t) should matter somehow, doesn’t it? Well… Think about it. Taking the absolute square of some complex number – or some complex function , in this case! – amounts to multiplying it with its complex conjugate. Because the complex conjugate of a product is the product of the complex conjugates, it’s easy to see what happens: the e^{−(i/ħ)·E₀·t} factor in C₁(t) = e^{−(i/ħ)·E₀·t}·cos[(A/ħ)·t] and C₂(t) = ±i·e^{−(i/ħ)·E₀·t}·sin[(A/ħ)·t] gets multiplied by e^{+(i/ħ)·E₀·t} and, hence, doesn’t matter: e^{−(i/ħ)·E₀·t}·e^{+(i/ħ)·E₀·t} = e⁰= 1. The cosine factor in C₁(t) = e^{−(i/ħ)·E₀·t}·cos[(A/ħ)·t] is real, and so its complex conjugate is the same. Now, the ±i·sin[(A/ħ)·t] factor in C₂(t) = ±i·e^{−(i/ħ)·E₀·t}·sin[(A/ħ)·t] is a pure imaginary number, and so its complex conjugate is its opposite. For some reason, we’ll find similar solutions for all of the situations we’ll describe below: the factor determining the probability will either be real or, else, a pure imaginary number. Hence, from a math point of view, it really doesn’t matter if we take +A or −A for or real factor for those H₁₂and H₂₁ coefficients. We just need to be consistent in our choice, and I must assume that, in order to be consistent, Feynman likes to think of our nitrogen atom borrowing some energy from the system and, hence, temporarily reducing its energy by an amount that’s equal to −A. If you have a better interpretation, please do let me know! 🙂

OK. We’re done with this section… Except… Well… I have to show you how we got those C₁(t) and C₁(t) functions, no? Let me copy Feynman here:

Note that the ‘trick’ involving the addition and subtraction of the differential equations is a trick we’ll use quite often, so please do have a look at it. As for the value of the a and b coefficients – which, as you can see, we’ve equated to 1 in our solutions for C₁(t) and C₁(t) – we get those because of the following starting condition: we assume that at t = 0, the molecule will be in state 1. Hence, we assume C₁(0) = 1 and C₂(0) = 0. In other words: we assume that we start out on that P₁(t) curve in that graph with the probability functions above, so the C₁(0) = 1 and C₂(0) = 0 starting condition is equivalent to P₁(0) = 1 and P₁(0) = 0. Plugging that in gives us a/2 + b/2 = 1 and a/2 − b/2 = 0, which is possible only if a = b = 1.

Of course, you’ll say: what if we’d choose to start out with state 2, so our starting condition is P₁(0) = 0 and P₁(0) = 1? Then a = 1 and b = −1, and we get the solution we got when equating H₁₂and H₂₁ with +A, rather than with −A. So you can think about that symmetry once again: when we’re in free space, then it’s quite arbitrary what we call ‘up’ or ‘down’.

So… Well… That’s all great. I should, perhaps, just add one more note, and that’s on that A/ħ value. We calculated it in the previous post, because we wanted to actually calculate the period of those P₁(t) and P₂(t) functions. Because we’re talking the square of a cosine and a sine respectively, the period is equal to π, rather than 2π, so we wrote: (A/ħ)·T = π ⇔ T = π·ħ/A. Now, the separation between the two energy levels E₀+ A and E₀− A, so that’s 2A, has been measured as being equal, more or less, to 2A ≈ 10⁻⁴eV.

How does one measure that? As mentioned above, I’ll show you, in a moment, that, when applying some external field, the plus and minus sign do matter, and the separation between those two energy levels E₀+ A and E₀− A will effectively represent something physical. More in particular, we’ll have transitions from one energy level to another and that corresponds to electromagnetic radiation being emitted or absorbed, and so there’s a relation between the energy and the frequency of that radiation. To be precise, we can write 2A = h·f₀. The frequency of the radiation that’s being absorbed or emitted is 23.79 GHz, which corresponds to microwave radiation with a wavelength of λ = c/f₀ = 1.26 cm. Hence, 2·A ≈ 25×10⁹ Hz times 4×10⁻¹⁵ eV·s = 10⁻⁴eV, indeed, and, therefore, we can write: T = π·ħ/A ≈ 3.14 × 6.6×10⁻¹⁶eV·s divided by 0.5×10⁻⁴eV, so that’s 40×10⁻¹²seconds = 40 picoseconds. That’s 40 trillionths of a seconds. So that’s very short, and surely much shorter than the time that’s associated with, say, a freely emitting sodium atom, which is of the order of 3.2×10⁻⁸seconds. You may think that makes sense, because the photon energy is so much lower: a sodium light photon is associated with an energy equal to E = h·f = 500×10¹² Hz times 4×10⁻¹⁵ eV·s = 2 eV, so that’s 20,000 times 10⁻⁴eV.

There’s a funny thing, however. An oscillation of a frequency of 500 tera-hertz that lasts 3.2×10⁻⁸seconds is equivalent to 500×10¹² Hz times 3.2×10⁻⁸s ≈ 16 million cycles. However, an oscillation of a frequency of 23.97 giga-hertz that only lasts 40×10⁻¹²seconds is equivalent to 23.97×10⁹ Hz times 40×10⁻¹²s ≈ 1000×10⁻³= 1 ! One cycle only? We’re surely not talking resonance here!

So… Well… I am just flagging it here. We’ll have to do some more thinking about that later. [I’ve added an addendum that may or may not help us in this regard. :-)]

The two-state system in a field

As mentioned above, when there is no external force field, we define the ‘up’ or ‘down’ direction of the nitrogen atom was defined with regard to its its spin around its axis of symmetry, so with regard to the molecule itself. However, when we apply an external electromagnetic field, as shown below, we do have some external reference frame.

Now, the external reference frame – i.e. the physics of the situation, really – may make it more convenient to define the whole system using another set of base states, which we’ll refer to as I and II, rather than 1 and 2. Indeed, you’ve seen the picture below: it shows a state selector, or a filter as we called it. In this case, there’s a filtering according to whether our ammonia molecule is in state I or, alternatively, state II. It’s like a Stern-Gerlach apparatus splitting an electron beam according to the spin state of the electrons, which is ‘up’ or ‘down’ too, but in a totally different way than our ammonia molecule. Indeed, the ‘up’ and ‘down’ spin of an electron has to do with its magnetic moment and its angular momentum. However, there are a lot of similarities here, and so you may want to compare the two situations indeed, i.e. the electron beam in an inhomogeneous magnetic field versus the ammonia beam in an inhomogeneous electric field.

Now, when reading Feynman, as he walks us through the relevant Lecture on all of this, you get the impression that it’s the I and II states only that have some kind of physical or geometric interpretation. That’s not the case. Of course, the diagram of the state selector above makes it very obvious that these new I and II base states make very much sense in regard to the orientation of the field, i.e. with regard to external space, rather than with respect to the position of our nitrogen atom vis-á-vis the hydrogens. But… Well… Look at the image below: the direction of the field (which we denote by ε because we’ve been using the E for energy) obviously matters when defining the old ‘up’ and ‘down’ states of our nitrogen atom too!

In other words, our previous | 1 〉 and | 2 〉 base states acquire a new meaning too: it obviously matters whether or not the electric dipole moment of the molecule is in the same or, conversely, in the opposite direction of the field. To be precise, the presence of the electromagnetic field suddenly gives the energy levels that we’d associate with these two states a very different physical interpretation.

Indeed, from the illustration above, it’s easy to see that the electric dipole moment of this particular molecule in state 1 is in the opposite direction and, therefore, temporarily ignoring the amplitude to flip over (so we do not think of A for just a brief little moment), the energy that we’d associate with state 1 would be equal to E₀+ με. Likewise, the energy we’d associate with state 2 is equal to E₀− με. Indeed, you’ll remember that the (potential) energy of an electric dipole is equal to the vector dot product of the electric dipole moment μ and the field vector ε, but with a minus sign in front so as to get the sign for the energy righ. So the energy is equal to −μ·ε = −|μ|·|ε|·cosθ, with θ the angle between both vectors. Now, the illustration above makes it clear that state 1 and 2 are defined for θ = π and θ = 0 respectively. [And, yes! Please do note that state 1 is the highest energy level, because it’s associated with the highest potential energy: the electric dipole moment μ of our ammonia molecule will – obviously! – want to align itself with the electric field ε ! Just think of what it would imply to turn the molecule in the field!]

Therefore, using the same hunches as the ones we used in the free space example, Feynman suggests that, when some external electric field is involved, we should use the following Hamiltonian matrix:

So we’ll need to solve a similar set of differential equations with this Hamiltonian now. We’ll do that later and, as mentioned above, it will be more convenient to switch to another set of base states, or another ‘representation’ as it’s referred to. But… Well… Let’s not get too much ahead of ourselves: I’ll say something about that before we’ll start solving the thing, but let’s first look at that Hamiltonian once more.

When I say that Feynman uses the same clues here, then… Well.. That’s true and not true. You should note that the diagonal elements in the Hamiltonian above are not the same: E₀+ με ≠ E₀+ με. So we’ve lost that symmetry of free space which, from a math point of view, was reflected in those identical H₁₁= H₂₂= E₀coefficients.

That should be obvious from what I write above: state 1 and state 2 are no longer those 1 and 2 states we described when looking at the molecule in free space. Indeed, the | 1 〉 and | 2 〉 states are still ‘up’ or ‘down’, but the illustration above also makes it clear we’re defining state 1 and state 2 not only with respect to the molecule’s spin around its own axis of symmetry but also vis-á-vis some direction in space. To be precise, we’re defining state 1 and state 2 here with respect to the direction of the electric field ε. Now that makes a really big difference in terms of interpreting what’s going on.

In fact, the ‘splitting’ of the energy levels because of that amplitude A is now something physical too, i.e. something that goes beyond just modeling the uncertainty involved. In fact, we’ll find it convenient to distinguish two new energy levels, which we’ll write as E_I= E₀+ A and E_II= E₀− A respectively. They are, of course, related to those new base states | I 〉 and | II 〉 that we’ll want to use. So the E₀+ A and E₀− A energy levels themselves will acquire some physical meaning, and especially the separation between them, i.e. the value of 2A. Indeed, E_I= E₀+ A and E_II= E₀− A will effectively represent an ‘upper’ and a ‘lower’ energy level respectively.

But, again, I am getting ahead of myself. Let’s first, as part of working towards a solution for our equations, look at what happens if and when we’d switch to another representation indeed.

Switching to another representation

Let me remind you of what I wrote in my post on quantum math in this regard. The actual state of our ammonia molecule – or any quantum-mechanical system really – is always to be described in terms of a set of base states. For example, if we have two possible base states only, we’ll write:

| φ 〉 = | 1 〉 C₁ + | 2 〉 C₂

You’ll say: why? Our molecule is obviously always in either state 1 or state 2, isn’t it? Well… Yes and no. That’s the mystery of quantum mechanics: it is and it isn’t. As long as we don’t measure it, there is an amplitude for it to be in state 1 and an amplitude for it to be in state 2. So we can only make sense of its state by actually calculating 〈 1 | φ 〉 and 〈 2 | φ 〉 which, unsurprisingly are equal to 〈 1 | φ 〉 = 〈 1 | 1 〉 C₁ + 〈 1 | 2 〉 C₂ = C₁(t) and 〈 2 | φ 〉 = 〈 2 | 1 〉 C₁ + 〈 2 | 2 〉 C₂ = C₂(t) respectively, and so these two functions give us the probabilities P₁(t) and P₂(t) respectively. So that’s Schrödinger’s cat really: the cat is dead or alive, but we don’t know until we open the box, and we only have a probability function – so we can say that it’s probably dead or probably alive, depending on the odds – as long as we do not open the box. It’s as simple as that.

Now, the ‘dead’ and ‘alive’ condition are, obviously, the ‘base states’ in Schrödinger’s rather famous example, and we can write them as | DEAD 〉 and | ALIVE 〉 you’d agree it would be difficult to find another representation. For example, it doesn’t make much sense to say that we’ve rotated the two base states over 90 degrees and we now have two new states equal to (1/√2)·| DEAD 〉 – (1/√2)·| ALIVE 〉 and (1/√2)·| DEAD 〉 + (1/√2)·| ALIVE 〉 respectively. There’s no direction in space in regard to which we’re defining those two base states: dead is dead, and alive is alive.

The situation really resembles our ammonia molecule in free space: there’s no external reference against which to define the base states. However, as soon as some external field is involved, we do have a direction in space and, as mentioned above, our base states are now defined with respect to a particular orientation in space. That implies two things. The first is that we should no longer say that our molecule will always be in either state 1 or state 2. There’s no reason for it to be perfectly aligned with or against the field. Its orientation can be anything really, and so its state is likely to be some combination of those two pure base states | 1 〉 and | 2 〉.

The second thing is that we may choose another set of base states, and specify the very same state in terms of the new base states. So, assuming we choose some other set of base states | I 〉 and | II 〉, we can write the very same state | φ 〉 = | 1 〉 C₁ + | 2 〉 C₂as:

| φ 〉 = | I 〉 C_I + | II 〉 C_II

It’s really like what you learned about vectors in high school: one can go from one set of base vectors to another by a transformation, such as, for example, a rotation, or a translation. It’s just that, just like in high school, we need some direction in regard to which we define our rotation or our translation.

For state vectors, I showed how a rotation of base states worked in one of my posts on two-state systems. To be specific, we had the following relation between the two representations:

The (1/√2) factor is there because of the normalization condition, and the two-by-two matrix equals the transformation matrix for a rotation of a state filtering apparatus about the y-axis, over an angle equal to (minus) 90 degrees, which we wrote as:

The y-axis? What y-axis? What state filtering apparatus? Just relax. Think about what you’ve learned already. The orientations are shown below: the S apparatus separates ‘up’ and ‘down’ states along the z-axis, while the T-apparatus does so along an axis that is tilted, about the y-axis, over an angle equal to α, or φ, as it’s written in the table above.

Of course, we don’t really introduce an apparatus at this or that angle. We just introduced an electromagnetic field, which re-defined our | 1 〉 and | 2 〉 base states and, therefore, through the rotational transformation matrix, also defines our | I 〉 and | II 〉 base states.

[…] You may have lost me by now, and so then you’ll want to skip to the next section. That’s fine. Just remember that the representations in terms of | I 〉 and | II 〉 base states or in terms of | 1 〉 and | 2 〉 base states are mathematically equivalent. Having said that, if you’re reading this post, and you want to understand it, truly (because you want to truly understand quantum mechanics), then you should try to stick with me here. 🙂 Indeed, there’s a zillion things you could think about right now, but you should stick to the math now. Using that transformation matrix, we can relate the C_Iand C_IIcoefficients in the | φ 〉 = | I 〉 C_I + | II 〉 C_II expression to the C_Iand C_IIcoefficients in the | φ 〉 = | 1 〉 C₁ + | 2 〉 C₂ expression. Indeed, we wrote:

C_I= 〈 I | ψ 〉 = (1/√2)·(C₁− C₂)
C_II= 〈 II | ψ 〉 = (1/√2)·(C₁+ C₂)

That’s exactly the same as writing:

OK. […] Waw! You just took a huge leap, because we can now compare the two sets of differential equations:

They’re mathematically equivalent, but the mathematical behavior of the functions involved is very different. Indeed, unlike the C₁(t) and C₂(t) amplitudes, we find that the C_I(t) and C_II(t) amplitudes are stationary, i.e. the associated probabilities – which we find by taking the absolute square of the amplitudes, as usual – do not vary in time. To be precise, if you write it all out and simplify, you’ll find that the C_I(t) and C_II(t) amplitudes are equal to:

C_I(t) = 〈 I | ψ 〉 = (1/√2)·(C₁− C₂) = (1/√2)·e^{−(i/ħ)·(E₀+ A)·t} = (1/√2)·e^{−(i/ħ)·E_I·t}
C_II(t) = 〈 II | ψ 〉 = (1/√2)·(C₁+ C₂) = (1/√2)·e^{−(i/ħ)·(E₀− A)·t}= (1/√2)·e^{−(i/ħ)·E_II·t}

As the absolute square of the exponential is equal to one, the associated probabilities, i.e. |C_I(t)|² and |C_II(t)|², are, quite simply, equal to |1/√2|² = 1/2. Now, it is very tempting to say that this means that our ammonia molecule has an equal chance to be in state I or state II. In fact, while I may have said something like that in my previous posts, that’s not how one should interpret this. The chance of our molecule being exactly in state I or state II, or in state 1 or state 2 is varying with time, with the probability being ‘dumped’ from one state to the other all of the time.

I mean… The electric dipole moment can point in any direction, really. So saying that our molecule has a 50/50 chance of being in state 1 or state 2 makes no sense. Likewise, saying that our molecule has a 50/50 chance of being in state I or state II makes no sense either. Indeed, the state of our molecule is specified by the | φ 〉 = | I 〉 C_I + | II 〉 C_II= | 1 〉 C₁ + | 2 〉 C₂equations, and neither of these two expressions is a stationary state. They mix two frequencies, because they mix two energy levels.

Having said that, we’re talking quantum mechanics here and, therefore, an external inhomogeneous electric field will effectively split the ammonia molecules according to their state. The situation is really like what a Stern-Gerlach apparatus does to a beam of electrons: it will split the beam according to the electron’s spin, which is either ‘up’ or, else, ‘down’, as shown in the graph below:

The graph for our ammonia molecule, shown below, is very similar. The vertical axis measures the same: energy. And the horizontal axis measures με, which increases with the strength of the electric field ε. So we see a similar ‘splitting’ of the energy of the molecule in an external electric field.

How should we explain this? It is very tempting to think that the presence of an external force field causes the electrons, or the ammonia molecule, to ‘snap into’ one of the two possible states, which are referred to as state I and state II respectively in the illustration of the ammonia state selector below. But… Well… Here we’re entering the murky waters of actually interpreting quantum mechanics, for which (a) we have no time, and (b) we are not qualified. So you should just believe, or take for granted, what’s being shown here: an inhomogeneous electric field will split our ammonia beam according to their state, which we define as I and II respectively, and which are associated with the energy E₀+ A and E₀− A respectively.

electric field

As mentioned above, you should note that these two states are stationary. The Hamiltonian equations which, as they always do, describe the dynamics of this system, imply that the amplitude to go from state I to state II, or vice versa, is zero. To make sure you ‘get’ that, I reproduce the associated Hamiltonian matrix once again:

Of course, that will change when we start our analysis of what’s happening in the maser. Indeed, we will have some non-zero H_I,II and H_II,Iamplitudes in the resonant cavity of our ammonia maser, in which we’ll have an oscillating electric field and, as a result, induced transitions from state I to II and vice versa. However, that’s for later. While I’ll quickly insert the full picture diagram below, you should, for the moment, just think about those two stationary states and those two zeroes. 🙂

Capito? If not… Well… Start reading this post again, I’d say. 🙂

Intermezzo: on approximations

At this point, I need to say a few things about all of the approximations involved, because it can be quite confusing indeed. So let’s take a closer look at those energy levels and the related Hamiltonian coefficients. In fact, in his Lectures, Feynman shows us that we can always have a general solution for the Hamiltonian equations describing a two-state system whenever we have constant Hamiltonian coefficients. That general solution – which, mind you, is derived assuming Hamiltonian coefficients that do not depend on time – can always be written in terms of two stationary base states, i.e. states with a definite energy and, hence, a constant probability. The equations, and the two definite energy levels are:

That yields the following values for the energy levels for the stationary states:

Now, that’s very different from the E_I= E₀+ A and E_II= E₀− A energy levels for those stationary states we had defined in the previous section: those stationary states had no square root, and no μ²ε², in their energy. In fact, that sort of answers the question: if there’s no external field, then that μ²ε² factor is zero, and the square root in the expression becomes ±√A²= ±A. So then we’re back to our E_I= E₀+ A and E_II= E₀− A formulas. The whole point, however, is that we will actually have an electric field in that cavity. Moreover, it’s going to be a field that varies in time, which we’ll write:

Now, part of the confusion in Feynman’s approach is that he constantly switches between representing the system in terms of the I and II base states and the 1 and 2 base states respectively. For a good understanding, we should compare with our original representation of the dynamics in free space, for which the Hamiltonian was the following one:

That matrix can easily be related to the new one we’re going to have to solve, which is equal to:

The interpretation is easy if we look at that illustration again:

If the direction of the electric dipole moment is opposite to the direction ε, then the associated energy is equal to −μ·ε = −μ·ε = −|μ|·|ε|·cosθ = −μ·ε·cos(π) = +με. Conversely, for state 2, we find −μ·ε·cos(0) = −με for the energy that’s associated with the dipole moment. You can and should think about the physics involved here, because they make sense! Thinking of amplitudes, you should note that the +με and −με terms effectively change the H₁₁ and H₂₂ coefficients, so they change the amplitude to stay in state 1 or state 2 respectively. That, of course, will have an impact on the associated probabilities, and so that’s why we’re talking of induced transitions now.

Having said that, the Hamiltonian matrix above keeps the −A for H₁₂ and H₂₁, so the matrix captures spontaneous transitions too!

Still… You may wonder why Feynman doesn’t use those E_Iand E_IIformulas with the square root because that would give us some exact solution, wouldn’t it? The answer to that question is: maybe it would, but would you know how to solve those equations? We’ll have a varying field, remember? So our Hamiltonian H₁₁ and H₂₂ coefficients will no longer be constant, but time-dependent. As you’re going to see, it takes Feynman three pages to solve the whole thing using the +με and −με approximation. So just imagine how complicated it would be using that square root expression! [By the way, do have a look at those asymptotic curves in that illustration showing the splitting of energy levels above, so you see how that approximation looks like.]

So that’s the real answer: we need to simplify somehow, so as to get any solutions at all!

Of course, it’s all quite confusing because, after Feynman first notes that, for strong fields, the A² in that square root is small as compared to μ²ε², thereby justifying the use of the simplified E_I= E₀+ με = H₁₁ and E_II= E₀− με = H₂₂ coefficients, he continues and bluntly uses the very same square root expression to explain how that state selector works, saying that the electric field in the state selector will be rather weak and, hence, that με will be much smaller than A, so one can use the following approximation for the square root in the expressions above:

The energy expressions then reduce to:

And then we can calculate the force on the molecules as:

So the electric field in the state selector is weak, but the electric field in the cavity is supposed to be strong, and so… Well… That’s it, really. The bottom line is that we’ve a beam of ammonia molecules that are all in state I, and it’s what happens with that beam then, that is being described by our new set of differential equations:

Solving the equations

As all molecules in our ammonia beam are described in terms of the | I 〉 and | II 〉 base states – as evidenced by the fact that we say all molecules that enter the cavity are state I – we need to switch to that representation. We do that by using that transformation above, so we write:

C_I= 〈 I | ψ 〉 = (1/√2)·(C₁− C₂)
C_II= 〈 II | ψ 〉 = (1/√2)·(C₁+ C₂)

Keeping these ‘definitions’ of C_Iand C_IIin mind, you should then add the two differential equations, divide the result by the square root of 2, and you should get the following new equation:

Please! Do it and verify the result! You want to learn something here, no? 🙂

Likewise, subtracting the two differential equations, we get:

We can re-write this as:

Now, the problem is that the Hamiltonian constants here are not constant. To be precise, the electric field ε varies in time. We wrote:

So H_I,IIand H_II,I, which are equal to με, are not constant: we’ve got Hamiltonian coefficients that are a function of time themselves. […] So… Well… We just need to get on with it and try to finally solve this thing. Let me just copy Feynman as he grinds through this:

This is only the first step in the process. Feynman just takes two trial functions, which are really similar to the very general C₁= a·e^{–(i/ħ)·H₁₁·t}function we presented when only one equation was involved, or – if you prefer a set of two equations – those C_I(t) = a·e^{−(i/ħ)·E_I·t}and C_I(t) = b·e^{−(i/ħ)·E_II·}^tequations above. The difference is that the coefficients in front, i.e. γ_I and γ_II are not some (complex) constant, but functions of time themselves. The next step in the derivation is as follows:

One needs to do a bit of gymnastics here as well to follow what’s going on, but please do check and you’ll see it works. Feynman derives another set of differential equations here, and they specify these γ_I = γ_I(t) and γ_II = γ_II(t) functions. These equations are written in terms of the frequency of the field, i.e. ω, and the resonant frequency ω₀, which we mentioned above when calculating that 23.79 GHz frequency from the 2A = h·f₀ equation. So ω₀ is the same molecular resonance frequency but expressed as an angular frequency, so ω₀ = f₀/2π = ħ/2A. He then proceeds to simplify, using assumptions one should check. He then continues:

That gives us what we presented in the previous post:

So… Well… What to say? I explained those probability functions in my previous post, indeed. We’ve got two probabilities here:

P_I= cos²[(με₀/ħ)·t]
P_II= sin²[(με₀/ħ)·t]

So that’s just like the P₁= cos²[(A/ħ)·t] and P₂= sin²[(A/ħ)·t] probabilities we found for spontaneous transitions. But so here we are talking induced transitions.

As you can see, the frequency and, hence, the period, depend on the strength, or magnitude, of the electric field, i.e. the ε₀constant in the ε = 2ε₀cos(ω·t) expression. The natural unit for measuring time would be the period once again, which we can easily calculate as (με₀/ħ)·T = π ⇔ T = π·ħ/με₀.

Now, we had that T = (π·ħ)/(2A) expression above, which allowed us to calculate the period of the spontaneous transition frequency, which we found was like 40 picoseconds, i.e. 40×10⁻¹²seconds. Now, the T = (π·ħ)/(2με₀) is very similar, it allows us to calculate the expected, average, or mean time for an induced transition. In fact, if we write T_induced = (π·ħ)/(2με₀) and T_spontaneous = (π·ħ)/(2A), then we can take ratio to find:

T_induced/T_spontaneous = [(π·ħ)/(2με₀)]/[(π·ħ)/(2A)] = A/με₀

This A/με₀ratio is greater than one, so T_induced/T_spontaneous is greater than one, which, in turn, means that the presence of our electric field – which, let me remind you, dances to the beat of the resonant frequency – causes a slower transition than we would have had if the oscillating electric field were not present.

But – Hey! – that’s the wrong comparison! Remember all molecules enter in a stationary state, as they’ve been selected so as to ensure they’re in state I. So there is no such thing as a spontaneous transition frequency here! They’re all polarized, so to speak, and they would remain that way if there was no field in the cavity. So if there was no oscillating electric field, they would never transition. Nothing would happen! Well… In terms of our particular set of base states, of course! Why? Well… Look at the Hamiltonian coefficients H_I,II= H_II,I= με: these coefficients are zero if ε is zero. So… Well… That says it all.

So that‘s what it’s all about: induced emission and, as I explained in my previous post, because all molecules enter in state I, i.e. the upper energy state, literally, they all ‘dump’ a net amount of energy equal to 2A into the cavity at the occasion of their first transition. The molecules then keep dancing, of course, and so they absorb and emit the same amount as they go through the cavity, but… Well… We’ve got a net contribution here, which is not only enough to maintain the cavity oscillations, but actually also provides a small excess of power that can be drawn from the cavity as microwave radiation of the same frequency.

As Feynman notes, an exact description of what actually happens requires an understanding of the quantum mechanics of the field in the cavity, i.e. quantum field theory, which I haven’t studied yet. But… Well… That’s for later, I guess. 🙂

Post scriptum: The sheer length of this post shows we’re not doing something that’s easy here. Frankly, I feel the whole analysis is still quite obscure, in the sense that – despite looking at this thing again and again – it’s hard to sort of interpret what’s going on, in a physical sense that is. But perhaps one shouldn’t try that. I’ve quoted Feynman’s view on how easy or how difficult it is to ‘understand’ quantum mechanics a couple of times already, so let me do it once more:

“Because atomic behavior is so unlike ordinary experience, it is very difficult to get used to, and it appears peculiar and mysterious to everyone—both to the novice and to the experienced physicist. Even the experts do not understand it the way they would like to, and it is perfectly reasonable that they should not, because all of direct, human experience and human intuition applies to large objects.”

So… Well… I’ll grind through the remaining Lectures now – I am halfway through Volume III now – and then re-visit all of this. Despite Feynman’s warning, I want to understand it the way I like to, even if I don’t quite know what way that is right now. 🙂

Addendum: As for those cycles and periods, I noted a couple of times already that the Planck-Einstein equation E = h·f can usefully be re-written as E/f = h, as it gives a physical interpretation to the value of the Planck constant. In fact, I said h is the energy that’s associated with one cycle, regardless of the frequency of the radiation involved. Indeed, the energy of a photon divided by the number of cycles per second, should give us the energy per cycle, no?

Well… Yes and no. Planck’s constant h and the frequency f are both expressed referencing the time unit. However, if we say that a sodium atom emits one photon only as its electron transitions from a higher energy level to a lower one, and if we say that involves a decay time of the order of 3.2×10⁻⁸seconds, then what we’re saying really is that a sodium light photon will ‘pack’ like 16 million cycles, which is what we get when we multiply the number of cycles per second (i.e. the mentioned frequency of 500×10¹²Hz) by the decay time (i.e. 3.2×10⁻⁸seconds): (500×10¹²Hz)·(3.2×10⁻⁸s) = 16 ×10⁶cycles, indeed. So the energy per cycle is 2.068 eV (i.e. the photon energy) divided by 16×10⁶, so that’s 0.129×10⁻⁶eV. Unsurprisingly, that’s what we get when we we divide h by 3.2×10⁻⁸s: (4.13567×10⁻¹⁵)/(3.2×10⁻⁸s) = 1.29×10⁻⁷eV. We’re just putting some values in to the E/(f·T) = h/T equation here.

The logic for that 2A = h·f₀ is the same. The frequency of the radiation that’s being absorbed or emitted is 23.79 GHz, so the photon energy is (23.97×10⁹ Hz)·(4.13567×10⁻¹⁵ eV·s) ≈ 1×10⁻⁴eV. Now, we calculated the transition period T as T = π·ħ/A ≈ (π·6.626×10⁻¹⁶eV·s)/(0.5×10⁻⁴eV) ≈ 41.6×10⁻¹²seconds. Now, an oscillation of a frequency of 23.97 giga-hertz that only lasts 41.6×10⁻¹²seconds is an oscillation of one cycle only. The consequence is that, when we continue this style of reasoning, we’d have a photon that packs all of its energy into one cycle!

Let’s think about what this implies in terms of the density in space. The wavelength of our microwave radiation is 1.25×10⁻²m, so we’ve got a ‘density’ of 1×10⁻⁴eV/1.25×10⁻²m = 0.8×10⁻²eV/m = 0.008 eV/m. The wavelength of our sodium light is 0.6×10⁻⁶m, so we get a ‘density’ of 1.29×10⁻⁷eV/0.6×10⁻⁶m = 2.15×10⁻¹eV/m = 0.215 eV/m. So the energy ‘density’ of our sodium light is 26.875 times that of our microwave radiation. 🙂

Frankly, I am not quite sure if calculations like this make much sense. In fact, when talking about energy densities, I should review my posts on the Poynting vector. However, they may help you think things through. 🙂

Re-visiting uncertainty…

I re-visited the Uncertainty Principle a couple of times already, but here I really want to get at the bottom of the thing? What’s uncertain? The energy? The time? The wavefunction itself? These questions are not easily answered, and I need to warn you: you won’t get too much wiser when you’re finished reading this. I just felt like freewheeling a bit. [Note that the first part of this post repeats what you’ll find on the Occam page, or my post on Occam’s Razor. But these post do not analyze uncertainty, which is what I will be trying to do here.]

Let’s first think about the wavefunction itself. It’s tempting to think it actually is the particle, somehow. But it isn’t. So what is it then? Well… Nobody knows. In my previous post, I said I like to think it travels with the particle, but then doesn’t make much sense either. It’s like a fundamental property of the particle. Like the color of an apple. But where is that color? In the apple, in the light it reflects, in the retina of our eye, or is it in our brain? If you know a thing or two about how perception actually works, you’ll tend to agree the quality of color is not in the apple. When everything is said and done, the wavefunction is a mental construct: when learning physics, we start to think of a particle as a wavefunction, but they are two separate things: the particle is reality, the wavefunction is imaginary.

But that’s not what I want to talk about here. It’s about that uncertainty. Where is the uncertainty? You’ll say: you just said it was in our brain. No. I didn’t say that. It’s not that simple. Let’s look at the basic assumptions of quantum physics:

Quantum physics assumes there’s always some randomness in Nature and, hence, we can measure probabilities only. We’ve got randomness in classical mechanics too, but this is different. This is an assumption about how Nature works: we don’t really know what’s happening. We don’t know the internal wheels and gears, so to speak, or the ‘hidden variables’, as one interpretation of quantum mechanics would say. In fact, the most commonly accepted interpretation of quantum mechanics says there are no ‘hidden variables’.
However, as Shakespeare has one of his characters say: there is a method in the madness, and the pioneers– I mean Werner Heisenberg, Louis de Broglie, Niels Bohr, Paul Dirac, etcetera – discovered that method: all probabilities can be found by taking the square of the absolute value of a complex-valued wavefunction (often denoted by Ψ), whose argument, or phase (θ), is given by the de Broglie relations ω = E/ħ and k = p/ħ. The generic functional form of that wavefunction is:

Ψ = Ψ(x, t) = a·e^−iθ= a·e^{−i(ω·t − k ∙x)} = a·e^{−i·[(E/ħ)·t − (p/ħ)∙x]}

That should be obvious by now, as I’ve written more than a dozens of posts on this. 🙂 I still have trouble interpreting this, however—and I am not ashamed, because the Great Ones I just mentioned have trouble with that too. It’s not that complex exponential. That e^−iφ is a very simple periodic function, consisting of two sine waves rather than just one, as illustrated below. [It’s a sine and a cosine, but they’re the same function: there’s just a phase difference of 90 degrees.]

No. To understand the wavefunction, we need to understand those de Broglie relations, ω = E/ħ and k = p/ħ, and then, as mentioned, we need to understand the Uncertainty Principle. We need to understand where it comes from. Let’s try to go as far as we can by making a few remarks:

Adding or subtracting two terms in math, (E/ħ)·t − (p/ħ)∙x, implies the two terms should have the same dimension: we can only add apples to apples, and oranges to oranges. We shouldn’t mix them. Now, the (E/ħ)·t and (p/ħ)·x terms are actually dimensionless: they are pure numbers. So that’s even better. Just check it: energy is expressed in newton·meter (energy, or work, is force over distance, remember?) or electronvolts (1 eV = 1.6×10⁻¹⁹J = 1.6×10⁻¹⁹N·m); Planck’s constant, as the quantum of action, is expressed in J·s or eV·s; and the unit of (linear) momentum is 1 N·s = 1 kg·m/s = 1 N·s. E/ħ gives a number expressed per second, and p/ħ a number expressed per meter. Therefore, multiplying E/ħ and p/ħ by t and x respectively gives us a dimensionless number indeed.
It’s also an invariant number, which means we’ll always get the same value for it, regardless of our frame of reference. As mentioned above, that’s because the four-vector product p_μx_μ= E·t − p∙x is invariant: it doesn’t change when analyzing a phenomenon in one reference frame (e.g. our inertial reference frame) or another (i.e. in a moving frame).
Now, Planck’s quantum of action h, or ħ – h and ħ only differ in their dimension: h is measured in cycles per second, while ħ is measured in radians per second: both assume we can at least measure one cycle – is the quantum of energy really. Indeed, if “energy is the currency of the Universe”, and it’s real and/or virtual photons who are exchanging it, then it’s good to know the currency unit is h, i.e. the energy that’s associated with one cycle of a photon. [In case you want to see the logic of this, see my post on the physical constants c, h and α.]
It’s not only time and space that are related, as evidenced by the fact that t − x itself is an invariant four-vector, E and p are related too, of course! They are related through the classical velocity of the particle that we’re looking at: E/p = c²/v and, therefore, we can write: E·β = p·c, with β = v/c, i.e. the relative velocity of our particle, as measured as a ratio of the speed of light. Now, I should add that the t − x four-vector is invariant only if we measure time and space in equivalent units. Otherwise, we have to write c·t − x. If we do that, so our unit of distance becomes c meter, rather than one meter, or our unit of time becomes the time that is needed for light to travel one meter, then c = 1, and the E·β = p·c becomes E·β = p, which we also write as β = p/E: the ratio of the energy and the momentum of our particle is its (relative) velocity.

Combining all of the above, we may want to assume that we are measuring energy and momentum in terms of the Planck constant, i.e. the ‘natural’ unit for both. In addition, we may also want to assume that we’re measuring time and distance in equivalent units. Then the equation for the phase of our wavefunctions reduces to:

θ = (ω·t − k ∙x) = E·t − p·x

Now, θ is the argument of a wavefunction, and we can always re-scale such argument by multiplying or dividing it by some constant. It’s just like writing the argument of a wavefunction as v·t–x or (v·t–x)/v = t –x/v with v the velocity of the waveform that we happen to be looking at. [In case you have trouble following this argument, please check the post I did for my kids on waves and wavefunctions.] Now, the energy conservation principle tells us the energy of a free particle won’t change. [Just to remind you, a ‘free particle’ means it’s in a ‘field-free’ space, so our particle is in a region of uniform potential.] So we can, in this case, treat E as a constant, and divide E·t − p·x by E, so we get a re-scaled phase for our wavefunction, which I’ll write as:

φ = (E·t − p·x)/E = t − (p/E)·x = t − β·x

Alternatively, we could also look at p as some constant, as there is no variation in potential energy that will cause a change in momentum, and the related kinetic energy. We’d then divide by p and we’d get (E·t − p·x)/p = (E/p)·t − x) = t/β − x, which amounts to the same, as we can always re-scale by multiplying it with β, which would again yield the same t − β·x argument.

The point is, if we measure energy and momentum in terms of the Planck unit (I mean: in terms of the Planck constant, i.e. the quantum of energy), and if we measure time and distance in ‘natural’ units too, i.e. we take the speed of light to be unity, then our Platonic wavefunction becomes as simple as:

Φ(φ) = a·e^−iφ= a·e^{−i(t − β·x)}

This is a wonderful formula, but let me first answer your most likely question: why would we use a relative velocity?Well… Just think of it: when everything is said and done, the whole theory of relativity and, hence, the whole of physics, is based on one fundamental and experimentally verified fact: the speed of light is absolute. In whatever reference frame, we will always measure it as 299,792,458 m/s. That’s obvious, you’ll say, but it’s actually the weirdest thing ever if you start thinking about it, and it explains why those Lorentz transformations look so damn complicated. In any case, this fact legitimately establishes c as some kind of absolute measure against which all speeds can be measured. Therefore, it is only natural indeed to express a velocity as some number between 0 and 1. Now that amounts to expressing it as the β = v/c ratio.

Let’s now go back to that Φ(φ) = a·e^−iφ= a·e^{−i(t − β·x)}wavefunction. Its temporal frequency ω is equal to one, and its spatial frequency k is equal to β = v/c. It couldn’t be simpler but, of course, we’ve got this remarkably simple result because we re-scaled the argument of our wavefunction using the energy and momentum itself as the scale factor. So, yes, we can re-write the wavefunction of our particle in a particular elegant and simple form using the only information that we have when looking at quantum-mechanical stuff: energy and momentum, because that’s what everything reduces to at that level.

So… Well… We’ve pretty much explained what quantum physics is all about here. You just need to get used to that complex exponential: e^−iφ = cos(−φ) + i·sin(−φ) = cos(φ) −i·sin(φ). It would have been nice if Nature would have given us a simple sine or cosine function. [Remember the sine and cosine function are actually the same, except for a phase difference of 90 degrees: sin(φ) = cos(π/2−φ) = cos(φ+π/2). So we can go always from one to the other by shifting the origin of our axis.] But… Well… As we’ve shown so many times already, a real-valued wavefunction doesn’t explain the interference we observe, be it interference of electrons or whatever other particles or, for that matter, the interference of electromagnetic waves itself, which, as you know, we also need to look at as a stream of photons , i.e. light quanta, rather than as some kind of infinitely flexible aether that’s undulating, like water or air.

However, the analysis above does not include uncertainty. That’s as fundamental to quantum physics as de Broglie‘s equations, so let’s think about that now.

Introducing uncertainty

Our information on the energy and the momentum of our particle will be incomplete: we’ll write E = E₀± σ_E, and p = p₀± σ_p. Huh? No ΔE or ΔE? Well… It’s the same, really, but I am a bit tired of using the Δ symbol, so I am using the σ symbol here, which denotes a standard deviation of some density function. It underlines the probabilistic, or statistical, nature of our approach.

The simplest model is that of a two-state system, because it involves two energy levels only: E = E₀± A, with A some constant. Large or small, it doesn’t matter. All is relative anyway. 🙂 We explained the basics of the two-state system using the example of an ammonia molecule, i.e. an NH₃molecule, so it consists on one nitrogen and three hydrogen atoms. We had two base states in this system: ‘up’ or ‘down’, which we denoted as base state | 1 〉 and base state | 2 〉 respectively. This ‘up’ and ‘down’ had nothing to do with the classical or quantum-mechanical notion of spin, which is related to the magnetic moment. No. It’s much simpler than that: the nitrogen atom could be either beneath or, else, above the plane of the hydrogens, as shown below, with ‘beneath’ and ‘above’ being defined in regard to the molecule’s direction of rotation around its axis of symmetry.

In any case, for the details, I’ll refer you to the post(s) on it. Here I just want to mention the result. We wrote the amplitude to find the molecule in either one of these two states as:

C₁= 〈 1 | ψ 〉 = (1/2)·e^{−(i/ħ)·(E₀− A)·t}+ (1/2)·e^{−(i/ħ)·(E₀+ A)·t}
C₂= 〈 2 | ψ 〉 = (1/2)·e^{−(i/ħ)·(E₀− A)·t}– (1/2)·e^{−(i/ħ)·(E₀+ A)·t}

That gave us the following probabilities:

If our molecule can be in two states only, and it starts off in one, then the probability that it will remain in that state will gradually decline, while the probability that it flips into the other state will gradually increase.

Now, the point you should note is that we get these time-dependent probabilities only because we’re introducing two different energy levels: E₀+ A and E₀− A. [Note they separated by an amount equal to 2·A, as I’ll use that information later.] If we’d have one energy level only – which amounts to saying that we know it, and that it’s something definite – then we’d just have one wavefunction, which we’d write as:

a·e^−iθ= a·e⁻⁽^{i/ħ)·(E₀·t − p·x)}= a·e⁻⁽ⁱ^/ħ)·(E^₀·t)·e⁽^{i/ħ)·(p·x)}

Note that we can always split our wavefunction in a ‘time’ and a ‘space’ part, which is quite convenient. In fact, because our ammonia molecule stays where it is, it has no momentum: p = 0. Therefore, its wavefunction reduces to:

a·e^−iθ= a·e⁻⁽^{i/ħ)·(E₀·t)}

As simple as it can be. 🙂 The point is that a wavefunction like this, i.e. a wavefunction that’s defined by a definite energy, will always yield a constant and equal probability, both in time as well as in space. That’s just the math of it: |a·e^−iθ|²= a². Always! If you want to know why, you should think of Euler’s formula and Pythagoras’ Theorem: cos²θ +sin²θ = 1. Always! 🙂

That constant probability is annoying, because our nitrogen atom never ‘flips’, and we know it actually does, thereby overcoming a energy barrier: it’s a phenomenon that’s referred to as ‘tunneling’, and it’s real! The probabilities in that graph above are real! Also, if our wavefunction would represent some moving particle, it would imply that the probability to find it somewhere in space is the same all over space, which implies our particle is everywhere and nowhere at the same time, really.

So, in quantum physics, this problem is solved by introducing uncertainty. Introducing some uncertainty about the energy, or about the momentum, is mathematically equivalent to saying that we’re actually looking at a composite wave, i.e. the sum of a finite or potentially infinite set of component waves. So we have the same ω = E/ħ and k = p/ħ relations, but we apply them to n energy levels, or to some continuous range of energy levels ΔE. It amounts to saying that our wave function doesn’t have a specific frequency: it now has n frequencies, or a range of frequencies Δω = ΔE/ħ. In our two-state system, n = 2, obviously! So we’ve two energy levels only and so our composite wave consists of two component waves only.

We know what that does: it ensures our wavefunction is being ‘contained’ in some ‘envelope’. It becomes a wavetrain, or a kind of beat note, as illustrated below:

[The animation comes from Wikipedia, and shows the difference between the group and phase velocity: the green dot shows the group velocity, while the red dot travels at the phase velocity.]

So… OK. That should be clear enough. Let’s now apply these thoughts to our ‘reduced’ wavefunction

Φ(φ) = a·e^−iφ= a·e^{−i(t − β·x)}

Thinking about uncertainty

Frankly, I tried to fool you above. If the functional form of the wavefunction is a·e⁻⁽^{i/ħ)·(E·t − p·x)}, then we can measure E and p in whatever unit we want, including h or ħ, but we cannot re-scale the argument of the function, i.e. the phase θ, without changing the functional form itself. I explained that in that post for my kids on wavefunctions:, in which I explained we may represent the same electromagnetic wave by two different functional forms:

F(ct−x) = G(t−x/c)

So F and G represent the same wave, but they are different wavefunctions. In this regard, you should note that the argument of F is expressed in distance units, as we multiply t with the speed of light (so it’s like our time unit is 299,792,458 m now), while the argument of G is expressed in time units, as we divide x by the distance traveled in one second). But F and G are different functional forms. Just do an example and take a simple sine function: you’ll agree that sin(θ) ≠ sin(θ/c) for all values of θ, except 0. Re-scaling changes the frequency, or the wavelength, and it does so quite drastically in this case. 🙂 Likewise, you can see that a·e^−i(φ/E)= [a·e^−iφ]^1/E, so that’s a very different function. In short, we were a bit too adventurous above. Now, while we can drop the 1/ħ in the a·e⁻⁽^{i/ħ)·(E·t − p·x)}function when measuring energy and momentum in units that are numerically equal to ħ, we’ll just revert to our original wavefunction for the time being, which equals

Ψ(θ) = a·e^−iθ= a·e⁻^{i·[(E/ħ)·t − (p/ħ)·x]}

Let’s now introduce uncertainty once again. The simplest situation is that we have two closely spaced energy levels. In theory, the difference between the two can be as small as ħ, so we’d write: E = E₀± ħ/2. [Remember what I said about the ± A: it means the difference is 2A.] However, we can generalize this and write: E = E₀± n·ħ/2, with n = 1, 2, 3,… This does not imply any greater uncertainty – we still have two states only – but just a larger difference between the two energy levels.

Let’s also simplify by looking at the ‘time part’ of our equation only, i.e. a·e⁻^i·(E/ħ)·t. It doesn’t mean we don’t care about the ‘space part’: it just means that we’re only looking at how our function varies in time and so we just ‘fix’ or ‘freeze’ x. Now, the uncertainty is in the energy really but, from a mathematical point of view, we’ve got an uncertainty in the argument of our wavefunction, really. This uncertainty in the argument is, obviously, equal to:

(E/ħ)·t = [(E₀± n·ħ/2)/ħ]·t = (E₀/ħ ± n/2)·t = (E₀/ħ)·t ± (n/2)·t

So we can write:

a·e⁻^i·(E/ħ)·t = a·e⁻^{i·[(E₀/ħ)·t ± (1/2)·t]} = a·e⁻^{i·[(E₀/ħ)·t]}·e^{i·[±(n/2)·t]}

This is valid for any value of t. What the expression says is that, from a mathematical point of view, introducing uncertainty about the energy is equivalent to introducing uncertainty about the wavefunction itself. It may be equal to a·e⁻^{i·[(E₀/ħ)·t]}·e^i·(n/2)·t, but it may also be equal to a·e⁻^{i·[(E₀/ħ)·t]}·e^{−i·(n/2)·t}. The phases of the e^−i·t/2 and e^i·t/2factors are separated by a distance equal to t.

So… Well…

[…]

Hmm… I am stuck. How is this going to lead me to the ΔE·Δt = ħ/2 principle? To anyone out there: can you help? 🙂

[…]

The thing is: you won’t get the Uncertainty Principle by staring at that formula above. It’s a bit more complicated. The idea is that we have some distribution of the observables, like energy and momentum, and that implies some distribution of the associated frequencies, i.e. ω for E, and k for p. The Wikipedia article on the Uncertainty Principle gives you a formal derivation of the Uncertainty Principle, using the so-called Kennard formulation of it. You can have a look, but it involves a lot of formalism—which is what I wanted to avoid here!

I hope you get the idea though. It’s like statistics. First, we assume we know the population, and then we describe that population using all kinds of summary statistics. But then we reverse the situation: we don’t know the population but we do have sample information, which we also describe using all kinds of summary statistics. Then, based on what we find for the sample, we calculate the estimated statistics for the population itself, like the mean value and the standard deviation, to name the most important ones. So it’s a bit the same here, except that, in quantum mechanics, there may not be any real value underneath: the mean and the standard deviation represent something fuzzy, rather than something precise.

Hmm… I’ll leave you with these thoughts. We’ll develop them further as we will be digging into all much deeper over the coming weeks. 🙂

Post scriptum: I know you expect something more from me, so… Well… Think about the following. If we have some uncertainty about the energy E, we’ll have some uncertainty about the momentum p according to that β = p/E. [By the way, please think about this relationship: it says, all other things being equal (such as the inertia, i.e. the mass, of our particle), that more energy will all go into more momentum. More specifically, note that ∂p/∂p = β according to this equation. In fact, if we include the mass of our particle, i.e. its inertia, as potential energy, then we might say that (1−β)·E is the potential energy of our particle, as opposed to its kinetic energy.] So let’s try to think about that.

Let’s denote the uncertainty about the energy as ΔE. As should be obvious from the discussion above, it can be anything: it can mean two separate energy levels E = E₀± A, or a potentially infinite set of values. However, even if the set is infinite, we know the various energy levels need to be separated by ħ, at least. So if the set is infinite, it’s going to be a countable infinite set, like the set of natural numbers, or the set of integers. But let’s stick to our example of two values E = E₀± A only, with A = ħ so E + ΔE = E₀± ħ and, therefore, ΔE = ± ħ. That implies Δp = Δ(β·E) = β·ΔE = ± β·ħ.

Hmm… This is a bit fishy, isn’t it? We said we’d measure the momentum in units of ħ, but so here we say the uncertainty in the momentum can actually be a fraction of ħ. […] Well… Yes. Now, the momentum is the product of the mass, as measured by the inertia of our particle to accelerations or decelerations, and its velocity. If we assume the inertia of our particle, or its mass, to be constant – so we say it’s a property of the object that is not subject to uncertainty, which, I admit, is a rather dicey assumption (if all other measurable properties of the particle are subject to uncertainty, then why not its mass?) – then we can also write: Δp = Δ(m·v) = Δ(m·β) = m·Δβ. [Note that we’re not only assuming that the mass is not subject to uncertainty, but also that the velocity is non-relativistic. If not, we couldn’t treat the particle’s mass as a constant.] But let’s be specific here: what we’re saying is that, if ΔE = ± ħ, then Δv = Δβ will be equal to Δβ = Δp/m = ± (β/m)·ħ. The point to note is that we’re no longer sure about the velocity of our particle. Its (relative) velocity is now:

β ± Δβ = β ± (β/m)·ħ

But, because velocity is the ratio of distance over time, this introduces an uncertainty about time and distance. Indeed, if its velocity is β ± (β/m)·ħ, then, over some time T, it will travel some distance X = [β ± (β/m)·ħ]·T. Likewise, it we have some distance X, then our particle will need a time equal to T = X/[β ± (β/m)·ħ].

You’ll wonder what I am trying to say because… Well… If we’d just measure X and T precisely, then all the uncertainty is gone and we know if the energy is E₀+ ħ or E₀− ħ. Well… Yes and no. The uncertainty is fundamental – at least that’s what’s quantum physicists believe – so our uncertainty about the time and the distance we’re measuring is equally fundamental: we can have either of the two values X = [β ± (β/m)·ħ] T = X/[β ± (β/m)·ħ], whenever or wherever we measure. So we have a ΔX and ΔT that are equal to ± [(β/m)·ħ]·T and X/[± (β/m)·ħ] respectively. We can relate this to ΔE and Δp:

ΔX = (1/m)·T·Δp
ΔT = X/[(β/m)·ΔE]

You’ll grumble: this still doesn’t give us the Uncertainty Principle in its canonical form. Not at all, really. I know… I need to do some more thinking here. But I feel I am getting somewhere. 🙂 Let me know if you see where, and if you think you can get any further. 🙂

The thing is: you’ll have to read a bit more about Fourier transforms and why and how variables like time and energy, or position and momentum, are so-called conjugate variables. As you can see, energy and time, and position and momentum, are obviously linked through the E·t and p·x products in the E₀·t − p·x sum. That says a lot, and it helps us to understand, in a more intuitive way, why the ΔE·Δt and Δp·Δx products should obey the relation they are obeying, i.e. the Uncertainty Principle, which we write as ΔE·Δt ≥ ħ/2 and Δp·Δx ≥ ħ/2. But so proving involves more than just staring at that Ψ(θ) = a·e^−iθ= a·e⁻^{i·[(E/ħ)·t − (p/ħ)·x]}relation.

Having said, it helps to think about how that E·t − p·x sum works. For example, think about two particles, a and b, with different velocity and mass, but with the same momentum, so p_a= p_b ⇔ m_a·v_a= m_a·v_a⇔ m_a/v_b= m_b/v_a. The spatial frequency of the wavefunction would be the same for both but the temporal frequency would be different, because their energy incorporates the rest mass and, hence, because m_a≠ m_b, we also know that E_a≠ E_b. So… It all works out but, yes, I admit it’s all very strange, and it takes a long time and a lot of reflection to advance our understanding.

Occam’s Razor

The analysis of a two-state system (i.e. the rather famous example of an ammonia molecule ‘flipping’ its spin direction from ‘up’ to ‘down’, or vice versa) in my previous post is a good opportunity to think about Occam’s Razor once more. What are we doing? What does the math tell us?

In the example we chose, we didn’t need to worry about space. It was all about time: an evolving state over time. We also knew the answers we wanted to get: if there is some probability for the system to ‘flip’ from one state to another, we know it will, at some point in time. We also want probabilities to add up to one, so we knew the graph below had to be the result we would find: if our molecule can be in two states only, and it starts of in one, then the probability that it will remain in that state will gradually decline, while the probability that it flips into the other state will gradually increase, which is what is depicted below.

However, the graph above is only a Platonic idea: we don’t bother to actually verify what state the molecule is in. If we did, we’d have to ‘re-set’ our t = 0 point, and start all over again. The wavefunction would collapse, as they say, because we’ve made a measurement. However, having said that, yes, in the physicist’s Platonic world of ideas, the probability functions above make perfect sense. They are beautiful. You should note, for example, that P₁ (i.e. the probability to be in state 1) and P₂ (i.e. the probability to be in state 2) add up to 1 all of the time, so we don’t need to integrate over a cycle or something: so it’s all perfect!

These probability functions are based on ideas that are even more Platonic: interfering amplitudes. Let me explain.

Quantum physics is based on the idea that these probabilities are determined by some wavefunction, a complex-valued amplitude that varies in time and space. It’s a two-dimensional thing, and then it’s not. It’s two-dimensional because it combines a sine and cosine, i.e. a real and an imaginary part, but the argument of the sine and the cosine is the same, and the sine and cosine are the same function, except for a phase shift equal to π. We write:

a·e^−iθ= a·cos(θ) – a·sin(−θ) = a·cosθ – a·sinθ

The minus sign is there because it turns out that Nature measures angles, i.e. our phase, clockwise, rather than counterclockwise, so that’s not as per our mathematical convention. But that’s a minor detail, really. [It should give you some food for thought, though.] For the rest, the related graph is as simple as the formula:

Now, the phase of this wavefunction is written as θ = (ω·t − k ∙x). Hence, ω determines how this wavefunction varies in time, and the wavevector k tells us how this wave varies in space. The young Frenchman Comte Louis de Broglie noted the mathematical similarity between the ω·t − k ∙x expression and Einstein’s four-vector product p_μx_μ= E·t − p∙x, which remains invariant under a Lorentz transformation. He also understood that the Planck-Einstein relation E = ħ·ω actually defines the energy unit and, therefore, that any frequency, any oscillation really, in space or in time, is to be expressed in terms of ħ.

[To be precise, the fundamental quantum of energy is h = ħ·2π, because that’s the energy of one cycle. To illustrate the point, think of the Planck-Einstein relation. It gives us the energy of a photon with frequency f: E_γ = h·f. If we re-write this equation as E_γ/f = h, and we do a dimensional analysis, we get: h = E_γ/f ⇔ 6.626×10⁻³⁴ joule·second = [x joule]/[f cycles per second] ⇔ h = 6.626×10⁻³⁴ joule per cycle. It’s only because we are expressing ω and k as angular frequencies (i.e. in radians per second or per meter, rather than in cycles per second or per meter) that we have to think of ħ = h/2π rather than h.]

Louis de Broglie connected the dots between some other equations too. He was fully familiar with the equations determining the phase and group velocity of composite waves, or a wavetrain that actually might represent a wavicle traveling through spacetime. In short, he boldly equated ω with ω = E/ħ and k with k = p/ħ, and all came out alright. It made perfect sense!

I’ve written enough about this. What I want to write about here is how this also makes for the situation on hand: a simple two-state system that depends on time only. So its phase is θ = ω·t = E₀/ħ. What’s E₀? It is the total energy of the system, including the equivalent energy of the particle’s rest mass and any potential energy that may be there because of the presence of one or the other force field. What about kinetic energy? Well… We said it: in this case, there is no translational or linear momentum, so p = 0. So our Platonic wavefunction reduces to:

a·e^−iθ= ae⁻⁽^{i/ħ)·(E₀·t)}

Great! […] But… Well… No! The problem with this wavefunction is that it yields a constant probability. To be precise, when we take the absolute square of this wavefunction – which is what we do when calculating a probability from a wavefunction − we get P = a², always. The ‘normalization’ condition (so that’s the condition that probabilities have to add up to one) implies that P₁ = P₂ = a² = 1/2. Makes sense, you’ll say, but the problem is that this doesn’t reflect reality: these probabilities do not evolve over time and, hence, our ammonia molecule never ‘flips’ its spin direction from ‘up’ to ‘down’, or vice versa. In short, our wavefunction does not explain reality.

The problem is not unlike the problem we’d had with a similar function relating the momentum and the position of a particle. You’ll remember it: we wrote it as a·e^−iθ= ae⁽^{i/ħ)·(p·x)}. [Note that we can write a·e^−iθ= a·e⁻⁽^{i/ħ)·(E₀·t − p·x)}= a·e⁻⁽ⁱ^/ħ)·(E^₀·t)·e⁽^{i/ħ)·(p·x)}, so we can always split our wavefunction in a ‘time’ and a ‘space’ part.] But then we found that this wavefunction also yielded a constant and equal probability all over space, which implies our particle is everywhere (and, therefore, nowhere, really).

In quantum physics, this problem is solved by introducing uncertainty. Introducing some uncertainty about the energy, or about the momentum, is mathematically equivalent to saying that we’re actually looking at a composite wave, i.e. the sum of a finite or infinite set of component waves. So we have the same ω = E/ħ and k = p/ħ relations, but we apply them to n energy levels, or to some continuous range of energy levels ΔE. It amounts to saying that our wave function doesn’t have a specific frequency: it now has n frequencies, or a range of frequencies Δω = ΔE/ħ.

We know what that does: it ensures our wavefunction is being ‘contained’ in some ‘envelope’. It becomes a wavetrain, or a kind of beat note, as illustrated below:

[The animation also shows the difference between the group and phase velocity: the green dot shows the group velocity, while the red dot travels at the phase velocity.]

This begs the following question: what’s the uncertainty really? Is it an uncertainty in the energy, or is it an uncertainty in the wavefunction? I mean: we have a function relating the energy to a frequency. Introducing some uncertainty about the energy is mathematically equivalent to introducing uncertainty about the frequency. Of course, the answer is: the uncertainty is in both, so it’s in the frequency and in the energy and both are related through the wavefunction. So… Well… Yes. In some way, we’re chasing our own tail. 🙂

However, the trick does the job, and perfectly so. Let me summarize what we did in the previous post: we had the ammonia molecule, i.e. an NH₃ molecule, with the nitrogen ‘flipping’ across the hydrogens from time to time, as illustrated below:

This ‘flip’ requires energy, which is why we associate two energy levels with the molecule, rather than just one. We wrote these two energy levels as E₀+ A and E₀− A. That assumption solved all of our problems. [Note that we don’t specify what the energy barrier really consists of: moving the center of mass obviously requires some energy, but it is likely that a ‘flip’ also involves overcoming some electrostatic forces, as shown by the reversal of the electric dipole moment in the illustration above.] To be specific, it gave us the following wavefunctions for the amplitude to be in the ‘up’ or ‘1’ state versus the ‘down’ or ‘2’ state respectivelly:

C₁= (1/2)·e^{−(i/ħ)·(E₀− A)·t}+ (1/2)·e^{−(i/ħ)·(E₀+ A)·t}
C₂= (1/2)·e^{−(i/ħ)·(E₀− A)·t}– (1/2)·e^{−(i/ħ)·(E₀+ A)·t}

Both are composite waves. To be precise, they are the sum of two component waves with a temporal frequency equal to ω₁= (E₀− A)/ħ and ω₁= (E₀+ A)/ħ respectively. [As for the minus sign in front of the second term in the wave equation for C₂, −1 = e^±iπ, so + (1/2)·e^{−(i/ħ)·(E₀+ A)·t}and – (1/2)·e^{−(i/ħ)·(E₀+ A)·t} are the same wavefunction: they only differ because their relative phase is shifted by ±π.] So the so-called base states of the molecule themselves are associated with two different energy levels: it’s not like one state has more energy than the other.

You’ll say: so what?

Well… Nothing. That’s it really. That’s all I wanted to say here. The absolute square of those two wavefunctions gives us those time-dependent probabilities above, i.e. the graph we started this post with. So… Well… Done!

You’ll say: where’s the ‘envelope’? Oh! Yes! Let me tell you. The C₁(t) and C₂(t) equations can be re-written as:

Now, remembering our rules for adding and subtracting complex conjugates (e^iθ + e^–iθ = 2cosθ and e^iθ − e^–iθ = 2sinθ), we can re-write this as:

So there we are! We’ve got wave equations whose temporal variation is basically defined by E₀but, on top of that, we have an envelope here: the cos(A·t/ħ) and sin(A·t/ħ) factor respectively. So their magnitude is no longer time-independent: both the phase as well as the amplitude now vary with time. The associated probabilities are the ones we plotted:

|C₁(t)|²= cos²[(A/ħ)·t], and
|C₂(t)|²= sin²[(A/ħ)·t].

So, to summarize it all once more, allowing the nitrogen atom to push its way through the three hydrogens, so as to flip to the other side, thereby breaking the energy barrier, is equivalent to associating two energy levels to the ammonia molecule as a whole, thereby introducing some uncertainty, or indefiniteness as to its energy, and that, in turn, gives us the amplitudes and probabilities that we’ve just calculated. [And you may want to note here that the probabilities “sloshing back and forth”, or “dumping into each other” – as Feynman puts it – is the result of the varying magnitudes of our amplitudes, so that’s the ‘envelope’ effect. It’s only because the magnitudes vary in time that their absolute square, i.e. the associated probability, varies too.

So… Well… That’s it. I think this and all of the previous posts served as a nice introduction to quantum physics. More in particular, I hope this post made you appreciate the mathematical framework is not as horrendous as it often seems to be.

When thinking about it, it’s actually all quite straightforward, and it surely respects Occam’s principle of parsimony in philosophical and scientific thought, also know as Occam’s Razor: “When trying to explain something, it is vain to do with more what can be done with less.” So the math we need is the math we need, really: nothing more, nothing less. As I’ve said a couple of times already, Occam would have loved the math behind QM: the physics call for the math, and the math becomes the physics.

That’s what makes it beautiful. 🙂

Post scriptum:

One might think that the addition of a term in the argument in itself would lead to a beat note and, hence, a varying probability but, no! We may look at e^{−(i/ħ)·(E₀+ A)·t}as a product of two amplitudes:

e^{−(i/ħ)·(E₀+ A)·t}= e^{−(i/ħ)·E₀·t}·e^{−(i/ħ)·A·t}

But, when writing this all out, one just gets a cos(α·t+β·t)–sin(α·t+β·t), whose absolute square |cos(α·t+β·t)–sin(α·t+β·t)|²= 1. However, writing e^{−(i/ħ)·(E₀+ A)·t}as a product of two amplitudes in itself is interesting. We multiply amplitudes when an event consists of two sub-events. For example, the amplitude for some particle to go from s to x via some point a is written as:

〈 x | s 〉_{via a} = 〈 x | a 〉〈 a | s 〉

Having said that, the graph of the product is uninteresting: the real and imaginary part of the wavefunction are a simple sine and cosine function, and their absolute square is constant, as shown below.

Adding two waves with very different frequencies – A is a fraction of E₀– gives a much more interesting pattern, like the one below, which shows an e^−iαt+e^−iβt= cos(αt)−i·sin(αt)+cos(βt)−i·sin(βt) = cos(αt)+cos(βt)−i·[sin(αt)+sin(βt)] pattern for α = 1 and β = 0.1.

That doesn’t look a beat note, does it? The graphs below, which use 0.5 and 0.01 for β respectively, are not typical beat notes either.

We get our typical ‘beat note’ only when we’re looking at a wave traveling in space, so then we involve the space variable x again, and the relations that come with in, i.e. a phase velocity v_p= ω/k = (E/ħ)/(p/ħ) = E/p = c²/v (read: all component waves travel at the same speed), and a group velocity v_g= dω/dk = v (read: the composite wave or wavetrain travels at the classical speed of our particle, so it travels with the particle, so to speak). That’s what’s I’ve shown numerous times already, but I’ll insert one more animation here, just to make sure you see what we’re talking about. [Credit for the animation goes to another site, one on acoustics, actually!]

So what’s left? Nothing much. The only thing you may want to do is to continue thinking about that wavefunction. It’s tempting to think it actually is the particle, somehow. But it isn’t. So what is it then? Well… Nobody knows, really, but I like to think it does travel with the particle. So it’s like a fundamental property of the particle. We need it every time when we try to measure something: its position, its momentum, its spin (i.e. angular momentum) or, in the example of our ammonia molecule, its orientation in space. So the funny thing is that, in quantum mechanics,

We can measure probabilities only, so there’s always some randomness. That’s how Nature works: we don’t really know what’s happening. We don’t know the internal wheels and gears, so to speak, or the ‘hidden variables’, as one interpretation of quantum mechanics would say. In fact, the most commonly accepted interpretation of quantum mechanics says there are no ‘hidden variables’.
But then, as Polonius famously put, there is a method in this madness, and the pioneers – I mean Werner Heisenberg, Louis de Broglie, Niels Bohr, Paul Dirac, etcetera – discovered. All probabilities can be found by taking the square of the absolute value of a complex-valued wavefunction (often denoted by Ψ), whose argument, or phase (θ), is given by the de Broglie relations ω = E/ħ and k = p/ħ:

θ = (ω·t − k ∙x) = (E/ħ)·t − (p/ħ)·x

That should be obvious by now, as I’ve written dozens of posts on this by now. 🙂 I still have trouble interpreting this, however—and I am not ashamed, because the Great Ones I just mentioned have trouble with that too. But let’s try to go as far as we can by making a few remarks:

Adding two terms in math implies the two terms should have the same dimension: we can only add apples to apples, and oranges to oranges. We shouldn’t mix them. Now, the (E/ħ)·t and (p/ħ)·x terms are actually dimensionless: they are pure numbers. So that’s even better. Just check it: energy is expressed in newton·meter (force over distance, remember?) or electronvolts (1 eV = 1.6×10⁻¹⁹J = 1.6×10⁻¹⁹N·m); Planck’s constant, as the quantum of action, is expressed in J·s or eV·s; and the unit of (linear) momentum is 1 N·s = 1 kg·m/s = 1 N·s. E/ħ gives a number expressed per second, and p/ħ a number expressed per meter. Therefore, multiplying it by t and x respectively gives us a dimensionless number indeed.
It’s also an invariant number, which means we’ll always get the same value for it. As mentioned above, that’s because the four-vector product p_μx_μ= E·t − p∙x is invariant: it doesn’t change when analyzing a phenomenon in one reference frame (e.g. our inertial reference frame) or another (i.e. in a moving frame).
Now, Planck’s quantum of action h or ħ (they only differ in their dimension: h is measured in cycles per second and ħ is measured in radians per second) is the quantum of energy really. Indeed, if “energy is the currency of the Universe”, and it’s real and/or virtual photons who are exchanging it, then it’s good to know the currency unit is h, i.e. the energy that’s associated with one cycle of a photon.
It’s not only time and space that are related, as evidenced by the fact that t − x itself is an invariant four-vector, E and p are related too, of course! They are related through the classical velocity of the particle that we’re looking at: E/p = c²/v and, therefore, we can write: E·β = p·c, with β = v/c, i.e. the relative velocity of our particle, as measured as a ratio of the speed of light. Now, I should add that the t − x four-vector is invariant only if we measure time and space in equivalent units. Otherwise, we have to write c·t − x. If we do that, so our unit of distance becomes c meter, rather than one meter, or our unit of time becomes the time that is needed for light to travel one meter, then c = 1, and the E·β = p·c becomes E·β = p, which we also write as β = p/E: the ratio of the energy and the momentum of our particle is its (relative) velocity.

θ = (ω·t − k ∙x) = E·t − p·x

Now, θ is the argument of a wavefunction, and we can always re-scale such argument by multiplying or dividing it by some constant. It’s just like writing the argument of a wavefunction as v·t–x or (v·t–x)/v = t –x/v with v the velocity of the waveform that we happen to be looking at. [In case you have trouble following this argument, please check the post I did for my kids on waves and wavefunctions.] Now, the energy conservation principle tells us the energy of a free particle won’t change. [Just to remind you, a ‘free particle’ means it is present in a ‘field-free’ space, so our particle is in a region of uniform potential.] You see what I am going to do now: we can, in this case, treat E as a constant, and divide E·t − p·x by E, so we get a re-scaled phase for our wavefunction, which I’ll write as:

φ = (E·t − p·x)/E = t − (p/E)·x = t − β·x

Now that’s the argument of a wavefunction with the argument expressed in distance units. Alternatively, we could also look at p as some constant, as there is no variation in potential energy that will cause a change in momentum, i.e. in kinetic energy. We’d then divide by p and we’d get (E·t − p·x)/p = (E/p)·t − x) = t/β − x, which amounts to the same, as we can always re-scale by multiplying it with β, which would then yield the same t − β·x argument.

Φ(φ) = a·e^−iφ= a·e^{−i(t − β·x)}

Of course, the analysis above does not include uncertainty. Our information on the energy and the momentum of our particle will be incomplete: we’ll write E = E₀± σ_E, and p = p₀± σ_p. [I am a bit tired of using the Δ symbol, so I am using the σ symbol here, which denotes a standard deviation of some density function. It underlines the probabilistic, or statistical, nature of our approach.] But, including that, we’ve pretty much explained what quantum physics is about here.

You just need to get used to that complex exponential: e^−iφ = cos(−φ) + i·sin(−φ) = cos(φ) − i·sin(φ). Of course, it would have been nice if Nature would have given us a simple sine or cosine function. [Remember the sine and cosine function are actually the same, except for a phase difference of 90 degrees: sin(φ) = cos(π/2−φ) = cos(φ+π/2). So we can go always from one to the other by shifting the origin of our axis.] But… Well… As we’ve shown so many times already, a real-valued wavefunction doesn’t explain the interference we observe, be it interference of electrons or whatever other particles or, for that matter, the interference of electromagnetic waves itself, which, as you know, we also need to look at as a stream of photons , i.e. light quanta, rather than as some kind of infinitely flexible aether that’s undulating, like water or air.

So… Well… Just accept that e^−iφ is a very simple periodic function, consisting of two sine waves rather than just one, as illustrated below.

And then you need to think of stuff like this (the animation is taken from Wikipedia), but then with a projection of the sine of those phasors too. It’s all great fun, so I’ll let you play with it now. 🙂

Quantum math: states as vectors, and apparatuses as operators

Pre-script (dated 26 June 2020): Our ideas have evolved into a full-blown realistic (or classical) interpretation of all things quantum-mechanical. In addition, I note the dark force has amused himself by removing some material. So no use to read this. Read my recent papers instead. 🙂

Original post:

I actually wanted to write about the Hamiltonian matrix. However, I realize that, before I can serve the plat de résistance, we need to review or introduce some more concepts and ideas. It all revolves around the same theme: working with states is like working with vectors, but so you need to know how exactly. Let’s go for it. 🙂

In my previous posts, I repeatedly said that a set of base states is like a coordinate system. A coordinate system allows us to describe (i.e. uniquely identify) vectors in an n-dimensional space: we associate a vector with a set of real numbers, like x, y and z, for example. Likewise, we can describe any state in terms of a set of complex numbers – amplitudes, really – once we’ve chosen a set of base states. We referred to this set of base states as a ‘representation’. For example, if our set of base states is +S, 0S and −S, then any state φ can be defined by the amplitudes C₊ = 〈 +S | φ 〉, C₀ = 〈 0S | φ 〉, and C₋ = 〈 −S | φ 〉.

We have to choose some representation (but we are free to choose which one) because, as I demonstrated when doing a practical example (see my description of muon decay in my post on how to work with amplitudes), we’ll usually want to calculate something like the amplitude to go from one state to another – which we denoted as 〈 χ | φ 〉 – and we’ll do that by breaking it up. To be precise, we’ll write that amplitude 〈 χ | φ 〉 – i.e. the amplitude to go from state φ to state χ (you have to read this thing from right to left, like Hebrew or Arab) – as the following sum:

So that’s a sum over a complete set of base states (that’s why I write all i under the summation symbol ∑). We discussed this rule in our presentation of the ‘Laws’ of quantum math.

Now we can play with this. As χ can be defined in terms of the chosen set of base states too, it’s handy to know that 〈 χ | i 〉 and 〈 i | χ 〉 are each other’s complex conjugates – we write this as: 〈 χ | i 〉 = 〈 i | χ 〉* – so if we have one, we have the other (we can also write: 〈 i | χ 〉* = 〈 χ | i 〉). In other words, if we have all C_i = 〈 i | φ 〉 and all D_i = 〈 i | χ 〉, i.e. the ‘components’ of both states in terms of our base states, then we can calculate 〈 χ | φ 〉 as:

〈 χ | φ 〉 = ∑ D_i*C_i = ∑〈 χ | i 〉〈 i | φ 〉,

provided we make sure we do the summation over a complete set of base states. For example, if we’re looking at the angular momentum of a spin-1/2 particle, like an electron or a proton, then we’ll have two base states, +ħ/2 and +ħ/2, so then we’ll have only two terms in our sum, but the spin number (j) of a cobalt nucleus is 7/2, so if we’d be looking at the angular momentum of a cobalt nucleus, we’ll have eight (2·j + 1) base states and, hence, eight terms when doing the sum. So it’s very much like working with vectors, indeed, and that’s why states are often referred to as state vectors. So now you know that term too. 🙂

However, the similarities run even deeper, and we’ll explore all of them in this post. You may or may not remember that your math teacher actually also defined ordinary vectors in three-dimensional space in terms of base vectors e_i, defined as: e₁= [1, 0, 0], e₂= [0, 1, 0] and e₂= [0, 0, 1]. You may also remember that the units along the x, y and z-axis didn’t have to be the same – we could, for example, measure in cm along the x-axis, but in inches along the z-axis, even if that’s not very convenient to calculate stuff – but that it was very important to ensure that the base vectors were a set of orthogonal vectors. In any case, we’d chose our set of orthogonal base vectors and write all of our vectors as:

A = A_x·e₁ + A_y·e₂+ A_z·e₃

That’s simple enough. In fact, one might say that the equation above actually defines coordinates. However, there’s another way of defining them. We can write A_x, A_y, and A_z as vector dot products, aka scalar vector products (as opposed to cross products, or vector products tout court). Check it:

A_x= A·e₁, A_y= A·e₂, and A_z= A·e₃.

This actually allows us to re-write the vector dot product A·B in a way you’ve probably haven’t seen before. Indeed, you’d usually calculate A·B as |A|∙|B|·cosθ = A∙B·cosθ (A and B is the magnitude of the vectors A and B respectively) or, quite simply, as A_xB_x+ A_yB_y+ A_zB_z. However, using the dot products above, we can now also write it as:

We deliberately wrote B·A instead of A∙B because, while the mathematical similarity with the

〈 χ | φ 〉 = ∑〈 χ | i 〉〈 i | φ 〉

equation is obvious, B·A = A·B but 〈 χ | φ 〉 ≠ 〈 φ | χ 〉. Indeed, 〈 χ | φ 〉 and 〈 φ | χ 〉 are complex conjugates – so 〈 χ | φ 〉 = 〈 φ | χ 〉* – but they’re not equal. So we’ll have to watch the order when working with those amplitudes. That’s because we’re working with complex numbers instead of real numbers. Indeed, it’s only because the A·B dot product involves real numbers, whose complex conjugate is the same, that we have that commutativity in the real vector space. Apart from that – so apart from having to carefully check the order of our products – the correspondence is complete.

Let me mention another similarity here. As mentioned above, our base vectors e_i had to be orthogonal. We can write this condition as:

e_i·e_j = δ_ij, with δ_ij= 0 if i ≠ j, and 1 if i = j.

Now, our first quantum-mechanical rule says the same:

〈 i | j 〉 = δ_ij, with δ_ij= 0 if i ≠ j, and 1 if i = j.

So our set of base states also has to be ‘orthogonal’, which is the term you’ll find in physics textbooks, although – as evidenced from our discussion on the base states for measuring angular momentum – one should not try to give any geometrical interpretation here: +ħ/2 and +ħ/2 (so that’s spin ‘up’ and ‘down’ respectively) are not ‘orthogonal’ in any geometric sense, indeed. It’s just that pure states, i.e. base states, are separate, which we write as: 〈 ‘up’ | ‘down’ 〉 = 〈 ‘down’ | ‘up’ 〉 = 0 and 〈 ‘up’ | ‘up’ 〉 = 〈 ‘down’ | ‘down’ 〉 = 1. It just means they are just different base states, and so it’s one or the other. For our +S, 0S and −S example, we’d have nine such amplitudes, and we can organize them in a little matrix:

In fact, just like we defined the base vectors e_i as e₁= [1, 0, 0], e₂= [0, 1, 0] and e₂= [0, 0, 1] respectively, we may say that the matrix above, which states exactly the same as the 〈 i | j 〉 = δ_ij rule, can serve as a definition of what base states actually are. [Having said that, it’s obvious we like to believe that base states are more than just mathematical constructs: we’re talking reality here. The angular momentum as measured in the x-, y- or z-direction, or in whatever direction, is more than just a number.]

OK. You get this. In fact, you’re probably getting impatient because this is too simple for you. So let’s take another step. We showed that the 〈 χ | φ 〉 = ∑〈 χ | i 〉〈 i | χ 〉 and B·A = ∑(B·e_i)(e_i·A) are structurally equivalent – from a mathematical point of view, that is – but B and A are separate vectors, while 〈 χ | φ 〉 is just a complex number. Right?

Well… No. We can actually analyze the bra and the ket in the 〈 χ | φ 〉 bra-ket as separate pieces too. Moreover, we’ll show they are actually state vectors too, even if the bra, i.e. 〈 χ |, and the ket, i.e. | φ 〉, are ‘unfinished pieces’, so to speak. Let’s be bold. Let’s just cut the 〈 χ | φ 〉 = ∑〈 χ | i 〉〈 i | χ 〉 by writing:

Huh?

Yes. That’s the power of Dirac’s bra-ket notation: we can just drop symbols left or right. It’s quite incredible. But, of course, the question is: so what does this actually mean? Well… Don’t rack your brain. I’ll tell you. We define | φ 〉 as a state vector because we define | i 〉 as a (base) state vector. Look at it this way: we wrote the 〈 +S | φ 〉, 〈 0S | φ 〉 and 〈 −S | φ 〉 amplitudes as C₊, C₀, C₋, respectively, so we can write the equation above as:

So we’ve got a sum of products here, and it’s just like A = A_x·e₁+ A_y·e₂ + A_z·e₃. Just substitute the A_icoefficients for C_i and the e_ibase vectors for the | i 〉 base states. We get:

| φ 〉 = |+S〉 C₊ + |0S〉 C₀+ |+S〉 C₋

Of course, you’ll wonder what those terms mean: what does it mean to ‘multiply’ C₊ (remember: C₊ is some complex number) by |+S〉? Be patient. Just wait. You’ll understand when we do some examples, so when you start working with this stuff. You’ll see it all makes sense—later. 🙂

Of course, we’ll have a similar equation for | χ 〉, and so if we write 〈 χ | i 〉 as D_i, then we can write | χ 〉 = ∑ | i 〉〈 χ | i 〉 as | χ 〉 = ∑ | i 〉 D_i.

So what? Again: be patient. We know that 〈 χ | i 〉 = 〈 i | χ 〉*, so our second equation above becomes:

You’ll have two questions now. The first is the same as the one above: what does it mean to ‘multiply’, let’s say, D₀* (i.e. the complex conjugate of D₀, so if D₀= a + ib, then D₀* = a − ib) with 〈0S|? The answer is the same: be patient. 🙂 Your second question is: why do I use another symbol for the index here? Why j instead of i? Well… We’ll have to re-combine stuff, so it’s better to keep things separate by using another symbol for the same index. 🙂

In fact, let’s re-combine stuff right now, in exactly the same way as we took it apart: we just write the two things right next to each other. We get the following:

What? Is that it? So we went through all of this hocus-pocus just to find the same equation as we started out with?

Yes. I had to take you through this so you get used to juggling all those symbols, because that’s what we’ll do in the next post. Just think about it and give yourself some time. I know you’ve probably never ever handled such exercise in symbols before – I haven’t, for sure! – but it all makes sense: we cut and paste. It’s all great! 🙂 [Oh… In case you wonder about the transition from the sum involving i and j to the sum involving i only, think about the Kronecker expression: 〈 j | i 〉 = δ_ij, with δ_ij= 0 if i ≠ j, and 1 if i = j, so most of the terms are zero.]

To summarize the whole discussion, note that the expression above is completely analogous with the B·A = B_xA_x+ B_yA_y+ B_zA_zformula. The only difference is that we’re talking complex numbers here, so we need to watch out. We have to watch the order of stuff, and we can’t use the D_inumbers themselves: we have to use their complex conjugates D_i*. But, for the rest, we’re all set! 🙂 If we’ve got a set of base states, then we can define any state in terms of a set of ‘coordinates’ or ‘coefficients’ – i.e. the C_i or D_i numbers for the φ or χ example above – and we can then calculate the amplitude to go from one state to another as:

In case you’d get confused, just take the original equation:

The two equations are fully equivalent.

[…]

So we just went through all of the shit above so as to show that structural similarity with vector spaces?

Yes. It’s important. You just need to remember that we may have two, three, four, five,… or even an infinite number of base states depending on the situation we’re looking at, and what we’re trying to measure. I am sorry I had to take you through all of this. However, there’s more to come, and so you need this baggage. We’ll take the next step now, and that is to introduce the concept of an operator.

Look at the middle term in that expression above—let me copy it:

We’ve got three terms in that double sum (a double sum is a sum involving two indices, which is what we have here: i and j). When we have two indices like that, one thinks of matrices. That’s easy to do here, because we represented that 〈 i | j 〉 = δ_ij equation as a matrix too! To be precise, we presented it as the identity matrix, and a simple substitution allows us to re-write our equation above as:

I must assume you’re shaking your head in disbelief now: we’ve expanded a simple amplitude into a product of three matrices now. Couldn’t we just stick to that sum, i.e that vector dot product ∑ D_i*C_i? What’s next? Well… I am afraid there’s a lot more to come. For starters, we’ll take that idea of ‘putting something in the middle’ to the next level by going back to our Stern-Gerlach filters and whatever other apparatus we can think of. Let’s assume that, instead of some filter S or T, we’ve got something more complex now, which we’ll denote by A. [Don’t confuse it with our vectors: we’re talking an apparatus now, so you should imagine some beam of particles, polarized or not, entering it, going through, and coming out.]

We’ll stick to the symbols we used already, and so we’ll just assume a particle enters into the apparatus in some state φ, and that it comes out in some state χ. Continuing the example of spin-one particles, and assuming our beam has not been filtered – so, using lingo, we’d say it’s unpolarized – we’d say there’s a probability of 1/3 for being either in the ‘plus’, ‘zero’, or ‘minus’ state with respect to whatever representation we’d happen to be working with, and the related amplitudes would be 1/√3. In other words, we’d say that φ is defined by C₊ = 〈 +S | φ 〉, C₀ = 〈 0S | φ 〉, and C₋ = 〈 −S | φ 〉, with C₊ = C₀ = C₋= 1/√3. In fact, using that | φ 〉 = |+S〉 C₊ + |0S〉 C₀+ |+S〉 C₋expression we invented above, we’d write: | φ 〉 = (1/√3)|+S〉 + (1/√3)|0S〉 C₀+ (1/√3)|+S〉 C₋ or, using ‘matrices’—just a row and a column, really:

However, you don’t need to worry about that now. The new big thing is the following expression:

〈 χ | A | φ〉

It looks simple enough: φ to A to χ. Right? Well… Yes and no. The question is: what do you do with this? How would we take its complex conjugate, for example? And if we know how to do that, would it be equal to 〈 φ | A | χ〉?

You guessed it: we’ll have to take it apart, but how? We’ll do this using another fantastic abstraction. Remember how we took Dirac’s 〈 χ | φ 〉 bra-ket apart by writing | φ 〉 = ∑ | i 〉〈 i | φ 〉? We just dropped the 〈 χ left and right in our 〈 χ | φ 〉 = ∑〈 χ | i 〉〈 i | φ 〉 expression. We can go one step further now, and drop the φ 〉 left and right in our | φ 〉 = ∑ | i 〉〈 i | φ 〉 expression. We get the following wonderful thing:

| = ∑ | i 〉〈 i | over all base states i

With characteristic humor, Feynman calls this ‘The Great Law of Quantum Mechanics’ and, frankly, there’s actually more than one grain of truth in this. 🙂

Now, if we apply this ‘Great Law’ to our 〈 χ | A | φ〉 expression – we should apply it twice, actually – we get:

As Feynman points out, it’s easy to add another apparatus in series. We just write:

Just put a | bar between B and A and apply the same trick. The | bar is really like a factor 1 in multiplication. However, that’s all great fun but it doesn’t solve our problem. Our ‘Great Law’ allows us to sort of ‘resolve’ our apparatus A in terms of base states, as we now have 〈 i | A | j 〉 in the middle, rather than 〈 χ | A | φ〉 but, again, how do we work with that?

Well… The answer will surprise you. Rather than trying to break this thing up, we’ll say that the apparatus A is actually being described, or defined, by the nine 〈 i | A | j 〉 amplitudes. [There are nine for this example, but four only for the example involving spin-1/2 particles, of course.] We’ll call those amplitudes, quite simply, the matrix of amplitudes, and we’ll often denote it by A_ij.

Now, I wanted to talk about operators here. The idea of an operator comes up when we’re creative again, and when we drop the 〈 χ | state from the 〈 χ | A | φ〉 expression. We write:

So now we think of the particle entering the ‘apparatus’ A in the state ϕ and coming out of A in some state ψ (‘psi’). We can generalize this and think of it as an ‘operator’, which Feynman intuitively defines as follows:

The symbol A is neither an amplitude, nor a vector; it is a new kind of thing called an operator. It is something which “operates on” a state to produce a new state.”

But… Wait a minute! | ψ 〉 is not the same as 〈 χ |. Why can we do that substitution? We can only do it because any state ψ and χ are related through that other ‘Law’ of quantum math:

Combining the two shows our ‘definition’ of an operator is OK. We should just note that it’s an ‘open’ equation until it is completed with a ‘bra’, i.e. a state like 〈 χ |, so as to give the 〈 χ | ψ〉 = 〈 χ | A | φ〉 type of amplitude that actually means something. In practical terms, that means our operator or our apparatus doesn’t mean much as long as we don’t measure what comes out, so then we choose some set of base states, i.e. a representation, which allows us to describe the final state, i.e. 〈 χ |.

[…]

Well… Folks, that’s it. I know this was mighty abstract, but the next posts should bring things back to earth again. I realize it’s only by working examples and doing exercises that one can get some kind of ‘feel’ for this kind of stuff, so that’s what we’ll have to go through now. 🙂

Taking the magic out of God’s number: some additional reflections

Note: I have published a paper that is very coherent and fully explains this so-called God-given number. There is nothing magical about it. It is just a scaling constant. Check it out: The Meaning of the Fine-Structure Constant. No ambiguity. No hocus-pocus.

Jean Louis Van Belle, 23 December 2018

Original post:

In my previous post, I explained why the fine-structure constant α is not a ‘magical’ number, even if it relates all fundamental properties of the electron: its mass, its energy, its charge, its radius, its photon scattering cross-section (i.e. the Bohr radius, or the size of the atom really) and, finally, the coupling constant for photon-electron interactions. The key to such understanding of α was the model of an electron as a tiny ball of charge. As such, we have two energy formulas for it. One is the energy that’s needed to assemble the charge from infinitely dispersed infinitesimal charges, which we denoted as U_elec. The other formula is the energy of the field of the tiny ball of charge, which we denoted as E_elec.

The formula for E_elec is calculated using the formula for the field momentum of a moving charge and, using the m = E/c²mas-energy equivalence relationship, is equivalent to the electromagnetic mass. We went through the derivation in our previous post, so let me just jot down the result:

The second formula depends on what ball of charge we’re thinking of, because the formulas for a charged sphere and a spherical shell of charge are different: both have the same structure as the relationship above (so the energy is also proportional to the square of the electron charge and inversely proportional to the radius a), but the constant of proportionality is different. For a sphere of charge, we write:

For a spherical shell of charge we write:

shell

To compare the formulas, you need to note that the square of the electron charge e in the formula for the field energy is equal to e²= q_e²/4πε₀= k_e·q_e². So we multiply the square of the actual electron charge by the Coulomb constant k_e= 1/4πε₀. As you can see, the three formulas have exactly the same form then. It’s just the proportionality constant that’s different: it’s 2/3, 3/5 and 1/2 respectively. It’s interesting to quickly reflect on the dimensions here: [k_e] ≈ 9×10⁹N·m²/C², so e² is expressed in N·m². That makes the units come out alright, as we divide by a (so that’s in meter) and so we get the energy in joule (which is newton·meter). In fact, now that we’re here, let’s quickly calculate the value of e²: it’s that k_e·q_e² product, so it’s equal to 2.3×10⁻²⁸N·m². We can quickly check this value because we know that the classical electron radius is equal to:

So we divide 2.3×10⁻²⁸N·m²by m_ec_²≈ 8.2×10⁻¹⁴J, so we get r₀≈ 2.82×10⁻¹⁵m. So we’re spot on! Why did I do this check? Not really to check what I wrote. It’s more to show what’s going on. We’ve got yet another formula relating the energy and the radius of an electron here, so now we have three. In fact we have more because the formula for U_elec depends on the finer details of our model for the electron (sphere versus shell, uniform versus non-uniform distribution):

E_elec= (2/3)·(e²/a): This is the formula for the energy of the field, so we may all it is external energy.
U_elec= (3/5)·(e²/a), or U_elec= (1/2)·(e²/a): This is the energy needed to assemble our electron, so we might, perhaps, call it its internal energy. The first formula assumes our electron is a uniformly charged sphere. The second assumes all charges sit on the surface of the sphere. If we drop the assumption of the charge having to be uniformly distributed, we’ll find yet another formula.
m_ec²= e²/r₀: This is the energy associated with the so-called classical electron radius (r₀) and the electron’s rest mass (m_e).

In our previous posts, we assumed the last equation was the right one. Why? Because it’s the one that’s been verified experimentally. The discrepancies between the various proportionality coefficients – i.e. the difference between 2/3 and 1, basically – are to be explained because of the binding forces within the electron, without which the electron would just ‘explode’, as the French physicist and polymath Henri Poincaré famously put it. Indeed, if the electron is a little ball of negative charge, the repulsive forces between its parts should rip it apart. So we will not say anything more about this. You can have fun yourself by googling all the various theories that try to model these binding forces. [I may do the same some day, but now I’ve got other priorities: I want to move to Feynman’s third volume of Lectures, which is devoted to quantum physics only, so I look very much forward to that.]

In this post, I just wanted to reflect once more on what constants are really fundamental and what constants are somewhat less fundamental. From all what I wrote in my previous post, I said there were three:

The fine-structure constant α, which is a dimensionless number.
Planck’s constant h, whose dimension is joule·second, so that’s the dimension of action.
The speed of light c, whose dimension is that of a velocity.

The three are related through the following expression:

This is an interesting expression. Let’s first check its dimension. We already explained that e_² is expressed in N·m². That’s rather strange, because it means the dimension of e itself is N^1/2·m: what’s the square root of a force of one newton? In fact, to interpret the formula above, it’s probably better to re-write e²as e²= q_e²/4πε₀= k_e·q_e². That shows you how the electron charge and Coulomb’s constant are related. Of course, they are part and parcel of one and the same force law: Coulomb’s law. We don’t need anything else, except for relativity theory, because we need to explain the magnetic force as well—and that we can do because magnetism is just a relativistic effect. Think of the field momentum indeed: the magnetic field comes into play only when we start to move our electron. The relativity effect is captured by c in that formula for α above. As for ħ, ħ = h/2π comes with the E = h·f equation, which links us to the electron’s Compton wavelength λ through the de Broglie relation λ = h/p.

The point is: we should probably not look at α as a ‘fundamental physical constant’. It’s e² that’s the third fundamental constant, besides h and c. Indeed, it’s from e² that all the rest follows: the electron’s internal energy, its external energy, and its radius, and then all the rest by combining stuff with other stuff.

Now, we took the magic out of α by doing what we did in the previous posts, and that’s to combine stuff with other stuff, and so now you may think I am putting the magic back in with that formula for α, which seems to define α in terms of the three mentioned ‘fundamental’ constants. That’s not the case: this relation comes out of all of the other relationships we found, and so it’s nothing new really. It’s actually not a definition of α: it just does what it does, and that’s to relate α to the ‘fundamental’ physical constants behind.

So… No new magic. In fact, I want to close this post by taking away even more of the magic. If you read my previous post, I said that α was ‘God’s cut-off factor’ 🙂 ensuring our energy functions do not blow up, but I also said it was impossible to say why he chose 0.00729735256 as the cut-off factor. The question is actually easily answered by thinking about those two formulas we had for the internal and external energy respectively. Let’s re-write them in natural units and, temporarily, two different subscripts for α, so we write:

E_elec= α_e/r₀: This is the formula for the energy of the field.
U_elec= α_u/r₀: This is the energy needed to assemble our electron.

Both energies are determined by the above-mentioned laws, i.e. Coulomb’s Law and the theory of relativity, so α has got nothing to do what that. However, both energies have to be the same, and so α_ehas to be equal to α_u. In that sense, α is, quite simply, a proportionality constant that achieves that equality. Now that explains why we can derive α from the three other constants which, as mentioned above, are probably more fundamental. In fact, we’ve got only three degrees of freedom here, so if we chose c, h and e as ‘fundamental’, then α isn’t any more.

The underlying deep question behind it all is why those two energies should be equal. Why would our electron have some internal energy if it’s elementary? The answer to that question is: because it has some non-zero radius, and it has some non-zero radius because we don’t want our formula for the field energy (or the field momentum) to blow up. Now, if it has some radius, then it has to have some internal energy.

You’ll say: that makes sense, but it doesn’t answer the question. Why would it have internal energy, with or without a zero radius? If an electron is an elementary particle, then it’s really elementary, isn’t? And so then we shouldn’t try to ‘assemble’ it from an infinite number of infinitesimally small charges. You’re right, and here we can also note that the fact that the electron doesn’t blow up is firm evidence it’s very elementary indeed.

I should also note that Feynman actually doesn’t talk about the energy that’s needed to assemble a charge: he gets his U_elec= (1/2)·(e²/a) by calculating the external field energy for a spherical shell of charge, and he sticks to it—presumably because it’s the same field for a uniform or non-uniform sphere of charge. He only notes there has to be some radius because, if not, the formula he uses blows up, indeed. So – who knows? – perhaps he doesn’t quite believe that formula for the internal energy is relevant either.

So perhaps there is no internal energy indeed. Perhaps there’s just the energy of the field. So… Well… I can’t say much about this… Except… Well… Perhaps just one more thing. Let me note something that, I hope, you noticed as well: the k_e·q_e²is the numerator in Coulomb’s Law itself. You also know that energy equals force times distance. So if we divide both sides by r₀, we get Coulomb’s Law itself F_elec= k_e·q_e²/r₀². The only thing is: what’s the distance? It’s one charge only, and there is no distance between one charge, is there? Well… Yes and no. I have been thinking that the requirement of the internal and external energies being equal resembles the statement that the forces between two charges are equal and opposite. That ties in with the idea of the internal energy itself: remember we were basically talking forces between infinitesimally small elements of charge within the electron itself? So r₀ is, perhaps, some average distance or so. There must be some way of thinking of it like that. But… Well… Which one exactly?

This kind of reflection may not make sense. Who knows? I obviously need to think all of this through and so this post is, indeed, just a bunch of reflections for which I will have more time later—hopefully. 🙂 Perhaps we’re all just pushing the matter too far. Perhaps we should just accept that the external energy has that 2/3 factor but that the actual energy of the electron should also include the equivalent energy of some binding force that holds the electron together. Well… In any case. That’s all I am going to do on this extremely complicated matter. It’s time to move indeed! So the point to take home here is probably just this:

When calculating the radius of an electron using classical theory, we get in trouble: not only do we find different radii, but the radii that we find do not respect the E = m_ec²law. It’s only the m_ec²= e²/r₀ that’s relativistically correct.
That suggests the electron also has some non-electric mass, which are referred to as ‘binding forces’ or ‘Poincaré stresses’, but which remain to be explained convincingly.
All of this shouldn’t surprise us: for all we know, the electron is something fuzzy. 🙂

So my next posts will focus on the ‘essentials’ preparing for Feynman’s Volume on quantum mechanics. Those ‘essentials’ will still involve some classical stuff but, as you will see, even more contradictions, that – hopefully! – will then be solved in the quantum-mechanical picture of it all. 🙂

Taking the magic out of God’s number

Jean Louis Van Belle, 23 December 2018

Original post:

I think the post scriptum to my previous post is interesting enough to separate it out as a piece of its own, so let me do that here. You’ll remember that we were trying to find some kind of a model for the electron, picturing it like a tiny little ball of charge, and then we just applied the classical energy formulas to it to see what comes out of it. The key formula is the integral that gives us the energy that goes into assembling a charge. It was the following thing:

This is a double integral which we simplified in two stages, so we’re looking at an integral within an integral really, but we can substitute the integral over the ρ(2)·dV₂product by the formula we got for the potential, so we write that as Φ(1), and so the integral above becomes:

Now, this integral integrates the ρ(1)·Φ(1)·dV₁product over all of space, so that’s over all points in space, and so we just dropped the index and wrote the whole thing as the integral of ρ·Φ·dV over all of space:

We then established that this integral was mathematically equivalent to the following equation:

So this integral is actually quite simple: it just integrates E•E = E² over all of space. The illustration below shows E as a function of the distance r for a sphere of radius R filled uniformly with charge.

So the field (E) goes as r for r ≤ R and as 1/r²for r ≥ R. So, for r ≥ R, the integral will have (1/r²)² = 1/r⁴in it. Now, you know that the integral of some function is the surface under the graph of that function. Look at the 1/r⁴function below: it blows up between 1 and 0. That’s where the problem is: there needs to be some kind of cut-off, because that integral will effectively blow up when the radius of our little sphere of charge gets ‘too small’. So that makes it clear why it doesn’t make sense to use this formula to try to calculate the energy of a point charge. It just doesn’t make sense to do that.

In fact, the need for a ‘cut-off factor’ so as to ensure our energy function doesn’t ‘blow up’ is not because of the exponent in the 1/r⁴expression: the need is also there for any 1/r relation, as illustrated below. All 1/rⁿfunction have the same pivot point, as you can see from the simple illustration below. So, yes, we cannot go all the way to zero from there when integrating: we have to stop somewhere.

So what’s the ‘cut-off point’? What’s ‘too small’ a radius? Let’s look at the formula we got for our electron as a shell of charge (so the assumption here is that the charge is uniformly distributed on the surface of a sphere with radius a):

So we’ve got an even simpler formula here: it’s just a 1/r relation (a is r in this formula), not 1/r⁴. Why is that? Well… It’s just the way the math turns out: we’re integrating over volumes and so that involves an r³ factor and so it all simplifies to 1/r, and so that gives us this simple inversely proportional relationship between U and r, i.e a, in this case. 🙂 I copied the detail of Feynman’s calculation in my previous post, so you can double-check it. It’s quite wonderful, really. Look at it again: we have a very simple inversely proportional relationship between the radius of our electron and its energy as a sphere of charge. We could write it as:

U_elect = α/a, with α = e²/2

Still… We need the cut-off point’. Also note that, as I pointed out, we don’t necessarily need to assume that the charge in our little ball of charge (i.e. our electron) sits on the surface only: if we’d assume it’s a uniformly charged sphere of charge, we’d just get another constant of proportionality: our 1/2 factor would become a 3/5 factor, so we’d write: U_elect = (3/5)·e²/a. But we’re not interested in finding the right model here. We know the U_elect = (3/5)·e²/a gives us a value for a that differs with a 2/5 factor as the classical electron radius. That’s not so bad and so let’s go along with it. 🙂

We’re going to look at the simple structure of this relation, and all of its implications. The simple equation above says that the energy of our electron is (a) proportional to the square of its charge and (b) inversely proportional to its radius. Now, that is a very remarkable result. In fact, we’ve seen something like this before, and we were astonished. We saw it when we were discussing the wonderful properties of that magical number, the fine-structure constant, which we also denoted by α. However, because we used α already, I’ll denote the fine-structure constant as α_ehere, so you don’t get confused. You’ll remember that the fine-structure constant is a God-like number indeed: it links all of the fundamental properties of the electron, i.e. its charge, its radius, its distance to the nucleus (i.e. the Bohr radius), its velocity, its mass (and, hence, its energy), its de Broglie wavelength. Whatever: all these physical constants are all related through the fine-structure constant.

In my various posts on this topic, I’ve repeatedly said that, but I never showed why it’s true, and so it was a very magical number indeed. I am going to take some of the magic out now. Not too much but… Well… You can judge for yourself how much of the magic remains after I am done here. 🙂

So, at this stage of the argument, α can be anything, and α_ecannot, of course. It’s just that magical number out there, which relates everything to everything: it’s the God-given number we don’t understand, or didn’t understand, I should say. Past tense. Indeed, we’re going to get some understanding here because we know that one of the many expressions involving α_ewas the following one:

m_e = α_e/r_e

This says that the mass of the electron is equal to the ratio of the fine-structure constant and the electron radius. [Note that we express everything in natural units here, so that’s Planck units. For the detail of the conversion, please see the relevant section on that in my one of my posts on this and other stuff.] In fact, the U = (3/5)·e²/a and m_e = α_e/r_erelations looks exactly the same, because one of the other equations involving the fine-structure constant was: α_e = e_P². So we’ve got the square of the charge here as well! Indeed, as I’ll explain in a moment, the difference between the two formulas is just a matter of units.

Now, mass is equivalent to energy, of course: it’s just a matter of units, so we can equate m_e with E_e (this amounts to expressing the energy of the electron in a kg unit—bit weird, but OK) and so we get:

E_e = α_e/r_e

So there we have: the fine-structure constant α_e is Nature’s ‘cut-off’ factor, so to speak. Why? Only God knows. 🙂 But it’s now (fairly) easy to see why all the relations involving α_e are what they are. As I mentioned already, we also know that α_e is the square of the electron charge expressed in Planck units, so we have:

α_e = e_P² and, therefore, E_e = e_P²/r_e

Now, you can check for yourself: it’s just a matter of re-expressing everything in standard SI units, and relating e_P² to e², and it should all work: you should get the E_elect = (2/3)·e²/a expression. So… Well… At least this takes some of the magic out the fine-structure constant. It’s still a wonderful thing, but so you see that the fundamental relationship between (a) the energy (and, hence, the mass), (b) the radius and (c) the charge of an electron is not something God-given. What’s God-given are Maxwell’s equations, and so the E_e = α_e/r_e= e_P²/r_e is just one of the many wonderful things that you can get out of them.

So we found God’s ‘cut-off factor’ 🙂 It’s equal to α_e ≈ 0.0073 = 7.3×10⁻³. So 7.3 thousands of… What? Well… Nothing. It’s just a pure ratio between the energy and the radius of an electron (if both are expressed in Planck units, of course). And so it determines the electron charge (again, expressed in Planck units). Indeed, we write:

e_P = √α_e

Really? Yes. Just do all these formulas:

e_P = √α_e≈ √0.0073·1.9×10⁻¹⁸coulomb ≈ 1.6 ×10⁻¹⁹C

Just re-check it with all the known decimals: you’ll see it’s bang on. Let’s look at the E_e= m_e = α_e/r_eratio once again. What’s the meaning of it? Let’s first calculate the value of r_e and m_e, i.e. the electron radius and electron mass expressed in Planck units. It’s equal to the classical electron radius divided by the Planck length, and then the same for the mass, so we get the following thing:

r_e ≈ (2.81794×10⁻¹⁵m)/(1.6162×10⁻³⁵m) = 1.7435×10²⁰

m_e ≈ (9.1×10⁻³¹kg)/(2.17651×10⁻⁸kg) = 4.18×10⁻²³

α_e = (4.18×10⁻²³)·(1.7435×10²⁰) ≈ 0.0073

It works like a charm, but what does it mean? Well… It’s just a ratio between two physical quantities, and the scale you use to measure those quantities matters very much. We’ve explained that the Planck mass is a rather large unit at the atomic scale and, therefore, it’s perhaps not quite appropriate to use it here. In fact, out of the many interesting expressions for α_e, I should highlight the following one:

α_e = e²/(ħ·c) ≈ (1.60217662×10⁻¹⁹C)²/(4πε₀·[(1.054572×10⁻³⁴N·m·s)·(2.998×10⁸m/s)]) ≈ 0.0073 once more 🙂

Note that the elementary charge e is actually equal to q_e/4πε₀, which is what I am using in the formula. I know that’s confusing, but it what it is. As for the units, it’s a bit tedious to write it all out, but you’ll get there. Note that ε₀≈ 8.8542×10⁻¹²C²/(N·m²) so… Well… All the units do cancel out, and we get a dimensionless number indeed, which is what α_e is.

The point is: this expression links α_e to the the de Broglie relation (p = h/λ), with λ the wavelength that’s associated with the electron. Of course, because of the Uncertainty Principle, we know we’re talking some wavelength range really, so we should write the de Broglie relation as Δp = h·Δ(1/λ). Now, that, in turn, allows us to try to work out the Bohr radius, which is the other ‘dimension’ we associate with an electron. Of course, now you’ll say: why would you do that. Why would you bring in the de Broglie relation here?

Well… We’re talking energy, and so we have the Planck-Einstein relation first: the energy of some particle can always be written as the product of h and some frequency f: E = h·f. The only thing that de Broglie relation adds is the Uncertainty Principle indeed: the frequency f will be some frequency range, associated with some momentum range, and so that’s what the Uncertainty Principle really says. I can’t dwell too much on that here, because otherwise this post would become a book. 🙂 For more detail, you can check out one of my many posts on the Uncertainty Principle. In fact, the one I am referring to here has Feynman’s calculation of the Bohr radius, so I warmly recommend you check it out. The thrust of the argument is as follows:

If we assume that (a) an electron takes some space – which I’ll denote by r 🙂 – and (b) that it has some momentum p because of its mass m and its velocity v, then the ΔxΔp = ħ relation (i.e. the Uncertainty Principle in its roughest form) suggests that the order of magnitude of r and p should be related in the very same way. Hence, let’s just boldly write r ≈ ħ/p and see what we can do with that.
We know that the kinetic energy of our electron equals mv²/2, which we can write as p²/2m so we get rid of the velocity factor.Well… Substituting our p ≈ ħ/r conjecture, we get K.E. = ħ²/2mr². So that’s a formula for the kinetic energy. Next is potential.
The formula for the potential energy is U = q₁q₂/4πε₀r₁₂. Now, we’re actually talking about the size of an atom here, so one charge is the proton (+e) and the other is the electron (–e), so the potential energy is U = P.E. = –e²/4πε₀r, with r the ‘distance’ between the proton and the electron—so that’s the Bohr radius we’re looking for!
We can now write the total energy (which I’ll denote by E, but don’t confuse it with the electric field vector!) as E = K.E. + P.E. = ħ²/2mr²– e²/4πε₀r. Now, the electron (whatever it is) is, obviously, in some kind of equilibrium state. Why is that obvious? Well… Otherwise our hydrogen atom wouldn’t or couldn’t exist. 🙂 Hence, it’s in some kind of energy ‘well’ indeed, at the bottom. Such equilibrium point ‘at the bottom’ is characterized by its derivative (in respect to whatever variable) being equal to zero. Now, the only ‘variable’ here is r (all the other symbols are physical constants), so we have to solve for dE/dr = 0. Writing it all out yields: dE/dr = –ħ²/mr³+ e²/4πε₀r²= 0 ⇔ r = 4πε₀ħ²/me²
We can now put the values in: r = 4πε₀h²/me²= [(1/(9×10⁹) C²/N·m²)·(1.055×10^–34J·s)²]/[(9.1×10^–31kg)·(1.6×10^–19C)²] = 53×10^–12m = 53 pico-meter (pm)

Done. We’re right on the spot. The Bohr radius is, effectively, about 53 trillionths of a meter indeed!

Phew!

Yes… I know… Relax. We’re almost done. You should now be able to figure out why the classical electron radius and the Bohr radius can also be related to each other through the fine-structure constant. We write:

m_e = α/r_e= α/α²r = 1/αr

So we get that α/r_e= 1/αr and, therefore, we get r_e/r = α², which explains why α is also equal to the so-called junction number, or the coupling constant, for an electron-photon coupling (see my post on the quantum-mechanical aspects of the photon-electron interaction). It gives a physical meaning to the probability (which, as you know, is the absolute square of the probability amplitude) in terms of the chance of a photon actually ‘hitting’ the electron as it goes through the atom. Indeed, the ratio of the Thomson scattering cross-section and the Bohr size of the atom should be of the same order as r_e/r, and so that’s α².

[Note: To be fully correct and complete, I should add that the coupling constant itself is not α² but √α = e_P. Why do we have this square root? You’re right: the fact that the probability is the absolute square of the amplitude explains one square root (√α² = α), but not two. The thing is: the photon-electron interaction consists of two things. First, the electron sort of ‘absorbs’ the photon, and then it emits another one, that has the same or a different frequency depending on whether or not the ‘collision’ was elastic or not. So if we denote the coupling constant as j, then the whole interaction will have a probability amplitude equal to j². In fact, the value which Feynman uses in his wonderful popular presentation of quantum mechanics (The Strange Theory of Light and Matter), is −α ≈ −0.0073. I am not quite sure why the minus sign is there. It must be something with the angles involved (the emitted photon will not be following the trajectory of the incoming photon) or, else, with the special arithmetic involved in boson-fermion interactions (we add amplitudes when bosons are involved, but subtract amplitudes when it’s fermions interacting. I’ll probably find out once I am true through Feynman’s third volume of Lectures, which focus on quantum mechanics only.]

Finally, the last bit of unexplained ‘magic’ in the fine-structure constant is that the fine-structure constant (which I’ve started to write as α again, instead of α_e) also gives us the (classical) relative speed of an electron, so that’s its speed as it orbits around the nucleus (according to the classical theory, that is), so we write

α = v/c = β

I should go through the motions here – I’ll probably do so in the coming days – but you can see we must be able to get it out somehow from all what we wrote above. See how powerful our U_elect ∼ e²/a relation really is? It links the electron, charge, its radius and its energy, and it’s all we need to all the rest out of it: its mass, its momentum, its speed and – through the Uncertainty Principle – the Bohr radius, which is the size of the atom.

We’ve come a long way. This is truly a milestone. We’ve taken the magic out of God’s number—to some extent at least. 🙂

You’ll have one last question, of course: if proportionality constants are all about the scale in which we measure the physical quantities on either side of an equation, is there some way the fine-structure constant would come out differently? That’s the same as asking: what if we’d measure energy in units that are equivalent to the energy of an electron, and the radius of our electron just as… Well… What if we’d equate our unit of distance with the radius of the electron, so we’d write r_e= 1? What would happen to α? Well… I’ll let you figure that one out yourself. I am tired and so I should go to bed now. 🙂

[…] OK. OK. Let me tell you. It’s not that simple here. All those relationships involving α, in one form or the other, are very deep. They relate a lot of stuff to a lot of stuff, and we can appreciate that only when doing a dimensional analysis. A dimensional analysis of the E_e = α_e/r_e= e_P²/r yields [e_P²/r] = C²/m on the right-hand side and [E_e] = J = N·mon the left-hand side. How can we reconcile both? The coulomb is an SI base unit , so we can’t ‘translate’ it into something with N and m. [To be fully correct, for some reason, the ampère (i.e. coulomb per second) was chosen as an SI base unit, but they’re interchangeable in regard to their place in the international system of units: they can’t be reduced.] So we’ve got a problem. Yes. That’s where we sort of ‘smuggled’ the 4πε₀factor in when doing our calculations above. That ε₀constant is, obviously, not ‘as fundamental’ as c or α (just think of the c⁻²= ε₀μ₀relationship to understand what I mean here) but, still, it was necessary to make the dimensions come out alright: we need the reciprocal dimension of ε₀, i.e. (N·m²)/C², to make the dimensional analysis work. We get: (C²/m)·(N·m²)/C² = N·m = J, i.e. joule, so that’s the unit in which we measure energy or – using the E = mc² equivalence – mass, which is the aspect of energy emphasizing its inertia.

So the answer is: no. Changing units won’t change alpha. So all that’s left is to play with it now. Let’s try to do that. Let me first plot that E_e= m_e = α_e/r_e= 0.00729735256/r_e:

Unsurprisingly, we find the pivot point of this curve is at the intersection of the diagonal and the curve itself, so that’s at the (0.00729735256, 0.00729735256) point, where slopes are ± 1, i.e. plus or minus unity. What does this show? Nothing much. What? I can hear you: I should be excited because… Well… Yes! Think of it. If you would have to chose a cut-off point, you’d chose this one, wouldn’t you? 🙂 Sure, you’re right. How exciting! Let me show you. Look at it! It proves that God thinks in terms of logarithms. He has chosen α such that ln(E) = ln(α/r) = lnα – lnr = lnα – lnr = 0, so ln α = lnr and, therefore, α = r. 🙂

Huh? Excuse me?

I am sorry. […] Well… I am not, of course… 🙂 I just wanted to illustrate the kind of exercise some people are tempted to do. It’s no use. The fine-structure constant is what it is: it sort of summarizes an awful lot of formulas. It basically shows what Maxwell’s equation imply in terms of the structure of an atom defined as a negative charge orbiting around some positive charge. It shows we can get calculate everything as a function of something else, and that’s what the fine-structure constant tells us: it relates everything to everything. However, when everything is said and done, the fine-structure constant shows us two things:

Maxwell’s equations are complete: we can construct a complete model of the electron and the atom, which includes: the electron’s energy and mass, its velocity, its own radius, and the radius of the atom. [I might have forgotten one of the dimensions here, but you’ll add it. :-)]
God doesn’t want our equations to blow up. Our equations are all correct but, in reality, there’s a cut-off factor that ensures we don’t go to the limit with them.

So the fine-structure constant anchors our world, so to speak. In other words: of all the worlds that are possible, we live in this one.

[…] It’s pretty good as far as I am concerned. Isn’t it amazing that our mind is able to just grasp things like that? I know my approach here is pretty intuitive, and with ‘intuitive’, I mean ‘not scientific’ here. 🙂 Frankly, I don’t like the talk about physicists “looking into God’s mind.” I don’t think that’s what they’re trying to do. I think they’re just trying to understand the fundamental unity behind it all. And that’s religion enough for me. 🙂

So… What’s the conclusion? Nothing much. We’ve sort of concluded our description of the classical world… Well… Of its ‘electromagnetic sector’ at least. 🙂 That sector can be summarized in Maxwell’s equations, which describe an infinite world of possible worlds. However, God fixed three constants: h, c and α. So we live in a world that’s defined by this Trinity of fundamental physical constants. Why is it not two, or four?

My guts instinct tells me it’s because we live in three dimensions, and so there’s three degrees of freedom really. But what about time? Time is the fourth dimension, isn’t it? Yes. But time is symmetric in the ‘electromagnetic’ sector: we can reverse the arrow of time in our equations and everything still works. The arrow of time involves other theories: statistics (physicists refer to it as ‘statistical mechanics‘) and the ‘weak force’ sector, which I discussed when talking about symmetries in physics. So… Well… We’re not done. God gave us plenty of other stuff to try to understand. 🙂

The classical explanation for the electron’s mass and radius

Feynman’s 28th Lecture in his series on electromagnetism is one of the more interesting but, at the same time, it’s one of the few Lectures that is clearly (out)dated. In essence, it talks about the difficulties involved in applying Maxwell’s equations to the elementary charges themselves, i.e. the electron and the proton. We already signaled some of these problems in previous posts. For example, in our post on the energy in electrostatic fields, we showed how our formulas for the field energy and/or the potential of a charge blow up when we use it to calculate the energy we’d need to assemble a point charge. What comes out is infinity: ∞. So our formulas tell us we’d need an infinite amount of energy to assemble a point charge.

Well… That’s no surprise, is it? The idea itself is impossible: how can one have a finite amount of charge in something that’s infinitely small? Something that has no size whatsoever? It’s pretty obvious we get some division by zero there. 🙂 The mathematical approach is often inconsistent. Indeed, a lot of blah-blah in physics is obviously just about applying formulas to situations that are clearly not within the relevant area of application of the formula. So that’s why I went through the trouble (in my previous post, that is) of explaining you how we get these energy and potential formulas, and that’s by bringing charges (note the plural) together. Now, we may assume these charges are point charges, but that assumption is not so essential. What I tried to say when being so explicit was the following: yes, a charge causes a field, but the idea of a potential makes sense only when we’re thinking of placing some other charge in that field. So point charges with ‘infinite energy’ should not be a problem. Feynman admits as much when he writes:

“If the energy can’t get out, but must stay there forever, is there any real difficulty with an infinite energy? Of course, a quantity that comes out infinite may be annoying, but what really matters is only whether there are any observable physical effects.”

So… Well… Let’s see. There’s another, more interesting, way to look at an electron: let’s have a look at the field it creates. A electron – stationary or moving – will create a field in Maxwell’s world, which we know inside out now. So let’s just calculate it. In fact, Feynman calculates it for the unit charge (+1), so that’s a positron. It eases the analysis because we don’t have to drag any minus sign along. So how does it work? Well…

We’ll have an energy flux density vector – i.e. the Poynting vector S – as well as a momentum density vector g all over space. Both are related through the g = S/c² equation which, as I explained in my previous post, is probably best written as cg = S/c, because we’ve got units then, on both sides, that we can readily understand, like N/m²(so that’s force per unit area) or J/m³ (so that’s energy per unit volume). On the other hand, we’ll need something that’s written as a function of the velocity of our positron, so that’s v, and so it’s probably best to just calculate g, the momentum, which is measured in N·s or kg·(m/s²)·s (both are equivalent units for the momentum p = mv, indeed) per unit volume (so we need to add a 1/ m³ to the unit). So we’ll have some integral all over space, but I won’t bother you with it. Why not? Well… Feynman uses a rather particular volume element to solve the integral, and so I want you to focus on the solution. The geometry of the situation, and the solution for g, i.e. the momentum of the field per unit volume, is what matters here.

So let’s look at that geometry. It’s depicted below. We’ve got a radial electric field—a Coulomb field really, because our charge is moving at a non-relativistic speed, so v << c and we can approximate with a Coulomb field indeed. Maxwell’s equations imply that B = v×E/c², so g = ε₀E×B is what it is in the illustration below. Note that we’d have to reverse the direction of both E and B for an electron (because it’s negative), but g would be the same. It is directed obliquely toward the line of motion and its magnitude is g = (ε₀v/c²)·E²·sinθ. Don’t worry about it: Feynman integrates this thing for you. 🙂 It’s not that difficult, but still… To solve it, he uses the fact that the fields are symmetric about the line of motion, which is indicated by the little arrow around the v-axis, with the Φ symbol next to it (it symbolizes the potential). [The ‘rather particular volume element’ is a ring around the v-axis, and it’s because of this symmetry that Feynman picks the ring. Feynman’s Lectures are not only great to learn physics: they’re a treasure drove of mathematical tricks too. :-)]

As said, I don’t want to bother you with the technicalities of the integral here. This is the result:

What does this say? It says that the momentum of the field – i.e. the electromagnetic momentum, integrated over all of space – is proportional to the velocity v of our charge. That makes sense: when v = 0, we’ll have an electrostatic field all over space and, hence, some inertia, but it’s only when we try to move our charge that Newton’s Law comes into play: then we’ll need some force to overcome that inertia. It all works through the Poynting formula: S = E×B/μ₀. If nothing’s moving, then B = 0, and so we’ll have some E and, therefore, we’ll have field energy alright, but the energy flow will be zero. But when we move the charge, we’re moving the field, and so then B ≠ 0 and so it’s through B that the E in our S equation start kicking in. Does that make sense? Think about it: it’s good to try to visualize things in your mind. 🙂

The constants in the proportionality constant (2e²)/(3ac²) of our p ∼ v formula above are:

e² = q_e²/(4πε₀), with q_ethe electron charge (without the minus sign) and ε₀ our ubiquitous electric constant. [Note that, unlike Feynman, I prefer to not write e in italics, so as to not confuse it with Euler’s number e ≈ 2.71828 etc. However, I know I am not always consistent in my notation. We don’t need Euler’s number in this post, so e or e is always an expression for the electron charge, not Euler’s number. Stupid remark, perhaps, but I don’t want you to be confused.]
a is the radius of our charge—see we got away from the idea of a point charge? 🙂
c² is just c², i.e. our weird constant (the square of the speed of light) which seems to connect everything to everything. Indeed, think about stuff like this: S/g = c²= 1/(ε₀μ₀).

Now, p = mv, so that formula for p basically says that our elementary charge (as mentioned, g is the same for a positron or an electron: E and B will be reversed, but g is not) has an electromagnetic mass m_elecequal to:

That’s an amazing result. We don’t need to give our electron any rest mass: just its charge and its movement will do! Super! So we don’t need any Higgs fields here! 🙂 The electromagnetic field will do!

Well… Maybe. Let’s explore what we’ve got here.

First, let’s compare that radius a in our formula to what’s found in experiments. Huh? Did someone ever try to measure the electron radius? Of course. There are all these scattering experiments in which electrons get fired at atoms. They can fly through or, else, hit something. Therefore, one can some statistical analysis and determine what is referred to as a cross-section. A cross-section is denoted by the same symbol as the standard deviation: σ (sigma). In any case… So there’s something that’s referred to as the classical electron radius, and it’s equal to the so-called Thomsom scattering length. Thomson scattering, as opposed to Compton scattering, is elastic scattering, so it preserves kinetic energy (unlike Compton scattering, where energy gets absorbed and changes frequencies). So… Well… I won’t go into too much detail but, yes, this is the electron radius we need. [I am saying this rather explicitly because there are two other numbers around: the so-called Bohr radius and, as you might imagine, the Compton scattering cross-section.]

The Thomson scattering length is 2.82 femtometer (so that’s 2.82×10⁻¹⁵m), more or less that is :-), and it’s usually related to the observed electron mass m_ethrough the fine-structure constant α. In fact, using Planck units, we can write: r_e·m_e= α, which is an amazing formula but, unfortunately, I can’t dwell on it here. Using ordinary m, s, C and what have you units, we can write r_eas:

That’s good, because if we equate m_eand m_elecand switch m_elecand a in our formula for m_elec, we get:

So, frankly, we’re spot on! Well… Almost. The two numbers differ by 1/3. But who cares about a 1/3 factor indeed? We’re talking rather fuzzy stuff here – scattering cross-sections and standard deviations and all that – so… Yes. Well done! Our theory works!

Well… Maybe. Physicists don’t think so. They think the 1/3 factor is an issue. It’s sad because it really makes a lot of sense. In fact, the Dutch physicist Hendrik Lorentz – whom we know so well by now 🙂 – had also worked out that, because of the length contraction effect, our spherical charge would contract into an ellipsoid and… Well… He worked it all out, and it was not a problem: he found that the momentum was altered by the factor (1−v²/c²)^−1/2, so that’s the ubiquitous Lorentz factor γ! He got this formula in the 1890s already, so that’s long before the theory of relativity had been developed. So, many years before Planck and Einstein would come up with their stuff, Hendrik Antoon Lorentz had the correct formulas already: the mass, or everything really, all should vary with that γ-factor. 🙂

Why bother about the 1/3 factor? [I should note it’s actually referred to as the 4/3 problem in physics.] Well… The critics do have a point: if we assume that (a) an electron is not a point charge – so if we allow it to have some radius a – and (b) that Maxwell’s Laws apply, then we should go all the way. The energy that’s needed to assemble an electron should then, effectively, be the same as the value we’d get out of those field energy formulas. So what do we get when we apply those formulas? Well… Let me quickly copy Feynman as he does the calculation for an electron, not looking at it as a point particle, but as a tiny shell of charge, i.e. a sphere with all charge sitting on the surface:

Let me enlarge the formula:

Now, if we combine that with our formula for m_elec above, then we get:

So that formula does not respect Einstein’s universal mass-energy equivalence formula E = mc². Now, you will agree that we really want Einstein’s mass-energy equivalence relation to be respected by all, so our electron should respect it too. 🙂 So, yes, we’ve got a problem here, and it’s referred to as the 4/3 problem (yes, the ratio got turned around).

Now, you may think it got solved in the meanwhile. Well… No. It’s still a bit of a puzzle today, and the current-day explanation is not really different from what the French scientist Henri Poincaré proposed as a ‘solution’ to the problem back in the 1890s. He basically told Lorentz the following: “If the electron is some little ball of charge, then it should explode because of the repulsive forces inside. So there should be some binding forces there, and so that energy explains the ‘missing mass’ of the electron.” So these forces are effectively being referred to as Poincaré stresses, and the non-electromagnetic energy that’s associated with them – which, of course, has to be equal to 1/3 of the electromagnetic energy (I am sure you see why) 🙂 – adds to the total energy and all is alright now. We get:

U = mc² = (m_elec+ m_Poincaré)c²

So… Yes… Pretty ad hoc. Worse, according to the Wikipedia article on electromagnetic mass, that’s still where we are. And, no, don’t read Feynman’s overview of all of the theories that were around then (so that’s in the 1960s, or earlier). As I said, it’s the one Lecture you don’t want to waste time on. So I won’t do that either.

In fact, let me try to do something else here, and that’s to de-construct the whole argument really. 🙂 Before I do so, let me highlight the essence of what was written above. It’s quite amazing really. Think of it: we say that the mass of an electron – i.e. its inertia, or the proportionality factor in Newton’s F = m·a law of motion – is the energy in the electric and magnetic field it causes. So the electron itself is just a hook for the force law, so to say. There’s nothing there, except for the charge causing the field. But so its mass is everywhere and, hence, nowhere really. Well… I should correct that: the field strength falls of as 1/r²and, hence, the energy flow and momentum density that’s associated with it, falls of as 1/r⁴, so it falls of very rapidly and so the bulk of the energy is pretty near the charge. 🙂

[Note: You’ll remember that the field that’s associated with electromagnetic radiation falls of as 1/r, not as 1/r², which is why there is an energy flux there which is never lost, which can travel independently through space. It’s not the same here, so don’t get confused.]

So that’s something to note: the m_elec= (2c⁻²/3)·(e²/a) has the radius a in it, but that radius is only the hook, so to say. That’s fine, because it is not inconsistent with the idea of the Thomson scattering cross-section, which is the area that one can hit. Now, you’ll wonder how one can hit an electron: you can readily imagine an electron beam aimed at nuclei, but how would one hit electrons? Well… You can shoot photons at them, and see if they bounce back elastically or non-elastically. The cross-section area that bounces them off elastically must be pretty ‘hard’, and the cross-section that deflects them non-elastically somewhat less so. 🙂

OK… But… Yes? Hey! How did we get that electron radius in that formula?

Good question! Brilliant, in fact! You’re right: it’s here that the whole argument falls apart really. We did a substitution. That radius a is the radius of a spherical shell of charge with an energy that’s equal to U_elec= (1/2)·(e²/a), so there’s another way of stating the inconsistency: the equivalent energy of m_elec= (2c⁻²/3)·(e²)/a) is equal to E = m_elec·c²= (2/3)·(e²/a) and that’s not the same as U_elec= (1/2)·(e²/a). If we take the ratio of U_elecand m_elec·c²=, we get the same factor: (1/2)/(2/3) = 3/4. But… Your question is superb! Look at it: putting it the way we put it reveals the inconsistency in the whole argument. We’re mixing two things here:

We first calculate the momentum density, and the momentum, that’s caused by the unit charge, so we get some energy which I’ll denote as E_elec= m_elec·c²
Now, we then assume this energy must be equal to the energy that’s needed to assemble the unit charge from an infinite number of infinitesimally small charges, thereby also assuming the unit charge is a uniformly charged sphere of charge with radius a.
We then use this radius a to simplify our formula for E_elec= m_elec·c²

Now that is not kosher, really! First, it’s (a) a lot of assumptions, both implicit as well as explicit, and then (b) it’s, quite simply, not a legit mathematical procedure: calculating the energy in the field, or calculating the energy we need to assemble a uniformly charged sphere of radius a are two very different things.

Well… Let me put it differently. We’re using the same laws – it’s all Maxwell’s equations, really – but we should be clear about what we’re doing with them, and those two things are very different. The legitimate conclusion must be that our a is wrong. In other words, we should not assume that our electron is spherical shell of charge. So then what? Well… We could easily imagine something else, like a uniform or even a non-uniformly charged sphere. Indeed, if we’re just filling empty space with infinitesimally small charge ‘elements’, then we may want to think the density at the ‘center’ will be much higher, like what’s going on when planets form: the density of the inner core of our own planet Earth is more than four times the density of its surface material. [OK. Perhaps not very relevant here, but you get the idea.] Or, conversely, taking into account Poincaré’s objection, we may want to think all of the charge will be on the surface, just like on a perfect conductor, where all charge is surface charge!

Note that the field outside of a uniformly charged sphere and the field of a spherical shell of charge is exactly the same, so we would not find a different number for E_elec= m_elec·c², but we surely would find a different number for U_elec. You may want to look up some formulas here: you’ll find that the energy of a uniformly distributed sphere of charge (so we do not assume that all of the charge sits on the surface here) is equal to (3/5)·(e²/a). So we’d already have much less of a problem, because the 3/4 factor in the U_elec= (3/4)·m_elec·c² becomes a (5/3)·(2/3) = 10/9 factor. So now we have a discrepancy of some 10% only. 🙂

You’ll say: 10% is 10%. It’s huge in physics, as it’s supposed to be an exact science. Well… It is and it isn’t. Do you realize we haven’t even started to talk about stuff like spin? Indeed, in modern physics, we think of electrons as something that also spins around one or the other axis, so there’s energy there too, and we didn’t include that in our analysis.

In short, Feynman’s approach here is disappointing. Naive even, but then… Well… Who knows? Perhaps he didn’t do this Lecture himself. Perhaps it’s just an assistant or so. In fact, I should wonder why there’s still physicists wasting time on this! I should also note that naively comparing that a radius with the classical electron radius also makes little or no sense. Unlike what you’d expect, the classical electron radius r_e and the Thomson scattering cross-section σ_eare not related like you might think they are, i.e. like σ_e= π·r_e² or σ_e= π·(r_e/2)² or σ_e= r_e² or σ_e= π·(2·r_e)² or whatever circular surface calculation rule that might make sense here. No. The Thomson scattering cross-section is equal to:

σ_e= (8π/3)·r_e² = (2π/3)·(2·r_e)² = (2/3)·π·(2·r_e)² ≈ 66.5×10⁻³⁰m²= 66.5 (fm)²

Why? I am not sure. I must assume it’s got to do with the standard deviation and all that. The point is, we’ve got a 2/3 factor here too, so do we have a problem really? I mean… The a we got was equal to a = (2/3)·r_e, wasn’t it? It was. But, unfortunately, it doesn’t mean anything. It’s just a coincidence. In fact, looking at the Thomson scattering cross-section, instead of the Thomson scattering radius, makes the ‘problem’ a little bit worse. Indeed, applying the π·r² rule for a circular surface, we get that the radius would be equal to (8/3)^1/2·r_e ≈ 1.633·r_e, so we get something that’s much larger rather than something that’s smaller here.

In any case, it doesn’t matter. The point is: this kind of comparisons should not be taken too seriously. Indeed, when everything is said and done, we’re comparing three very different things here:

The radius that’s associated with the energy that’s needed to assemble our electron from infinitesimally small charges, and so that’s based on Coulomb’s law and the model we use for our electron: is it a shell or a sphere of charge? If it’s a sphere, do we want to think of it as something that’s of uniform of non-uniform density.
The second radius is associated with the field of an electron, which we calculate using Poynting’s formula for the energy flow and/or the momentum density. So that’s not about the internal structure of the electron but, of course, it would be nice if we could find some model of an electron that matches this radius.
Finally, there’s the radius that’s associated with elastic scattering, which is also referred to as hard scattering because it’s like the collision of two hard spheres indeed. But so that’s some value that has to be established experimentally and so it involves judicious choices because there’s probabilities and standard deviations involved.

So should we worry about the gaps between these three different concepts? In my humble opinion: no. Why? Because they’re all damn close and so we’re actually talking about the same thing. I mean: isn’t terrific that we’ve got a model that brings the first and the second radius together with a difference of 10% only? As far as I am concerned, that shows the theory works. So what Feynman’s doing in that (in)famous chapter is some kind of ‘dimensional analysis’ which confirms rather than invalidates classical electromagnetic theory. So it shows classical theory’s strength, rather than its weakness. It actually shows our formula do work where we wouldn’t expect them to work. 🙂

The thing is: when looking at the behavior of electrons themselves, we’ll need a different conceptual framework altogether. I am talking quantum mechanics here. Indeed, we’ll encounter other anomalies than the ones we presented above. There’s the issue of the anomalous magnetic moment of electrons, for example. Indeed, as I mentioned above, we’ll also want to think as electrons as spinning around their own axis, and so that implies some circulation of E that will generate a permanent magnetic dipole moment… […] OK, just think of some magnetic field if you don’t have a clue what I am saying here (but then you should check out my post on it). […] The point is: here too, the so-called ‘classical result’, so that’s its theoretical value, will differ from the experimentally measured value. Now, the difference here will be 0.0011614, so that’s about 0.1%, i.e. 100 times smaller than my 10%. 🙂

Personally, I think that’s not so bad. 🙂 But then physicists need to stay in business, of course. So, yes, it is a problem. 🙂

Post scriptum on the math versus the physics

The key to the calculation of the energy that goes into assembling a charge was the following integral:

This is a double integral which we simplified in two stages, so we’re looking at an integral within an integral really, but we can substitute the integral over the ρ(2)·dV₂product by the formula we got for the potential, so we write that as Φ(1), and so the integral above becomes:

We then established that this integral was mathematically equivalent to the following equation:

What’s ‘too small’? Let’s look at the formula we got for our electron as a spherical shell of charge:

So we’ve got an even simpler formula here: it’s just a 1/r relation. Why is that? Well… It’s just the way the math turns it out. I copied the detail of Feynman’s calculation above, so you can double-check it. It’s quite wonderful, really. We have a very simple inversely proportional relationship between the radius of our electron and its energy as a sphere of charge. We could write it as:

U_elect = α/a , with α = e²/2

But – Hey! Wait a minute! We’ve seen something like this before, haven’t we? We did. We did when we were discussing the wonderful properties of that magical number, the fine-structure constant, which we also denoted by α. 🙂 However, because we used α already, I’ll denote the fine-structure constant as α_ehere, so you don’t get confused. As you can see, the fine-structure constant links all of the fundamental properties of the electron: its charge, its radius, its distance to the nucleus (i.e. the Bohr radius), its velocity, and its mass (and, hence, its energy). So, at this stage of the argument, α can be anything, and α_ecannot, of course. It’s just that magical number out there, which relates everything to everything: it’s the God-given number we don’t understand. 🙂 Having said that, it seems like we’re going to get some understanding here because we know that, one the many expressions involving α_ewas the following one:

m_e = α_e/r_e

This says that the mass of the electron is equal to the ratio of the fine-structure constant and the electron radius. [Note that we express everything in natural units here, so that’s Planck units. For the detail of the conversion, please see the relevant section on that in my one of my posts on this and other stuff.] Now, mass is equivalent to energy, of course: it’s just a matter of units, so we can equate m_e with E_e (this amounts to expressing the energy of the electron in a kg unit—bit weird, but OK) and so we get:

E_e = α_e/r_e

So there we have: the fine-structure constant α_e is Nature’s ‘cut-off’ factor, so to speak. Why? Only God knows. 🙂 But it’s now (fairly) easy to see why all the relations involving α_e are what they are. For example, we also know that α_e is the square of the electron charge expressed in Planck units, so we have:

α = e_P² and, therefore, E_e = e_P²/r_e

Now, you can check for yourself: it’s just a matter of re-expressing everything in standard SI units, and relating e_P² to e², and it should all work: you should get the U_elect = (1/2)·e²/a expression. So… Well… At least this takes some of the magic out the fine-structure constant. It’s still a wonderful thing, but so you see that the fundamental relationship between (a) the energy (and, hence, the mass), (b) the radius and (c) the charge of an electron is not something God-given. What’s God-given are Maxwell’s equations, and so the E_e = α_e/r_e= e_P²/r_e is just one of the many wonderful things that you can get out of them. 🙂

Field energy and field momentum

This post goes to the heart of the E = mc², equation. It’s kinda funny, because Feynman just compresses all of it in a sub-section of his Lectures. However, as far as I am concerned, I feel it’s a very crucial section. Pivotal, I’d say, which would fit with its place in all of the 115 Lectures that make up the three volumes, which is sort of mid-way, which is where we are here. So let’s get go for it. 🙂

Let’s first recall what we wrote about the Poynting vector S, which we calculate from the magnetic and electric field vectors E and B by taking their cross-product:

This vector represents the energy flow, per unit area and per unit time, in electrodynamical situations. If E and/or B are zero (which is the case in electrostatics, for example, because we don’t have magnetic fields in electrostatics), then S is zero too, so there is no energy flow then. That makes sense, because we have no moving charges, so where would the energy go to?

I also made it clear we should think of S as something physical, by comparing it to the heat flow vector h, which we presented when discussing vector analysis and vector operators. The heat flow out of a surface element da is the area times the component of h perpendicular to da, so that’s (h•n)·da = h_n·da. Likewise, we can write (S•n)·da = S_n·da. The units of S and h are also the same: joule per second and per square meter or, using the definition of the watt (1 W = 1 J/s), in watt per square meter. In fact, if you google a bit, you’ll find that both h and S are referred to as a flux density:

The heat flow vector h is the heat flux density vector, from which we get the heat flux through an area through the (h•n)·da = h_n·da product.
The energy flow S is the energy flux density vector, from which we get the energy flux through the (S•n)·da = S_n·da product.

So that should be enough as an introduction to what I want to talk about here. Let’s first look at the energy conservation principle once again.

Local energy conservation

In a way, you can look at my previous post as being all about the equation below, which we referred to as the ‘local’ energy conservation law:

Of course, it is not the complete energy conservation law. The local energy is not only in the field. We’ve got matter as well, and so that’s what I want to discuss here: we want to look at the energy in the field as well as the energy that’s in the matter. Indeed, field energy is conserved, and then it isn’t: if the field is doing work on matter, or matter is doing work on the field, then… Well… Energy goes from one to the other, i.e. from the field to the matter or from the matter to the field. So we need to include matter in our analysis, which we didn’t do in our last post. Feynman gives the following simple example: we’re in a dark room, and suddenly someone turns on the light switch. So now the room is full of field energy—and, yes, I just mean it’s not dark anymore. :-). So that means some matter out there must have radiated its energy out and, in the process, it must have lost the equivalent mass of that energy. So, yes, we had matter losing energy and, hence, losing mass.

Now, we know that energy and momentum are related. Respecting and incorporating relativity theory, we’ve got two equivalent formulas for it:

E²− p²c² = m₀²c⁴
pc = E·(v/c) ⇔ p = v·E/c²= m·v

The E = mc² and m = ·m₀·(1−v²/c²)^−1/2 formulas connect both expressions. So we can look at it in either of two ways. We could use the energy conservation law, but Feynman prefers the conservation of momentum approach, so let’s see where he takes us. If the field has some energy (and, hence, some equivalent mass) per unit volume, and if there’s some flow, so if there’s some velocity (which there is: that’s what our previous post was all about), then it will have a certain momentum per unit volume. [Remember: momentum is mass times velocity.] That momentum will have a direction, so it’s a vector, just like p = mv. We’ll write it as g, so we define g as:

g is the momentum of the field per unit volume.

What units would we express it in? We’ve got a bit of choice here. For example, because we’re relating everything to energy here, we may want to convert our kilogram into eV/c²or J/c²units, using the mass-energy equivalence relation E = mc². Hmm… Let’s first keep the kg as a measure of inertia though. So we write: [g] = [m]·[v]/m³= (kg·m/s)/m³. Hmm… That doesn’t show it’s energy, so let’s replace the kg with a unit that’s got newton and meter in it, cf. the F = ma law. So we write: [g] = (kg·m/s)/m³= (kg/s)/m²= [(N·s²/m)/s]/m²= N·s/m³. Well… OK. The newton·second is the unit of momentum indeed, and we can re-write it including the joule (1 J = 1 N·m), so then we get [g] = (J·s/m⁴), so what’s that? Well… Nothing much. However, I do note it happens to be the dimension of S/c², so that’s [S/c²] = [J/(s·m²)]·(s²/m²) = (J·s/m⁴). 🙂 Let’s continue the discussion.

Now, momentum is conserved, and each component of it is conserved. So let’s look at the x-direction. We should have something like:

If you look at this carefully, you’ll probably say: “OK. I understood the thing with the dark room and light switch. Mass got converted into field energy, but what’s that second term of the left?”

Good. Smart. Right remark. Perfect. […] Let me try to answer the question. While all of the quantities above are expressed per unit volume, we’re actually looking at the same infinitesimal volume element here, so the example of the light switch is actually an example of a ‘momentum outflow’, so it’s actually an example of that second term of the left-hand side of the equation above kicking in! 🙂

Indeed, the first term just sort of reiterates the mass-energy equivalence: the energy that’s in the matter can become field energy, so to speak, in our infinitesimal volume element itself, and vice versa. But if it doesn’t, then it should get out and, hence, become ‘momentum outflow’. Does that make sense? No?

Hmm… What to say? You’ll need to look at that equation a couple of times more, I guess. But I need to move on, unfortunately. [Don’t get put off when I say things like this: I am basically talking to myself, so it means I’ll need to re-visit this myself. :-/]

Let’s look at all of the three terms:

The left-hand side (i.e. the time rate-of-change of the momentum of matter) is easy. It’s just the force on it, which we know is equal to F = $q (E + v \times B). Do we know that? OK\dots I’ll admit it. Sometimes it’s easy to forget where we are in an analysis like this, but so we’re looking at the electromagnetic force here. 🙂 As we’re talking infinitesimals here and, therefore, charge density rather than discrete charges, we should re-write this as the force per unit volume which is$ $ρ E + j \times B . [This is an interesting formula which I didn’t use before, so you should double-check it. :-)]$
The first term on the right-hand side should be equally obvious, or… Well… Perhaps somewhat less so. But with all my rambling on the Uncertainty Principle and/or the wave-particle duality, it should make sense. If we scrap the second term on the right-hand side, we basically have an equation that is equivalent to the E = mc² equation. No? Sorry. Just look at it, again and again. You’ll end up understanding it. 🙂
So it’s that second term on the right-hand side. What the hell does that say? Well… I could say: it’s the local energy or momentum conservation law. If the energy or momentum doesn’t stay in, it has to go out. 🙂 But that’s not very satisfactory as an answer, of course. However, please just go along with this ‘temporary’ answer for a while.

So what is that second term on the right-hand side? As we wrote it, it’s an x-component – or, let’s put it differently, it is or was part of the x-component of the momentum density – but, frankly, we should probably allow it to go out in any direction really, as the only constraint on the left-hand side is a per second rate of change of something. Hence, Feynman suggest to equate it to something like this:

What a, b and c? The components of some vector? Not sure. We’re stuck. This piece really requires very advanced math. In fact, as far as I know, this is the only time where Feynman says: “Sorry. This is too advanced. I’ll just give you the equation. Sorry.” So that’s what he does. He explains the philosophy of the argument, which is the following:

On the left-hand side, we’ve got the time rate-of-change of momentum, so that obeys the F = dp/dt = d(mv)/dt law, with the force F, $per unit volume, being equal to F (unit volume) =$ $ρ E + j \times B .$
$On the right-hand side, we’ve got something that can be written as:$

So we’d need to find a way to $ρ E + j \times B$ in terms of $E$ and $B only -$ eliminating $ρ$ and $j$ by using Maxwell’s equations or whatever other trick – and then juggle terms and make substitutions to get it into a form that looks like the formula above, i.e. the right-hand side of that equation. But so Feynman doesn’t show us how it’s being done. He just mentions some theorem in physics, which says that the energy that’s flowing through a unit area per unit time divided by c² – so that’s E/c²per unit area and per unit time – must be equal to the momentum per unit volume in the space, so we write:

g = S/c²

He illustrates the general theorem that’s used to get the equation above by giving two examples:

OK. Two good examples. However, it’s still frustrating to not see how we get the g = S/c² in the specific context of the electromagnetic force, so let’s do a dimensional analysis at least. In my previous post, I showed that the dimension of S must be J/(m²·s), so [S/c²] = [J/(m²·s)]/(m²/s²) = [N·m/(m²·s)]·(s²/m²) = [N·s/m³]. Now, we know that the unit of mass is 1 kg = N/(m/s²). That’s just the force law: a force of 1 newton will give a mass of 1 kg an acceleration of 1 m/s per second, so 1 N = 1 kg·(m/s²). So the [N·s/m³] dimension is equal to [kg·(m/s²)·s/m³] = [(kg·(m/s)/m³] = [(kg·(m/s)]/m³ $, which is the dimension of momentum (p = m v) per unit volume, indeed. So, yes, the dimensional analysis works out, and it’s also in line with the p = v\cdot E / c 2$ $= m\cdot v equation, but\dots Oh\dots We did a dimensional analysis already, where we also showed that [g] = [S / c 2] = (J\cdots/m 4). Well\dots In any case\dots It’s a bit frustrating to not see the detail here, but let us note the the Grand Result once again:$

The Poynting vector S gives us the energy flow as well as the $momentum density$ $g$ = S/c².

But what does it all mean, really? Let’s go through Einstein’s illustration of the principle. That will help us a lot. Before we do, however, I’d like to note something. I’ve always wondered a bit about that dichotomy between energy and momentum. Energy is force times distance: 1 joule is 1 newton × 1 meter indeed (1 J = 1 N·m). Momentum is force times time, as we can express it in N·s. Planck’s constant h combines all three in the dimension of action, which is force times distance times time: h ≈ 6.6×10⁻³⁴ N·m·s, indeed. I like that unity. In this regard, you should, perhaps, quickly review that post in which I explain that h is the energy per cycle, i.e. per wavelength or per period, of a photon, regardless of its wavelength. So it’s really something very fundamental.

We’ve got something similar here: energy and momentum coming together, and being shown as one aspect of the same thing: some oscillation. Indeed, just see what happens with the dimensions when we ‘distribute’ the 1/c²factor on the right-hand side over the two sides, so we write: $c$ $\cdot$ $g$ = S/c and work out the dimensions:

$[$ $c$ $\cdot$ $g$ $]$ = (m/s)·(N·s)/m³= N/m²= J/m³.
[S/c] = (s/m)·(N·m)/(s·m²) = N/m²= J/m³.

Isn’t that nice? Both sides of the equation now have a dimension like ‘the force per unit area’, or ‘the energy per unit volume’. To get that, we just re-scaled g and S, by c and 1/c respectively. As far as I am concerned, this shows an underlying unity we probably tend to mask with our ‘related but different’ energy and momentum concepts. It’s like E and B: I just love it we can write them together in our Poynting formula S = ε₀c²E×B. In fact, let me show something else here, which you should think about. You know that c²= 1/(ε₀μ₀), so we can write S also as S = E×B/μ₀. That’s nice, but what’s nice too is the following:

S/c = $c$ $\cdot$ $g$ = ε₀cE×B = E×B/μ₀c
S/g = c²= 1/(ε₀μ₀)

So, once again, Feynman may feel the Poynting vector is sort of counter-intuitive when analyzing specific situations but, as far as I am concerned, I feel the Poyning vector makes things actually easier to understand. Instead of two E and B vectors, and two concepts to deal with ‘energy’ (i.e. energy and momentum), we’re sort of unifying things here. In that regard – i.e in regard of feeling we’re talking the same thing really – I’d really highlight the S/g = c²= 1/(ε₀μ₀) equation. Indeed, the universal constant c acts just like the fine-structure constant here: it links everything to everything. 🙂

And, yes, it’s also about time we introduce the so-called principle of least action to explain things, because action, as a concept, combines force, distance and time indeed, so it’s a bit more promising than just energy, of just momentum. Having said that, you’ll see in the next section that it’s sometimes quite useful to have the choice between one formula or the other. But… Well… Enough talk. Let’s look at Einstein’s car.

Einstein’s car

Einstein’s car is a wonderful device: it rolls without any friction and it moves with a little flashlight. That’s all it needs. It’s pictured below. 🙂 So the situation is the following: the flashlight shoots some light out from one side, which is then stopped at the opposite end of the car. When the light is emitted, there must be some recoil. In fact, we know it’s going to be equal to 1/c times the energy because all we need to do is apply the pc = E·(v/c) formula for v = c, so we know that p = E/c. Of course, this momentum now needs to move Einstein’s car. It’s frictionless, so it should work, but still… The car has some mass M, and so that will determine its recoil velocity: v = p/M. We just apply the general p = mv formula here, and v is not equal to c here, of course! Of course, then the light hits the opposite end of the car and delivers the same momentum, so that stops the car again. However, it did move over some distance x = vt. So we could flash our light again and get to wherever we want to get. [Never mind the infinite accelerations involved!] So… Well… Great! Yes, but Einstein didn’t like this car when he first saw it. In fact, he still doesn’t like it, because he knows it won’t take you very far. 🙂

The problem is that we seem to be moving the center of gravity of this car by fooling around on the inside only. Einstein doesn’t like that. He thinks it’s impossible. And he’s right of course. The thing is: the center of gravity did not change. What happened here is that we’ve got some blob of energy, and so that blob has some equivalent mass (which we’ll denote by U/c²), and so that equivalent mass moved all the way from one side to the other, i.e. over the length of the car, which we denote by L. In fact, it’s stuff like this that inspired the whole theory of the field energy and field momentum, and how it interacts with matter.

What happens here is like switching the light on in the dark room: we’ve got matter doing work on the field, and so matter loses mass, and the field gains it, through its momentum and/or energy. To calculate how much, we could integrate S/c or $c$ $\cdot$ $g$ over the volume of our blob, and we’d get something in joule indeed, but there’s a simpler way here. The momentum conservation says that the momentum of our car and the momentum of our blob must be equal, so if T is the time that was needed for our blob to go to the other side – and so that’s, of course, also the time during which our car was rolling – then M·v = M·x/T must be equal to (U/c²)·c = (U/c²)·L/T. The 1/T factor on both sides cancel, so we write: M·x = (U/c²)·L. Now, what is x? Yes. In case you were wondering, that’s what we’re looking for here. 🙂 Here it is:

x = vT = vL/c = (p/M)·(L/c) = [U/c)/M]·(L/c) = (U/c²)·(L/M)

So what’s next? Well… Now we need to show that the center-of-mass actually did not move with this ‘transfer’ of the blob. I’ll leave the math to you here: it should all work out. And you can also think through the obvious questions:

Where is the energy and, hence, the mass of our blob after it stops the car? Hint: think about excited atoms and imagine they might radiate some light back. 🙂
As the car did move a little bit, we should be able to move it further and further away from its center of gravity, until the center of gravity is no longer in the car. Hint: think about batteries and energy levels going down while shooting light out. It just won’t happen. 🙂

Now, what about a blob of light going from the top to the bottom of the car? Well… That involves the conservation of angular momentum: we’ll have more mass on the bottom, but on a shorter lever-arm, so angular momentum is being conserved. It’s a very good question though, and it led Einstein to combine the center-of-gravity theorem with the angular momentum conservation theorem to explain stuff like this.

It’s all fascinating, and one can think of a great many paradoxes that, at first, seem to contradict the Grand Principles we used here, which means that they would contradict all that we have learned so far. However, a careful analysis of those paradox reveals that they are paradoxes indeed: propositions which sound true but are, in the end, self-contradictory. In fact, when explaining electromagnetism over his various Lectures, Feynman tasks his readers with a rather formidable paradox when discussing the laws of induction, he solves it here, ten chapters later, after describing what we described above. You can busy yourself with it but… Well… I guess you’ve got something better to do. If so, just take away the key lesson: there’s momentum in the field, and it’s also possible to build up angular momentum in a magnetic field and, if you switch it off, the angular momentum will be given back, somehow, as it’s stored energy.

That’s also why the seemingly irrelevant circulation of S we discussed in my previous post, where we had a charge next to an ordinary magnet, and where we found that there was energy circulating around, is not so queer. The energy is there, in the circulating field, and it’s real. As real as can be. 🙂

The energy of fields and the Poynting vector

For some reason, I always thought that Poynting was a Russian physicist, like Minkowski. He wasn’t. I just looked it up. Poynting was an Englishman, born near Manchester, and he teached in Birmingham. I should have known. Poynting is a very English name, isn’t it? My confusion probably stems from the fact that it was some Russian physicist, Nikolay Umov, who first proposed the basic concepts we are going to discuss here, i.e. the speed and direction of energy itself, or its movement. And as I am double-checking, I just learned that Hermann Minkowski is generally considered to be German-Jewish, not Russian. Makes sense. With Einstein and all that. His personal life story is actually quite interesting. You should check it out. 🙂

Let’s go for it. We’ve done a few posts on the energy in the fields already, but all in the contexts of electrostatics. Let me first walk you through the ideas we presented there.

The basic concepts: force, work, energy and potential

1. A charge q causes an electric field E, and E‘s magnitude E is a simple function of the charge (q) and its distance (r) from the point that we’re looking at, which we usually write as P = (x, y, z). Of course, the origin of our reference frame here is q. The formula is the simple inverse-square law that you (should) know: E ∼ q/r², and the proportionality constant is just Coulomb’s constant, which I think you wrote as k_e in your high-school days and which, as you know, is there so as to make sure the units come out alright. So we could just write E = k_e·q/r². However, just to make sure it does not look like a piece of cake 🙂 physicists write the proportionality constant as 1/4πε₀, so we get:

Now, the field is the force on any unit charge (+1) we’d bring to P. This led us to think of energy, potential energy, because… Well… You know: energy is measured by work, so that’s some force acting over some distance. The potential energy of a charge increases if we move it against the field, so we wrote:

Well… We actually gave the formula below in that post, so that’s the work done per unit charge. To interpret it, you just need to remember that F = qE, which is equivalent to saying that E is the force per unit charge.

As for the F•ds or E•ds product in the integrals, that’s a vector dot product, which we need because it’s only the tangential component of the force that’s doing work, as evidenced by the formula F•ds = |F|·|ds|·cosθ = F_t·ds, and as depicted below.

Now, this allowed us to describe the field in terms of the (electric) potential Φ and the potential differences between two points, like the points a and b in the integral above. We have to chose some reference point, of course, some P₀ defining zero potential, which is usually infinitely far away. So we wrote our formula for the work that’s being done on a unit charge, i.e. W(unit) as:

2. The world is full of charges, of course, and so we need to add all of their fields. But so now you need a bit of imagination. Let’s reconstruct the world by moving all charges out, and then we bring them back one by one. So we take q₁ now, and we bring it back into the now-empty world. Now that does not require any energy, because there’s no field to start with. However, when we take our second charge q₂, we will be doing work as we move it against the field or, if it’s an opposite charge, we’ll be taking energy out of the field. Huh? Yes. Think about it. All is symmetric. Just to make sure you’re comfortable with every step we take, let me jot down the formula for the force that’s involved. It’s just the Coulomb force of course:

F₁is the force on charge q₁, and F₂is the force on charge q₂. Now, q₁and q₂. may attract or repel each other but the forces will always be equal and opposite. The e₁₂ vector makes sure the directions and signs come out alright, as it’s the unit vector from q₂to q₁(not from q₁to q₂, as you might expect when looking at the order of the indices). So we would need to integrate this for r going from infinity to… Well… The distance between q₁and q₂ – wherever they end up as we put them back into the world – so that’s what’s denoted by r₁₂. Now I hate integrals too, but this is an easy one. Just note that ∫ r⁻²dr = 1/r and you’ll be able to figure out that what I’ll write now makes sense (if not, I’ll do a similar integral in a moment): the work done in bringing two charges together from a large distance (infinity) is equal to:

So now we should bring in q₃and then q₄, of course. That’s easy enough. Bringing the first two charges into that world we had emptied took a lot of time, but now we can automate processes. Trust me: we’ll be done in no time. 🙂 We just need to sum over all of the pairs of charges q_i and q_j. So we write the total electrostatic energy U as the sum of the energies of all possible pairs of charges:

Huh? Can we do that? I mean… Every new charge that we’re bringing in here changes the field, doesn’t it? It does. But it’s the magic of the superposition principle at work here. Our third charge q₃is associated with two pairs in this formula. Think of it: we’ve got the q₁q₃and the q₂q₃combination, indeed. Likewise, our fourh charge q₄is to be paired up with three charges now: q₁, q₁ and q₃. This formula takes care of it, and the ‘all pairs’ mention under the summation sign (Σ) reminds us we should watch we don’t double-count pairs: the q₁q₃and q₃q₁combination, for example, count for one pair only, obviously. So, yes, we write ‘all pairs’ instead of the usual i, j subscripts. But then, yes, this formula takes care of it. We’re done!

Well… Not really, of course. We’ve still got some way to go before I can introduce the Poynting vector. 🙂 However, to make sure you ‘get’ the energy formula above, let me insert an extremely simple diagram so you’ve got a bit of a visual of what we’re talking about.

3. Now, let’s take a step back. We just calculated the (potential) energy of the world (U), which is great. But perhaps we should also be interested in the world’s potential Φ, rather than its potential energy U. Why? Well, we’ll want to know what happens when we bring yet another charge in—from outer space or so. 🙂 And so then it’s easier to know the world’s potential, rather than its energy, because we can calculate the field from it using the E = −∇Φ formula. So let’s de- and re-construct the world once again 🙂 but now we’ll look at what happens with the field and the potential.

We know our first charge created a field with a field strength we calculated as:

So, when bringing in our second charge, we can use our Φ(P) integral to calculate the potential:

[Let me make a note here, just for the record. You probably think I am being pretty childish when talking about my re-construction of the world in terms of bringing all charges out and then back in again but, believe me, there will be a lot of confusion when we’ll start talking about the energy of one charge, and that confusion can be avoided, to a large extent, when you realize that the idea (I mean the concept itself, really—not its formula) of a potential involves two charges really. Just remember: it’s the first charge that causes the field (and, of course, any charge causes a field), but calculating a potential only makes sense when we’re talking some other charge. Just make a mental note of it. You’ll be grateful to me later.]

Let’s now combine the integral and the formula for E above. Because you hate integrals as much as I do, I’ll spell it out: the antiderivative of the Φ(P) integral is ∫ q/(4πε₀r²)·dr. Now, let’s bring q/4πε₀out for a while so we can focus on solving ∫(1/r²)dr. Now, ∫(1/r²)dr is equal to –1/r + k, and so the whole antiderivative is –q/4πε₀r + k. Now, we integrate from r = ∞ to r, and so the definite integral is [–q/(4πε₀)]·[1/∞ − 1/r] = [–q/(4πε₀)]·[0 − 1/r] = q/(4πε₀r). Let me present this somewhat nicer:

You’ll say: so what? Well… We’re done! The only thing we need to do now is add up the potentials of all of the charges in the world. So the formula for the potential Φ at a point which we’ll simply refer to as point 1, is:

Note that our index j starts at 2, otherwise it doesn’t make sense: we’d have a division by zero for the q₁/r₁₁ term. Again, it’s an obvious remark, but not thinking about it can cause a lot of confusion down the line.

4. Now, I am very sorry but I have to inform you that we’ll be talking charge densities and all that shortly, rather than discrete charges, so I have to give you the continuum version of this formula, i.e. the formula we’ll use when we’ve got charge densities rather than individual charges. That sum above then becomes an infinite sum (i.e. an integral), and q_j becomes a variable which we write as ρ(2). [That’s totally in line with our index j starts at 2, rather than from 1.] We get:

Just look at this integral, and try to understand it: we’re integrating over all of space – so we’re integrating the whole world, really 🙂 – and the ρ(2)·dV₂product in the integral is just the charge of an infinitesimally small volume of our world. So the whole integral is just the (infinite) sum of the contributions to the potential (at point 1) of all (infinitesimally small) charges that are around indeed. Now, there’s something funny here. It’s just a mathematical thing: we don’t need to worry about double-counting here. Why? We’re not having products of volumes here. Just make a mental note of it because it will be different in a moment.

Now we’re going to look at the continuum version for our energy formula indeed. Which energy formula? That electrostatic energy formula, which said that the total electrostatic energy U as the sum of the energies of all possible pairs of charges:

Its continuum version is the following monster:

Hmm… What kind of integral is that? We’ve got two variables here: dV₂ and dV₁. Yes. And we’ve also got a 1/2 factor now, because we do not want to double-count and, unfortunately, there is no convenient way of writing an integral like this that keeps track of the pairs. It’s a so-called double integral, but I’ll let you look up the math yourself. In any case, we can simplify this integral so you don’t need to worry about it too much. How do we simplify it? Well… Just look at that integral we got for Φ(1): we calculated the potential at point 1 by integrating the ρ(2)·dV₂product over all of space, so the integral above can be written as:

But so this integral integrates the ρ(1)·Φ(1)·dV₁product over all of space, so that’s over all points in space. So we can just drop the index and write the whole thing as the integral of ρ·Φ·dV over all of space:

5. It’s time for the hat-trick now. The equation above is mathematically equivalent to the following equation:

Huh? Yes. Let me make two remarks here. First on the math, the E = −∇Φ formula allows you to the integrand of the integral above as E•E = (−∇Φ)•(−∇Φ) = (∇Φ)•(∇Φ). And then you may or may not remember that, when substituting E = −∇Φ in Maxwell’s first equation (∇•E = ρ/ε₀), we got the following equality: ρ = ε₀·∇•(∇Φ) = ε₀·∇²Φ, so we can write ρΦ as ε₀·Φ·∇²Φ. However, that still doesn’t show the two integrals are the same thing. The proof is actually rather involved, and so I’ll refer to that post I referred to, so you can check the proof there.

The second remark is much more fundamental. The two integrals are mathematically equivalent, but are they also physically? What do I mean with that? Well… Look at it. The second integral implies that we can look at (ε₀/2)·E•E = ε₀E²/2 as an energy density, which we’ll denote by u, so we write:

Just to make sure you ‘get’ what we’re talking about here: u is the energy density in the little cube dV in the rather simplistic (and, therefore, extremely useful) illustration below (which, just like most of what I write above, I got from Feynman).

Now the question: what is the reality of that formula? Indeed, what we did when calculating U amounted to calculating the Universe with some number U – and that’s kinda nice, of course! – but then what? Is u = ε₀E²/2 anything real? Well… That’s what this post is about. So we’re finished with the introduction now. 🙂

Energy density and energy flow in electrodynamics

Before giving you any more formulas, let me answer the question: there is no doubt, in the classical theory of electromagnetism at least, that the energy density u is something very real. It has to be because of the charge conservation law. Charges cannot just disappear in space, to then re-appear somewhere else. The charge conservation law is written as ∇•j = −∂ρ/∂t, and that makes it clear it’s a local conservation law. Therefore, charges can only disappear and re-appear through some current. We write dQ₁/dt = ∫ (j•n)·da = −dQ₂/dt, and here’s the simple illustration that comes with it:

So we do not allow for any ‘non-local’ interactions here! Therefore, we say that, if energy goes away from a region, it’s because it flows away through the boundaries of that region. So that’s what the Poynting formulas are all about, and so I want to be clear on that from the outset.

Now, to get going with the discussion, I need to give you the formula for the energy density in electrodynamics. Its shape won’t surprise you:

However, it’s just like the electrostatic formula: it takes quite a bit of juggling to get this from our electrodynamic equations, so, if you want to see how it’s done, I’ll refer you to Feynman. Indeed, I feel the derivation doesn’t matter all that much, because the formula itself is very intuitive: it’s really the thing everyone knows about a wave, electromagnetic or not: the energy in it is proportional to the square of its amplitude, and so that’s E•E = E² and B•B = B². Now, you also know that the magnitude of B is 1/c of that of E, so cB = E, and so that explains the extra c² factor in the second term.

The second formula is also very intuitive. Let me write it down:

Just look at it: u is the energy density, so that’s the amount of energy per unit volume at a given point, and so whatever flows out of that point must represent its time rate of change. As for the –∇•S expression… Well… Sorry, I can’t keep re-explaining things: the ∇• operator is the divergence, and so it give us the magnitude of a (vector) field’s source or sink at a given point. ∇•C is a scalar, and if it’s positive in a region, then that region is a source. Conversely, if it’s negative, then it’s a sink. To be precise, the divergence represents the volume density of the outward flux of a vector field from an infinitesimal volume around a given point. So, in this case, it gives us the volume density of the flux of S. As you can see, the formula has exactly the same shape as ∇•j = −∂ρ/∂t.

So what is S? Well… Think about the more general formula for the flux out of some closed surface, which we get from integrating over the volume enclosed. It’s just Gauss’ Theorem:

Just replace C by E, and think about what it meant: the flux of E was the field strength multiplied by the surface area, so it was the total flow of E. Likewise, S represents the flow of (field) energy. Let me repeat this, because it’s an important result:

S represents the flow of field energy.

Huh? What flow? Per unit area? Per second? How do you define such ‘flow’? Good question. Let’s do a dimensional analysis:

E is measured in newton per coulomb, so [E•E] = [E²] = N²/C².
B is measured in (N/C)/(m/s). [Huh? Well… Yes. I explained that a couple of times already. Just check it in my introduction to electric circuits.] So we get [B•B] = [B²] = (N²/C²)·(s²/m²) but the dimension of our c² factor is (m²/s²) so we’re left with N²/C². That’s nice, because we need to add in the same units.
Now we need to look at ε₀. That constant usually ‘fixes’ our units, but can we trust it to do the same now? Let’s see… One of the many ways in which we can express its dimension is [ε₀] = C²/(N·m²), so if we multiply that with N²/C², we find that u is expressed in N/m². Wow! That’s kinda neat. Why? Well… Just multiply with m/m and its dimension becomes N·m/m³= J/m³, so that’s joule per cubic meter, so… Yes: u has got the right unit for something that’s supposed to measure energy density!
OK. Now, we take the time rate of change of u, and so both the right and left of our ∂u/∂t = −∇•S formula are expressed in (J/m³)/s, which means that the dimension of S itself must be J/(m²·s). Just check it by writing it all out: ∇•S = ∂S_x/∂x + ∂S_y/∂x + ∂S_z/∂z, and so that’s something per meter so, to get the dimension of S itself, we need to go from cubic meter to square meter. Done! Let me highlight the grand result:

S is the energy flow per unit area and per second.

Now we’ve got its magnitude and its dimension, but what is its direction? Indeed, we’ve been writing S as a vector, but… Well… What’s its direction indeed?

Well… Hmm… I referred you to Feynman for that derivation of that u = ε₀E²/2 + ε₀c²B²/2 formula energy for u, and so the direction of S – I should actually say, its complete definition – comes out of that derivation as well. So… Well… I think you should just believe what I’ll be writing here for S:

So it’s the vector cross product of E and B with ε₀c²thrown in. It’s a simple formula really, and because I didn’t drag you through the whole argument, you should just quickly do a dimensional analysis again—just to make sure I am not talking too much nonsense. 🙂 So what’s the direction? Well… You just need to apply the usual right-hand rule:

OK. We’re done! This S vector, which – let me repeat it – represents the energy flow per unit area and per second, is what is referred to as Poynting’s vector, and it’s a most remarkable thing, as I’ll show now. Let’s think about the implications of this thing.

Poynting’s vector in electrodynamics

The S vector is actually quite similar to the heat flow vector h, which we presented when discussing vector analysis and vector operators. The heat flow out of a surface element da is the area times the component of h perpendicular to da, so that’s (h•n)·da = h_n·da. Likewise, we can write (S•n)·da = S_n·da. The units of S and h are also the same: joule per second and per square meter or, using the definition of the watt (1 W = 1 J/s), in watt per square meter. In fact, if you google a bit, you’ll find that both h and S are referred to as a flux density:

The heat flow vector h is the heat flux density vector, from which we get the heat flux through an area through the (h•n)·da = h_n·da product.
The energy flow S is the energy flux density vector, from which we get the energy flux through the (S•n)·da = S_n·da product.

The big difference, of course, is that we get h from a simpler vector equation:

h = −κ∇T ⇔ (h_x, h_y, h_z) = −κ(∂T_x/∂x, ∂T_y/∂y,∂T_z/∂x)

The vector equation for S is more complicated:

So it’s a vector product. Note that S will be zero if E = 0 and/or if B = 0. So S = 0 in electrostatics, i.e. when there are no moving charges and only steady currents. Let’s examine Feynman’s examples.

The illustration below shows the geometry of the E, B and S vectors for a light wave. It’s neat, and totally in line with what we wrote on the radiation pressure, or the momentum of light. So I’ll refer you to that post for an explanation, and to Feynman himself, of course.

OK. The situation here is rather simple. Feynman gives a few others examples that are not so simple, like that of a charging capacitor, which is depicted below.

The Poynting vector points inwards here, toward the axis. What does it mean? It means the energy isn’t actually coming down the wires, but from the space surrounding the capacitor.

What? I know. It’s completely counter-intuitive, at first that is. You’d think it’s the charges. But it actually makes sense. The illustration below shows how we should think of it. The charges outside of the capacitor are associated with a weak, enormously spread-out field that surrounds the capacitor. So if we bring them to the capacitor, that field gets weaker, and the field between the plates gets stronger. So the field energy which is way out moves into the space between the capacitor plates indeed, and so that’s what Poynting’s vector tells us here.

Hmm… Yes. You can be skeptic. You should be. But that’s how it works. The next illustration looks at a current-carrying wire itself. Let’s first look at the B and E vectors. You’re familiar with the magnetic field around a wire, so the B vector makes sense, but what about the electric field? Aren’t wires supposed to be electrically neutral? It’s a tricky question, and we handled it in our post on the relativity of fields. The positive and negative charges in a wire should cancel out, indeed, but then it’s the negative charges that move and, because of their movement, we have the relativistic effect of length contraction, so the volumes are different, and the positive and negative charge density do not cancel out: the wire appears to be charged, so we do have a mix of E and B! Let me quickly give you the formula: E = (2πε₀)·(λ/r), with λ the (apparent) charge per unit length, so it’s the same formula as for a long line of charge, or for a long uniformly charged cylinder.

So we have a non-zero E and B and, hence, a non-zero Poynting vector S, whose direction is radially inward, so there is a flow of energy into the wire, all around. What the hell? Where does it go? Well… There’s a few possibilities here: the charges need kinetic energy to move, or as they increase their potential energy when moving towards the terminals of our capacitor to increase the charge on the plates or, much more mundane, the energy may be radiated out again in the form of heat. It looks crazy, but that’s how it is really. In fact, the more you think about, the more logical it all starts to sound. Energy must be conserved locally, and so it’s just field energy going in and re-appearing in some other form. So it does make sense. But, yes, it’s weird, because no one bothered to teach us this in school. 🙂

The ‘craziest’ example is the one below: we’ve got a charge and a magnet here. All is at rest. Nothing is moving… Well… I’ll correct that in a moment. 🙂 The charge (q) causes a (static) Coulomb field, while our magnet produces the usual magnetic field, whose shape we (should) recognize: it’s the usual dipole field. So E and B are not changing. But so when we calculate our Poynting vector, we see there is a circulation of S. The E×B product is not zero. So what’s going on here?

Well… There is no net change in energy with time: the energy just circulates around and around. Everything which flows into one volume flows out again. As Feynman puts it: “It is like incompressible water flowing around.” What’s the explanation? Well… Let me copy Feynman’s explanation of this ‘craziness’:

“Perhaps it isn’t so terribly puzzling, though, when you remember that what we called a “static” magnet is really a circulating permanent current. In a permanent magnet the electrons are spinning permanently inside. So maybe a circulation of the energy outside isn’t so queer after all.”

So… Well… It looks like we do need to revise some of our ‘intuitions’ here. I’ll conclude this post by quoting Feynman on it once more:

“You no doubt get the impression that the Poynting theory at least partially violates your intuition as to where energy is located in an electromagnetic field. You might believe that you must revamp all your intuitions, and, therefore have a lot of things to study here. But it seems really not necessary. You don’t need to feel that you will be in great trouble if you forget once in a while that the energy in a wire is flowing into the wire from the outside, rather than along the wire. It seems to be only rarely of value, when using the idea of energy conservation, to notice in detail what path the energy is taking. The circulation of energy around a magnet and a charge seems, in most circumstances, to be quite unimportant. It is not a vital detail, but it is clear that our ordinary intuitions are quite wrong.”

Well… That says it all, I guess. As far as I am concerned, I feel the Poyning vector makes things actually easier to understand. Indeed, the E and B vectors were quite confusing, because we had two of them, and the magnetic field is, frankly, a weird thing. Just think about the units in which we’re measuring B: (N/C)/(m/s). I can’t imagine what a unit like that could possible represent, so I must assume you can’t either. But so now we’ve got this Poynting vector that combines both E and B, and which represents the flow of the field energy. Frankly, I think that makes a lot of sense, and it’s surely much easier to visualize than E and/or B. [Having said that, of course, you should note that E and B do have their value, obviously, if only because they represent the lines of force, and so that’s something very physical too, of course. I guess it’s a matter of taste, to some extent, but so I’d tend to soften Feynman’s comments on the supposed ‘craziness’ of S.

In any case… The next thing I should discuss is field momentum. Indeed, if we’ve got flow, we’ve got momentum. But I’ll leave that for my next post. This topic can’t be exhausted in one post only, indeed. 🙂 So let me conclude this post. I’ll do with a very nice illustration I got from the Wikipedia article on the Poynting vector. It shows the Poynting vector around a voltage source and a resistor, as well as what’s going on in-between. [Note that the magnetic field is given by the field vector H, which is related to B as follows: B = μ₀(H + M), with M the magnetization of the medium. B and H are obviously just proportional in empty space, with μ₀ as the proportionality constant.]

Re-visiting relativity and four-vectors: the proper time, the tensor and the four-force

Original post:

My previous post explained how four-vectors transform from one reference frame to the other. Indeed, a four-vector is not just some one-dimensional array of four numbers: it represent something—a physical vector that… Well… Transforms like a vector. 🙂 So what vectors are we talking about? Let’s see what we have:

We knew the position four-vector already, which we’ll write as x_μ= (ct, x, y, z) = (ct, x).
We also proved that A_μ= (Φ, A_x, A_y, A_z) = (Φ, A) is a four-vector: it’s referred to as the four-potential.
We also know the momentum four-vector from the Lectures on special relativity. We write it as p_μ= (E, p_x, p_y, p_z) = (E, p), with E = γm₀, p = γm₀v, and γ = (1−v²/c²)^−1/2 or, for c = 1, γ = (1−v²)^−1/2

To show that it’s not just a matter of adding some fourth t-component to a three-vector, Feynman gives the example of the four-velocity vector. We have v_x= dx/dt, v_y= dy/dt and v_z= dz/dt, but a v_μ= (d(ct)/dt, dx/dt, dy/dt, dz/dt) = (c, dx/dt, dy/dt, dz/dt) ‘vector’ is, obviously, not a four-vector. [Why obviously? The inner product v_μv_μ is not invariant.] In fact, Feynman ‘fixes’ the problem by noting that ct, x, y and z have the ‘right behavior’, but the d/dt operator doesn’t. The d/dt operator is not an invariant operator. So how does he fix it then? He tries the (1−v²/c²)^−1/2·d/dt operator and, yes, it turns out we do get a four-vector then. In fact, we get that four-velocity vector u_μ that we were looking for:[Note we assume we’re using equivalent time and distance units now, so c = 1 and v/c reduces to a new variable v.]

Now how do we know this is four-vector? How can we prove this one? It’s simple. We can get it from our p_μ= (E, p) by dividing it by m₀, which is an invariant scalar in four dimensions too. Now, it is easy to see that a division by an invariant scalar does not change the transformation properties. So just write it all out, and you’ll see that p_μ/m₀ = u_μ and, hence, that u_μ is a four-vector too. 🙂

We’ve got an interesting thing here actually: division by an invariant scalar, or applying that (1−v²/c²)^−1/2·d/dt operator, which is referred to as an invariant operator, on a four-vector will give us another four-vector. Why is that? Let’s switch to compatible time and distance units so c = 1 so to simplify the analysis that follows.

The invariant (1−v²)^−1/2·d/dt operator and the proper time s

Why is the (1−v²)^−1/2·d/dt operator invariant? Why does it ‘fix’ things? Well… Think about the invariant spacetime interval (Δs)²= Δt²− Δx²− Δy²− Δz² going to the limit (ds)²= dt²− dx²− dy²− dz² . Of course, we can and should relate this to an invariant quantity s = ∫ ds. Just like Δs, this quantity also ‘mixes’ time and distance. Now, we could try to associate some derivative d/ds with it because, as Feynman puts it, “it should be a nice four-dimensional operation because it is invariant with respect to a Lorentz transformation.” Yes. It should be. So let’s relate ds to dt and see what we get. That’s easy enough: dx = v_x·dt, dy = v_y·dt, dz = v_z·dt, so we write:

(ds)²= dt²− v_x²·dt²− v_y²·dt²− v_z²·dt²⇔ (ds)²= dt²·(1 − v_x²− v_y²− v_z²) = dt²·(1 − v²)

and, therefore, ds = dt·(1−v²)^1/2. So our operator d/ds is equal to (1−v²)^−1/2·d/dt, and we can apply it to any four-vector, as we are sure that, as an invariant operator, it’s going to give us another four-vector. I’ll highlight the result, because it’s important:

The d/ds = (1−v²)^−1/2·d/dt operator is an invariant operator for four-vectors.

For example, if we apply it to x_μ= (t, x, y, z), we get the very same four-velocity vector μ_μ:

dx_μ/ds = u_μ = p_μ/m₀

Now, if you’re somewhat awake, you should ask yourself: what is this s, really, and what is this operator all about? Our new function s = ∫ ds is not the distance function, as it’s got both time and distance in it. Likewise, the invariant operator d/ds = (1−v²)^−1/2·d/dt has both time and distance in it (the distance is implicit in the v² factor). Still, it is referred to as the proper time along the path of a particle. Now why is that? If it’s got distance and time in it, why don’t we call it the ‘proper distance-time’ or something?

Well… The invariant quantity s actually is the time that would be measured by a clock that’s moving along, in spacetime, with the particle. Just think of it: in the reference frame of the moving particle itself, Δx, Δyand Δz must be zero, because it’s not moving in its own reference frame. So the (Δs)²= Δt²− Δx²− Δy²− Δz² reduces to (Δs)²= Δt², and so we’re only adding time to s. Of course, this view of things implies that the proper time itself is fixed only up to some arbitrary additive constant, namely the setting of the clock at some event along the ‘world line’ of our particle, which is its path in four-dimensional spacetime. But… Well… In a way, s is the ‘genuine’ or ‘proper’ time coming with the particle’s reference frame, and so that’s why Einstein called it like that. You’ll see (later) that it plays a very important role in general relativity theory (which is a topic we haven’t discussed yet: we’ve only touched special relativity, so no gravity effects).

OK. I know this is simple and complicated at the same time: the math is (fairly) easy but, yes, it may be difficult to ‘understand’ this in some kind of intuitive way. But let’s move on.

The four-force vector f_μ

We know the relativistically correct equation for the motion of some charge q. It’s just Newton’s Law F = dp/dt = d(mv)/dt. The only difference is that we are not assuming that m is some constant. Instead, we use the p = γm₀v formula to get:

How can we get a four-vector for the force? It turns out that we get it when applying our new invariant operator to the momentum four-vector p_μ= (E, p), so we write: f_μ= dp_μ/ds. But p_μ= m₀u_μ = m₀dx_μ/ds, so we can re-write this as f_μ= d(m₀·dx_μ/ds)/ds, which gives us a formula which is reminiscent of the Newtonian F = ma equation:

What is this thing? Well… It’s not so difficult to verify that the x, y and z-components are just our old-fashioned F_x, F_y and F_z, so these are the components of F. The t-component is (1−v²)^−1/2·dE/dt. Now, dE/dt is the time rate of change of energy and, hence, it’s equal to the rate of doing work on our charge, which is equal to F•v. So we can write f_μas:

The force and the tensor

We will now derive that formula which we ended the previous post with. We start with calculating the spacelike components of f_μfrom the Lorentz formula F = q(E + v×B). [The terminology is nice, isn’t it? The spacelike components of the four-force vector! Now that sounds impressive, doesn’t it? But so… Well… It’s really just the old stuff we know already.] So we start with f_x = F_x, and write it all out:

What a monster! But, hey! We can ‘simplify’ this by substituting stuff by (1) the t-, x-, y- and z-components of the four-velocity vector u_μand (2) the components of our tensor F_μν = [F_ij] = [∇_iA_j − ∇_jA_i] with i, j = t, x, y, z. We’ll also pop in the diagonal F_xx = 0 element, just to make sure it’s all there. We get:

Looks better, doesn’t it? 🙂 Of course, it’s just the same, really. This is just an exercise in symbolism. Let me insert the electromagnetic tensor we defined in our previous post, just as a reminder of what that F_μν matrix actually is:

If you read my previous post, this matrix – or the concept of a tensor – has no secrets for you. Let me briefly summarize it, because it’s an important result as well. The tensor is (a generalization of) the cross-product in four-dimensional space. We take two vectors: a_μ = (a_t, a_x, a_y, a_z) and b_μ = (b_t, b_x, b_y, b_z) and then we take cross-products of their components just like we did in three-dimensional space, so we write T_ij = a_ib_j − a_jb_i. Now, it’s easy to see that this combination implies that T_ij = − T_ji and that T_ii= 0, which is why we only have six independent numbers out of the 16 possible combinations, and which is why we’ll get a so-called anti-symmetric matrix when we organize them in a matrix. In three dimensions, the very same definition of the cross-product T_ij gives us 9 combinations, and only 3 independent numbers, which is why we represented our ‘tensor’ as a vector too! In four-dimensional space we can’t do that: six things cannot be represented by a four-vector, so we need to use this matrix, which is referred to as a tensor of the second rank in four dimensions. [When you start using words like that, you’ve come a long way, really. :-)]

[…] OK. Back to our four-force. It’s easy to get a similar one-liner for f_y and f_z too, of course, as well as for f_t. But… Yes, f_t… Is it the same thing really? Let me quickly copy Feynman’s calculation for f_t:

It does: remember that v×B and v are orthogonal, and so their dot product is zero indeed. So, to make a long story short, the four equations – one for each component of the four-force vector f_μ– can be summarized in the following elegant equation:

Writing this all requires a few conventions, however. For example, F_μν is a 4×4 matrix and so u_ν has to be written as a 1×4 vector. And the formula for the f_x and f_t component also make it clear that we also want to use the +−−− signature here, so the convention for the signs in the u_νF_μν product is the same as that for the scalar product a_μb_μ. So, in short, you really need to interpret what’s being written here.

A more important question, perhaps, is: what can we do with it? Well… Feynman’s evaluation of the usefulness of this formula is rather succinct: “Although it is nice to see that the equations can be written that way, this form is not particularly useful. It’s usually more convenient to solve for particle motions by using the F = q(E + v×B) = (1−v²)^−1/2·d(m₀v)/dt equations, and that’s what we will usually do.”

Having said that, this formula really makes good on the promise I started my previous post with: we wanted a formula, some mathematical construct, that effectively presents the electromagnetic force as one force, as one physical reality. So… Well… Here it is! 🙂

Well… That’s it for today. Tomorrow we’ll talk about energy and about a very mysterious concept—the electromagnetic mass. That should be fun! So I’ll c u tomorrow! 🙂

Relativistic transformations of fields and the electromagnetic tensor

Original post:

We’re going to do a very interesting piece of math here. It’s going to bring a lot of things together. The key idea is to present a mathematical construct that effectively presents the electromagnetic force as one force, as one physical reality. Indeed, we’ve been saying repeatedly that electromagnetism is one phenomenon only but we’ve been writing it always as something involving two vectors: he electric field vector E and the magnetic field vector B. Of course, Lorentz’ force law F = q(E + v×B) makes it clear we’re talking one force only but… Well… There is a way of writing it all up that is much more elegant.

I have to warn you though: this post doesn’t add anything to the physics we’ve seen so far: it’s all math, really and, to a large extent, math only. So if you read this blog because you’re interested in the physics only, then you may just as well skip this post. Having said that, the mathematical concept we’re going to present is that of the tensor and… Well… You’ll have to get to know that animal sooner or later anyway, so you may just as well give it a try right now, and see whatever you can get out of this post.

The concept of a tensor further builds on the concept of the vector, which we liked so much because it allows us to write the laws of physics as vector equations, which do not change when going from one reference frame to another. In fact, we’ll see that a tensor can be described as a ‘special’ vector cross product (to be precise, we’ll show that a tensor is a ‘more general’ cross product, really). So the tensor and vector concepts are very closely related, but then… Well… If you think about it, the concept of a vector and the concept of a scalar are closely related, too! So we’re just moving up the value chain, so to speak: from scalar fields to vector fields to… Well… Tensor fields! And in quantum mechanics, we’ll introduce spinors, and so we also have spinor fields! Having said that, don’t worry about tensor fields. Let’s first try to understand tensors tout court. 🙂

So… Well… Here we go. Let me start with it all by reminding you of the concept of a vector, and why we like to use vectors and vector equations.

The invariance of physics and the use of vector equations

What’s a vector? You may think, naively, that any one-dimensional array of numbers is a vector. But… Well… No! In math, we may, effectively, refer to any one-dimensional array of numbers as a ‘vector’, perhaps, but in physics, a vector does represent something real, something physical, and so a vector is only a vector if it transforms like a vector under the transformation rules that apply when going from one another frame of reference, i.e. one coordinate system, to another. Examples of vectors in three dimensions are: the velocity vector v, or the momentum vector p = m·v, or the position vector r.

Needless to say, the same can be said of scalars: mathematicians may define a scalar as just any real number, but it’s not in physics. A scalar in physics refers to something real, i.e. a scalar field, like the temperature (T) inside of a block of material. In fact, think about your first vector equation: it may have been the one determining the heat flow (h), i.e. h = −κ·∇T = (−κ·∂T/∂x, −κ·∂T/∂y, −κ·∂T/∂z). It immediately shows how scalar and vector fields are intimately related.

Now, when discussing the relativistic framework of physics, we introduced vectors in four dimensions, i.e. four-vectors. The most basic four-vector is the spacetime four-vector R = (ct, x, y, z), which is often referred to as an event, but it’s just a point in spacetime, really. So it’s a ‘point’ with a time as well as a spatial dimension, so it also has t in it, besides x, y and z. It is also known as the position four-vector but, again, you should think of a ‘position’ that includes time! Of course, we can re-write R as R = (ct, r), with r = (x, y, z), so here we sort of ‘break up’ the four-vector in a scalar and a three-dimensional vector, which is something we’ll do from time to time, indeed. 🙂

We also have a displacement four-vector, which we can write as ΔR = (c·Δt, Δr). There are other four-vectors as well, including the four-velocity, the four-momentum and the four-force four-vectors, which we’ll discuss later (in the last section of this post).

So it’s just like using three-dimensional vectors in three-dimensional physics, or ‘Newtonian’ physics, I should say: the use of four-vectors is going to allow us to write the laws of physics using vector equations, but in four dimensions, rather than three, so we get the ‘Einsteinian’ physics, the real physics, so to speak—or the relativistically correct physics, I should say. And so these four-dimensional vector equations will also not change when going from one reference frame to another, and so our four-vector will be vectors indeed, i.e. they will transform like a vector under the transformation rules that apply when going from one another frame of reference, i.e. one coordinate system, to another.

What transformation? Well… In Newtonian or Galilean physics, we had translations and rotations and what have you, but what we are interested in right now are ‘Einsteinian’ transformations of coordinate systems, so these have to ensure that all of the laws of physics that we know of, including the principle of relativity, still look the same. You’ve seen these transformation rules. We don’t call them the ‘Einsteinian’ transformation rules, but the Lorentz transformation rules, because it was a Dutch physicist (Hendrik Lorentz) who first wrote them down. So these rules are very different from the Newtonian or Galilean transformation rules which everyone assumed to be valid until the Michelson-Morley experiment unequivocally established that the speed of light did not respect the Galilean transformation rules. Very different? Well… Yes. In their mathematical structure, that is. Of course, when velocities are low, i.e. non-relativistic, then they yield the same result, approximately, that is. However, I explained that in my post on special relativity, and so I won’t dwell on that here.

Let me just jot down both sets of rules assuming that the two reference frames move with respect to each other along the x- axis only, so the y- and z-component of u is zero.

The Galilean or Newtonian rules are the simple rules on the right. Going from one reference frame to another (let’s call them S and S’ respectively) is just a matter of adding or subtracting speeds: if my car goes 100 km/h, and yours goes 120 km/h, then you will see my car falling behind at a speed of (minus) 20 km/h. That’s it. We could also rotate our reference frame, and our Newtonian vector equations would still look the same. As Feynman notes, smilingly, it’s what a lot of armchair philosophers think relativity theory is all about, but so it’s got nothing to do with it. It’s plain wrong!

In any case, back to vectors and transformations. The key to the so-called invariance of the laws of physics is the use of vectors and vector operators that transform like vectors. For example, if we defined A and B as (A_x, A_y, A_z) and (B_x, B_y, B_z), then we knew that the so-called inner product A•B would look the same in all rotated coordinate systems, so we can write: A•B = A’•B’. So we know that if we have a product like that on both sides of an equation, we’re fine: the equation will have the same form in all rotated coordinate systems. Also, the gradient, i.e. our vector operator ∇ = (∂/∂_x, ∂/∂_y, ∂/∂_z), when applied to a scalar function, gave three quantities that also transform like a vector under rotation. We also defined a vector cross product, which yielded a vector (as opposed to the inner product, i.e. the vector dot product, which yields a scalar):

So how does this thing behave under a Galilean transformation? Well… You may or may not remember that we used this cross-product to define the angular momentum L, which was a cross product of the radius vector r and the momentum vector p = mv, as illustrated below. The animation also gives the torque τ, which is, loosely speaking, a measure of the turning force: it’s the cross product of r and F, i.e. the force on the lever-arm.

The components of L are:

Now, we find that these three numbers, or objects if you want, transform in exactly the same way as the components of a vector. However, as Feynman points out, that’s a matter of ‘luck’ really. It’s something ‘special’. Indeed, you may or may not remember that we distinguished axial vectors from polar vectors. L is an axial vector, while r and p are polar vectors, and so we find that, in three dimensions, the cross product of two polar vectors will always yields an axial vector. Axial vectors are sometimes referred to as pseudovectors, which suggests that they are ‘not so real’ as… Well… Polar vectors, which are sometimes referred to as ‘true’ vectors. However, it doesn’t matter when doing these Newtonian or Galilean transformations: pseudo or true, both vectors transform like vectors. 🙂

But so… Well… We’re actually getting a bit of a heads-up here: if we’d be mixing (or ‘crossing’) polar and axial vectors, or mixing axial vectors only, so if we’d define something involving L and p (rather than r and p), or something involving L and τ, then we may not be so lucky, and then we’d have to carefully examine our cross-product, or whatever other product we’d want to define, because its components may not behave like a vector.

Huh? Whatever other product we’d want to define? Why are you saying that? Well… We actually can think of other products. For example, if we have two vectors a = (a_x, a_y, a_z) and b = (b_x, b_y, b_z), then we’ll have nine possible combinations of their components, which we can write as T_ij = a_ib_j. So that’s like L_xy, L_yz and L_zx really. Now, you’ll say: “No. It isn’t. We don’t have nine combinations here. Just three numbers.” Well… Think about it: we actually do have nine L_ij combinations too here, as we can write: L_ij = r_i·p_j – r_j·p_i. It just happens that, with this definition, only three of these combinations L_ij are independent. That’s because the other six numbers are either zero or the opposite. Indeed, it’s easy to verify that L_ij = –L_ji , and L_ii = 0. So… Well… It turns out that the three components of our L = r×p ‘vector’ are actually a subset of a set of nine L_ij numbers. So… Well… Think about it. We cannot just do whatever we want with our ‘vectors’. We need to watch out.

In fact, I do not want to get too much ahead of myself, but I can already tell you that the matrix with these nine T_ij = a_ib_j combinations is what is referred to as the tensor. To be precise, it’s referred to as a tensor of the second rank in three dimensions. The ‘second rank’, aka as ‘degree’ or ‘order’ refers to the fact that we’ve got two indices, and the ‘three dimensions’ is because we’re using three-dimensional vectors. We’ll soon see that the electromagnetic tensor is also of the second rank, but it’s a tensor in four dimensions. In any case, I should not get ahead of myself. Just note what I am saying here: the tensor is like a ‘new’ product of two vectors, a new type of ‘cross’ product really (because we’re mixing the components, so to say), but it doesn’t yield a vector: it yields a matrix. For three-dimensional vectors, we get a 3×3 matrix. For four-vectors, we’ll get a 4×4 matrix. And so the full truth about our angular momentum vector L, is the following:

There is a thing which we call the angular momentum tensor. It’s a 3×3 matrix, so it has nine elements which are defined as: L_ij = r_i·p_j – r_j·p_i. Because of this definition, it’s an antisymmetric tensor of the second order in three dimensions, so it’s got only three independent components.
The three independent elements are the components of our ‘vector’ L, and picking them out and calling these three components a ‘vector’ is actually a ‘trick’ that only works in three dimensions. They really just happen to transform like a vector under rotation or under whatever Galilean transformation! [By the way, do you know understand why I was saying that we can look at a tensor as a ‘more general’ cross product?]
In fact, in four dimensions, we’ll use a similar definition and define 16 elements F_ij as F_ij = ∇_iA_j − ∇_jA_i, using the two four-vectors ∇_μand A_μ (so we have 4×4 = 16 combinations indeed), out of which only six will be independent for the very same reason: we have an antisymmetric vector combination here, F_ij = −F_ji and F_ii = 0. 🙂 However, because we cannot represent six independent things by four things, we do not get some other four-vector, and so that’s why we cannot apply the same ‘trick’ in four dimensions.

However, here I am getting way ahead of myself and so… Well… Yes. Back to the main story line. 🙂 So let’s try to move to the next level of understanding, which is… Well…

Because of guys like Maxwell and Einstein, we now know that rotations are part of the Newtonian world, in which time and space are neatly separated, and that things are not so simple in Einstein’s world, which is the real world, as far as we know, at least! Under a Lorentz transformation, the new ‘primed’ space and time coordinates are a mixture of the ‘unprimed’ ones. Indeed, the new x’ is a mixture of x and t, and the new t’ is a mixture of x and t as well. [Yes, please scroll all the way up and have a look at the transformation on the left-hand side!]

So you don’t have that under a Galilean transformation: in the Newtonian world, space and time are neatly separated, and time is absolute, i.e. it is the same regardless of the reference frame. In Einstein’s world – our world – that’s not the case: time is relative, or local as Hendrik Lorentz termed it quite appropriately, and so it’s space-time – i.e. ‘some kind of union of space and time’ as Minkowski termed it – that transforms.

So that’s why physicists use four-vectors to keep track of things. These four-vectors always have three space-like components, but they also include one so-called time-like component. It’s the only way to ensure that the laws of physics are unchanged when moving with uniform velocity. Indeed, any true law of physics we write down must be arranged so that the invariance of physics (as a “fact of Nature”, as Feynman puts it) is built in, and so that’s why we use Lorentz transformations and four-vectors.

In the mentioned post, I gave a few examples illustrating how the Lorentz rules work. Suppose we’re looking at some spaceship that is moving at half the speed of light (i.e. 0.5c) and that, inside the spaceship, some object is also moving at half the speed of light, as measured in the reference frame of the spaceship, then we get the rather remarkable result that, from our point of view (i.e. our reference frame as observer on the ground), that object is not going as fast as light, as Newton or Galileo – and most present-day armchair philosophers 🙂 – would predict (0.5c + 0.5c = c). We’d see it move at a speed equal to v = 0.8c. Huh? How do we know that? Well… We can derive a velocity formula from the Lorentz rules:

So now you can just put in the numbers now: v_x = (0.5c + 0.5c)/(1 + 0.5·0.5) = 0.8c. See?

Let’s do another example. Suppose we’re looking at a light beam inside the spaceship, so something that’s traveling at speed c itself in the spaceship. How does that look to us? The Galilean transformation rules say its speed should be 1.5c, but that can’t be true of course, and the Lorentz rules save us once more: v_x = (0.5c + c)/(1 + 0.5·1) = c, so it turns out that the speed of light does not depend on the reference frame: it looks the same – both to the man in the ship as well as to the man on the ground. As Feynman puts it: “This is good, for it is, in fact, what the Einstein theory of relativity was designed to do in the first place—so it had better work!” 🙂

So let’s now apply relativity to electromagnetism. Indeed, that’s what this post is all about! However, before I do so, let me re-write the Lorentz transformation rules for c = 1. We can equate the speed of light to one, indeed, when measure time and distance in equivalent units. It’s just a matter of ditching our seconds for meters (so our time unit becomes the time that light needs to travel a distance of one meter), or ditching our meters for seconds (so our distance unit becomes the distance that light travels in one second). You should be familiar with this procedure. If not, well… Check out my posts on relativity. So here’s the same set of rules for c = 1:

They’re much easier to remember and work with, and so that’s good, because now we need to look at how these rules work with four-vectors and the various operations and operators we’ll be defining on them. Let’s look at that step by step.

Electrodynamics in relativistic notation

Let me copy the Universal Set of Equations and Their Solution once more:

The solution for Maxwell’s equations is given in terms of the (electric) potential Φ and the (magnetic) vector potential A. I explained that in my post on this, so I won’t repeat myself too much here either. The only point you should note is that this solution is the result of a special choice of Φ and A, which we referred to as the Lorentz gauge. We’ll touch upon this condition once more, so just make a mental note of it.

Now, E and B do not correspond to four-vectors: they depend on x, y, z and t, but they have three components only: E_x, E_y, E_z, and B_x, B_y, and B_z respectively. So we have six independent terms here, rather than four things that, somehow, we could combine into some four-vector. [Does this ring a bell? It should. :-)] Having said that, it turns out that we can combine Φ and A into a four-vector, which we’ll refer to as the four-potential and which we’ll will write as:

A_μ= (Φ, A) = (Φ, A_x, A_y, A_z) = (A_t, A_x, A_y, A_z) with A_t = Φ.

So that’s a four-vector just like R = (ct, x, y, z).

How do we know that A_μis a four-vector? Well… Here I need to say a few things about those Lorentz transformation rules and, more importantly, about the required condition of invariance under a Lorentz transformation. So, yes, here we need to dive into the math.

Four-vectors and invariance under Lorentz transformations

When you were in high-school, you learned how to rotate your coordinate frame. You also learned that the distance of a point from the origin does not change under a rotation, so you’d write r’²= x’²+ y’²+ z’²= r²= x²+ y²+ z², and you’d say that r² is an invariant quantity under a rotation. Indeed, transformations leave certain things unchanged. From the Lorentz transformation rules itself, it is easy to see that

c·t’²– x’²– y’²–z ‘²= c·t²–x²– y² – z², or,

if c = 1, that t’²– x’²– y’²– z’²= t²– x²– y² – z²,

is an invariant under a Lorentz transformation. We found the same for the so-called spacetime interval Δs² = Δr²– cΔt², which we write as Δs² = Δr²– Δt² as we chose our time or distance units such that c = 1. [Note that, from now on, we’ll assume that’s the case, so c = 1 everywhere. We can always change back to our old units when we’re done with the analysis.] Indeed, such invariance allowed us to define spacelike, timelike and lightlike intervals using the so-called light cone emanating from a single event and traveling in all directions.

You should note that, for four-vectors, we do not have a simple sum of three terms. Indeed, we don’t write x²+ y²+ z² but t²– x²– y² – z². So we’ve got a +−−− thing here or, it’s just another convention, we could also work with a −+++ sum of terms. The convention is referred to as the signature, and we will use the so-called metric signature here, which is +−−−. Let’s continue the story. Now, all four-vectors a_μ= (a_t, a_x, a_y, a_z) have this property that:

a_t‘²– a_x‘²– a_y‘²– a_z‘²= a_t²– a_x²– a_y² – a_z².

[The primed quantities are, obviously, the quantities as measured in the other reference frame.] So. Well… Yes. 🙂 But… Well… Hmm… We can say that our four-potential vector is a four-vector, but so we still have to prove that. So we need to prove that Φ’²– A_x‘²– A_y‘²– A_z‘²= Φ²– A_x²– A_y² – A_z² for our four-potential vector A_μ= (Φ, A). So… Yes… How can we do that? The proof is not so easy, but you need to go through it as it will introduce some more concepts and ideas you need to understand.

In my post on the Lorentz gauge, I mentioned that Maxwell’s equations can be re-written in terms of Φ and A, rather than in terms of E and B. The equations are:

The expression look rather formidable, but don’t panic: just look at it. Of course, you need to be familiar with the operators that are being used here, so that’s the Laplacian ∇² and the divergence operator ∇• that’s being applied to the scalar Φ and the vector A. I can’t re-explain this. I am sorry. Just check my posts on vector analysis. You should also look at the third equation: that’s just the Lorentz gauge condition, which we introduced when deriving these equations from Maxwell’s equations. Having said that, it’s the first and second equation which describe Φ and A as a function of the charges and currents in space, and so that’s what matters here. So let’s unfold the first equation. It says the following:

In fact, if we’d be talking free or empty space, i.e. regions where there are no charges and currents, then the right-hand side would be zero and this equation would then represent a wave equation, so some potential Φ that is changing in time and moving out at the speed c. Here again, I am sorry I can’t write about this here: you’ll need to check one of my posts on wave equations. If you don’t want to do that, you should believe me when I say that, if you see an equation like this:

then the function Ψ(x, t) must be some function

Now, that’s a function representing a wave traveling at speed c, i.e. the phase velocity. Always? Yes. Always! It’s got to do with the x − ct and/or x + ct argument in the function. But, sorry, I need to move on here.

The unfolding of the equation with Φ makes it clear that we have four equations really. Indeed, the second equation is three equations: one for A_x, one for A_y, and one for A_z respectively. The four quantities on the right-hand side of these equations are ρ, j_x, j_y and j_z respectively, divided by ε₀, which is a universal constant which does not change when going from one coordinate system to another. Now, the quantities ρ, j_x, j_y and j_z transform like a four-vector. How do we know that? It’s just the charge conservation law. We used it when solving the problem of the fields around a moving wire, when we demonstrated the relativity of the electric and magnetic field. Indeed, the relevant equations were:

You can check that against the Lorentz transformation rules for c = 1. They’re exactly the same, but so we chose t = 0, so the rules are even simpler. Hence, the (ρ, j_x, j_y, j_z) vector is, effectively, a four-vector, and we’ll denote it by j_μ= (ρ, j). I now need to explain something else. [And, yes, I know this is becoming a very long story but… Well… That’s how it is.]

It’s about our operators ∇, ∇•, ∇× and ∇², so that’s the gradient, the divergence, curl and Laplacian operator respectively: they all have a four-dimensional equivalent. Of course, that won’t surprise you. 😦 Let me just jot all of them down, so we’re done with that, and then I’ll focus on the four-dimensional equivalent of the Laplacian ∇•∇ = ∇², which is referred to as the D’Alembertian, and which is denoted by □², because that’s the one we need to prove that our four-potential vector is a real four-vector. [I know: □²is a tiny symbol for a pretty monstrous thing, but I can’t help it: my editor tool is pretty limited.]

Now, we’re almost there. Just hang in for a little longer. It should be obvious that we can re-write those two equations with Φ, A, ρ and j, as:

Just to make sure, let me remind you that A_μ= (Φ, A) and that j_μ= (ρ, j). Now, our new D’Alembertian operator is just an operator—a pretty formidable operator but, still, it’s an operator, and so it doesn’t change when the coordinate system changes, so the conclusion is that, IF j_μ= (ρ, j) is a four-vector – which it is – and, therefore, transforms like a four-vector, THEN the quantities Φ, A_x, A_y, and A_z must also transform like a four-vector, which means they are (the components of) a four-vector.

So… Well… Think about it, but not too long, because it’s just an intermediate result we had to prove. So that’s done. But we’re not done here. It’s just the beginning, actually. Let me repeat our intermediate result:

A_μ= (Φ, A) is a four-vector. We call it the four-potential vector.

OK. Let’s continue. Let me first draw your attention to that expression with the D’Alembertian above. Which expression? This one:

What about it? Well… You should note that the physics of that equation is just the same as Maxwell’s equations. So it’s one equation only, but it’s got it all.

It’s quite a pleasure to re-write it in such elegant form. Why? Think about it: it’s a four-vector equation: we’ve got a four-vector on the left-hand side, and a four-vector on the right-hand side. Therefore, this equation is invariant under a transformation. So, therefore, it directly shows the invariance of electrodynamics under the Lorentz transformation.

Huh? Yes. You may think about this a little longer. 🙂

To wrap this up, I should also note that we can also express the gauge condition using our new four-vector notation. Indeed, we can write it as:

It’s referred to as the Lorentz condition and it is, effectively, a condition for invariance, i.e. it ensures that the four-vector equation above does stay in the form it is in for all reference frames. Note that we’re re-writing it using the four-dimensional equivalent of the divergence operator ∇•, but so we don’t have a dot between ∇_μ and A_μ. In fact, the notation is pretty confusing, and it’s easy to think we’re talking some gradient, rather than the divergence. So let me therefore highlight the meaning of both once again. It looks the same, but it’s two very different things: the gradient operates on a scalar, while the divergence operates on a (four-)vector. Also note the +−−− signature is only there for the gradient, not for the divergence!

You’ll wonder why they didn’t use some • or ∗ symbol, and the answer: I don’t know. I know it’s hard to keep inventing symbols for all these different ‘products’ – the ⊗ symbol, for example, is reserved for tensor products, which we won’t get into – but… Well… I think they could have done something here. 😦

In any case… Let’s move on. Before we do, please note that we can also re-write our conservation law for electric charge using our new four-vector notation. Indeed, you’ll remember that we wrote that conservation law as:

Using our new four-vector operator ∇_μ, we can re-write that as ∇_μj_μ= 0. So all of electrodynamics can be summarized in the two equations only—Maxwell’s law and the charge conservation law:

OK. We’re now ready to discuss the electromagnetic tensor. [I know… This is becoming an incredibly long and incredibly complicated piece but, if you get through it, you’ll admit it’s really worth it.]

The electromagnetic tensor

The whole analysis above was done in terms of the Φ and A potentials. It’s time to get back to our field vectors E and B. We know we can easily get them from Φ and A, using the rules we mentioned as solutions:

These two equations should not look as yet another formula. They are essential, and you should be able to jot them down anytime anywhere. They should be on your kitchen door, in your toilet and above your bed. 🙂 For example, the second equation gives us the components of the magnetic field vector B:

Now, look at these equations. The $x$ -component is equal to a couple of terms that involve only $y$ – and $z$ -components. The y-component is equal to something involving only x and $z.$ Finally, the $z$ -component only involves x and y. Interesting. Let’s define a ‘thing’ we’ll denote by F_zy and define as:

So now we can write: B_x = F_zy, B_y = F_xz, and B_z = F_xy. Now look at our equation for E. It turns out the components of E are equal to things like F_xt, F_ytand F_zt! Indeed, F_xt = ∂A_x/∂t − ∂A_t/∂x = E_x!

But… Well… No. 😦 The sign is wrong! E_x = −∂A_x/∂t−∂A_t/∂x, so we need to modify our definition of F_xt. When the t-component is involved, we’ll define our ‘F-things’ as:

So we’ve got a plus instead of a minus. It looks quite arbitrary but, frankly, you’ll have to admit it’s sort of consistent with our +−−− signature for our four-vectors and, in just a minute, you’ll see it’s fully consistent with our definition of the four-dimensional vector operator ∇_μ= (∂/∂t, −∂/∂x, −∂/∂y, −∂/∂z). So… Well… Let’s go along with it.

What about the F_xx, F_yy, F_zzand F_ttterms? Well… F_xx = ∂A_x/∂x − ∂A_x/∂x = 0, and it’s easy to see that F_yy and F_zz are zero too. But F_tt? Well… It’s a bit tricky but, applying our definitions carefully, we see that F_tt must be zero too. In any case, the F_tt = 0 will become obvious as we will be arranging these ‘F-things’ in a matrix, which is what we’ll do now. [Again: does this ring a bell? If not, it should. :-)]

Indeed, we’ve got sixteen possible combinations here, which Feynman denotes as F_μν, which is somewhat confusing, because F_μν usually denotes the 4×4 matrix representing all of these combinations. So let me use the subscripts i and j instead, and define F_ij as:

F_ij = ∇_iA_j − ∇_jA_i

with ∇_i being the t-, x-, y- or z-component of ∇_μ = (∂/∂t, −∂/∂x, −∂/∂y, −∂/∂z) and, likewise, A_i being the t-, x-, y- or z-component of A_μ = (Φ, A_x, A_y, A_z). Just check it: F_zy = −∂A_y/∂z + ∂A_z/∂y = ∂A_z/∂y − ∂A_y/∂z = B_x, for example, and F_xt = −∂Φ/∂x − ∂A_x/∂t = E_x. So the +−−− convention works. [Also note that it’s easier now to see that F_tt = ∂Φ/∂t − ∂Φ/∂t = 0.]

We can now arrange the F_ij in a matrix. This matrix is antisymmetric, because F_ij = – F_ji, and its diagonal elements are zero. [For those of you who love math: note that the diagonal elements of an antisymmetric matrix are always zero because of the F_ij = – F_ji constraint: just use k = i = j in the constraint.]

Now that matrix is referred to as the electromagnetic tensor and it’s depicted below (we plugged c back in, remember that B’s magnitude is 1/c times E’s magnitude).

So… Well… Great ! We’re done! Well… Not quite. 🙂

We can get this matrix in a number of ways. The least complicated way is, of course, just to calculate all F_ij components and them put them in a [F_ij] matrix using the i as the row number and the j as the column number. You need to watch out with the conventions though, and so i and j start on t and end on z. 🙂

The other way to do it is to write the ∇_μ = (∂/∂t, −∂/∂x, −∂/∂y, −∂/∂z) operator as a 4×1 column vector, which you then multiply with the four-vector A_μ written as a 4×1 row vector. So ∇_μA_μis then a 4×4 matrix, which we combine with its transpose, i.e. (∇_μA_μ)^T, as shown below. So what’s written below is (∇_μA_μ) − (∇_μA_μ)^T.

If you google, you’ll see there’s more than one way to go about it, so I’d recommend you just go through the motions and double-check the whole thing yourself—and please do let me know if you find any mistake! In fact, the Wikipedia article on the electromagnetic tensor denotes the matrix above as F^μν, rather than as F_μν, which is the same tensor but in its so-called covariant form, but so I’ll refer you to that article as I don’t want to make things even more complicated here! As said, there’s different conventions around here, and so you need to double-check what is what really. 🙂

Where are we heading with all of this? The next thing is to look at the Lorentz transformation of these F_ij = ∇_iA_j − ∇_jA_icomponents, because then we know how our E and B fields transform. Before we do so, however, we should note the more general results and definitions which we obtained here:

1. The F_μν matrix (a matrix is just a multi-dimensional array, of course) is a so-called tensor. It’s a tensor of the second rank, because it has two indices in it. We think of it as a very special ‘product’ of two vectors, not unlike the vector cross product a × b, whose components were also defined by a similar combination of the components of a and b. Indeed, we wrote:

So one should think of a tensor as “another kind of cross product” or, preferably, and as Feynman puts it, as a “generalization of the cross product”.

2. In this case, the four-vectors are ∇_μ = (∂/∂t, −∂/∂x, −∂/∂y, −∂/∂z) and A_μ = (Φ, A_x, A_y, A_z). Now, you will probably say that ∇_μ is an operator, not a vector, and you are right. However, we know that ∇_μ behaves like a vector, and so this is just a special case. The point is: because the tensor is based on four-vectors, the F_μν tensor is referred to as a tensor of the second rank in four dimensions. In addition, because of the F_ij = – F_ji result, F_μν is an asymmetric tensor of the second rank in four dimensions.

3. Now, the whole point is to examine how tensors transform. We know that the vector dot product, aka the inner product, remains invariant under a Lorentz transformation, both in three as well as in four dimensions, but what about the vector cross product, and what about the tensor? That’s what we’ll be looking at now.

The Lorentz transformation of the electric and magnetic fields

Cross products are complicated, and tensors will be complicated too. Let’s recall our example in three dimensions, i.e. the angular momentum vector L, which was a cross product of the radius vector r and the momentum vector p = mv, as illustrated below (the animation also gives the torque τ, which is, loosely speaking, a measure of the turning force).

The components of L are:

Now, this particular definition ensures that L_ijturns out to be an antisymmetric object:

So it’s a similar situation here. We have nine possible combinations, but only three independent numbers. So it’s a bit like our tensor in four dimensions: 16 combinations, but only 6 independent numbers.

Now, it so happens that that these three numbers, or objects if you want, transform in exactly the same way as the components of a vector. However, as Feynman points out, that’s a matter of ‘luck’ really. In fact, Feynman points out that, when we have two vectors a = (a_x, a_y, a_z) and b = (b_x, b_y, b_z), we’ll have nine products T_ij = a_ib_j which will also form a tensor of the second rank (cf. the two indices) but which, in general, will not obey the transformation rules we got for the angular momentum tensor, which happened to be an antisymmetric tensor of the second rank in three dimensions.

To make a long story short, it’s not simple in general, and surely not here: with E and B, we’ve got six independent terms, and so we cannot represent six things by four things, so the transformation rules for E and B will differ from those for a four-vector. So what are they then?

Well… Feynman first works out the rules for the general antisymmetric vector combination G_ij = a_ib_j− a_jb_i, with a_iand b_j the t-, x-, y- or z-component of the four-vectors a_μ= (a_t, a_x, a_y, a_z) and b_μ= (b_t, b_x, b_y, b_z) respectively. The idea is to first get some general rules, and then replace G_ij = a_ib_j− a_jb_i by F_ij = ∇_iA_j − ∇_jA_i, of course! So let’s apply the Lorentz rules, which – let me remind you – are the following ones:

So we get:

The rest is all very tedious: you just need to plug these things into the various G_ij = a_ib_j− a_jb_i formulas. For example, for G’_tx, we get:

Hey! That’s just G’_tx, so we find that G’_tx= G_tx! What about the rest? Well… That yields something different. Let me shorten the story by simply copying Feynman here:

So… Done!

So what?

Well… Now we just substitute. In fact, there are two alternative formulations of the Lorentz transformations of E and B. They are given below (note the units are such that c = 1):

In addition, there is a third equivalent formulation which is more practical, and also simpler, even if it puts the c‘s back in. It re-defines the field components, distinguishing only two:

The ‘parallel’ components E_|| and B_||along the x-direction ( because they are parallel to the relative velocity of the S and S’ reference frames), and
The ‘perpendicular’ or ‘total transverse’ components E_⊥ and B_⊥, which are the vector sums of the y- and z-components.

So that gives us four equations only:

And, yes, we are done now. This is the Lorentz transformation of the fields. I am sure it has left you totally exhausted. Well… If not… […] It sure left me totally exhausted. 🙂

To lighten things up, let me insert an image of how the transformed field E actually looks like. The first image is the reference frame of a charge itself: we have a simple Coulomb field. The second image shows the charge flying by. Its electric field is ‘squashed up’. To be precise, it’s just like the scale of x is squashed up by a factor ((1−v²/c²)^1/2. Let me refer you to Feynman for the detail of the calculations here.

OK. So that’s it. You may wonder: what about that promise I made? Indeed, when I started this post, I said I’d present a mathematical construct that presents the electromagnetic force as one force only, as one physical reality, but so we’re back writing all of it in terms of two vectors—the electric field vector E and the magnetic field vector B. Well… What can I say? I did present the mathematical construct: it’s the electromagnetic tensor. So it’s that antisymmetric matrix really, which one can combine with a transformation matrix embodying the Lorentz transformation rules. So, I did what I promised to do. But you’re right: I am re-presenting stuff in the old style once again.

The second objection that you may have—in fact, that you should have, is that all of this has been rather tedious. And you’re right. The whole thing just re-emphasizes the value of using the four-potential vector. It’s obviously much easier to take that vector from one reference frame to another – so we just apply the Lorentz transformation rules to A_μ= (Φ, A) and get A_μ‘ = (Φ’, A’) from it – and then calculate E’ and B’ from it, rather than trying to remember those equations above. However, that’s not the point, or…

Well… It is and it isn’t. We wanted to get away from those two vectors E and B, and show that electromagnetism is really one phenomenon only, and so that’s where the concept of the electromagnetic tensor came in. There were two objectives here: the first objective was to introduce you to the concept of tensors, which we’ll need in the future. The second objective was to show you that, while Lorentz’ force law – F = q(E + v×B) makes it clear we’re talking one force only, there is a way of writing it all up that is much more elegant.

I’ve introduced the concept of tensors here, so the first objective should have been achieved. As for the second objective, I’ll discuss that in my next post, in which I’ll introduce the four-velocity vector μ_μas well as the four-force vector f_μ. It will explain the following beautiful equation of motion:

Now that looks very elegant and unified, doesn’t it? 🙂

[…] Hmm… No reaction. I know… You’re tired now, and you’re thinking: yet another way of representing the same thing? Well… Yes! So…

OK… Enough for today. Let’s follow up tomorrow.

Re-visiting the speed of light, Planck’s constant, and the fine-structure constant

Note: I have published a paper that is very coherent and fully explains what the fine-structure constant actually is. There is nothing magical about it. It’s not some God-given number. It’s a scaling constant – and then some more. But not God-given. Check it out: The Meaning of the Fine-Structure Constant. No ambiguity. No hocus-pocus.

Jean Louis Van Belle, 23 December 2018

Original post:

A brother of mine sent me a link to an article he liked. Now, because we share some interest in physics and math and other stuff, I looked at it and…

Well… I was disappointed. Despite the impressive credentials of its author – a retired physics professor – it was very poorly written. It made me realize how much badly written stuff is around, and I am glad I am no longer wasting my time on it. However, I do owe my brother some explanation of (a) why I think it was bad, and of (b) what, in my humble opinion, he should be wasting his time on. 🙂 So what it is all about?

The article talks about physicists deriving the speed of light from “the electromagnetic properties of the quantum vacuum.” Now, it’s the term ‘quantum‘, in ‘quantum vacuum’, that made me read the article.

Indeed, deriving the theoretical speed of light in empty space from the properties of the classical vacuum – aka empty space – is a piece of cake: it was done by Maxwell himself as he was figuring out his equations back in the 1850s (see my post on Maxwell’s equations and the speed of light). And then he compared it to the measured value, and he saw it was right on the mark. Therefore, saying that the speed of light is a property of the vacuum, or of empty space, is like a tautology: we may just as well put it the other way around, and say that it’s the speed of light that defines the (properties of the) vacuum!

Indeed, as I’ll explain in a moment: the speed of light determines both the electric as well as the magnetic constants μ₀and ε₀, which are the (magnetic) permeability and the (electric) permittivity of the vacuum respectively. Both constants depend on the units we are working with (i.e. the units for electric charge, for distance, for time and for force – or for inertia, if you want, because force is defined in terms of overcoming inertia), but so they are just proportionality coefficients in Maxwell’s equations. So once we decide what units to use in Maxwell’s equations, then μ₀and ε₀ are just proportionality coefficients which we get from c. So they are not separate constants really – I mean, they are not separate from c – and all of the ‘properties’ of the vacuum, including these constants, are in Maxwell’s equations.

In fact, when Maxwell compared the theoretical value of c with its presumed actual value, he didn’t compare c‘s theoretical value with the speed of light as measured by astronomers (like that 17th century Ole Roemer, to which our professor refers: he had a first go at it by suggesting some specific value for it based on his observations of the timing of the eclipses of one of Jupiter’s moons), but with c‘s value as calculated from the experimental values of μ₀and ε₀! So he knew very well what he was looking at. In fact, to drive home the point, it may also be useful to note that the Michelson-Morley experiment – which accurately measured the speed of light – was done some thirty years later. So Maxwell had already left this world by then—very much in peace, because he had solved the mystery all 19th century physicists wanted to solve through his great unification: his set of equations covers it all, indeed: electricity, magnetism, light, and even relativity!

I think the article my brother liked so much does a very lousy job in pointing all of that out, but that’s not why I wouldn’t recommend it. It got my attention because I wondered why one would try to derive the speed of light from the properties of the quantum vacuum. In fact, to be precise, I hoped the article would tell me what the quantum vacuum actually is. Indeed, as far as I know, there’s only one vacuum—one ’empty space’: empty is empty, isn’t it? 🙂 So I wondered: do we have a ‘quantum’ vacuum? And, if so, what is it, really?

Now, that is where the article is really disappointing, I think. The professor drops a few names (like the Max Planck Institute, the University of Paris-Sud, etcetera), and then, promisingly, mentions ‘fleeting excitations of the quantum vacuum’ and ‘virtual pairs of particles’, but then he basically stops talking about quantum physics. Instead, he wanders off to share some philosophical thoughts on the fundamental physical constants. What makes it all worse is that even those thoughts on the ‘essential’ constants are quite off the mark.

So… This post is just a ‘quick and dirty’ thing for my brother which, I hope, will be somewhat more thought-provoking than that article. More importantly, I hope that my thoughts will encourage him to try to grind through better stuff.

On Maxwell’s equations and the properties of empty space

Let me first say something about the speed of light indeed. Maxwell’s four equations may look fairly simple, but that’s only until one starts unpacking all those differential vector equations, and it’s only when going through all of their consequences that one starts appreciating their deep mathematical structure. Let me quickly copy how another blogger jotted them down: 🙂

As I showed in my above-mentioned post, the speed of light (i.e. the speed with which an electromagnetic pulse or wave travels through space) is just one of the many consequences of the mathematical structure of Maxwell’s set of equations. As such, the speed of light is a direct consequence of the ‘condition’, or the properties, of the vacuum indeed, as Maxwell suggested when he wrote that “we can scarcely avoid the inference that light consists in the transverse undulations of the same medium which is the cause of electric and magnetic phenomena”.

Of course, while Maxwell still suggests light needs some ‘medium’ here – so that’s a reference to the infamous aether theory – we now know that’s because he was a 19th century scientist, and so we’ve done away with the aether concept (because it’s a redundant hypothesis), and so now we also know there’s absolutely no reason whatsoever to try to “avoid the inference.” 🙂 It’s all OK, indeed: light is some kind of “transverse undulation” of… Well… Of what?

We analyze light as traveling fields, represented by two vectors, E and B, whose direction and magnitude varies both in space as well as in time. E and B are field vectors, and represent the electric and magnetic field respectively. An equivalent formulation – more or less, that is (see my post on the Liénard-Wiechert potentials) – for Maxwell’s equations when only one (moving) charge is involved is:

This re-formulation, which is Feynman’s preferred formula for electromagnetic radiation, is interesting in a number of ways. It clearly shows that, while we analyze the electric and magnetic field as separate mathematical entities, they’re one and the same phenomenon really, as evidenced by the B = –e_r‘×E/c equation, which tells us the magnetic field from a single moving charge is always normal (i.e. perpendicular) to the electric field vector, and also that B‘s magnitude is 1/c times the magnitude of E, so |B| = B = |E|/c = E/c. In short, B is fully determined by E, or vice versa: if we have one of the two fields, we have the other, so they’re ‘one and the same thing’ really—not in a mathematical sense, but in a real sense.

Also note that E and B‘s magnitude is just the same if we’re using natural units, so if we equate c with 1. Finally, as I pointed out in my post on the relativity of electromagnetic fields, if we would switch from one reference frame to another, we’ll have a different mix of E and B, but that different mix obviously describes the same physical reality. More in particular, if we’d be moving with the charges, the magnetic field sort of disappears to re-appear as an electric field. So the Lorentz force F = F_electric + F_magnetic= qE + qv×B is one force really, and its ‘electric’ and ‘magnetic’ component appear the way they appear in our reference frame only. In some other reference frame, we’d have the same force, but its components would look different, even if they, obviously, would and should add up to the same. [Well… Yes and no… You know there’s relativistic corrections to be made to the forces to, but that’s a minor point, really. The force surely doesn’t disappear!]

All of this reinforces what you know already: electricity and magnetism are part and parcel of one and the same phenomenon, the electromagnetic force field, and Maxwell’s equations are the most elegant way of ‘cutting it up’. Why elegant? Well… Click the Occam tab. 🙂

Now, after having praised Maxwell once more, I must say that Feynman’s equations above have another advantage. In Maxwell’s equations, we see two constants, the electric and magnetic constant (denoted by μ₀and ε₀ respectively), and Maxwell’s equations imply that the product of the electric and magnetic constant is the reciprocal of c²: μ₀·ε₀= 1/c². So here we see ε₀and c only, so no μ₀, so that makes it even more obvious that the magnetic and electric constant are related one to another through c.

[…] Let me digress briefly: why do we have c² in μ₀·ε₀= 1/c², instead of just c? That’s related to the relativistic nature of the magnetic force: think about that B = E/c relation. Or, better still, think about the Lorentz equation F = F_electric + F_magnetic= qE + qv×B = q[E + (v/c)×(E×e_r‘)]: the 1/c factor is there because the magnetic force involves some velocity, and any velocity is always relative—and here I don’t mean relative to the frame of reference but relative to the (absolute) speed of light! Indeed, it’s the v/c ratio (usually denoted by β = v/c) that enters all relativistic formulas. So the left-hand side of the μ₀·ε₀= 1/c² equation is best written as (1/c)·(1/c), with one of the two 1/c factors accounting for the fact that the ‘magnetic’ force is a relativistic effect of the ‘electric’ force, really, and the other 1/c factor giving us the proper relationship between the magnetic and the electric constant. To drive home the point, I invite you to think about the following:

μ₀ is expressed in (V·s)/(A·m), while ε₀is expressed in (A·s)/(V·m), so the dimension in which the μ₀·ε₀product is expressed is [(V·s)/(A·m)]·[(A·s)/(V·m)] = s²/m², so that’s the dimension of 1/c².
Now, this dimensional analysis makes it clear that we can sort of distribute 1/c² over the two constants. All it takes is re-defining the fundamental units we use to calculate stuff, i.e. the units for electric charge, for distance, for time and for force – or for inertia, as explained above. But so we could, if we wanted, equate both μ₀ as well as ε₀with 1/c.
Now, if we would then equate c with 1, we’d have μ₀ = ε₀= c = 1. We’d have to define our units for electric charge, for distance, for time and for force accordingly, but it could be done, and then we could re-write Maxwell’s set of equations using these ‘natural’ units.

In any case, the nitty-gritty here is less important: the point is that μ₀and ε₀are also related through the speed of light and, hence, they are ‘properties’ of the vacuum as well. [I may add that this is quite obvious if you look at their definition, but we’re approaching the matter from another angle here.]

In any case, we’re done with this. On to the next!

On quantum oscillations, Planck’s constant, and Planck units

The second thought I want to develop is about the mentioned quantum oscillation. What is it? Or what could it be? An electromagnetic wave is caused by a moving electric charge. What kind of movement? Whatever: the charge could move up or down, or it could just spin around some axis—whatever, really. For example, if it spins around some axis, it will have a magnetic moment and, hence, the field is essentially magnetic, but then, again, E and B are related and so it doesn’t really matter if the first cause is magnetic or electric: that’s just our way of looking at the world: in another reference frame, one that’s moving with the charges, the field would essential be electric. So the motion can be anything: linear, rotational, or non-linear in some irregular way. It doesn’t matter: any motion can always be analyzed as the sum of a number of ‘ideal’ motions. So let’s assume we have some elementary charge in space, and it moves and so it emits some electromagnetic radiation.

So now we need to think about that oscillation. The key question is: how small can it be? Indeed, in one of my previous posts, I tried to explain some of the thinking behind the idea of the ‘Great Desert’, as physicists call it. The whole idea is based on our thinking about the limit: what is the smallest wavelength that still makes sense? So let’s pick up that conversation once again.

The Great Desert lies between the 10³² and 10⁴³Hz scale. 10³² Hz corresponds to a photon energy of E_γ = h·f = (4×10⁻¹⁵ eV·s)·(10³² Hz) = 4×10¹⁷ eV = 400,000 tera-electronvolt (1 TeV = 10¹²eV). I use the γ (gamma) subscript in my E_γ symbol for two reasons: (1) to make it clear that I am not talking the electric field E here but energy, and (2) to make it clear we are talking ultra-high-energy gamma-rays here.

In fact, γ-rays of this frequency and energy are theoretical only. Ultra-high-energy gamma-rays are defined as rays with photon energies higher than 100 TeV, which is the upper limit for very-high-energy gamma-rays, which have been observed as part of the radiation emitted by so-called gamma-ray bursts (GRBs): flashes associated with extremely energetic explosions in distant galaxies. Wikipedia refers to them as the ‘brightest’ electromagnetic events know to occur in the Universe. These rays are not to be confused with cosmic rays, which consist of high-energy protons and atomic nuclei stripped of their electron shells. Cosmic rays aren’t rays really and, because they consist of particles with a considerable rest mass, their energy is even higher. The so-called Oh-My-God particle, for example, which is the most energetic particle ever detected, had an energy of 3×10²⁰ eV, i.e. 300 million TeV. But it’s not a photon: its energy is largely kinetic energy, with the rest mass m₀ counting for a lot in the m in the E = m·c² formula. To be precise: the mentioned particle was thought to be an iron nucleus, and it packed the equivalent energy of a baseball traveling at 100 km/h!

But let me refer you to another source for a good discussion on these high-energy particles, so I can get get back to the energy of electromagnetic radiation. When I talked about the Great Desert in that post, I did so using the Planck-Einstein relation (E = h·f), which embodies the idea of the photon being valid always and everywhere and, importantly, at every scale. I also discussed the Great Desert using real-life light being emitted by real-life atomic oscillators. Hence, I may have given the (wrong) impression that the idea of a photon as a ‘wave train’ is inextricably linked with these real-life atomic oscillators, i.e. to electrons going from one energy level to the next in some atom. Let’s explore these assumptions somewhat more.

Let’s start with the second point. Electromagnetic radiation is emitted by any accelerating electric charge, so the atomic oscillator model is an assumption that should not be essential. And it isn’t. For example, whatever is left of the nucleus after alpha or beta decay (i.e. a nuclear decay process resulting in the emission of an α- or β-particle) it likely to be in an excited state, and likely to emit a gamma-ray for about 10⁻¹² seconds, so that’s a burst that’s about 10,000 times shorter than the 10^–8seconds it takes for the energy of a radiating atom to die out. [As for the calculation of that 10^–8sec decay time – so that’s like 10 nanoseconds – I’ve talked about this before but it’s probably better to refer you to the source, i.e. one of Feynman’s Lectures.]

However, what we’re interested in is not the energy of the photon, but the energy of one cycle. In other words, we’re not thinking of the photon as some wave train here, but what we’re thinking about is the energy that’s packed into a space corresponding to one wavelength. What can we say about that?

As you know, that energy will depend both on the amplitude of the electromagnetic wave as well as its frequency. To be precise, the energy is (1) proportional to the square of the amplitude, and (2) proportional to the frequency. Let’s look at the first proportionality relation. It can be written in a number of ways, but one way of doing it is stating the following: if we know the electric field, then the amount of energy that passes per square meter per second through a surface that is normal to the direction in which the radiation is going (which we’ll denote by S – the s from surface – in the formula below), must be proportional to the average of the square of the field. So we write S ∝ 〈E²〉, and so we should think about the constant of proportionality now. Now, let’s not get into the nitty-gritty, and so I’ll just refer to Feynman for the derivation of the formula below:

S = ε₀c·〈E²〉

So the constant of proportionality is ε₀c. [Note that, in light of what we wrote above, we can also write this as S = (1/μ₀·c)·〈(c·B)²〉 = (c/μ₀)·〈B²〉, so that underlines once again that we’re talking one electromagnetic phenomenon only really.] So that’s a nice and rather intuitive result in light of all of the other formulas we’ve been jotting down. However, it is a ‘wave’ perspective. The ‘photon’ perspective assumes that, somehow, the amplitude is given and, therefore, the Planck-Einstein relation only captures the frequency variable: E_γ = h·f.

Indeed, ‘more energy’ in the ‘wave’ perspective basically means ‘more photons’, but photons are photons: they have a definite frequency and a definite energy, and both are given by that Planck-Einstein relation. So let’s look at that relation by doing a bit of dimensional analysis:

Energy is measured in electronvolt or, using SI units, joule: 1 eV ≈ 1.6×10⁻¹⁹J. Energy is force times distance: 1 joule = 1 newton·meter, which means that a larger force over a shorter distance yields the same energy as a smaller force over a longer distance. The oscillations we’re talking about here involve very tiny distances obviously. But the principle is the same: we’re talking some moving charge q, and the power – which is the time rate of change of the energy – that goes in or out at any point of time is equal to dW/dt = F·v, with W the work that’s being done by the charge as it emits radiation.
I would also like to add that, as you know, forces are related to the inertia of things. Newton’s Law basically defines a force as that what causes a mass to accelerate: F = m·a = m·(dv/dt) = d(m·v)/dt = dp/dt, with p the momentum of the object that’s involved. When charges are involved, we’ve got the same thing: a potential difference will cause some current to change, and one of the equivalents of Newton’s Law F = m·a = m·(dv/dt) in electromagnetism is V = L·(dI/dt). [I am just saying this so you get a better ‘feel’ for what’s going on.]
Planck’s constant is measured in electronvolt·seconds (eV·s) or in, using SI units, in joule·seconds (J·s), so its dimension is that of (physical) action, which is energy times time: [energy]·[time]. Again, a lot of energy during a short time yields the same energy as less energy over a longer time. [Again, I am just saying this so you get a better ‘feel’ for these dimensions.]
The frequency f is the number of cycles per time unit, so that’s expressed per second, i.e. in herz (Hz) = 1/second = s⁻¹.

So… Well… It all makes sense: [x joule] = [6.626×10⁻³⁴ joule]·[1 second]×[f cycles]/[1 second]. But let’s try to deepen our understanding even more: what’s the Planck-Einstein relation really about?

To answer that question, let’s think some more about the wave function. As you know, it’s customary to express the frequency as an angular frequency ω, as used in the wave function A(x, t) = A₀·sin(kx − ωt). The angular frequency is the frequency expressed in radians per second. That’s because we need an angle in our wave function, and so we need to relate x and t to some angle. The way to think about this is as follows: one cycle takes a time T (i.e. the period of the wave) which is equal to T = 1/f. Yes: one second divided by the number of cycles per second gives you the time that’s needed for one cycle. One cycle is also equivalent to our argument ωt going around the full circle (i.e. 2π), so we write: ω·T = 2π and, therefore:

ω = 2π/T = 2π·f

Now we’re ready to play with the Planck-Einstein relation. We know it gives us the energy of one photon really, but what if we re-write our equation E_γ = h·f as E_γ/f = h? The dimensions in this equation are:

[x joule]·[1 second]/[f cyles] = [6.626×10⁻³⁴ joule]·[1 second]

⇔ x = 6.626×10⁻³⁴ joule per cycle

So that means that the energy per cycle is equal to 6.626×10⁻³⁴ joule, i.e. the value of Planck’s constant.

Let me rephrase truly amazing result, so you appreciate it—perhaps: regardless of the frequency of the light (or our electromagnetic wave, in general) involved, the energy per cycle, i.e. per wavelength or per period, is always equal to 6.626×10⁻³⁴ joule or, using the electronvolt as the unit, 4.135667662×10⁻¹⁵ eV. So, in case you wondered, that is the true meaning of Planck’s constant!

Now, if we have the frequency f, we also have the wavelength λ, because the velocity of the wave is the frequency times the wavelength: c = λ·f and, therefore, λ = c/f. So if we increase the frequency, the wavelength becomes smaller and smaller, and so we’re packing the same amount of energy – admittedly, 4.135667662×10⁻¹⁵ eV is a very tiny amount of energy – into a space that becomes smaller and smaller. Well… What’s tiny, and what’s small? All is relative, of course. 🙂 So that’s where the Planck scale comes in. If we pack that amount of energy into some tiny little space of the Planck dimension, i.e. a ‘length’ of 1.6162×10⁻³⁵ m, then it becomes a tiny black hole, and it’s hard to think about how that would work.

[…] Let me make a small digression here. I said it’s hard to think about black holes but, of course, it’s not because it’s ‘hard’ that we shouldn’t try it. So let me just mention a few basic facts. For starters, black holes do emit radiation! So they swallow stuff, but they also spit stuff out. More in particular, there is the so-called Hawking radiation, as Roger Penrose and Stephen Hawking discovered.

Let me quickly make a few remarks on that: Hawking radiation is basically a form of blackbody radiation, so all frequencies are there, as shown below: the distribution of the various frequencies depends on the temperature of the black body, i.e. the black hole in this case. [The black curve is the curve that Lord Rayleigh and Sir James Jeans derived in the late 19th century, using classical theory only, so that’s the one that does not correspond to experimental fact, and which led Max Planck to become the ‘reluctant’ father of quantum mechanics. In any case, that’s history and so I shouldn’t dwell on this.]

The interesting thing about blackbody radiation, including Hawking radiation, is that it reduces energy and, hence, the equivalent mass of our blackbody. So Hawking radiation reduces the mass and energy of black holes and is therefore also known as black hole evaporation. So black holes that lose more mass than they gain through other means are expected to shrink and ultimately vanish. Therefore, there’s all kind of theories that say why micro black holes, like that Planck scale black hole we’re thinking of right now, should be much larger net emitters of radiation than large black holes and, hence, whey they should shrink and dissipate faster.

Hmm… Interesting… What do we do with all of this information? Well… Let’s think about it as we continue our trek on this long journey to reality over the next year or, more probably, years (plural). 🙂

The key lesson here is that space and time are intimately related because of the idea of movement, i.e. the idea of something having some velocity, and that it’s not so easy to separate the dimensions of time and distance in any hard and fast way. As energy scales become larger and, therefore, our natural time and distance units become smaller and smaller, it’s the energy concept that comes to the fore. It sort of ‘swallows’ all other dimensions, and it does lead to limiting situations which are hard to imagine. Of course, that just underscores the underlying unity of Nature, and the mysteries involved.

So… To relate all of this back to the story that our professor is trying to tell, it’s a simple story really. He’s talking about two fundamental constants basically, c and h, pointing out that c is a property of empty space, and h is related to something doing something. Well… OK. That’s really nothing new, and surely not ground-breaking research. 🙂

Now, let me finish my thoughts on all of the above by making one more remark. If you’ve read a thing or two about this – which you surely have – you’ll probably say: this is not how people usually explain it. That’s true, they don’t. Anything I’ve seen about this just associates the 10⁴³Hz scale with the 10²⁸eV energy scale, using the same Planck-Einstein relation. For example, the Wikipedia article on micro black holes writes that “the minimum energy of a microscopic black hole is 10¹⁹ GeV [i.e. 10²⁸eV], which would have to be condensed into a region on the order of the Planck length.” So that’s wrong. I want to emphasize this point because I’ve been led astray by it for years. It’s not the total photon energy, but the energy per cycle that counts. Having said that, it is correct, however, and easy to verify, that the 10⁴³Hz scale corresponds to a wavelength of the Planck scale: λ = c/f = (3×10⁸m/s)/(10⁴³ s⁻¹) = 3×10⁻³⁵m. The confusion between the photon energy and the energy per wavelength arises because of the idea of a photon: it travels at the speed of light and, hence, because of the relativistic length contraction effect, it is said to be point-like, to have no dimension whatsoever. So that’s why we think of packing all of its energy in some infinitesimally small place. But you shouldn’t think like that. The photon is dimensionless in our reference frame: in its own ‘world’, it is spread out, so it is a wave train. And it’s in its ‘own world’ that the contradictions start… 🙂

OK. Done!

My third and final point is about what our professor writes on the fundamental physical constants, and more in particular on what he writes on the fine-structure constant. In fact, I could just refer you to my own post on it, but that’s probably a bit too easy for me and a bit difficult for you 🙂 so let me summarize that post and tell you what you need to know about it.

The fine-structure constant

The fine-structure constant α is a dimensionless constant which also illustrates the underlying unity of Nature, but in a way that’s much more fascinating than the two or three things the professor mentions. Indeed, it’s quite incredible how this number (α = 0.00729735…, but you’ll usually see it written as its reciprocal, which is a number that’s close to 137.036…) links charge with the relative speeds, radii, and the mass of fundamental particles and, therefore, how this number also these concepts with each other. And, yes, the fact that it is, effectively, dimensionless, unlike h or c, makes it even more special. Let me quickly sum up what the very same number α all stands for:

(1) α is the square of the electron charge expressed in Planck units: α = e_P².

(2) α is the square root of the ratio of (a) the classical electron radius and (b) the Bohr radius: α = √(r_e /r). You’ll see this more often written as r_e = α²r. Also note that this is an equation that does not depend on the units, in contrast to equation 1 (above), and 4 and 5 (below), which require you to switch to Planck units. It’s the square of a ratio and, hence, the units don’t matter. They fall away.

(3) α is the (relative) speed of an electron: α = v/c. [The relative speed is the speed as measured against the speed of light. Note that the ‘natural’ unit of speed in the Planck system of units is equal to c. Indeed, if you divide one Planck length by one Planck time unit, you get (1.616×10⁻³⁵m)/(5.391×10⁻⁴⁴s) = c m/s. However, this is another equation, just like (2), that does not depend on the units: we can express v and c in whatever unit we want, as long we’re consistent and express both in the same units.]

(4) α is also equal to the product of (a) the electron mass (which I’ll simply write as m_e here) and (b) the classical electron radius r_e (if both are expressed in Planck units): α = m_e·r_e. Now I think that’s, perhaps, the most amazing of all of the expressions for α. [If you don’t think that’s amazing, I’d really suggest you stop trying to study physics. :-)]

Also note that, from (2) and (4), we find that:

(5) The electron mass (in Planck units) is equal m_e = α/r_e= α/α²r = 1/αr. So that gives us an expression, using α once again, for the electron mass as a function of the Bohr radius r expressed in Planck units.

Finally, we can also substitute (1) in (5) to get:

(6) The electron mass (in Planck units) is equal to m_e = α/r_e = e_P²/r_e. Using the Bohr radius, we get m_e = 1/αr = 1/e_P²r.

So… As you can see, this fine-structure constant really links all of the fundamental properties of the electron: its charge, its radius, its distance to the nucleus (i.e. the Bohr radius), its velocity, its mass (and, hence, its energy),…

So… Why is what it is?

Well… We all marvel at this, but what can we say about it, really? I struggle how to interpret this, just as much – or probably much more 🙂 – as the professor who wrote the article I don’t like (because it’s so imprecise, and that’s what made me write all what I am writing here).

Having said that, it’s obvious that it points to a unity beyond these numbers and constants that I am only beginning to appreciate for what it is: deep, mysterious, and very beautiful. But so I don’t think that professor does a good job at showing how deep, mysterious and beautiful it all is. But then that’s up to you, my brother and you, my imaginary reader, to judge, of course. 🙂

[…] I forgot to mention what I mean with ‘Planck units’. Well… Once again, I should refer you to one of my other posts. But, yes, that’s too easy for me and a bit difficult for you. 🙂 So let me just note we get those Planck units by equating not less than five fundamental physical constants to 1, notably (1) the speed of light, (2) Planck’s (reduced) constant, (3) Boltzmann’s constant, (4) Coulomb’s constant and (5) Newton’s constant (i.e. the gravitational constant). Hence, we have a set of five equations here (c = ħ = k_B = k_e = G = 1), and so we can solve that to get the five Planck units, i.e. the Planck length unit, the Planck time unit, the Planck mass unit, the Planck energy unit, the Planck charge unit and, finally (oft forgotten), the Planck temperature unit. Of course, you should note that all mass and energy units are directly related because of the mass-energy equivalence relation E = mc², which simplifies to E = m if c is equated to 1. [I could also say something about the relation between temperature and (kinetic) energy, but I won’t, as it would only further confuse you.]

OK. Done! 🙂

Addendum: How to think about space and time?

If you read the argument on the Planck scale and constant carefully, then you’ll note that it does not depend on the idea of an indivisible photon. However, it does depend on that Planck-Einstein relation being valid always and everywhere. Now, the Planck-Einstein relation is, in its essence, a fairly basic result from classical electromagnetic theory: it incorporates quantum theory – remember: it’s the equation that allowed Planck to solve the black-body radiation problem, and so it’s why they call Planck the (reluctant) ‘Father of Quantum Theory’ – but it’s not quantum theory.

So the obvious question is: can we make this reflection somewhat more general, so we can think of the electromagnetic force as an example only. In other words: can we apply the thoughts above to any force and any movement really?

The truth is: I haven’t advanced enough in my little study to give the equations for the other forces. Of course, we could think of gravity, and I developed some thoughts on how gravity waves might look like, but nothing specific really. And then we have the shorter-range nuclear forces, of course: the strong force, and the weak force. The laws involved are very different. The strong force involves color charges, and the way distances work is entirely different. So it would surely be some different analysis. However, the results should be the same. Let me offer some thoughts though:

We know that the relative strength of the nuclear force is much larger, because it pulls like charges (protons) together, despite the strong electromagnetic force that wants to push them apart! So the mentioned problem of trying to ‘pack’ some oscillation in some tiny little space should be worse with the strong force. And the strong force is there, obviously, at tiny little distances!
Even gravity should become important, because if we’ve got a lot of energy packed into some tiny space, its equivalent mass will ensure the gravitational forces also become important. In fact, that’s what the whole argument was all about!
There’s also all this talk about the fundamental forces becoming one at the Planck scale. I must, again, admit my knowledge is not advanced enough to explain how that would be possible, but I must assume that, if physicists are making such statements, the argument must be fairly robust.

So… Whatever charge or whatever force we are talking about, we’ll be thinking of waves or oscillations—or simply movement, but it’s always a movement in a force field, and so there’s power and energy involved (energy is force times distance, and power is the time rate of change of energy). So, yes, we should expect the same issues in regard to scale. And so that’s what’s captured by h.

As we’re talking the smallest things possible, I should also mention that there are also other inconsistencies in the electromagnetic theory, which should (also) have their parallel for other forces. For example, the idea of a point charge is mathematically inconsistent, as I show in my post on fields and charges. Charge, any charge really, must occupy some space. It cannot all be squeezed into one dimensionless point. So the reasoning behind the Planck time and distance scale is surely valid.

In short, the whole argument about the Planck scale and those limits is very valid. However, does it imply our thinking about the Planck scale is actually relevant? I mean: it’s not because we can imagine how things might look like – they may look like those tiny little black holes, for example – that these things actually exist. GUT or string theorists obviously think they are thinking about something real. But, frankly, Feynman had a point when he said what he said about string theory, shortly before his untimely death in 1988: “I don’t like that they’re not calculating anything. I don’t like that they don’t check their ideas. I don’t like that for anything that disagrees with an experiment, they cook up an explanation—a fix-up to say, ‘Well, it still might be true.'”

It’s true that the so-called Standard Model does not look very nice. It’s not like Maxwell’s equations. It’s complicated. It’s got various ‘sectors’: the electroweak sector, the QCD sector, the Higgs sector,… So ‘it looks like it’s got too much going on’, as a friend of mine said when he looked at a new design for mountainbike suspension. 🙂 But, unlike mountainbike designs, there’s no real alternative for the Standard Model. So perhaps we should just accept it is what it is and, hence, in a way, accept Nature as we can see it. So perhaps we should just continue to focus on what’s here, before we reach the Great Desert, rather than wasting time on trying to figure out how things might look like on the other side, especially because we’ll never be able to test our theories about ‘the other side.’

On the other hand, we can see where the Great Desert sort of starts (somewhere near the 10³² Hz scale), and so it’s only natural to think it should also stop somewhere. In fact, we know where it stops: it stops at the 10⁴³ Hz scale, because everything beyond that doesn’t make sense. The question is: is there actually there? Like fundamental strings or whatever you want to call it. Perhaps we should just stop where the Great Desert begins. And what’s the Great Desert anyway? Perhaps it’s a desert indeed, and so then there is absolutely nothing there. 🙂

Hmm… There’s not all that much one can say about it. However, when looking at the history of physics, there’s one thing that’s really striking. Most of what physicists can think of, in the sense that it made physical sense, turned out to exist. Think of anti-matter, for instance. Paul Dirac thought it might exist, that it made sense to exist, and so everyone started looking for it, and Carl Anderson found in a few years later (in 1932). In fact, it had been observed before, but people just didn’t pay attention, so they didn’t want to see it, in a way. […] OK. I am exaggerating a bit, but you know what I mean. The 1930s are full of examples like that. There was a burst of scientific creativity, as the formalism of quantum physics was being developed, and the experimental confirmations of the theory just followed suit.

In the field of astronomy, or astrophysics I should say, it was the same with black holes. No one could really imagine the existence of black holes until the 1960s or so: they were thought of a mathematical curiosity only, a logical possibility. However, the circumstantial evidence now is quite large and so… Well… It seems a lot of what we can think of actually has some existence somewhere. 🙂

So… Who knows? […] I surely don’t. And so I need to get back to the grind and work my way through the rest of Feynman’s Lectures and the related math. However, this was a nice digression, and so I am grateful to my brother he initiated it. 🙂

Reconciling the wave-particle duality in electromagnetism

As I talked about Feynman’s equation for electromagnetic radiation in my previous post, I thought I should add a few remarks on wave-particle duality, but then I didn’t do it there, because my post would have become way too long. So let me add those remarks here. In fact, I’ve written about this before, and so I’ll just mention the basic ideas without going too much in detail. Let me first jot down the formula once again, as well as illustrate the geometry of the situation:

The gist of the matter is that light, in classical theory, is a traveling electromagnetic field caused by an accelerating electric charge and that, because light travels at speed c, it’s the acceleration at the retarded time t – r/c, i.e. a‘ = a(t – r/c), that enters the formula. You’ve also seen the diagrams that accompany this formula:

The two diagrams above show that the curve of the electric field in space is a “reversed” plot of the acceleration as a function of time. As I mentioned before, that’s quite obvious from the mathematical behavior of a function with argument like the argument above, i.e. a function F(t – r/c). When we write t – r/c, we basically measure distance units in seconds, instead of in meter. So we basically use c as the scale for both time as well as distance. I explained that in a previous post, so please have a look there if you’d want so see how that works.

So it’s pretty straightforward, really. However, having said that, when I see a diagram like the one above, so all of these diagrams plotting an E or B wave in space, I can’t help thinking it’s somewhat misleading: after all, we’re talking something traveling at the speed of light here and, therefore, its length – in our frame of reference – should be zero. And it is, obviously. Electromagnetic radiation comes packed in point-like, dimensionless photons: the length of something that travels at the speed of light must be zero.

Now, I don’t claim to know what’s going on exactly, but my thinking on it may not be far off the mark. We know that light is emitted and absorbed by atoms, as electrons go from one energy level to another, and the energy of the photons of light corresponds to the difference between those energy levels (i.e. a few electron-volt only, typically: it’s given by the E = h·ν relation). Therefore, we can look at a photon as a transient electromagnetic wave. It’s a very short pulse: the decay time for one such pulse of sodium light, i.e. one photon of sodium light, is 3.2×10^–8seconds. However, taking into account the frequency of sodium light (500 THz), that still makes for some 16 million oscillations, and a wave-train with a length of almost 10 meter. [Yes. Quite incredible, isn’t it?] So the photon could look like the transient wave I depicted below, except… Well… This wavetrain is traveling at the speed of light and, hence, we will not see it as a ten-meter long wave-train. Why not? Well… Because of the relativistic length contraction, it will effectively appear as a point-like particle to us.

So relativistic length contraction is why the wave and particle duality can be easily reconciled in electromagnetism: we can think of light as an irregular beam of point-like photons indeed, as one atomic oscillator after the other releases a photon, in no particularly organized way. So we can think of photons as transient wave-trains, but we should remind ourselves that they are traveling at the speed of light, so they’ll look point-like to us.

Is such view consistent with the results of the famous – of should I say infamous? – double-slit experiment. Well… Maybe. As I mentioned in one of my posts, it is rather remarkable that is actually hard to find actual double-slit experiments that use actual detectors near the slits, and even harder to find such experiments involving photons! Indeed, experiments involving detectors near the slits are usually experiments with ‘real’ particles, such as electrons, for example. Now, a lot of advances have been made in the set-up of these experiments over the past five years, and one of these experiments is a 2010 experiment of an Italian team which suggests that it’s the interaction between the detector and the electron wave that may cause the interference pattern to disappear. Now that throws some doubts on the traditional explanation of the results of the double-slit experiment.

The idea is shown below. The electron is depicted as an incoming plane wave which effectively breaks up as it goes through the slits. The slit on the left has no ‘filter’ (which you may think of as a detector) and, hence, the plane wave goes through as a cylindrical wave. The slit on the right-hand side is covered by a ‘filter’ made of several layers of ‘low atomic number material’, so the electron goes through but, at the same time, the barrier creates a spherical wave as it goes through. The researchers note that “the spherical and cylindrical wave do not have any phase correlation, and so even if an electron passed through both slits, the two different waves that come out cannot create an interference pattern on the wall behind them.” [I hope I don’t have to remind you that, while being represented as ‘real’ waves here, the ‘waves’ are, obviously, complex-valued psi functions.]

In fact, to be precise, the experimenters note that there still was an interference effect if the filter was thin enough. Let me quote the reason for that: “The thicker the filter, the greater the probability for inelastic scattering. When the electron suffers inelastic scattering, it is localized. This means that its wavefunction collapses and, after the measurement act, it propagates roughly as a spherical wave from the region of interaction, with no phase relation at all with other elastically or inelastically scattered electrons. If the filter is made thick enough, the interference effects cancels out almost completely.”

This does not solve the ‘mystery’ of the double-slit experiment, but it throws doubt on how it’s usually being explained. The mystery in such experiments is that, when we put detectors, it is either the detector at A or the detector at B that goes off. They should never go off together—”at half strength, perhaps”, as Feynman puts it. But so there are doubts here now. Perhaps the electron does go through both slits at the same time! And so that’s why I used italics when writing “even if an electron passed through both slits”: the electron, or the photon in a similar set-up, is not supposed to do that according to the traditional explanation of the results of the double-slit experiment! It’s one or the other, and the wavefunction collapses or reduces as it goes through.

However, that’s where these so-called ‘weak measurement’ experiments now come in, like this 2010 experiment: it does not prove but indicates that interaction does not have to be that way. They strongly suggest that it is not all or nothing, that our observations should not necessarily destroy the wavefunction. So, who knows, perhaps we will be able, one day, to show that the wavefunction does go through both slits, as it should (otherwise the interference pattern cannot be explained), and then we will have resolved the paradox.

I am pretty sure that, when that’s done, physicists will also be able to relate the image of a photon as a transient electromagnetic wave (cf. the diagram above), being emitted by an atomic oscillator for a few nanoseconds only (we gave the example for sodium light, for which the decay time was 3.2×10^–8seconds) with the image of a photon as a particle that can be represented by a complex-valued probability amplitude function (cf. the diagram below). I look forward to that day. I think it will come soon.

Here I should add two remarks. First, a lot has been said about the so-called indivisibility of a photon, but inelastic scattering implies that photons are not monolithic: the photon loses energy to the electron and, hence, its wavelength changes. Now, you’ll say: the scattered photon is not the same photon as the incident photon, and you’re right. But… Well. Think about it. It does say something about the presumed oneness of a photon.

The other remark is on the mathematics of interference. Photons are bosons and, therefore, we have to add their amplitudes to get the interference effect. So you may try to think of an amplitude function, like Ψ = (1/√2π)·eⁱ^θor whatever, and think it’s just a matter of ‘splitting’ this function before it enters the two slits and then ‘putting it back together’, so to say, after our photon has gone through the slits. [For the detailed math of interference in quantum mechanics, see my page on essentials.] Well… No. It’s not that simple. The illustration with that plane wave entering the slits, and the cylindrical and/or spherical wave coming out, makes it obvious that something happens to our wave as it goes through the slit. As I said a couple of times already, the two-slit experiment is interesting, but the interference phenomenon – or diffraction as it’s called – involving one slit only is at least as interesting. So… Well… The analysis is not that simple. Not at all, really. 🙂

Maxwell’s equations and the speed of light

Pre-script (dated 26 June 2020): Our ideas have evolved into a full-blown realistic (or classical) interpretation of all things quantum-mechanical. In addition, I note the dark force has amused himself by removing some material, which messed up the lay-out of this post as well. So no use to read this. Read my recent papers instead. 🙂

Original post:

We know how electromagnetic waves travel through space: they do so because of the mechanism described in Maxwell’s equation: a changing magnetic field causes a changing electric field, and a changing magnetic field causes a (changing) electric field, as illustrated below.

So we need some First Cause to get it all started 🙂 i.e. some current, i.e. some moving charge, but then the electromagnetic wave travels, all by itself, through empty space, completely detached from the cause. You know that by now – indeed, you’ve heard this a thousand times before – but, if you’re reading this, you want to know how it works exactly. 🙂

In my post on the Lorentz gauge, I included a few links to Feynman’s Lectures that explain the nitty-gritty of this mechanism from various angles. However, they’re pretty horrendous to read, and so I just want to summarize them a bit—if only for myself, so as to remind myself what’s important and not. In this post, I’ll focus on the speed of light: why do electromagnetic waves – light – travel at the speed of light?

You’ll immediately say: that’s a nonsensical question. It’s light, so it travels at the speed of light. Sure, smart-arse! Let me be more precise: how can we relate the speed of light to Maxwell’s equations? That is the question here. Let’s go for it.

Feynman deals with the matter of the speed of an electromagnetic wave, and the speed of light, in a rather complicated exposé on the fields from some infinite sheet of charge that is suddenly set into motion, parallel to itself, as shown below. The situation looks – and actually is – very simple, but the math is rather messy because of the rather exotic assumptions: infinite sheets and infinite acceleration are not easy to deal with. 🙂 But so the whole point of the exposé is just to prove that the speed of propagation (v) of the electric and magnetic fields is equal to the speed of light (c), and it does a marvelous job at that. So let’s focus on that here only. So what I am saying is that I am going to leave out most of the nitty-gritty and just try to get to that v = c result as fast as I possibly can. So, fasten your seat belt, please.

Most of the nitty-gritty in Feynman’s exposé is about how to determine the direction and magnitude of the electric and magnetic fields, i.e. E and B. Now, when the nitty-gritty business is finished, the grand conclusion is that both E and B travel out in both the positive as well as the negative x-direction at some speed v and sort of ‘fill’ the entire space as they do. Now, the region they are filling extends infinitely far in both the y- and z-direction but, because they travel along the x-axis, there are no fields (yet) in the region beyond x = ± v·t (t = 0 is the moment when the sheet started moving, and it moves in the positive y-direction). As you can see, the sheet of charge fills the yz-plane, and the assumption is that its speed goes from zero to u instantaneously, or very very quickly at least. So the E and B fields move out like a tidal wave, as illustrated below, and thereby ‘fill’ the space indeed, as they move out.

The magnitude of E and B is constant, but it’s not the same constant, and part of the exercise here is to determine the relationship between the two constants. As for their direction, you can see it in the first illustration: B points in the negative z-direction for x > 0 and in the positive z-direction for x < 0, while E‘s direction is opposite to u‘s direction everywhere, so E points in the negative y-direction. As said, you should just take my word for it, because the nitty-gritty on this – which we do not want to deal with here – is all in Feynman and so I don’t want to copy that.

The crux of the argument revolves around what happens at the wavefront itself, as it travels out. Feynman relates flux and circulation there. It’s the typical thing to do: it’s at the wavefront itself that the fields change: before they were zero, and now they are equal to that constant. The fields do not change anywhere else, so there’s no changing flux or circulation business to be analyzed anywhere else. So we define two loops at the wavefront itself: Γ₁ and Γ₂. They are normal to each other (cf. the top and side view of the situation below), because the E and B fields are normal to each other. And so then we use Maxwell’s equations to check out what happens with the flux and circulation there and conclude what needs to be concluded. 🙂

We start with rectangle Γ₂. So one side is in the region where there are fields, and one side is in the region where the fields haven’t reached yet. There is some magnetic flux through this loop, and it is changing, so there is an emf around it, i.e. some circulation of E. The flux changes because the area in which B exists increases at speed v. Now, the time rate of change of the flux is, obviously, the width of the rectangle L times the rate of change of the area, so that’s (B·L·v·Δt)/Δt = B·L·v, with Δt some differential time interval co-defining how slow or how fast the field changes. Now, according to Faraday’s Law (see my previous post), this will be equal to minus the line integral of $E around Γ 2, which is E\cdotL. So E\cdotL = B\cdotL\cdotv and, hence, we find:$ $E = v\cdotB.$

$Interesting! To satisfy Faraday’s equation (which is just one of Maxwell’s equations in integral rather than in differential form), E must equal B times v, with v the speed of propagation of our ‘tidal’ wave. Now let’s look at Γ 1 . There we should apply:$

$Now the line integral is just B\cdotL, and the right-hand side is E\cdotL\cdotv, so, not forgetting that c 2 in front—i.e. the square of the speed of light, as you know!—we get:$ $c 2 B = E\cdotv, or E = (c 2 /v)\cdotB.$

Now, the E = v·B and E = (c²/v)·B equations must both apply (we’re talking one wave and one and the same phenomenon) and, obviously, that’s only possible if v = c²/v, i.e. if v = c. So the wavefront must travel at the speed of light! Waw ! That’s fast. 🙂 Yes. […] Jokes aside, that’s the result we wanted here: we just proved that the speed of travel of an electromagnetic wave must be equal to the speed of light.

As an added bonus, we also showed the mechanism of travel. It’s obvious from the equations we used to prove the result: it works through the derivatives of the fields with respect to time, i.e. ∂E/∂t and ∂B/∂t.

Done! Great! Enjoy the view!

Well… Yes and no. If you’re smart, you’ll say: we got this result because of the c² factor in that equation, so Maxwell had already put it in, so to speak. Waw! You really are a smart-arse, aren’t you? 🙂

The thing is… Well… The answer is: no. Maxwell did not put it in. Well… Yes and no. Let me explain. Maxwell’s first equation was the electric flux law ∇·E = σ/ε₀: the flux of E through a closed surface is proportional to the charge inside. So that’s basically an other way of writing Coulomb’s Law, and ε₀ was just some constant in it, the electric constant. So it’s a constant of proportionality that depends on the unit in which we measure electric charge. The only reason that it’s there is to make the units come out alright, so if we’d measure charge not in coulomb (C) in a unit equal to 1 C/ε₀, it would disappear. If we’d do that, our new unit would be equivalent to the charge of some 700,000 protons. You can figure that magical number yourself by checking the values of the proton charge and ε₀. 🙂

OK. And then Faraday came up with the exact laws for magnetism, and they involved current and some other constant of proportionality, and Maxwell formalized that by writing ∇×B = μ₀j, with μ₀ the magnetic constant. It’s not a flux law but a circulation law: currents cause circulation of B. We get the flux rule from it by integrating it. But currents are moving charges, and so Maxwell knew magnetism was related to the same thing: electric charge. So Maxwell knew the two constants had to be related. In fact, when putting the full set of equations together – there are four, as you know – Maxwell figured out that μ₀times ε₀would have to be equal to the reciprocal of c², with c the speed of propagation of the wave. So Maxwell knew that, whatever the unit of charge, we’d get two constants of proportionality, and electric and a magnetic constant, and that μ₀·ε₀would be equal to 1/c². However, while he knew that, at the time, light and electromagnetism were considered to be separate phenomena, and so Maxwell did not say that c was the speed of light: the only thing his equations told him was that c is the speed of propagation of that ‘electromagnetic’ wave that came out of his equations.

The rest is history. In 1856, the great Wilhelm Eduard Weber – you’ve seen his name before, didn’t you? – did a whole bunch of experiments which measured the electric constant rather precisely, and Maxwell jumped on it and calculated all the rest, i.e. μ₀, and so then he took the reciprocal of the square root of μ₀·ε₀and – Bang! – he had c, the speed of propagation of the electromagnetic wave he was thinking of. Now, c was some value of the order of 3×10⁸ m/s, and so that happened to be the same as the speed of light, which suggested that Maxwell’s c and the speed of light were actually one and the same thing!

Now, I am a smart-arse too 🙂 and, hence, when I first heard this story, I actually wondered how Maxwell could possibly know the speed of light at the time: Maxwell died many years before the Michelson-Morley experiment unequivocally established the value of the speed of light. [In case, you wonder: the Michelson-Morley experiment was done in 1887. So I check it. The fact is that the Michelson-Morley experiment concluded that the speed of light was an absolute value and that, in the process of doing so, they got a rather precise value for it, but the value of c itself has already been established, more or less, that is, by a Danish astronomer, Ole Römer, in 1676 ! He did so by carefully observing the timing of the repeating eclipses of Io, one of Jupiter’s moons. Newton mentioned his results in his Principia, which he wrote in 1687, duly noting that it takes about seven to eight minutes for light to travel from the Sun to the Earth. Done! The whole story is fascinating, really, so you should check it out yourself. 🙂

In any case, to make a long story short, Maxwell was puzzled by this mysterious coincidence, but he was bold enough to immediately point to the right conclusion, tentatively at least, and so he told the Cambridge Philosophical Society, in the very same year, i.e. 1856, that “we can scarcely avoid the inference that light consists in the transverse undulations of the same medium which is the cause of electric and magnetic phenomena.”

So… Well… Maxwell still suggests light needs some medium here, so the ‘medium’ is a reference to the infamous aether theory, but that’s not the point: what he says here is what we all take for granted now: light is an electromagnetic wave. So now we know there’s absolute no reason whatsoever to avoid the ‘inference’, but… Well… 160 years ago, it was quite a big deal to suggest something like that. 🙂

So that’s the full story. I hoped you like it. Don’t underestimate what you just did: understanding an argument like this is like “climbing a great peak”, as Feynman puts it. So it is “a great moment” indeed. 🙂 The only thing left is, perhaps, to explain the ‘other’ flux rules I used above. Indeed, you know Faraday’s Law:

But that other one? Well… As I explained in my previous post, Faraday’s Law is the integral form of Maxwell’s second equation: −∂B/∂t = ∇×E. The ‘other’ flux rule above – so that’s the one with the c² in front and without a minus sign, is the integral form of Maxwell’s fourth equation: c²∇×B = j/ε₀+ ∂E/∂t, taking into account that we’re talking a wave traveling in free space, so there are no charges and currents (it’s just a wave in empty space—whatever that means) and, hence, the Maxwell equation reduces to c²∇×B = ∂E/∂t. Now, I could take you through the same gymnastics as I did in my previous post but, if I were you, I’d just apply the general principle that ”the same equations must yield the same solutions” and so I’d just switch E for B and vice versa in Faraday’s equation. 🙂

So we’re done… Well… Perhaps one more thing. We’ve got these flux rules above telling us that the electromagnetic wave will travel all by itself, through empty space, completely detached from its First Cause. But… […] Well… Again you may think there’s some trick here. In other words, you may think the wavefront has to remain connected to the First Cause somehow, just like the whip below is connected to some person whipping it. 🙂

There’s no such connection. The whip is not needed. 🙂 If we’d switch off the First Cause after some time T, so our moving sheet stops moving, then we’d have the pulse below traveling through empty space. As Feynman puts it: “The fields have taken off: they are freely propagating through space, no longer connected in any way with the source. The caterpillar has turned into a butterfly!“

Now, the last question is always the same: what are those fields? What’s their reality? Here, I should refer you to one of the most delightful sections in Feynman’s Lectures. It’s on the scientific imagination. I’ll just quote the introduction to it, but I warmly recommend you go and check it out for yourself: it has no formulas whatsoever, and so you should understand all of it without any problem at all. 🙂

“I have asked you to imagine these electric and magnetic fields. What do you do? Do you know how? How do I imagine the electric and magnetic field? What do I actually see? What are the demands of scientific imagination? Is it any different from trying to imagine that the room is full of invisible angels? No, it is not like imagining invisible angels. It requires a much higher degree of imagination to understand the electromagnetic field than to understand invisible angels. Why? Because to make invisible angels understandable, all I have to do is to alter their properties a little bit—I make them slightly visible, and then I can see the shapes of their wings, and bodies, and halos. Once I succeed in imagining a visible angel, the abstraction required—which is to take almost invisible angels and imagine them completely invisible—is relatively easy. So you say, “Professor, please give me an approximate description of the electromagnetic waves, even though it may be slightly inaccurate, so that I too can see them as well as I can see almost invisible angels. Then I will modify the picture to the necessary abstraction.”

I’m sorry I can’t do that for you. I don’t know how. I have no picture of this electromagnetic field that is in any sense accurate. I have known about the electromagnetic field a long time—I was in the same position 25 years ago that you are now, and I have had 25 years more of experience thinking about these wiggling waves. When I start describing the magnetic field moving through space, I speak of the $E$ and $B$ fields and wave my arms and you may imagine that I can see them. I’ll tell you what I see. I see some kind of vague shadowy, wiggling lines—here and there is an $E and a$ $B$ written on them somehow, and perhaps some of the lines have arrows on them—an arrow here or there which disappears when I look too closely at it. When I talk about the fields swishing through space, I have a terrible confusion between the symbols I use to describe the objects and the objects themselves. I cannot really make a picture that is even nearly like the true waves. So if you have some difficulty in making such a picture, you should not be worried that your difficulty is unusual.

Our science makes terrific demands on the imagination. The degree of imagination that is required is much more extreme than that required for some of the ancient ideas. The modern ideas are much harder to imagine. We use a lot of tools, though. We use mathematical equations and rules, and make a lot of pictures. What I realize now is that when I talk about the electromagnetic field in space, I see some kind of a superposition of all of the diagrams which I’ve ever seen drawn about them. I don’t see little bundles of field lines running about because it worries me that if I ran at a different speed the bundles would disappear, I don’t even always see the electric and magnetic fields because sometimes I think I should have made a picture with the vector potential and the scalar potential, for those were perhaps the more physically significant things that were wiggling.

Perhaps the only hope, you say, is to take a mathematical view. Now what is a mathematical view? From a mathematical view, there is an electric field vector and a magnetic field vector at every point in space; that is, there are six numbers associated with every point. Can you imagine six numbers associated with each point in space? That’s too hard. Can you imagine even one number associated with every point? I cannot! I can imagine such a thing as the temperature at every point in space. That seems to be understandable. There is a hotness and coldness that varies from place to place. But I honestly do not understand the idea of a number at every point.

So perhaps we should put the question: Can we represent the electric field by something more like a temperature, say like the displacement of a piece of jello? Suppose that we were to begin by imagining that the world was filled with thin jello and that the fields represented some distortion—say a stretching or twisting—of the jello. Then we could visualize the field. After we “see” what it is like we could abstract the jello away. For many years that’s what people tried to do. Maxwell, Ampère, Faraday, and others tried to understand electromagnetism this way. (Sometimes they called the abstract jello “ether.”) But it turned out that the attempt to imagine the electromagnetic field in that way was really standing in the way of progress. We are unfortunately limited to abstractions, to using instruments to detect the field, to using mathematical symbols to describe the field, etc. But nevertheless, in some sense the fields are real, because after we are all finished fiddling around with mathematical equations—with or without making pictures and drawings or trying to visualize the thing—we can still make the instruments detect the signals from Mariner II and find out about galaxies a billion miles away, and so on.

The whole question of imagination in science is often misunderstood by people in other disciplines. They try to test our imagination in the following way. They say, “Here is a picture of some people in a situation. What do you imagine will happen next?” When we say, “I can’t imagine,” they may think we have a weak imagination. They overlook the fact that whatever we are allowed to imagine in science must be consistent with everything else we know: that the electric fields and the waves we talk about are not just some happy thoughts which we are free to make as we wish, but ideas which must be consistent with all the laws of physics we know. We can’t allow ourselves to seriously imagine things which are obviously in contradiction to the known laws of nature. And so our kind of imagination is quite a difficult game. One has to have the imagination to think of something that has never been seen before, never been heard of before. At the same time the thoughts are restricted in a strait jacket, so to speak, limited by the conditions that come from our knowledge of the way nature really is. The problem of creating something which is new, but which is consistent with everything which has been seen before, is one of extreme difficulty.”

Isn’t that great? I mean: Feynman, one of the greatest physicists of all time, didn’t write what he wrote above when he was a undergrad student or so. No. He did so in 1964, when he was 45 years old, at the height of his scientific career! And it gets better, because Feynman then starts talking about beauty. What is beauty in science? Well… Just click and check what Feynman thinks about it. 🙂

Oh… Last thing. So what is the magnitude of the E and B field? Well… You can work it out yourself, but I’ll give you the answer. The geometry of the situation makes it clear that the electric field has a y-component only, and the magnetic field a z-component only. Their magnitudes are given in terms of J, i.e. the surface current density going in the positive y-direction:

Music and Math

Pre-scriptum (dated 26 June 2020): These posts on elementary math and physics have not suffered much the attack by the dark force—which is good because I still like them. While my views on the true nature of light, matter and the force or forces that act on them have evolved significantly as part of my explorations of a more realist (classical) explanation of quantum mechanics, I think most (if not all) of the analysis in this post remains valid and fun to read. In fact, I find the simplest stuff is often the best. 🙂

Original post:

I ended my previous post, on Music and Physics, by emphatically making the point that music is all about structure, about mathematical relations. Let me summarize the basics:

1. The octave is the musical unit, defined as the interval between two pitches with the higher frequency being twice the frequency of the lower pitch. Let’s denote the lower and higher pitch by a and b respectively, so we say that b‘s frequency is twice that of a.

2. We then divide the [a, b] interval (whose length is unity) in twelve equal sub-intervals, which define eleven notes in-between a and b. The pitch of the notes in-between is defined by the exponential function connecting a and b. What exponential function? The exponential function with base 2, so that’s the function y = 2^x.

Why base 2? Because of the doubling of the frequencies when going from a to b, and when going from b to b + 1, and from b + 1 to b + 2, etcetera. In music, we give a, b, b + 1, b + 2, etcetera the same name, or symbol: A, for example. Or Do. Or C. Or Re. Whatever. If we have the unit and the number of sub-intervals, all the rest follows. We just add a number to distinguish the various As, or Cs, or Gs, so we write A1, A2, etcetera. Or C1, C2, etcetera. The graph below illustrates the principle for the interval between C4 and C5. Don’t think the function is linear. It’s exponential: note the logarithmic frequency scale. To make the point, I also inserted another illustration (credit for that graph goes to another blogger).

You’ll wonder: why twelve sub-intervals? Well… That’s random. Non-Western cultures use a different number. Eight instead of twelve, for example—which is more logical, at first sight at least: eight intervals amounts to dividing the interval in two equal halves, and the halves in halves again, and then once more: so the length of the sub-interval is then 1/2·1/2·1/2 = (1/2)³ = 1/8. But why wouldn’t we divide by three, so we have 9 = 3·3 sub-intervals? Or by 27 = 3·3·3? Or by 16? Or by 5?

The answer is: we don’t know. The limited sensitivity of our ear demands that the intervals be cut up somehow. [You can do tests of the sensitivity of your ear to relative frequency differences online: it’s fun. Just try them! Some of the sites may recommend a hearing aid, but don’t take that crap.] So… The bottom line is that, somehow, mankind settled on twelve sub-intervals within our musical unit—or our sound unit, I should say. So it is what it is, and the ratio of the frequencies between two successive (semi)tones (e.g. C and C#, or E and F, as E and F are also separated by one half-step only) is 2^1/12 = 1.059463… Hence, the pitch of each note is about 6% higher than the pitch of the previous note. OK. Next thing.

3. What’s the similarity between C1, C2, C3 etcetera? Or between A1, A2, A3 etcetera? The answer is: harmonics. The frequency of the first overtone of a string tuned at pitch A3 (i.e. 220 Hz) is equal to the fundamental frequency of a string tuned at pitch A4 (i.e. 440 Hz). Likewise, the frequency of the (pitch of the) C4 note above (which is the so-called middle C) is 261.626 Hz, while the frequency of the (pitch of the) next C note (C5) is twice that frequency: 523.251 Hz. [I should quickly clarify the terminology here: a tone consists of several harmonics, with frequencies f, 2·f, 3·f,… n·f,… The first harmonic is referred to as the fundamental, with frequency f. The second, third, etc harmonics are referred to as overtones, with frequency 2·f, 3·f, etc.]

To make a long story short: our ear is able to identify the individual harmonics in a tone, and if the frequency of the first harmonic of one tone (i.e. the fundamental) is the same frequency as the second harmonic of another, then we feel they are separated by one musical unit.

Isn’t that most remarkable? Why would it be that way?

My intuition tells me I should look at the energy of the components. The energy theorem tells us that the total energy in a wave is just the sum of the energies in all of the Fourier components. Surely, the fundamental must carry most of the energy, and then the first overtone, and then the second. Really? Is that so?

Well… I checked online to see if there’s anything on that, but my quick check reveals there’s nothing much out there in terms of research: if you’d google ‘energy levels of overtones’, you’ll get hundreds of links to research on the vibrational modes of molecules, but nothing that’s related to music theory. So… Well… Perhaps this is my first truly original post! 🙂 Let’s go for it. 🙂

The energy in a wave is proportional to the square of its amplitude, and we must integrate over one period (T) of the oscillation. The illustration below should help you to understand what’s going on. The fundamental mode of the wave is an oscillation with a wavelength (λ₁) that is twice the length of the string (L). For the second mode, the wavelength (λ₂) is just L. For the third mode, we find that λ₃ = (2/3)·L. More in general, the wavelength of the n^thmode is λ_n = (2/n)·L.

The illustration above shows that we’re talking sine waves here, differing in their frequency (or wavelength) only. [The speed of the wave (c), as it travels back and forth along the string, i constant, so frequency and wavelength are in that simple relationship: c = f·λ.] Simplifying and normalizing (i.e. choosing the ‘right’ units by multiplying scales with some proportionality constant), the energy of the first mode would be (proportional to):

What about the second and third modes? For the second mode, we have two oscillations per cycle, but we still need to integrate over the period of the first mode T = T₁, which is twice the period of the second mode: T₁ = 2·T₂. Hence, T₂ = (1/2)·T₁. Therefore, the argument of the sine wave (i.e. the x variable in the integral above) should go from 0 to 4π. However, we want to compare the energies of the various modes, so let’s substitute cleverly. We write:

The period of the third mode is equal to T₃ = (1/3)·T₁. Conversely, T₁ = 3·T₃. Hence, the argument of the sine wave should go from 0 to 6π. Again, we’ll substitute cleverly so as to make the energies comparable. We write:

Now that is interesting! For a so-called ideal string, whose motion is the sum of a sinusoidal oscillation at the fundamental frequency f, another at the second harmonic frequency 2·f, another at the third harmonic 3·f, etcetera, we find that the energies of the various modes are proportional to the values in the harmonic series 1, 1/2, 1/3, 1/4,… 1/n, etcetera. Again, Pythagoras’ conclusion was wrong (the ratio of frequencies of individual notes do not respect simple ratios), but his intuition was right: the harmonic series ∑n⁻¹(n = 1, 2,…,∞) is very relevant in describing natural phenomena. It gives us the respective energies of the various natural modes of a vibrating string! In the graph below, the values are represented as areas. It is all quite deep and mysterious really!

So now we know why we feel C4 and C5 have so much in common that we call them by the same name: C, or Do. It also helps us to understand why the E and A tones have so much in common: the third harmonic of the 110 Hz A2 string corresponds to the fundamental frequency of the E4 string: both are 330 Hz! Hence, E and A have ‘energy in common’, so to speak, but less ‘energy in common’ than two successive E notes, or two successive A notes, or two successive C notes (like C4 and C5).

[…] Well… Sort of… In fact, the analysis above is quite appealing but – I hate to say it – it’s wrong, as I explain in my post scriptum to this post. It’s like Pythagoras’ number theory of the Universe: the intuition behind is OK, but the conclusions aren’t quite right. 🙂

Ideality versus reality

We’ve been talking ideal strings. Actual tones coming out of actual strings have a quality, which is determined by the relative amounts of the various harmonics that are present in the tone, which is not some simple sum of sinusoidal functions. Actual tones have a waveform that may resemble something like the wavefunction I presented in my previous post, when discussing Fourier analysis. Let me insert that illustration once again (and let me also acknowledge its source once more: it’s Wikipedia). The red waveform is the sum of six sine functions, with harmonically related frequencies, but with different amplitudes. Hence, the energy levels of the various modes will not be proportional to the values in that harmonic series ∑n⁻¹, with n = 1, 2,…,∞.

Das wohltemperierte Klavier

Nothing in what I wrote above is related to questions of taste like: why do I seldomly select a classical music channel on my online radio station? Or why am I not into hip hop, even if my taste for music is quite similar to that of the common crowd (as evidenced from the fact that I like ‘Listeners’ Top’ hit lists)?

Not sure. It’s an unresolved topic, I guess—involving rhythm and other ‘structures’ I did not mention. Indeed, all of the above just tells us a nice story about the structure of the language of music: it’s a story about the tones, and how they are related to each other. That relation is, in essence, an exponential function with base 2. That’s all. Nothing more, nothing less. It’s remarkably simple and, at the same time, endlessly deep. 🙂 But so it is not a story about the structure of a musical piece itself, of a pop song of Ellie Goulding, for instance, or one of Bach’s preludes or fugues.

That brings me back to the original question I raised in my previous post. It’s a question which was triggered, long time ago, when I tried to read Douglas Hofstadter‘s Gödel, Escher and Bach, frustrated because my brother seemed to understand it, and I didn’t. So I put it down, and never ever looked at it again. So what is it really about that famous piece of Bach?

Frankly, I still amn’t sure. As I mentioned in my previous post, musicians were struggling to find a tuning system that would allow them to easily transpose musical compositions. Transposing music amounts to changing the so-called key of a musical piece, so that’s moving the whole piece up or down in pitch by some constant interval that is not equal to an octave. It’s a piece of cake now. In fact, increasing or decreasing the playback speed of a recording also amounts to transposing a piece: a increase or decrease of the playback speed by 6% will shift the pitch up or down by about one semitone. Why? Well… Go back to what I wrote above about that 12th root of 2. We’ve got the right tuning system now, and so everything is easy. Logarithms are great! 🙂

Back to Bach. Despite their admiration for the Greek ideas around aesthetics – and, most notably, their fascination with harmonic ratios! – (almost) all Renaissance musicians were struggling with the so-called Pythagorean tuning system, which was used until the 18th century and which was based on a correct observation (similar strings, under the same tension but differing in length, sound ‘pleasant’ when sounded together if – and only if – the ratio of the length of the strings is like 1:2, 2:3, 3:4, 3:5, 4:5, etcetera) but a wrong conclusion (the frequencies of musical tones should also obey the same harmonic ratios), and Bach’s so-called ‘good’ temperament tuning system was designed such that the piece could, indeed, be played in most keys without sounding… well… out of tune. 🙂

Having said that, the modern ‘equal temperament’ tuning system, which prescribes that tuning should be done such that the notes are in the above-described simple logarithmic relation to each other, had already been invented. So the true question is: why didn’t Bach embrace it? Why did he stick to ratios? Why did it take so long for the right system to be accepted?

I don’t know. If you google, you’ll find a zillion of possible explanations. As far as I can see, most are all rather mystic. More importantly, most of them do not mention many facts. My explanation is rather simple: while Bach was, obviously, a musical genius, he may not have understood what an exponential, or a logarithm, is all about. Indeed, a quick read of summary biographies reveals that Bach studied a wide range of topics, like Latin and Greek, and theology—of course! But math is not mentioned. He didn’t write about tuning and all that: all of his time went to writing musical masterpieces!

What the biographies do mention is that he always found other people’s tunings unsatisfactory, and that he tuned his harpsichords and clavichords himself. Now that is quite revealing, I’d say! In my view, Bach couldn’t care less about the ratios. He knew something was wrong with the Pythagorean system (or the variants as were then used, which are referred to as meantone temperament) and, as a musical genius, he probably ended up tuning by ear. [For those who’d wonder what I am talking about, let me quickly insert a Wikipedia graph illustrating the difference between the Pythagorean system (and two of these meantone variants) and the equal temperament tuning system in use today.]

So… What’s the point I am trying to make? Well… Frankly, I’d bet Bach’s own tuning was actually equal temperament, and so he should have named his masterpiece Das gleichtemperierte Klavier. Then we wouldn’t have all that ‘noise’ around it. 🙂

Post scriptum: Did you like the argument on the respective energy levels of the harmonics of an ideal string? Too bad. It’s wrong. I made a common mistake: when substituting variables in the integral, I ‘forgot’ to substitute the lower and upper bound of the interval over which I was integrating the function. The calculation below corrects the mistake, and so it does the required substitutions—for the first three modes at least. What’s going on here? Well… Nothing much… I just integrate over the length L taking a snapshot at t = 0 (as mentioned, we can always shift the origin of our independent variable, so here we do it for time and so it’s OK). Hence, the argument of our wave function sin(kx−ωt) reduces to kx, with k = 2π/λ, and λ= 2L, λ = L, λ= (2/3)·L for the first, second and third mode respectively. [As for solving the integral of the sine squared, you can google the formula, and please do check my substitutions. They should be OK, but… Well… We never know, do we? :-)]

[…] No… This doesn’t make all that much sense either. Those integrals yield the same energy for all three modes. Something must be wrong: shorter wavelengths (i.e. higher frequencies) are associated with higher energy levels. Full stop. So the ‘solution’ above can’t be right… […] You’re right. That’s where the time aspect comes into play. We were taking a snapshot, indeed, and the mean value of the sine squared function is 1/2 = 0.5, as should be clear from Pythagoras’ theorem: cos²x + sin²x = 1. So what I was doing is like integrating a constant function over the same-length interval. So… Well… Yes: no wonder I get the same value again and again.

[…]

We need to integrate over the same time interval. You could do that, as an exercise, but there’s a more direct approach to it: the energy of a wave is directly proportional to its frequency, so we write: E ∼ f. If the frequency doubles, triples, quadruples etcetera, then its energy doubles, triples, quadruples etcetera too. But – remember – we’re talking one string only here, with a fixed wave speed c = λ·f – so f = c/λ (read: the frequency is inversely proportional to the wavelength) – and, therefore (assuming the same (maximum) amplitude), we get that the energy level of each mode is inversely proportional to the wavelength, so we find that E ∼ 1/f.

Now, with direct or inverse proportionality relations, we can always invent some new unit that makes the relationship an identity, so let’s do that and turn it into an equation indeed. [And, yes, sorry… I apologize again to your old math teacher: he may not quite agree with the shortcut I am taking here, but he’ll justify the logic behind.] So… Remembering that λ₁ = 2L, λ₂ = L, λ₃ = (2/3)·L, etcetera, we can then write:

E₁ = (1/2)/L, E₂ = (2/2)/L, E₃ = (3/2)/L, E₄ = (4/2)/L, E₅ = (5/2)/L,…, E_n = (n/2)/L,…

That’s a really nice result, because… Well… In quantum theory, we have this so-called equipartition theorem, which says that the permitted energy levels of a harmonic oscillator are equally spaced, with the interval between them equal to h or ħ (if you use the angular frequency to describe a wave (so that’s ω = 2π·f), then Planck’s constant (h) becomes ħ = h/2π). So here we’ve got equipartition too, with the interval between the various energy levels equal to (1/2)/L.

You’ll say: So what? Frankly, if this doesn’t amaze you, stop reading—but if this doesn’t amaze you, you actually stopped reading a long time ago. 🙂 Look at what we’ve got here. We didn’t specify anything about that string, so we didn’t care about its materials or diameter or tension or how it was made (a wound guitar string is a terribly complicated thing!) or about whatever. Still, we know its fundamental (or normal) modes, and their frequency or nodes or energy or whatever depend on the length of the string only, with the ‘fundamental’ unit of energy being equal to the reciprocal length. Full stop. So all is just a matter of size and proportions. In other words, it’s all about structure. Absolute measurements don’t matter.

You may say: Bull****. What’s the conclusion? You still didn’t tell me anything about how the total energy of the wave is supposed to be distributed over its normal modes!

That’s true. I didn’t. Why? Well… I am not sure, really. I presented a lot of stuff here, but I did not present a clear and unambiguous answer as to how the total energy of a string is distributed over its modes. Not for actual strings, nor for ideal strings. Let me be honest: I don’t know. I really don’t. Having said that, my guts instinct that most of the energy – of, let’s say, a C4 note – should be in the primary mode (i.e. in the fundamental frequency) must be right: otherwise we would not call it a C4 note. So let’s try to make some assumptions. However, before doing so, let’s first briefly touch base with reality.

For actual strings (or actual musical sounds), I suspect the analysis can be quite complicated, as evidenced by the following illustration, which I took from one of the many interesting sites on this topic. Let me quote the author: “A flute is essentially a tube that is open at both ends. Air is blown across one end and sound comes out the other. The harmonics are all whole number multiples of the fundamental frequency (436 Hz, a slightly flat A₄ — a bit lower in frequency than is normally acceptable). Note how the second harmonic is nearly as intense as the fundamental. [My = blog writer’s 🙂 italics] This strong second harmonic is part of what makes a flute sound like a flute.”

Hmmm… What I see in the graph is a first harmonic that is actually more intense than its fundamental, so what’s that all about? So can we actually associate a specific frequency to that tone? Not sure. So we’re in trouble already.

If reality doesn’t match our thinking, what about ideality? Hmmm… What to say? As for ideal strings – or ideal flutes 🙂 – I’d venture to say that the most obvious distribution of energy over the various modes (or harmonics, when we’re talking sound) would is the Boltzmann distribution.

Huh? Yes. Have a look at one of my posts on statistical mechanics. It’s a weird thing: the distribution of molecular speeds in a gas, or the density of the air in the atmosphere, or whatever involving many particles and/or a great degree of complexity (so many, or such a degree of complexity, that only some kind of statistical approach to the problem works—all that involves Boltzmann’s Law, which basically says the distribution function will be a function of the energy levels involved: f = e^–energy. So… Well… Yes. It’s the logarithmic scale again. It seems to govern the Universe. 🙂

Huh? Yes. That’s why I think: the distribution of the total energy of the oscillation should be some Boltzmann function, so it should depend on the energy of the modes: most of the energy will be in the lower modes, and most of the most in the fundamental. […] Hmmm… It again begs the question: how much exactly?

Well… The Boltzmann distribution strongly resembles the ‘harmonic’ distribution shown above (1, 1/2, 1/3, 1/4 etc), but it’s not quite the same. The graph below shows how they are similar and dissimilar in shape. You can experiment yourself with coefficients and all that, but your conclusion will be the same. As they say in Asia: they are “same-same but different.” 🙂 […] It’s like the ‘good’ and ‘equal’ temperament used when tuning musical instruments: the ‘good’ temperament – which is based on harmonic ratios – is good, but not good enough. Only the ‘equal’ temperament obeys the logarithmic scale and, therefore, is perfect. So, as I mentioned already, while my assumption isn’t quite right (the distribution is not harmonic, in the Pythagorean sense), the intuition behind is OK. So it’s just like Pythagoras’ number theory of the Universe. Having said that, I’ll leave it to you to draw the correct the conclusions from it. 🙂

Music and Physics

Original post:

My first working title for this post was Music and Modes. Yes. Modes. Not moods. The relation between music and moods is an interesting research topic as well but so it’s not what I am going to write about. 🙂

It started with me thinking I should write something on modes indeed, because the concept of a mode of a wave, or any oscillator really, is quite central to physics, both in classical physics as well as in quantum physics (quantum-mechanical systems are analyzed as oscillators too!). But I wondered how to approach it, as it’s a rather boring topic if you look at the math only. But then I was flying back from Europe, to Asia, where I live and, as I am also playing a bit of guitar, I suddenly wanted to know why we like music. And then I thought that’s a question you may have asked yourself at some point of time too! And so then I thought I should write about modes as part of a more interesting story: a story about music—or, to be precise, a story about the physics behind music. So… Let’s go for it.

Philosophy versus physics

There is, of course, a very simple answer to the question of why we like music: we like music because it is music. If it would not be music, we would not like it. That’s a rather philosophical answer, and it probably satisfies most people. However, for someone studying physics, that answer can surely not be sufficient. What’s the physics behind? I reviewed Feynman’s Lecture on sound waves in the plane, combined it with some other stuff I googled when I arrived, and then I wrote this post, which gives you a much less philosophical answer. 🙂

The observation at the center of the discussion is deceptively simple: why is it that similar strings (i.e. strings made of the same material, with the same thickness, etc), under the same tension but differing in length, sound ‘pleasant’ when sounded together if – and only if – the ratio of the length of the strings is like 1:2, 2:3, 3:4, 3:5, 4:5, etc (i.e. like whatever other ratio of two small integers)?

You probably wonder: is that the question, really? It is. The question is deceptively simple indeed because, as you will see in a moment, the answer is quite complicated. So complicated, in fact, that the Pythagoreans didn’t have any answer. Nor did anyone else for that matter—until the 18th century or so, when musicians, physicists and mathematicians alike started to realize that a string (of a guitar, or a piano, or whatever instrument Pythagoras was thinking of at the time), or a column of air (in a pipe organ or a trumpet, for example), or whatever other thing that actually creates the musical tone, actually oscillates at numerous frequencies simultaneously.

The Pythagoreans did not suspect that a string, in itself, is a rather complicated thing – something which physicists refer to as a harmonic oscillator – and that its sound, therefore, is actually produced by many frequencies, instead of only one. The concept of a pure note, i.e. a tone that is free of harmonics (i.e. free of all other frequencies, except for the fundamental frequency) also didn’t exist at the time. And if it did, they would not have been able to produce a pure tone anyway: producing pure tones – or notes, as I’ll call them, somewhat inaccurately (I should say: a pure pitch) – is remarkably complicated, and they do not exist in Nature. If the Pythagoreans would have been able to produce pure tones, they would have observed that pure tones do not give any sensation of consonance or dissonance if their relative frequencies respect those simple ratios. Indeed, repeated experiments, in which such pure tones are being produced, have shown that human beings can’t really say whether it’s a musical sound or not: it’s just sound, and it’s neither pleasant (or consonant, we should say) or unpleasant (i.e. dissonant).

The Pythagorean observation is valid, however, for actual (i.e. non-pure) musical tones. In short, we need to distinguish between tones and notes (i.e. pure tones): they are two very different things, and the gist of the whole argument is that musical tones coming out of one (or more) string(s) under tension are full of harmonics and, as I’ll explain in a minute, that’s what explains the observed relation between the lengths of those strings and the phenomenon of consonance (i.e. sounding ‘pleasant’) or dissonance (i.e. sounding ‘unpleasant’).

Of course, it’s easy to say what I say above: we’re 2015 now, and so we have the benefit of hindsight. Back then – so that’s more than 2,500 years ago! – the simple but remarkable fact that the lengths of similar strings should respect some simple ratio if they are to sound ‘nice’ together, triggered a fascination with number theory (in fact, the Pythagoreans actually established the foundations of what is now known as number theory). Indeed, Pythagoras felt that similar relationships should also hold for other natural phenomena! To mention just one example, the Pythagoreans also believed that the orbits of the planets would also respect such simple numerical relationships, which is why they talked of the ‘music of the spheres’ (Musica Universalis).

We now know that the Pythagoreans were wrong. The proportions in the movements of the planets around the Sun do not respect simple ratios and, with the benefit of hindsight once again, it is regrettable that it took many courageous and brilliant people, such as Galileo Galilei and Copernicus, to convince the Church of that fact. 😦 Also, while Pythagoras’ observations in regard to the sounds coming out of whatever strings he was looking at were correct, his conclusions were wrong: the observation does not imply that the frequencies of musical notes should all be in some simple ratio one to another.

Let me repeat what I wrote above: the frequencies of musical notes are not in some simple relationship one to another. The frequency scale for all musical tones is logarithmic and, while that implies that we can, effectively, do some tricks with ratios based on the properties of the logarithmic scale (as I’ll explain in a moment), the so-called ‘Pythagorean’ tuning system, which is based on simple ratios, was plain wrong, even if it – or some variant of it (instead of the 3:2 ratio, musicians used the 5:4 ratio from about 1510 onwards) – was generally used until the 18th century! In short, Pythagoras was wrong indeed—in this regard at least: we can’t do much with those simple ratios.

Having said that, Pythagoras’ basic intuition was right, and that intuition is still very much what drives physics today: it’s the idea that Nature can be described, or explained (whatever that means), by quantitative relationships only. Let’s have a look at how it actually works for music.

Tones, noise and notes

Let’s first define and distinguish tones and notes. A musical tone is the opposite of noise, and the difference between the two is that musical tones are periodic waveforms, so they have a period T, as illustrated below. In contrast, noise is a non-periodic waveform. It’s as simple as that.

Now, from previous posts, you know we can write any period function as the sum of a potentially infinite number of simple harmonic functions, and that this sum is referred to as the Fourier series. I am just noting it here, so don’t worry about it as for now. I’ll come back to it later.

You also know we have seven musical notes: Do-Re-Mi-Fa-Sol-La-Si or, more common in the English-speaking world, A-B-C-D-E-F-G. And then it starts again with A (or Do). So we have two notes, separated by an interval which is referred to as an octave (from the Greek octo, i.e. eight), with six notes in-between, so that’s eight notes in total. However, you also know that there are notes in-between, except between E and F and between B and C. They are referred to as semitones or half-steps. I prefer the term ‘half-step’ over ‘semitone’, because we’re talking notes really, not tones.

We have, for example, F–sharp (denoted by F#), which we can also call G-flat (denoted by Gb). It’s the same thing: a sharp # raises a note by a semitone (aka half-step), and a flat b lowers it by the same amount, so F# is Gb. That’s what shown below: in an octave, we have eight notes but twelve half-steps.

Let’s now look at the frequencies. The frequency scale above (expressed in oscillations per second, so that’s the hertz unit) is a logarithmic scale: frequencies double as we go from one octave to another: the frequency of the C4 note above (the so-called middle C) is 261.626 Hz, while the frequency of the next C note (C5) is double that: 523.251 Hz. [Just in case you’d want to know: the 4 and 5 number refer to its position on a standard 88-key piano keyboard: C4 is the fourth C key on the piano.]

Now, if we equate the interval between C4 and C5 with 1 (so the octave is our musical ‘unit’), then the interval between the twelve half-steps is, obviously, 1/12. Why? Because we have 12 halve-steps in our musical unit. You can also easily verify that, because of the way logarithms work, the ratio of the frequencies of two notes that are separated by one half-step (between D# and E, for example) will be equal to 2^1/12. Likewise, the ratio of the frequencies of two notes that are separated by n half-steps is equal to 2^n/12. [In case you’d doubt, just do an example. For instance, if we’d denote the frequency of C4 as f₀, and the frequency of C# as f₁ and so on (so the frequency of D is f₂, the frequency of C5 is f₁₂, and everything else is in-between), then we can write the f₂/f₀ratio as f₂/f₀= ( f₂/f₁)(f₁/f₀) = 2^1/12·2^1/12 = 2^2/12= 2^1/6. I must assume you’re smart enough to generalize this result yourself, and that f₁₂/f₀is, obviously, equal to 2^12/12=2¹ = 2, which is what it should be!]

Now, because the frequencies of the various C notes are expressed as a number involving some decimal fraction (like 523.251 Hz, and the 0.251 is actually an approximation only), and because they are, therefore, a bit hard to read and/or work with, I’ll illustrate the next idea – i.e. the concept of harmonics – with the A instead of the C. 🙂

Harmonics

The lowest A on a piano is denoted by A0, and its frequency is 27.5 Hz. Lower A notes exist (we have one at 13.75 Hz, for instance) but we don’t use them, because they are near (or actually beyond) the limit of the lowest frequencies we can hear. So let’s stick to our grand piano and start with that 27.5 Hz frequency. The next A note is A1, and its frequency is 55 Hz. We then have A2, which is like the A on my (or your) guitar: its frequency is equal to 2×55 = 110 Hz. The next is A3, for which we double the frequency once again: we’re at 220 Hz now. The next one is the A in the illustration of the C scale above: A4, with a frequency of 440 Hz.

[Let me, just for the record, note that the A4 note is the standard tuning pitch in Western music. Why? Well… There’s no good reason really, except convention. Indeed, we can derive the frequency of any other note from that A4 note using our formula for the ratio of frequencies but, because of the properties of a logarithmic function, we could do the same using whatever other note really. It’s an important point: there’s no such thing as an absolute reference point in music: once we define our musical ‘unit’ (so that’s the so-called octave in Western music), and how many steps we want to have in-between (so that’s 12 steps—again, in Western music, that is), we get all the rest. That’s just how logarithms work. So music is all about structure, i.e. mathematical relationships. Again, Pythagoras’ conclusions were wrong, but his intuition was right.]

Now, the notes we are talking about here are all so-called pure tones. In fact, when I say that the A on our guitar is referred to as A2 and that it has a frequency of 110 Hz, then I am actually making a huge simplification. Worse, I am lying when I say that: when you play a string on a guitar, or when you strike a key on a piano, all kinds of other frequencies – so-called harmonics – will resonate as well, and that’s what gives the quality to the sound: it’s what makes it sound beautiful. So the fundamental frequency (aka as first harmonic) is 110 Hz alright but we’ll also have second, third, fourth, etc harmonics with frequency 220 Hz, 330 Hz, 440 Hz, etcetera. In music, the basic or fundamental frequency is referred to as the pitch of the tone and, as you can see, I often use the term ‘note’ (or pure tone) as a synonym for pitch—which is more or less OK, but not quite correct actually. [However, don’t worry about it: my sloppiness here does not affect the argument.]

What’s the physics behind? Look at the illustration below (I borrowed it from the Physics Classroom site). The thick black line is the string, and the wavelength of its fundamental frequency (i.e. the first harmonic) is twice its length, so we write λ₁ = 2·L or, the other way around, L = (1/2)·λ₁. Now that’s the so-called first mode of the string. [One often sees the term fundamental or natural or normal mode, but the adjective is not necessary really. In fact, I find it confusing, although I sometimes find myself using it too.]

We also have a second, third, etc mode, depicted below, and these modes correspond to the second, third, etc harmonic respectively.

For the second, third, etc mode, the relationship between the wavelength and the length of the string is, obviously, the following: L = (2/2)·λ₂= λ₂, L = L = (3/2)·λ₃, etc. More in general, for the n^th mode, L will be equal to L = (n/2)·λ_n, with n = 1, 2, etcetera. In fact, because L is supposed to be some fixed length, we should write it the other way around: λ_n = (2/n)·L.

What does it imply for the frequencies? We know that the speed of the wave – let’s denote it by c – as it travels up and down the string, is a property of the string, and it’s a property of the string only. In other words, it does not depend on the frequency. Now, the wave velocity is equal to the frequency times the wavelength, always, so we have c = f·λ. To take the example of the (classical) guitar string: its length is 650 mm, i.e. 0.65 m. Hence, the identities λ₁ = (2/1)·L, λ₂ = (2/2)·L, λ₃ = (2/3)·L etc become λ₁ = (2/1)·0.65 = 1.3 m, λ₂ = (2/2)·0.65 = 0.65 m, λ₃ = (2/3)·0.65 = 0.433.. m and so on. Now, combining these wavelengths with the above-mentioned frequencies, we get the wave velocity c = (110 Hz)·(1.3 m) = (220 Hz)·(0.65 m) = (330 Hz)·(0.433.. m) = 143 m/s.

Let me now get back to Pythagoras’ string. You should note that the frequencies of the harmonics produced by a simple guitar string are related to each other by simple whole number ratios. Indeed, the frequencies of the first and second harmonics are in a simple 2 to 1 ratio (2:1). The second and third harmonics have a 3:2 frequency ratio. The third and fourth harmonics a 4:3 ratio. The fifth and fourth harmonic 5:4, and so on and so on. They have to be. Why? Because the harmonics are simple multiples of the basic frequency. Now that is what’s really behind Pythagoras’ observation: when he was sounding similar strings with the same tension but different lengths, he was making sounds with the same harmonics. Nothing more, nothing less.

Let me be quite explicit here, because the point that I am trying to make here is somewhat subtle. Pythagoras’ string is Pythagoras’ string: he talked similar strings. So we’re not talking some actual guitar or a piano or whatever other string instrument. The strings on (modern) string instruments are not similar, and they do not have the same tension. For example, the six strings of a guitar strings do not differ in length (they’re all 650 mm) but they’re different in tension. The six strings on a classical guitar also have a different diameter, and the first three strings are plain strings, as opposed to the bottom strings, which are wound. So the strings are not similar but very different indeed. To illustrate the point, I copied the values below for just one of the many commercially available guitar string sets. It’s the same for piano strings. While they are somewhat more simple (they’re all made of piano wire, which is very high quality steel wire basically), they also differ—not only in length but in diameter as well, typically ranging from 0.85 mm for the highest treble strings to 8.5 mm (so that’s ten times 0.85 mm) for the lowest bass notes.

In short, Pythagoras was not playing the guitar or the piano (or whatever other more sophisticated string instrument that the Greeks surely must have had too) when he was thinking of these harmonic relationships. The physical explanation behind his famous observation is, therefore, quite simple: musical tones that have the same harmonics sound pleasant, or consonant, we should say—from the Latin con-sonare, which, literally, means ‘to sound together’ (from sonare = to sound and con = with). And otherwise… Well… Then they do not sound pleasant: they are dissonant.

To drive the point home, let me emphasize that, when we’re plucking a string, we produce a sound consisting of many frequencies, all in one go. One can see it in practice: if you strike a lower A string on a piano – let’s say the 110 Hz A2 string – then its second harmonic (220 Hz) will make the A3 string vibrate too, because it’s got the same frequency! And then its fourth harmonic will make the A4 string vibrate too, because they’re both at 440 Hz. Of course, the strength of these other vibrations (or their amplitude we should say) will depend on the strength of the other harmonics and we should, of course, expect that the fundamental frequency (i.e. the first harmonic) will absorb most of the energy. So we pluck one string, and so we’ve got one sound, one tone only, but numerous notes at the same time!

In this regard, you should also note that the third harmonic of our 110 Hz A2 string corresponds to the fundamental frequency of the E4 tone: both are 330 Hz! And, of course, the harmonics of E, such as its second harmonic (2·330 Hz = 660 Hz) correspond to higher harmonics of A too! To be specific, the second harmonic of our E string is equal to the sixth harmonic of our A2 string. If your guitar is any good, and if your strings are of reasonable quality too, you’ll actually see it: the (lower) E and A strings co-vibrate if you play the A major chord, but by striking the upper four strings only. So we’ve got energy – motion really – being transferred from the four strings you do strike to the two strings you do not strike! You’ll say: so what? Well… If you’ve got any better proof of the actuality (or reality) of various frequencies being present at the same time, please tell me! 🙂

So that’s why A and E sound very well together (A, E and C#, played together, make up the so-called A major chord): our ear likes matching harmonics. And so that why we like musical tones—or why we define those tones as being musical! 🙂 Let me summarize it once more: musical tones are composite sound waves, consisting of a fundamental frequency and so-called harmonics (so we’ve got many notes or pure tones altogether in one musical tone). Now, when other musical tones have harmonics that are shared, and we sound those notes too, we get the sensation of harmony, i.e. the combination sounds consonant.

Now, i’s not difficult to see that we will always have such shared harmonics if we have similar strings, with the same tension but different lengths, being sounded together. In short, what Pythagoras observed has nothing much to do with notes, but with tones. Let’s go a bit further in the analysis now by introducing some more math. And, yes, I am very sorry: it’s the dreaded Fourier analysis indeed! 🙂

Fourier analysis

You know that we can decompose any periodic function into a sum of a (potentially infinite) series of simple sinusoidal functions, as illustrated below. I took the illustration from Wikipedia: the red function s₆(x) is the sum of six sine functions of different amplitudes and (harmonically related) frequencies. The so-called Fourier transform S(f) (in blue) relates the six frequencies with the respective amplitudes.

In light of the discussion above, it is easy to see what this means for the sound coming from a plucked string. Using the angular frequency notation (so we write everything using ω instead of f), we know that the normal or natural modes of oscillation have frequencies ω = 2π/T = 2πf (so that’s the fundamental frequency or first harmonic), 2ω (second harmonic), 3ω (third harmonic), and so on and so on.

Now, there’s no reason to assume that all of the sinusoidal functions that make up our tone should have the same phase: some phase shift Φ may be there and, hence, we should write our sinusoidal function not as cos(ωt), but as cos(ωt + Φ) in order to ensure our analysis is general enough. [Why not a sine function? It doesn’t matter: the cosine and sine function are the same, except for another phase shift of 90° = π/2.] Now, from our geometry classes, we know that we can re-write cos(ωt + Φ) as

cos(ωt + Φ) = [cos(Φ)cos(ωt) – sin(Φ)sin(ωt)]

We have a lot of these functions of course – one for each harmonic, in fact – and, hence, we should use subscripts, which is what we do in the formula below, which says that any function f(t) that is periodic with the period T can be written mathematically as:

You may wonder: what’s that period T? It’s the period of the fundamental mode, i.e. the first harmonic. Indeed, the period of the second, third, etc harmonic will only be one half, one third etcetera of the period of the first harmonic. Indeed, T₂ = (2π)/(2ω) = (1/2)·(2π)/ω = (1/2)·T₁, and T₃ = (2π)/(3ω) = (1/3)·(2π)/ω = (1/3)·T₁, and so on. However, it’s easy to see that these functions also repeat themselves after two, three, etc periods respectively. So all is alright, and the general idea behind the Fourier analysis is further illustrated below. [Note that both the formula as well as the illustration below (which I took from Feynman’s Lectures) add a ‘zero-frequency term’ a₀ to the series. That zero-frequency term will usually be zero for a musical tone, because the ‘zero’ level of our tone will be zero indeed. Also note that the a_n and b_n coefficients are, of course, equal to a_n = cos Φ_nand b_n= –sinΦ_n, so you can relate the illustration and the formula easily.]

You’ll say: What the heck! Why do we need the mathematical gymnastics here? It’s just to understand that other characteristic of a musical tone: its quality (as opposed to its pitch). A so-called rich tone will have strong harmonics, while a pure tone will only have the first harmonic. All other characteristics – the difference between a tone produced by a violin as opposed to a piano – are then related to the ‘mix’ of all those harmonics.

So we have it all now, except for loudness which is, of course, related to the magnitude of the air pressure changes as our waveform moves through the air: pitch, loudness and quality. that’s what makes a musical tone. 🙂

Dissonance

As mentioned above, if the sounds are not consonant, they’re dissonant. But what is dissonance really? What’s going on? The answer is the following: when two frequencies are near to a simple fraction, but not exact, we get so-called beats, which our ear does not like.

Huh? Relax. The illustration below, which I copied from the Wikipedia article on piano tuning, illustrates the phenomenon. The blue wave is the sum of the red and the green wave, which are originally identical. But then the frequency of the green wave is increased, and so the two waves are no longer in phase, and the interference results in a beating pattern. Of course, our musical tone involves different frequencies and, hence, different periods T₁,T₂, T₃etcetera, but you get the idea: the higher harmonics also oscillate with period T₁, and if the frequencies are not in some exact ratio, then we’ll have a similar problem: beats, and our ear will not like the sound.

Of course, you’ll wonder: why don’t we like beats in tones? We can ask that, can’t we? It’s like asking why we like music, isn’t it? […] Well… It is and it isn’t. It’s like asking why our ear (or our brain) likes harmonics. We don’t know. That’s how we are wired. The ‘physical’ explanation of what is musical and what isn’t only goes so far, I guess. 😦

Pythagoras versus Bach

From all of what I wrote above, it is obvious that the frequencies of the harmonics of a musical tone are, indeed, related by simple ratios of small integers: the frequencies of the first and second harmonics are in a simple 2 to 1 ratio (2:1); the second and third harmonics have a 3:2 frequency ratio; the third and fourth harmonics a 4:3 ratio; the fifth and fourth harmonic 5:4, etcetera. That’s it. Nothing more, nothing less.

In other words, Pythagoras was observing musical tones: he could not observe the pure tones behind, i.e. the actual notes. However, aesthetics led Pythagoras, and all musicians after him – until the mid-18th century – to also think that the ratio of the frequencies of the notes within an octave should also be simple ratios. From what I explained above, it’s obvious that it should not work that way: the ratio of the frequencies of two notes separated by n half-steps is 2^n/12, and, for most values of n, 2^n/12 is not some simple ratio. [Why? Just take your pocket calculator and calculate the value of 2^1/12: it’s 2^0.08333… = 1.0594630943… and so on… It’s an irrational number: there are no repeating decimals. Now, 2^n/12 is equal to 2^1/12·2^1/12·…·2^1/12 (n times). Why would you expect that product to be equal to some simple ratio?]

So – I said it already – Pythagoras was wrong—not only in this but also in other regards, such as when he espoused his views on the solar system, for example. Again, I am sorry to have to say that, but it is what is: the Pythagoreans did seem to prefer mathematical ideas over physical experiment. 🙂 Having said that, musicians obviously didn’t know about any alternative to Pythagoras, and they had surely never heard about logarithmic scales at the time. So… Well… They did use the so-called Pythagorean tuning system. To be precise, they tuned their instruments by equating the frequency ratio between the first and the fifth tone in the C scale (i.e. the C and G, as they did not include the C#, D# and F# semitones when counting) with the ratio 3/2, and then they used other so-called harmonic ratios for the notes in-between.

Now, the 3/2 ratio is actually almost correct, because the actual frequency ratio is 2^7/12 (we have seven tones, including the semitones—not five!), and so that’s 1.4983, approximately. Now, that’s pretty close to 3/2 = 1.5, I’d say. 🙂 Using that approximation (which, I admit, is fairly accurate indeed), the tuning of the other strings would then also be done assuming certain ratios should be respected, like the ones below.

So it was all quite good. Having said that, good musicians, and some great mathematicians, felt something was wrong—if only because there were several so-called just intonation systems around (for an overview, check out the Wikipedia article on just intonation). More importantly, they felt it was quite difficult to transpose music using the Pythagorean tuning system. Transposing music amounts to changing the so-called key of a musical piece: what one does, basically, is moving the whole piece up or down in pitch by some constant interval that is not equal to an octave. Today, transposing music is a piece of cake—Western music at least. But that’s only because all Western music is played on instruments that are tuned using that logarithmic scale (technically, it’s referred to as the 12-tone equal temperament (12-TET) system). When you’d use one of the Pythagorean systems for tuning, a transposed piece does not sound quite right.

The first mathematician who really seemed to know what was wrong (and, hence, who also knew what to do) was Simon Stevin, who wrote a manuscript based on the ’12^throot of 2 principle’ around AD 1600. It shouldn’t surprise us: the thinking of this mathematician from Bruges would inspire John Napier’s work on logarithms. Unfortunately, while that manuscript describes the basic principles behind the 12-TET system, it didn’t get published (Stevin had to run away from Bruges, to Holland, because he was protestant and the Spanish rulers at the time didn’t like that). Hence, musicians, while not quite understanding the math (or the physics, I should say) behind their own music, kept trying other tuning systems, as they felt it made their music sound better indeed.

One of these ‘other systems’ is the so-called ‘good’ temperament, which you surely heard about, as it’s referred to in Bach’s famous composition, Das Wohltemperierte Klavier, which he finalized in the first half of the 18th century. What is that ‘good’ temperament really? Well… It is what it is: it’s one of those tuning systems which made musicians feel better about their music for a number of reasons, all of which are well described in the Wikipedia article on it. But the main reason is that the tuning system that Bach recommended was a great deal better when it came to playing the same piece in another key. However, it still wasn’t quite right, as it wasn’t the equal temperament system (i.e. the 12-TET system) that’s in place now (in the West at least—the Indian music scale, for instance, is still based on simple ratios).

Why do I mention this piece of Bach? The reason is simple: you probably heard of it because it’s one of the main reference points in a rather famous book: Gödel, Escher and Bach—an Eternal Golden Braid. If not, then just forget about it. I am mentioning it because one of my brothers loves it. It’s on artificial intelligence. I haven’t read it, but I must assume Bach’s master piece is analyzed there because of its structure, not because of the tuning system that one’s supposed to use when playing it. So… Well… I’d say: don’t make that composition any more mystic than it already is. 🙂 The ‘magic’ behind it is related to what I said about A4 being the ‘reference point’ in music: since we’re using a universal logarithmic scale now, there’s no such thing as an absolute reference point any more: once we define our musical ‘unit’ (so that’s the so-called octave in Western music), and also define how many steps we want to have in-between (so that’s 12—in Western music, that is), we get all the rest. That’s just how logarithms work.

So, in short, music is all about structure, i.e. it’s all about mathematical relations, and about mathematical relations only. Again, Pythagoras’ conclusions were wrong, but his intuition was right. And, of course, it’s his intuition that gave birth to science: the simple ‘models’ he made – of how notes are supposed to be related to each other, or about our solar system – were, obviously, just the start of it all. And what a great start it was! Looking back once again, it’s rather sad conservative forces (such as the Church) often got in the way of progress. In fact, I suddenly wonder: if scientists would not have been bothered by those conservative forces, could mankind have sent people around the time that Charles V was born, i.e. around A.D. 1500 already? 🙂

Post scriptum: My example of the the (lower) E and A guitar strings co-vibrating when playing the A major chord striking the upper four strings only, is somewhat tricky. The (lower) E and A strings are associated with lower pitches, and we said overtones (i.e. the second, third, fourth, etc harmonics) are multiples of the fundamental frequency. So why is that the lower strings co-vibrate? The answer is easy: they oscillate at the higher frequencies only. If you have a guitar: just try it. The two strings you do not pluck do vibrate—and very visibly so, but the low fundamental frequencies that come out of them when you’d strike them, are not audible. In short, they resonate at the higher frequencies only. 🙂

The example that Feynman gives is much more straightforward: his example mentions the lower C (or A, B, etc) notes on a piano causing vibrations in the higher C strings (or the higher A, B, etc string respectively). For example, striking the C2 key (and, hence, the C2 string inside the piano) will make the (higher) C3 string vibrate too. But few of us have a grand piano at home, I guess. That’s why I prefer my guitar example. 🙂

The Uncertainty Principle revisited

Pre-script (dated 26 June 2020): This post has become less relevant (even irrelevant, perhaps) because my views on all things quantum-mechanical have evolved significantly as a result of my progression towards a more complete realist (classical) interpretation of quantum physics. I keep blog posts like these mainly because I want to keep track of where I came from. I might review them one day, but I currently don’t have the time or energy for it. 🙂

Original post:

I’ve written a few posts on the Uncertainty Principle already. See, for example, my post on the energy-time expression for it (ΔE·Δt ≥ h). So why am I coming back to it once more? Not sure. I felt I left some stuff out. So I am writing this post to just complement what I wrote before. I’ll do so by explaining, and commenting on, the ‘semi-formal’ derivation of the so-called Kennard formulation of the Principle in the Wikipedia article on it.

The Kennard inequalities, σ_xσ_p ≥ ħ/2 and σ_Eσ_t ≥ ħ/2, are more accurate than the more general Δx·Δp ≥ h and ΔE·Δt ≥ h expressions one often sees, which are an early formulation of the Principle by Niels Bohr, and which Heisenberg himself used when explaining the Principle in a thought experiment picturing a gamma-ray microscope. I presented Heisenberg’s thought experiment in another post, and so I won’t repeat myself here. I just want to mention that it ‘proves’ the Uncertainty Principle using the Planck-Einstein relations for the energy and momentum of a photon:

E = hf and p = h/λ

Heisenberg’s thought experiment is not a real proof, of course. But then what’s a real proof? The mentioned ‘semi-formal’ derivation looks more impressive, because more mathematical, but it’s not a ‘proof’ either (I hope you’ll understand why I am saying that after reading my post). The main difference between Heisenberg’s thought experiment and the mathematical derivation in the mentioned Wikipedia article is that the ‘mathematical’ approach is based on the de Broglie relation. That de Broglie relation looks the same as the Planck-Einstein relation (p = h/λ) but it’s fundamentally different.

Indeed, the momentum of a photon (i.e. the p we use in the Planck-Einstein relation) is not the momentum one associates with a proper particle, such as an electron or a proton, for example (so that’s the p we use in the de Broglie relation). The momentum of a particle is defined as the product of its mass (m) and velocity (v). Photons don’t have a (rest) mass, and their velocity is absolute (c), so how do we define momentum for a photon? There are a couple of ways to go about it, but the two most obvious ones are probably the following:

We can use the classical theory of electromagnetic radiation and show that the momentum of a photon is related to the magnetic field (we usually only analyze the electric field), and the so-called radiation pressure that results from it. It yields the p = E/c formula which we need to go from E = hf to p = h/λ, using the ubiquitous relation between the frequency, the wavelength and the wave velocity (c = λf). In case you’re interested in the detail, just click on the radiation pressure link).
We can also use the mass-energy equivalence E = mc². Hence, the equivalent mass of the photon is E/c², which is relativistic mass only. However, we can multiply that mass with the photon’s velocity, which is c, thereby getting the very same value for its momentum p = c·E/c²= E/c.

So Heisenberg’s ‘proof’ uses the Planck-Einstein relations, as it analyzes the Uncertainty Principle more as an observer effect: probing matter with light, so to say. In contrast, the mentioned derivation takes the de Broglie relation itself as the point of departure. As mentioned, the de Broglie relations look exactly the same as the Planck-Einstein relationship (E = hf and p = h/λ) but the model behind is very different. In fact, that’s what the Uncertainty Principle is all about: it says that the de Broglie frequency and/or wavelength cannot be determined exactly: if we want to localize a particle, somewhat at least, we’ll be dealing with a frequency range Δf. As such, the de Broglie relation is actually somewhat misleading at first. Let’s talk about the model behind.

A particle, like an electron or a proton, traveling through space, is described by a complex-valued wavefunction, usually denoted by the Greek letter psi (Ψ) or phi (Φ). This wavefunction has a phase, usually denoted as θ (theta) which – because we assume the wavefunction is a nice periodic function – varies as a function of time and space. To be precise, we write θ as θ = ωt – kx or, if the wave is traveling in the other direction, as θ = kx – ωt.

I’ve explained this in a couple of posts already, including my previous post, so I won’t repeat myself here. Let me just note that ω is the angular frequency, which we express in radians per second, rather than cycles per second, so ω = 2πf (one cycle covers 2π rad). As for k, that’s the wavenumber, which is often described as the spatial frequency, because it’s expressed in cycles per meter or, more often (and surely in this case), in radians per meter. Hence, if we freeze time, this number is the rate of change of the phase in space. Because one cycle is, again, 2π rad, and one cycle corresponds to the wave traveling one wavelength (i.e. λ meter), it’s easy to see that k = 2π/λ. We can use these definitions to re-write the de Broglie relations E = hf and p = h/λ as:

E = ħω and p = ħk with h = h/2π

What about the wave velocity? For a photon, we have c = λf and, hence, c = (2π/k)(ω/2π) = ω/k. For ‘particle waves’ (or matter waves, if you prefer that term), it’s much more complicated, because we need to distinguish between the so-called phase velocity (v_p) and the group velocity (v_g). The phase velocity is what we’re used to: it’s the product of the frequency (the number of cycles per second) and the wavelength (the distance traveled by the wave over one cycle), or the ratio of the angular frequency and the wavenumber, so we have, once again, λf = ω/k = v_p. However, this phase velocity is not the classical velocity of the particle that we are looking at. That’s the so-called group velocity, which corresponds to the velocity of the wave packet representing the particle (or ‘wavicle’, if your prefer that term), as illustrated below.

The animation below illustrates the difference between the phase and the group velocity even more clearly: the green dot travels with the ‘wavicles’, while the red dot travels with the phase. As mentioned above, the group velocity corresponds to the classical velocity of the particle (v). However, the phase velocity is a mathematical point that actually travels faster than light. It is a mathematical point only, which does not carry a signal (unlike the modulation of the wave itself, i.e. the traveling ‘groups’) and, hence, it does not contradict the fundamental principle of relativity theory: the speed of light is absolute, and nothing travels faster than light (except mathematical points, as you can, hopefully, appreciate now).

The two animations above do not represent the quantum-mechanical wavefunction, because the functions that are shown are real-valued, not complex-valued. To imagine a complex-valued wave, you should think of something like the ‘wavicle’ below or, if you prefer animations, the standing waves underneath (i.e. C to H: A and B just present the mathematical model behind, which is that of a mechanical oscillator, like a mass on a spring indeed). These representations clearly show the real as well as the imaginary part of complex-valued wave-functions.

With this general introduction, we are now ready for the more formal treatment that follows. So our wavefunction Ψ is a complex-valued function in space and time. A very general shape for it is one we used in a couple of posts already:

Ψ(x, t) ∝ e^{i(kx – ωt)}= cos(kx – ωt) + isin(kx – ωt)

If you don’t know anything about complex numbers, I’d suggest you read my short crash course on it in the essentials page of this blog, because I don’t have the space nor the time to repeat all of that. Now, we can use the de Broglie relationship relating the momentum of a particle with a wavenumber (p = ħk) to re-write our psi function as:

Ψ(x, t) ∝ e^{i(kx – ωt)}= e^{i(px/ħ – ωt)}

Note that I am using the ‘proportional to’ symbol (∝) because I don’t worry about normalization right now. Indeed, from all of my other posts on this topic, you know that we have to take the absolute square of all these probability amplitudes to arrive at a probability density function, describing the probability of the particle effectively being at point x in space at point t in time, and that all those probabilities, over the function’s domain, have to add up to 1. So we should insert some normalization factor.

Having said that, the problem with the wavefunction above is not normalization really, but the fact that it yields a uniform probability density function. In other words, the particle position is extremely uncertain in the sense that it could be anywhere. Let’s calculate it using a little trick: the absolute square of a complex number equals the product of itself with its (complex) conjugate. Hence, if z = re^iθ, then │z│² = zz* = re^iθ·re^–iθ= r²e^iθ^–iθ= r²e⁰= r². Now, in this case, assuming unique values for k, ω, p, which we’ll note as k₀, ω₀, p₀ (and, because we’re freezing time, we can also write t = t₀), we should write:

│Ψ(x)│² = │a₀e^{i(p₀x/ħ – ω₀t₀)}│² = │a₀e^ip₀x/ħe^{–iω₀t₀}│² = │a₀e^ip₀x/ħ│² │e^–i^ω^₀t₀│² = a₀²

Note that, this time around, I did insert some normalization constant a₀ as well, so that’s OK. But so the problem is that this very general shape of the wavefunction gives us a constant as the probability for the particle being somewhere between some point a and another point b in space. More formally, we get the surface for a rectangle when we calculate the probability P[a ≤ X ≤ b] as we should calculate it, which is as follows:

More specifically, because we’re talking one-dimensional space here, we get P[a ≤ X ≤ b] = (b–a)·a₀². Now, you may think that such uniform probability makes sense. For example, an electron may be in some orbital around a nucleus, and so you may think that all ‘points’ on the orbital (or within the ‘sphere’, or whatever volume it is) may be equally likely. Or, in another example, we may know an electron is going through some slit and, hence, we may think that all points in that slit should be equally likely positions. However, we know that it is not the case. Measurements show that not all points are equally likely. For an orbital, we get complicated patterns, such as the one shown below, and please note that the different colors represent different complex numbers and, hence, different probabilities.

Also, we know that electrons going through a slit will produce an interference pattern—even if they go through it one by one! Hence, we cannot associate some flat line with them: it has to be a proper wavefunction which implies, once again, that we can’t accept a uniform distribution.

In short, uniform probability density functions are not what we see in Nature. They’re non-uniform, like the (very simple) non-uniform distributions shown below. [The left-hand side shows the wavefunction, while the right-hand side shows the associated probability density function: the first two are static (i.e. they do not vary in time), while the third one shows a probability distribution that does vary with time.]

I should also note that, even if you would dare to think that a uniform distribution might be acceptable in some cases (which, let me emphasize this, it is not), an electron can surely not be ‘anywhere’. Indeed, the normalization condition implies that, if we’d have a uniform distribution and if we’d consider all of space, i.e. if we let a go to –∞ and b to +∞, then a₀²would tend to zero, which means we’d have a particle that is, literally, everywhere and nowhere at the same time.

In short, a uniform probability distribution does not make sense: we’ll generally have some idea of where the particle is most likely to be, within some range at least. I hope I made myself clear here.

Now, before I continue, I should make some other point as well. You know that the Planck constant (h or ħ) is unimaginably small: about 1×10⁻³⁴J·s (joule-second). In fact, I’ve repeatedly made that point in various posts. However, having said that, I should add that, while it’s unimaginably small, the uncertainties involved are quite significant. Let us indeed look at the value of ħ by relating it to that σ_xσ_p ≥ ħ/2 relation.

Let’s first look at the units. The uncertainty in the position should obviously be expressed in distance units, while momentum is expressed in kg·m/s units. So that works out, because 1 joule is the energy transferred (or work done) when applying a force of 1 newton (N) over a distance of 1 meter (m). In turn, one newton is the force needed to accelerate a mass of one kg at the rate of 1 meter per second per second (this is not a typing mistake: it’s an acceleration of 1 m/s per second, so the unit is m/s²: meter per second squared). Hence, 1 J·s = 1 N·m·s = 1 kg·m/s²·m·s = kg·m²/s. Now, that’s the same dimension as the ‘dimensional product’ for momentum and distance: m·kg·m/s = kg·m²/s.

Now, these units (kg, m and s) are all rather astronomical at the atomic scale and, hence, h and ħ are usually expressed in other dimensions, notably eV·s (electronvolt-second). However, using the standard SI units gives us a better idea of what we’re talking about. If we split the ħ = 1×10⁻³⁴J·s value (let’s forget about the 1/2 factor for now) ‘evenly’ over σ_xand σ_p – whatever that means: all depends on the units, of course! – then both factors will have magnitudes of the order of 1×10⁻¹⁷: 1×10⁻¹⁷m times 1×10⁻¹⁷kg·m/s gives us 1×10⁻³⁴J·s.

You may wonder how this 1×10⁻¹⁷m compares to, let’s say, the classical electron radius, for example. The classical electron radius is, roughly speaking, the ‘space’ an electron seems to occupy as it scatters incoming light. The idea is illustrated below (credit for the image goes to Wikipedia, as usual). The classical electron radius – or Thompson scattering length – is about 2.818×10⁻¹⁵m, so that’s almost 300 times our ‘uncertainty’ (1×10⁻¹⁷m). Not bad: it means that we can effectively relate our ‘uncertainty’ in regard to the position to some actual dimension in space. In this case, we’re talking the femtometer scale (1 fm = 10⁻¹⁵m), and so you’ve surely heard of this before.

What about the other ‘uncertainty’, the one for the momentum (1×10⁻¹⁷kg·m/s)? What’s the typical (linear) momentum of an electron? Its mass, expressed in kg, is about 9.1×10⁻³¹ kg. We also know its relative velocity in an electron: it’s that magical number α = v/c, about which I wrote in some other posts already, so v = αc ≈ 0.0073·3×10⁸m/s ≈ 2.2×10⁶m/s. Now, 9.1×10⁻³¹ kg times 2.2×10⁶m/s is about 2×10^–26kg·m/s, so our proposed ‘uncertainty’ in regard to the momentum (1×10⁻¹⁷kg·m/s) is half a billion times larger than the typical value for it. Now that is, obviously, not so good. [Note that calculations like this are extremely rough. In fact, when one talks electron momentum, it’s usual angular momentum, which is ‘analogous’ to linear momentum, but angular momentum involves very different formulas. If you want to know more about this, check my post on it.]

Of course, now you may feel that we didn’t ‘split’ the uncertainty in a way that makes sense: those –17 exponents don’t work, obviously. So let’s take 1×10^–26kg·m/s for σ_p, which is half of that ‘typical’ value we calculated. Then we’d have 1×10⁻⁸m for σ_x (1×10⁻⁸m times 1×10^–26kg·m/s is, once again, 1×10^–34J·s). But then that uncertainty suddenly becomes a huge number: 1×10⁻⁸m is 100 angstrom. That’s not the atomic scale but the molecular scale! So it’s huge as compared to the pico- or femto-meter scale (1 pm = 1×10⁻¹² m, 1 fm = 1×10⁻¹⁵ m) which we’d sort of expect to see when we’re talking electrons.

OK. Let me get back to the lesson. Why this digression? Not sure. I think I just wanted to show that the Uncertainty Principle involves ‘uncertainties’ that are extremely relevant: despite the unimaginable smallness of the Planck constant, these uncertainties are quite significant at the atomic scale. But back to the ‘proof’ of Kennard’s formulation. Here we need to discuss the ‘model’ we’re using. The rather simple animation below (again, credit for it has to go to Wikipedia) illustrates it wonderfully.

Look at it carefully: we start with a ‘wave packet’ that looks a bit like a normal distribution, but it isn’t, of course. We have negative and positive values, and normal distributions don’t have that. So it’s a wave alright. Of course, you should, once more, remember that we’re only seeing one part of the complex-valued wave here (the real or imaginary part—it could be either). But so then we’re superimposing waves on it. Note the increasing frequency of these waves, and also note how the wave packet becomes increasingly localized with the addition of these waves. In fact, the so-called Fourier analysis, of which you’ve surely heard before, is a mathematical operation that does the reverse: it separates a wave packet into its individual component waves.

So now we know the ‘trick’ for reducing the uncertainty in regard to the position: we just add waves with different frequencies. Of course, different frequencies imply different wavenumbers and, through the de Broglie relationship, we’ll also have different values for the ‘momentum’ associated with these component waves. Let’s write these various values as k_n, ω_n, and p_n respectively, with n going from 0 to N. Of course, our point in time remains frozen at t₀. So we get a wavefunction that’s, quite simply, the sum of N component waves and so we write:

Ψ(x) = ∑ a_ne^{i(p_nx/ħ – ω_nt₀)}= ∑ a_ne^ip_nx/ħe^–iω_nt₀= ∑ A_ne^ip_nx/ħ

Note that, because of the e^–iω_nt₀, we now have complex-valued coefficients A_n = a_ne^–iω_nt₀ in front. More formally, we say that A_n represents the relative contribution of the mode p_n to the overall Ψ(x) wave. Hence, we can write these coefficients A as a function of p. Because Greek letters always make more of an impression, we’ll use the Greek letter Φ (phi) for it. 🙂 Now, we can go to the continuum limit and, hence, transform that sum above into an infinite sum, i.e. an integral. So our wave function then becomes an integral over all possible modes, which we write as:

Don’t worry about that new 1/√2πħ factor in front. That’s, once again, something that has to do with normalization and scales. It’s the integral itself you need to understand. We’ve got that Φ(p) function there, which is nothing but our A_n coefficient, but for the continuum case. In fact, these relative contributions Φ(p) are now referred to as the amplitude of all modes p, and so Φ(p) is actually another wave function: it’s the wave function in the so-called momentum space.

You’ll probably be very confused now, and wonder where I want to go with an integral like this. The point to note is simple: if we have that Φ(p) function, we can calculate (or derive, if you prefer that word) the Ψ(x) from it using that integral above. Indeed, the integral above is referred to as the Fourier transform, and it’s obviously closely related to that Fourier analysis we introduced above.

Of course, there is also an inverse transform, which looks exactly the same: it just switches the wave functions (Ψ and Φ) and variables (x and p), and then (it’s an important detail!), it has a minus sign in the exponent. Together, the two functions – as defined by each other through these two integrals – form a so-called Fourier integral pair, also known as a Fourier transform pair, and the variables involved are referred to as conjugate variables. So momentum (p) and position (x) are conjugate variables and, likewise, energy and time are also conjugate variables (but so I won’t expand on the time-energy relation here: please have a look at one of my others posts on that).

Now, I thought of copying and explaining the proof of Kennard’s inequality from Wikipedia’s article on the Uncertainty Principle (you need to click on the show button in the relevant section to see it), but then that’s pretty boring math, and simply copying stuff is not my objective with this blog. More importantly, the proof has nothing to do with physics. Nothing at all. Indeed, it just proves a general mathematical property of Fourier pairs. More specifically, it proves that, the more concentrated one function is, the more spread out its Fourier transform must be. In other words, it is not possible to arbitrarily concentrate both a function and its Fourier transform.

So, in this case, if we’d ‘squeeze’ Ψ(x), then its Fourier transform Φ(p) will ‘stretch out’, and so that’s what the proof in that Wikipedia article basically shows. In other words, there is some ‘trade-off’ between the ‘compaction’ of Ψ(x), on the one hand, and Φ(p), on the other, and so that is what the Uncertainty Principle is all about. Nothing more, nothing less.

But… Yes? What’s all this talk about ‘squeezing’ and ‘compaction’? We can’t change reality, can we? Well… Here we’re entering the philosophical field, of course. How do we interpret the Uncertainty Principle? It surely does look like us trying to measure something has some impact on the wavefunction. In fact, usually, our measurement – of either position or momentum – usually makes the wavefunctions collapse: we suddenly know where the particle is and, hence, ψ(x) seems to collapse into one point. Alternatively, we measure its momentum and, hence, Φ(p) collapses.

That’s intriguing. In fact, even more intriguing is the possibility we may only partially affect those wavefunctions with measurements that are somewhat less ‘drastic’. It seems a lot of research is focused on that (just Google for partial collapse of the wavefunction, and you’ll finds tons of references, including presentations like this one).

Hmm… I need to further study the topic. The decomposition of a wave into its component waves is obviously something that works well in physics—and not only in quantum mechanics but also in much more mundane examples. Its most general application is signal processing, in which we decompose a signal (which is a function of time) into the frequencies that make it up. Hence, our wavefunction model makes a lot of sense, as it mirrors the physics involved in oscillators and harmonics obviously.

Still… I feel it doesn’t answer the fundamental question: what is our electron really? What do those wave packets represent? Physicists will say questions like this don’t matter: as long as our mathematical models ‘work’, it’s fine. In fact, if even Feynman said that nobody – including himself – truly understands quantum mechanics, then I should just be happy and move on. However, for some reason, I can’t quite accept that. I should probably focus some more on that de Broglie relationship, p = h/λ, as it’s obviously as fundamental to my understanding of the ‘model’ of reality in physics as that Fourier analysis of the wave packet. So I need to do some more thinking on that.

The de Broglie relationship is not intuitive. In fact, I am not ashamed to admit that it actually took me quite some time to understand why we can’t just re-write the de Broglie relationship (λ = h/p) as an uncertainty relation itself: Δλ = h/Δp. Hence, let me be very clear on this:

Δx = h/Δp (that’s the Uncertainty Principle) but Δλ ≠ h/Δp !

Let me quickly explain why.

If the Δ symbol expresses a standard deviation (or some other measurement of uncertainty), we can write the following:

p = h/λ ⇒ Δp = Δ(h/λ) = hΔ(1/λ) ≠ h/Δp

So I can take h out of the brackets after the Δ symbol, because that’s one of the things that’s allowed when working with standard deviations. More in particular, one can prove the following:

The standard deviation of some constant function is 0: Δ(k) = 0
The standard deviation is invariant under changes of location: Δ(x + k) = Δ(x + k)
Finally, the standard deviation scales directly with the scale of the variable: Δ(kx) = |k |Δ(x).

However, it is not the case that Δ(1/x) = 1/Δx. However, let’s not focus on what we cannot do with Δx: let’s see what we can do with it. Δx equals h/Δp according to the Uncertainty Principle—if we take it as an equality, rather than as an inequality, that is. And then we have the de Broglie relationship: p = h/λ. Hence, Δx must equal:

Δx = h/Δp = h/[Δ(h/λ)] =h/[hΔ(1/λ)] = 1/Δ(1/λ)

That’s obvious, but so what? As mentioned, we cannot write Δx = Δλ, because there’s no rule that says that Δ(1/λ) = 1/Δλ and, therefore, h/Δp ≠ Δλ. However, what we can do is define Δλ as an interval, or a length, defined by the difference between its lower and upper bound (let’s denote those two values by λ_a and λ_b respectively. Hence, we write Δλ = λ_b – λ_a. Note that this does not assume we have a continuous range of values for λ: we can have any number of frequencies λ_nbetween λ_a and λ_b, but so you see the point: we’ve got a range of values λ, discrete or continuous, defined by some lower and upper bound.

Now, the de Broglie relation associates two values p_a and p_b with λ_a and λ_b respectively: p_a = h/λ_a and p_b = h/λ_b. Hence, we can similarly define the corresponding Δp interval as p_a – p_b, with p_a = h/λ_a and p_b= h/λ_b. Note that, because we’re taking the reciprocal, we have to reverse the order of the values here: if λ_b > λ_a, then p_a = h/λ_a > p_b= h/λ_b. Hence, we can write Δp = Δ(h/λ) = p_a – p_b = h/λ₁ – h/λ₂= h(1/λ₁ – 1/λ₂) = h[λ₂ – λ₁]/λ₁λ₂. In case you have a bit of difficulty, just draw some reciprocal functions (like the ones below), and have fun connecting intervals on the horizontal axis with intervals on the vertical axis using these functions.

Now, h[λ₂ – λ₁]/λ₁λ₂) is obviously something very different than h/Δλ = h/(λ₂ – λ₁). So we can surely not equate the two and, hence, we cannot write that Δp = h/Δλ.

Having said that, the Δx = 1/Δ(1/λ) = λ₁λ₂/(λ₂ – λ₁) that emerges here is quite interesting. We’ve got a ratio here, λ₁λ₂/(λ₂ – λ₁, which shows that Δx depends only on the upper and lower bounds of the Δλ range. It does not depend on whether or not the interval is discrete or continuous.

The second thing that is interesting to note is Δx depends not only on the difference between those two values (i.e. the length of the interval) but also on their value: if the length of the interval, i.e. the difference between the two frequencies is the same, but their values as such are higher, then we get a higher value for Δx, i.e. a greater uncertainty in the position. Again, this shows that the relation between Δλ and Δx is not straightforward. But so we knew that already, and so I’ll end this post right here and right now. 🙂

The Strange Theory of Light and Matter (II)

If we limit our attention to the interaction between light and matter (i.e. the behavior of photons and electrons only—so we we’re not talking quarks and gluons here), then the ‘crazy ideas’ of quantum mechanics can be summarized as follows:

At the atomic or sub-atomic scale, we can no longer look at light as an electromagnetic wave. It consists of photons, and photons come in blobs. Hence, to some extent, photons are ‘particle-like’.
At the atomic or sub-atomic scale, electrons don’t behave like particles. For example, if we send them through a slit that’s small enough, we’ll observe a diffraction pattern. Hence, to some extent, electrons are ‘wave-like’.

In short, photons aren’t waves, but they aren’t particles either. Likewise, electrons aren’t particles, but they aren’t waves either. They are neither. The weirdest thing of all, perhaps, is that, while light and matter are two very different things in our daily experience – light and matter are opposite concepts, I’d say, just like particles and waves are opposite concepts) – they look pretty much the same in quantum physics: they are both represented by a wavefunction.

Let me immediately make a little note on terminology here. The term ‘wavefunction’ is a bit ambiguous, in my view, because it makes one think of a real wave, like a water wave, or an electromagnetic wave. Real waves are described by real-valued wave functions describing, for example, the motion of a ball on a spring, or the displacement of a gas (e.g. air) as a sound wave propagates through it, or – in the case of an electromagnetic wave – the strength of the electric and magnetic field.

You may have questions about the ‘reality’ of fields, but electromagnetic waves – i.e. the classical description of light – are quite ‘real’ too, even if:

Light doesn’t travel in a medium (like water or air: there is no aether), and
The magnitude of the electric and magnetic field (they are usually denoted by E and B) depend on your reference frame: if you calculate the fields using a moving coordinate system, you will get a different mixture of E and B. Therefore, E and B may not feel very ‘real’ when you look at them separately, but they are very real when we think of them as representing one physical phenomenon: the electromagnetic interaction between particles. So the E and B mix is, indeed, a dual representation of one reality. I won’t dwell on that, as I’ve done that in another post of mine.

How ‘real’ is the quantum-mechanical wavefunction?

The quantum-mechanical wavefunction is not like any of these real waves. In fact, I’d rather use the term ‘probability wave’ but, apparently, that’s used only by bloggers like me 🙂 and so it’s not very scientific. That’s for a good reason, because it’s not quite accurate either: the wavefunction in quantum mechanics represents probability amplitudes, not probabilities. So we should, perhaps, be consistent and term it a ‘probability amplitude wave’ – but then that’s too cumbersome obviously, so the term ‘probability wave’ may be confusing, but it’s not so bad, I think.

Amplitudes and probabilities are related as follows:

Probabilities are real numbers between 0 and 1: they represent the probability of something happening, e.g. a photon moves from point A to B, or a photon is absorbed (and emitted) by an electron (i.e. a ‘junction’ or ‘coupling’, as you know).
Amplitudes are complex numbers, or ‘arrows’ as Feynman calls them: they have a length (or magnitude) and a direction.
We get the probabilities by taking the (absolute) square of the amplitudes.

So photons aren’t waves, but they aren’t particles either. Likewise, electrons aren’t particles, but they aren’t waves either. They are neither. So what are they? We don’t have words to describe what they are. Some use the term ‘wavicle’ but that doesn’t answer the question, because who knows what a ‘wavicle’ is? So we don’t know what they are. But we do know how they behave. As Feynman puts it, when comparing the behavior of light and then of electrons in the double-slit experiment—struggling to find language to describe what’s going on: “There is one lucky break: electrons behave just like light.”

He says so because of that wave function: the mathematical formalism is the same, for photons and for electrons. Exactly the same? […] But that’s such a weird thing to say, isn’t it? We can’t help thinking of light as waves, and of electrons as particles. They can’t be the same. They’re different, aren’t they? They are.

Scales and senses

To some extent, the weirdness can be explained because the scale of our world is not atomic or sub-atomic. Therefore, we ‘see’ things differently. Let me say a few words about the instrument we use to look at the world: our eye.

Our eye is particular. The retina has two types of receptors: the so-called cones are used in bright light, and distinguish color, but when we are in a dark room, the so-called rods become sensitive, and it is believed that they actually can detect a single photon of light. However, neural filters only allow a signal to pass to the brain when at least five photons arrive within less than a tenth of a second. A tenth of a second is, roughly, the averaging time of our eye. So, as Feynman puts it: “If we were evolved a little further so we could see ten times more sensitively, we wouldn’t have this discussion—we would all have seen very dim light of one color as a series of intermittent little flashes of equal intensity.” In other words, the ‘particle-like’ character of light would have been obvious to us.

Let me make a few more remarks here, which you may or may not find useful. The sense of ‘color’ is not something ‘out there’: colors, like red or brown, are experiences in our eye and our brain. There are ‘pigments’ in the cones (cones are the receptors that work only if the intensity of the light is high enough) and these pigments absorb the light spectrum somewhat differently, as a result of which we ‘see’ color. Different animals see different things. For example, a bee can distinguish between white paper using zinc white versus lead white, because they reflect light differently in the ultraviolet spectrum, which the bee can see but we don’t. Bees can also tell the direction of the sun without seeing the sun itself, because they are sensitive to polarized light, and the scattered light of the sky (i.e. the blue sky as we see it) is polarized. The bee can also notice flicker up to 200 oscillations per second, while we see it only up to 20, because our averaging time is like a tenth of a second, which is short for us, but so the averaging time of the bee is much shorter. So we cannot see the quick leg movements and/or wing vibrations of bees, but the bee can!

Sometimes we can’t see any color. For example, we see the night sky in ‘black and white’ because the light intensity is very low, and so it’s our rods, not the cones, that process the signal, and so these rods can’t ‘see’ color. So those beautiful color pictures of nebulae are not artificial (although the pictures are often enhanced). It’s just that the camera that is used to take those pictures (film or, nowadays, digital) is much more sensitive than our eye.

Regardless, color is a quality which we add to our experience of the outside world ourselves. What’s out there are electromagnetic waves with this or that wavelength (or, what amounts to the same, this or that frequency). So when critics of the exact sciences say so much is lost when looking at (visible) light as an electromagnetic wave in the range of 430 to 790 teraherz, they’re wrong. Those critics will say that physics reduces reality. That is not the case.

What’s going on is that our senses process the signal that they are receiving, especially when it comes to vision. As Feynman puts it: “None of the other senses involves such a large amount of calculation, so to speak, before the signal gets into a nerve that one can make measurements on. The calculations for all the rest of the senses usually happen in the brain itself, where it is very difficult to get at specific places to make measurements, because there are so many interconnections. Here, with the visual sense, we have the light, three layers of cells making calculations, and the results of the calculations being transmitted through the optic nerve.”

Hence, things like color and all of the other sensations that we have are the object of study of other sciences, including biochemistry and neurobiology, or physiology. For all we know, what’s ‘out there’ is, effectively, just ‘boring’ stuff, like electromagnetic radiation, energy and ‘elementary particles’—whatever they are. No colors. Just frequencies. 🙂

Light versus matter

If we accept the crazy ideas of quantum mechanics, then the what and the how become one and the same. Hence we can say that photons and electrons are a wavefunction somewhere in space. Photons, of course, are always traveling, because they have energy but no rest mass. Hence, all their energy is in the movement: it’s kinetic, not potential. Electrons, on the other hand, usually stick around some nucleus. And, let’s not forget, they have an electric charge, so their energy is not only kinetic but also potential.

But, otherwise, it’s the same type of ‘thing’ in quantum mechanics: a wavefunction, like those below.

Why diagram A and B? It’s just to emphasize the difference between a real-valued wave function and those ‘probability waves’ we’re looking at here (diagram C to H). A and B represent a mass on a spring, oscillating at more or less the same frequency but a different amplitude. The amplitude here means the displacement of the mass. The function describing the displacement of a mass on a spring (so that’s diagram A and B) is an example of a real-valued wave function: it’s a simple sine or cosine function, as depicted below. [Note that a sine and a cosine are the same function really, except for a phase difference of 90°.]

Let’s now go back to our ‘probability waves’. Photons and electrons, light and matter… The same wavefunction? Really? How can the sunlight that warms us up in the morning and makes trees grow be the same as our body, or the tree? The light-matter duality that we experience must be rooted in very different realities, isn’t it?

Well… Yes and no. If we’re looking at one photon or one electron only, it’s the same type of wavefunction indeed. The same type… OK, you’ll say. So they are the same family or genus perhaps, as they say in biology. Indeed, both of them are, obviously, being referred to as ‘elementary particles’ in the so-called Standard Model of physics. But so what makes an electron and a photon specific as a species? What are the differences?

There’re quite a few, obviously:

1. First, as mentioned above, a photon is a traveling wave function and, because it has no rest mass, it travels at the ultimate speed, i.e. the speed of light (c). An electron usually sticks around or, if it travels through a wire, it travels at very low speeds. Indeed, you may find it hard to believe, but the drift velocity of the free electrons in a standard copper wire is measured in cm per hour, so that’s very slow indeed—and while the electrons in an electron microscope beam may be accelerated up to 70% of the speed of light, and close to c in those huge accelerators, you’re not likely to find an electron microscope or accelerator in Nature. In fact, you may want to remember that a simple thing like electricity going through copper wires in our houses is a relatively modern invention. 🙂

So, yes, those oscillating wave functions in those diagrams above are likely to represent some electron, rather than a photon. To be precise, the wave functions above are examples of standing (or stationary) waves, while a photon is a traveling wave: just extend that sine and cosine function in both directions if you’d want to visualize it or, even better, think of a sine and cosine function in an envelope traveling through space, such as the one depicted below.

Indeed, while the wave function of our photon is traveling through space, it is likely to be limited in space because, when everything is said and done, our photon is not everywhere: it must be somewhere.

At this point, it’s good to pause and think about what is traveling through space. It’s the oscillation. But what’s the oscillation? There is no medium here, and even if there would be some medium (like water or air or something like aether—which, let me remind you, isn’t there!), the medium itself would not be moving, or – I should be precise here – it would only move up and down as the wave propagates through space, as illustrated below. To be fully complete, I should add we also have longitudinal waves, like sound waves (pressure waves): in that case, the particles oscillate back and forth along the direction of wave propagation. But you get the point: the medium does not travel with the wave.

When talking electromagnetic waves, we have no medium. These E and B vectors oscillate but is very wrong to assume they use ‘some core of nearby space’, as Feynman puts it. They don’t. Those field vectors represent a condition at one specific point (admittedly, a point along the direction of travel) in space but, for all we know, an electromagnetic wave travels in a straight line and, hence, we can’t talk about its diameter or so.

Still, as mentioned above, we can imagine, more or less, what E and B stand for (we can use field line to visualize them, for instance), even if we have to take into account their relativity (calculating their values from a moving reference frame results in different mixtures of E and B). But what are those amplitudes? How should we visualize them?

The honest answer is: we can’t. They are what they are: two mathematical quantities which, taken together, form a two-dimensional vector, which we square to find a value for a real-life probability, which is something that – unlike the amplitude concept – does make sense to us. Still, that representation of a photon above (i.e. the traveling envelope with a sine and cosine inside) may help us to ‘understand’ it somehow. Again, you absolute have to get rid of the idea that these ‘oscillations’ would somehow occupy some physical space. They don’t. The wave itself has some definite length, for sure, but that’s a measurement in the direction of travel, which is often denoted as x when discussing uncertainty in its position, for example—as in the famous Uncertainty Principle (ΔxΔp > h).

You’ll say: Oh!—but then, at the very least, we can talk about the ‘length’ of a photon, can’t we? So then a photon is one-dimensional at least, not zero-dimensional! The answer is yes and no. I’ve talked about this before and so I’ll be short(er) on it now. A photon is emitted by an atom when an electron jumps from one energy level to another. It thereby emits a wave train that lasts about 10^–8seconds. That’s not very long but, taking into account the rather spectacular speed of light (3×10⁸m/s), that still makes for a wave train with a length of not less than 3 meter. […] That’s quite a length, you’ll say. You’re right. But you forget that light travels at the speed of light and, hence, we will see this length as zero because of the relativistic length contraction effect. So… Well… Let me get back to the question: if photons and electrons are both represented by a wavefunction, what makes them different?

2. A more fundamental difference between photons and electrons is how they interact with each other.

From what I’ve written above, you understand that probability amplitudes are complex numbers, or ‘arrows’, or ‘two-dimensional vectors’. [Note that all of these terms have precise mathematical definitions and so they’re actually not the same, but the difference is too subtle to matter here.] Now, there are two ways of combining amplitudes, which are referred to as ‘positive’ and ‘negative’ interference respectively. I should immediately note that there’s actually nothing ‘positive’ or ‘negative’ about the interaction: we’re just putting two arrows together, and there are two ways to do that. That’s all.

The diagrams below show you these two ways. You’ll say: there are four! However, remember that we square an arrow to get a probability. Hence, the direction of the final arrow doesn’t matter when we’re taking the square: we get the same probability. It’s the direction of the individual amplitudes that matters when combining them. So the square of A+B is the same as the square of –(A+B) = –A+(–B) = –A–B. Likewise, the square of A–B is the same as the square of –(A–B) = –A+B.

These are the only two logical possibilities for combining arrows. I’ve written ad nauseam about this elsewhere: see my post on amplitudes and statistics, and so I won’t go into too much detail here. Or, in case you’d want something less than a full mathematical treatment, I can refer you to my previous post also, where I talked about the ‘stopwatch’ and the ‘phase’: the convention for the stopwatch is to have its hand turn clockwise (obviously!) while, in quantum physics, the phase of a wave function will turn counterclockwise. But so that’s just convention and it doesn’t matter, because it’s the phase difference between two amplitudes that counts. To use plain language: it’s the difference in the angles of the arrows, and so that difference is just the same if we reverse the direction of both arrows (which is equivalent to putting a minus sign in front of the final arrow).

OK. Let me get back to the lesson. The point is: this logical or mathematical dichotomy distinguishes bosons (i.e. force-carrying ‘particles’, like photons, which carry the electromagnetic force) from fermions (i.e. ‘matter-particles’, such as electrons and quarks, which make up protons and neutrons). Indeed, the so-called ‘positive’ and ‘negative’ interference leads to two very different behaviors:

The probability of getting a boson where there are already n present, is n+1 times stronger than it would be if there were none before.
In contrast, the probability of getting two electrons into exactly the same state is zero.

The behavior of photons makes lasers possible: we can pile zillions of photon on top of each other, and then release all of them in one powerful burst. [The ‘flickering’ of a laser beam is due to the quick succession of such light bursts. If you want to know how it works in detail, check my post on lasers.]

The behavior of electrons is referred to as Fermi’s exclusion principle: it is only because real-life electrons can have one of two spin polarizations (i.e. two opposite directions of angular momentum, which are referred to as ‘up’ or ‘down’, but they might as well have been referred to as ‘left’ or ‘right’) that we find two electrons (instead of just one) in any atomic or molecular orbital.

So, yes, while both photons and electrons can be described by a similar-looking wave function, their behavior is fundamentally different indeed. How is that possible? Adding and subtracting ‘arrows’ is a very similar operation, isn’it?

It is and it isn’t. From a mathematical point of view, I’d say: yes. From a physics point of view, it’s obviously not very ‘similar’, as it does lead to these two very different behaviors: the behavior of photons allows for laser shows, while the behavior of electrons explain (almost) all the peculiarities of the material world, including us walking into doors. 🙂 If you want to check it out for yourself, just check Feynman’s Lectures for more details on this or, else, re-read my posts on it indeed.

3. Of course, there are even more differences between photons and electrons than the two key differences I mentioned above. Indeed, I’ve simplified a lot when I wrote what I wrote above. The wavefunctions of electrons in orbit around a nucleus can take very weird shapes, as shown in the illustration below—and please do google a few others if you’re not convinced. As mentioned above, they’re so-called standing waves, because they occupy a well-defined position in space only, but standing waves can look very weird. In contrast, traveling plane waves, or envelope curves like the one above, are much simpler.

In short: yes, the mathematical representation of photons and electrons (i.e. the wavefunction) is very similar, but photons and electrons are very different animals indeed.

Potentiality and interconnectedness

I guess that, by now, you agree that quantum theory is weird but, as you know, quantum theory does explain all of the stuff that couldn’t be explained before: “It works like a charm”, as Feynman puts it. In fact, he’s often quoted as having said the following:

“It is often stated that of all the theories proposed in this century, the silliest is quantum theory. Some say the the only thing that quantum theory has going for it, in fact, is that it is unquestionably correct.”

Silly? Crazy? Uncommon-sensy? Truth be told, you do get used to thinking in terms of amplitudes after a while. And, when you get used to them, those ‘complex’ numbers are no longer complicated. 🙂 Most importantly, when one thinks long and hard enough about it (as I am trying to do), it somehow all starts making sense.

For example, we’ve done away with dualism by adopting a unified mathematical framework, but the distinction between bosons and fermions still stands: an ‘elementary particle’ is either this or that. There are no ‘split personalities’ here. So the dualism just pops up at a different level of description, I’d say. In fact, I’d go one step further and say it pops up at a deeper level of understanding.

But what about the other assumptions in quantum mechanics. Some of them don’t make sense, do they? Well… I struggle for quite a while with the assumption that, in quantum mechanics, anything is possible really. For example, a photon (or an electron) can take any path in space, and it can travel at any speed (including speeds that are lower or higher than light). The probability may be extremely low, but it’s possible.

Now that is a very weird assumption. Why? Well… Think about it. If you enjoy watching soccer, you’ll agree that flying objects (I am talking about the soccer ball here) can have amazing trajectories. Spin, lift, drag, whatever—the result is a weird trajectory, like the one below:

But, frankly, a photon taking the ‘southern’ route in the illustration below? What are the ‘wheels and gears’ there? There’s nothing sensible about that route, is there?

In fact, there’s at least three issues here:

First, you should note that strange curved paths in the real world (such as the trajectories of billiard or soccer balls) are possible only because there’s friction involved—between the felt of the pool table cloth and the ball, or between the balls, or, in the case of soccer, between the ball and the air. There’s no friction in the vacuum. Hence, in empty space, all things should go in a straight line only.
While it’s quite amazing what’s possible, in the real world that is, in terms of ‘weird trajectories’, even the weirdest trajectories of a billiard or soccer ball can be described by a ‘nice’ mathematical function. We obviously can’t say the same of that ‘southern route’ which a photon could follow, in theory that is. Indeed, you’ll agree the function describing that trajectory cannot be ‘nice’. So even we’d allow all kinds of ‘weird’ trajectories, shouldn’t we limit ourselves to ‘nice’ trajectories only? I mean: it doesn’t make sense to allow the photons traveling from your computer screen to your retina take some trajectory to the Sun and back, does it?
Finally, and most fundamentally perhaps, even when we would assume that there’s some mechanism combining (a) internal ‘wheels and gears’ (such as spin or angular momentum) with (b) felt or air or whatever medium to push against, what would be the mechanism determining the choice of the photon in regard to these various paths? In Feynman’s words: How does the photon ‘make up its mind’?

Feynman answers these questions, fully or partially (I’ll let you judge), when discussing the double-slit experiment with photons:

“Saying that a photon goes this or that way is false. I still catch myself saying, “Well, it goes either this way or that way,” but when I say that, I have to keep in mind that I mean in the sense of adding amplitudes: the photon has an amplitude to go one way, and an amplitude to go the other way. If the amplitudes oppose each other, the light won’t get there—even though both holes are open.”

It’s probably worth re-calling the results of that experiment here—if only to help you judge whether or not Feynman fully answer those questions above!

The set-up is shown below. We have a source S, two slits (A and B), and a detector D. The source sends photons out, one by one. In addition, we have two special detectors near the slits, which may or may not detect a photon, depending on whether or not they’re switched on as well as on their accuracy.

First, we close one of the slits, and we find that 1% of the photons goes through the other (so that’s one photon for every 100 photons that leave S). Now, we open both slits to study interference. You know the results already:

If we switch the detectors off (so we have no way of knowing where the photon went), we get interference. The interference pattern depends on the distance between A and B and varies from 0% to 4%, as shown in diagram (a) below. That’s pretty standard. As you know, classical theory can explain that too assuming light is an electromagnetic wave. But so we have blobs of energy – photons – traveling one by one. So it’s really that double-slit experiment with electrons, or whatever other microscopic particles (as you know, they’ve done these interference electrons with large molecules as well—and they get the same result!). We get the interference pattern by using those quantum-mechanical rules to calculate probabilities: we first add the amplitudes, and it’s only when we’re finished adding those amplitudes, that we square the resulting arrow to the final probability.
If we switch those special detectors on, and if they are 100% reliable (i.e. all photons going through are being detected), then our photon suddenly behaves like a particle, instead of as a wave: they will go through one of the slits only, i.e. either through A, or, alternatively, through B. So the two special detectors never go off together. Hence, as Feynman puts it: we shouldn’t think there is “sneaky way that the photon divides in two and then comes back together again.” It’s one or the other way and, and there’s no interference: the detector at D goes off 2% of the time, which is the simple sum of the probabilities for A and B (i.e. 1% + 1%).
When the special detectors near A and B are not 100% reliable (and, hence, do not detect all photons going through), we have three possible final conditions: (i) A and D go off, (ii) B and D go off, and (iii) D goes off alone (none of the special detectors went off). In that case, we have a final curve that’s a mixture, as shown in diagram (c) and (d) below. We get it using the same quantum-mechanical rules: we add amplitudes first, and then we square to get the probabilities.

Now, I think you’ll agree with me that Feynman doesn’t answer my (our) question in regard to the ‘weird paths’. In fact, all of the diagrams he uses assume straight or nearby paths. Let me re-insert two of those diagrams below, to show you what I mean.

So where are all the strange non-linear paths here? Let me, in order to make sure you get what I am saying here, insert that illustration with the three crazy routes once again. What we’ve got above (Figure 33 and 34) is not like that. Not at all: we’ve got only straight lines there! Why? The answer to that question is easy: the crazy paths don’t matter because their amplitudes cancel each other out, and so that allows Feynman to simplify the whole situation and show all the relevant paths as straight lines only.

Now, I struggled with that for quite a while. Not because I can’t see the math or the geometry involved. No. Feynman does a great job showing why those amplitudes cancel each other out indeed (if you want a summary, see my previous post once again). My ‘problem’ is something else. It’s hard to phrase it, but let me try: why would we even allow for the logical or mathematical possibility of ‘weird paths’ (and let me again insert that stupid diagram below) if our ‘set of rules’ ensures that the truly ‘weird’ paths (like that photon traveling from your computer screen to your eye doing a detour taking it to the Sun and back) cancel each other out anyway? Does that respect Occam’s Razor? Can’t we devise some theory including ‘sensible’ paths only?

Of course, I am just an autodidact with limited time, and I know hundreds (if not thousands) of the best scientists have thought long and hard about this question and, hence, I readily accept the answer is quite simply: no. There is no better theory. I accept that answer, ungrudgingly, not only because I think I am not so smart as those scientists but also because, as I pointed out above, one can’t explain any path that deviates from a straight line really, as there is no medium, so there are no ‘wheels and gears’. The only path that makes sense is the straight line, and that’s only because…

Well… Thinking about it… We think the straight path makes sense because we have no good theory for any of the other paths. Hmm… So, from a logical point of view, assuming that the straight line is the only reasonable path is actually pretty random too. When push comes to shove, we have no good theory for the straight line either!

You’ll say I’ve just gone crazy. […] Well… Perhaps you’re right. 🙂 But… Somehow, it starts to make sense to me. We allow for everything to, then, indeed weed out the crazy paths using our interference theory, and so we do end up with what we’re ending up with: some kind of vague idea of “light not really traveling in a straight line but ‘smelling’ all of the neighboring paths around it and, hence, using a small core of nearby space“—as Feynman puts it.

Hmm… It brings me back to Richard Feynman’s introduction to his wonderful little book, in which he says we should just be happy to know how Nature works and not aspire to know why it works that way. In fact, he’s basically saying that, when it comes to quantum mechanics, the ‘how’ and the ‘why’ are one and the same, so asking ‘why’ doesn’t make sense, because we know ‘how’. He compares quantum theory with the system of calculation used by the Maya priests, which was based on a system of bars and dots, which helped them to do complex multiplications and divisions, for example. He writes the following about it: “The rules were tricky, but they were a much more efficient way of getting an answer to complicated questions (such as when Venus would rise again) than by counting beans.”

When I first read this, I thought the comparison was flawed: if a common Maya Indian did not want to use the ‘tricky’ rules of multiplication and what have you (or, more likely, if he didn’t understand them), he or she could still resort to counting beans. But how do we count beans in quantum mechanics? We have no ‘simpler’ rules than those weird rules about adding amplitudes and taking the (absolute) square of complex numbers so… Well… We actually are counting beans here then:

We allow for any possibility—any path: straight, curved or crooked. Anything is possible.
But all those possibilities are inter-connected. Also note that every path has a mirror image: for every route ‘south’, there is a similar route ‘north’, so to say, except for the straight line, which is a mirror image of itself.
And then we have some clock ticking. Time goes by. It ensures that the paths that are too far removed from the straight line cancel each other. [Of course, you’ll ask: what is too far? But I answered that question – convincingly, I hope – in my previous post: it’s not about the ‘number of arrows’ (as suggested in the caption under that Figure 34 above), but about the frequency and, hence, the ‘wavelength’ of our photon.]
And so… Finally, what’s left is a limited number of possibilities that interfere with each other, which results in what we ‘see’: light seems to use a small core of space indeed–a limited number of nearby paths.

You’ll say… Well… That still doesn’t ‘explain’ why the interference pattern disappears with those special detectors or – what amounts to the same – why the special detectors at the slits never click simultaneously.

You’re right. How do we make sense of that? I don’t know. You should try to imagine what happens for yourself. Everyone has his or her own way of ‘conceptualizing’ stuff, I’d say, and you may well be content and just accept all of the above without trying to ‘imagine’ what’s happening really when a ‘photon’ goes through one or both of those slits. In fact, that’s the most sensible thing to do. You should not try to imagine what happens and just follow the crazy calculus rules.

However, when I think about it, I do have some image in my head. The image is of one of those ‘touch-me-not’ weeds. I quickly googled one of these images, but I couldn’t quite find what I am looking for: it would be more like something that, when you touch it, curls up in a little ball. Any case… You know what I mean, I hope.

You’ll shake your head now and solemnly confirm that I’ve gone mad. Touch-me-not weeds? What’s that got to do with photons?

Well… It’s obvious you and I cannot really imagine how a photon looks like. But I think of it as a blob of energy indeed, which is inseparable, and which effectively occupies some space (in three dimensions that is). I also think that, whatever it is, it actually does travel through both slits, because, as it interferes with itself, the interference pattern does depend on the space between the two slits as well as the width of those slits. In short, the whole ‘geometry’ of the situation matters, and so the ‘interaction’ is some kind of ‘spatial’ thing. [Sorry for my awfully imprecise language here.]

Having said that, I think it’s being detected by one detector only because only one of them can sort of ‘hook’ it, somehow. Indeed, because it’s interconnected and inseparable, it’s the whole blob that gets hooked, not just one part of it. [You may or may not imagine that the detectors that’s got the best hold of it gets it, but I think that’s pushing the description too much.] In any case, the point is that a photon is surely not like a lizard dropping its tail while trying to escape. Perhaps it’s some kind of unbreakable ‘string’ indeed – and sorry for summarizing string theory so unscientifically here – but then a string oscillating in dimensions we can’t imagine (or in some dimension we can’t observe, like the Kaluza-Klein theory suggests). It’s something, for sure, and something that stores energy in some kind of oscillation, I think.

What it is, exactly, we can’t imagine, and we’ll probably never find out—unless we accept that the how of quantum mechanics is not only the why, but also the what. 🙂

Does this make sense? Probably not but, if anything, I hope it fired your imagination at least. 🙂

The wavefunction

Energy and momentum in the wavefunction

Re-visiting the de Broglie relations

The wavefunction and relativistic length contraction

The two-state system in free space

The two-state system in a field

Switching to another representation

Intermezzo: on approximations

Solving the equations

Introducing uncertainty

Thinking about uncertainty

Local energy conservation

Einstein’s car

The basic concepts: force, work, energy and potential

Energy density and energy flow in electrodynamics

Poynting’s vector in electrodynamics

The invariant (1−v2)−1/2·d/dt operator and the proper time s

The four-force vector fμ

The force and the tensor

The invariance of physics and the use of vector equations

Electrodynamics in relativistic notation

Four-vectors and invariance under Lorentz transformations

The electromagnetic tensor

The Lorentz transformation of the electric and magnetic fields

On Maxwell’s equations and the properties of empty space

On quantum oscillations, Planck’s constant, and Planck units

The fine-structure constant

Addendum: How to think about space and time?

**Re-visiting the de Broglie relations**

The invariant (1−v²)^−1/2·d/dt operator and the proper time s

The four-force vector f_μ