The Gauge Idea in EM Theory

Gauge as Causal Bookkeeping

The Lorenz Condition from Maxwell to Quantum Field Theory

Abstract

In this lecture, we revisit the notion of gauge in classical electromagnetism, with particular focus on the Lorenz gauge condition. Rather than treating gauge as a symmetry principle or abstract redundancy, we show that the Lorenz condition emerges naturally as a causal continuity requirement already implicit in Maxwell’s equations. This perspective allows gauge freedom to be understood as bookkeeping freedom rather than physical freedom, and provides a useful conceptual bridge to the role of gauge in quantum field theory (QFT), where similar constraints are often elevated to ontological status.

Note on how this post differs from other posts on the topic: In earlier posts (see, for example, our 2015 post on Maxwell, Lorentz, gauges and gauge transformations) approached the Lorenz gauge primarily from a logical standpoint; the present note revisits the same question with a more explicit emphasis on causality and continuity.

1. Why potentials appear at all

Maxwell’s equations impose structural constraints on electromagnetic fields that make the introduction of potentials unavoidable.

The absence of magnetic monopoles, $\nabla \cdot \mathbf{B} = 0,$

implies that the magnetic field must be expressible as the curl of a vector potential, $\mathbf{B} = \nabla \times \mathbf{A}.$

Faraday’s law of induction, $\nabla \times \mathbf{E} = -\frac{\partial \mathbf{B}}{\partial t},$ then requires the electric field to take the form $\mathbf{E} = -\nabla \phi – \frac{\partial \mathbf{A}}{\partial t}.$

At this stage, no gauge has been chosen. Potentials appear not because they are elegant, but because the curl–divergence structure of Maxwell’s equations demands them. The scalar and vector potentials encode how electromagnetic structure evolves in time.

2. The problem of over-description

The potentials $(\phi, \mathbf{A})$ (ϕ,A) are not uniquely determined by the fields $(\mathbf{E}, \mathbf{B})$ (E,B). Transformations of the form $\mathbf{A} \rightarrow \mathbf{A} + \nabla \chi, \quad \phi \rightarrow \phi – \frac{\partial \chi}{\partial t}$ leave the physical fields unchanged.

This non-uniqueness is often presented as a “gauge freedom.” However, without further restriction, Maxwell’s equations expressed in terms of potentials suffer from a deeper issue: the equations mix instantaneous (elliptic) and propagating (hyperbolic) behavior. In particular, causality becomes obscured at the level of the potentials.

The question is therefore not which gauge to choose, but:

What minimal condition restores causal consistency to the potential description?

3. The Lorenz gauge as a continuity condition

The Lorenz gauge condition, $\nabla \cdot \mathbf{A} + \frac{1}{c^2}\frac{\partial \phi}{\partial t} = 0,$ provides a direct answer.

When imposed, Maxwell’s equations reduce to wave equations for both potentials: $\Box \phi = \frac{\rho}{\varepsilon_0}, \quad \Box \mathbf{A} = \mu_0 \mathbf{J},$ with the same d’Alembert operator $\Box$ . Scalar and vector potentials propagate at the same finite speed and respond locally to their sources.

In covariant form, the Lorenz condition reads: $\partial_\mu A^\mu = 0.$

This equation closely mirrors charge conservation, $\partial_\mu J^\mu = 0.$

The parallel is not accidental. The Lorenz gauge enforces spacetime continuity of electromagnetic influence, ensuring that potentials evolve consistently with conserved sources.

4. Physical interpretation

From this perspective, the Lorenz gauge is not a symmetry principle but a causal closure condition:

the divergence of the vector potential controls longitudinal structure,
the time variation of the scalar potential tracks charge redistribution,
the condition ties both into a single spacetime constraint.

Nothing new is added to Maxwell’s theory. Instead, an implicit requirement — finite-speed propagation — is made explicit at the level of the potentials.

Gauge freedom thus reflects freedom of description under causal equivalence, not freedom of physical behavior.

5. Historical remark

The condition is named after Ludvig Lorenz, who introduced it in 1867, well before relativistic spacetime was formalized. Its later compatibility with Lorentz invariance — developed by Hendrik Antoon Lorentz — explains why it plays a privileged role in relativistic field theory.

The frequent miswriting of the “Lorenz gauge” as “Lorentz gauge” in modern textbooks (including by Richard Feynman) is, therefore, historically inaccurate but physically suggestive.

6. Gauge in quantum field theory: a cautionary bridge

In quantum field theory, gauge invariance is often elevated from a bookkeeping constraint to a foundational principle. This move has undeniable calculational power, but it risks conflating descriptive redundancy with physical necessity.

From the classical electromagnetic perspective developed here, gauge conditions arise whenever:

local causality is enforced,
descriptive variables exceed physical degrees of freedom,
continuity constraints must be imposed to maintain consistency.

Seen this way, gauge symmetry stabilizes theories that would otherwise over-describe their objects. It does not, by itself, mandate the existence of distinct fundamental forces.

7. Concluding remark

The Lorenz gauge is best understood not as an optional choice, nor as a deep symmetry of nature, but as good accounting imposed by causality.

When structure, continuity, and finite propagation speed are respected, gauge quietly disappears into consistency.

Is the weak force a force?

Occam’s Razor

The analysis of a two-state system (i.e. the rather famous example of an ammonia molecule ‘flipping’ its spin direction from ‘up’ to ‘down’, or vice versa) in my previous post is a good opportunity to think about Occam’s Razor once more. What are we doing? What does the math tell us?

In the example we chose, we didn’t need to worry about space. It was all about time: an evolving state over time. We also knew the answers we wanted to get: if there is some probability for the system to ‘flip’ from one state to another, we know it will, at some point in time. We also want probabilities to add up to one, so we knew the graph below had to be the result we would find: if our molecule can be in two states only, and it starts of in one, then the probability that it will remain in that state will gradually decline, while the probability that it flips into the other state will gradually increase, which is what is depicted below.

However, the graph above is only a Platonic idea: we don’t bother to actually verify what state the molecule is in. If we did, we’d have to ‘re-set’ our t = 0 point, and start all over again. The wavefunction would collapse, as they say, because we’ve made a measurement. However, having said that, yes, in the physicist’s Platonic world of ideas, the probability functions above make perfect sense. They are beautiful. You should note, for example, that P₁ (i.e. the probability to be in state 1) and P₂ (i.e. the probability to be in state 2) add up to 1 all of the time, so we don’t need to integrate over a cycle or something: so it’s all perfect!

These probability functions are based on ideas that are even more Platonic: interfering amplitudes. Let me explain.

Quantum physics is based on the idea that these probabilities are determined by some wavefunction, a complex-valued amplitude that varies in time and space. It’s a two-dimensional thing, and then it’s not. It’s two-dimensional because it combines a sine and cosine, i.e. a real and an imaginary part, but the argument of the sine and the cosine is the same, and the sine and cosine are the same function, except for a phase shift equal to π. We write:

a·e^−iθ= a·cos(θ) – a·sin(−θ) = a·cosθ – a·sinθ

The minus sign is there because it turns out that Nature measures angles, i.e. our phase, clockwise, rather than counterclockwise, so that’s not as per our mathematical convention. But that’s a minor detail, really. [It should give you some food for thought, though.] For the rest, the related graph is as simple as the formula:

Now, the phase of this wavefunction is written as θ = (ω·t − k ∙x). Hence, ω determines how this wavefunction varies in time, and the wavevector k tells us how this wave varies in space. The young Frenchman Comte Louis de Broglie noted the mathematical similarity between the ω·t − k ∙x expression and Einstein’s four-vector product p_μx_μ= E·t − p∙x, which remains invariant under a Lorentz transformation. He also understood that the Planck-Einstein relation E = ħ·ω actually defines the energy unit and, therefore, that any frequency, any oscillation really, in space or in time, is to be expressed in terms of ħ.

[To be precise, the fundamental quantum of energy is h = ħ·2π, because that’s the energy of one cycle. To illustrate the point, think of the Planck-Einstein relation. It gives us the energy of a photon with frequency f: E_γ = h·f. If we re-write this equation as E_γ/f = h, and we do a dimensional analysis, we get: h = E_γ/f ⇔ 6.626×10⁻³⁴ joule·second = [x joule]/[f cycles per second] ⇔ h = 6.626×10⁻³⁴ joule per cycle. It’s only because we are expressing ω and k as angular frequencies (i.e. in radians per second or per meter, rather than in cycles per second or per meter) that we have to think of ħ = h/2π rather than h.]

Louis de Broglie connected the dots between some other equations too. He was fully familiar with the equations determining the phase and group velocity of composite waves, or a wavetrain that actually might represent a wavicle traveling through spacetime. In short, he boldly equated ω with ω = E/ħ and k with k = p/ħ, and all came out alright. It made perfect sense!

I’ve written enough about this. What I want to write about here is how this also makes for the situation on hand: a simple two-state system that depends on time only. So its phase is θ = ω·t = E₀/ħ. What’s E₀? It is the total energy of the system, including the equivalent energy of the particle’s rest mass and any potential energy that may be there because of the presence of one or the other force field. What about kinetic energy? Well… We said it: in this case, there is no translational or linear momentum, so p = 0. So our Platonic wavefunction reduces to:

a·e^−iθ= ae⁻⁽^{i/ħ)·(E₀·t)}

Great! […] But… Well… No! The problem with this wavefunction is that it yields a constant probability. To be precise, when we take the absolute square of this wavefunction – which is what we do when calculating a probability from a wavefunction − we get P = a², always. The ‘normalization’ condition (so that’s the condition that probabilities have to add up to one) implies that P₁ = P₂ = a² = 1/2. Makes sense, you’ll say, but the problem is that this doesn’t reflect reality: these probabilities do not evolve over time and, hence, our ammonia molecule never ‘flips’ its spin direction from ‘up’ to ‘down’, or vice versa. In short, our wavefunction does not explain reality.

The problem is not unlike the problem we’d had with a similar function relating the momentum and the position of a particle. You’ll remember it: we wrote it as a·e^−iθ= ae⁽^{i/ħ)·(p·x)}. [Note that we can write a·e^−iθ= a·e⁻⁽^{i/ħ)·(E₀·t − p·x)}= a·e⁻⁽ⁱ^/ħ)·(E^₀·t)·e⁽^{i/ħ)·(p·x)}, so we can always split our wavefunction in a ‘time’ and a ‘space’ part.] But then we found that this wavefunction also yielded a constant and equal probability all over space, which implies our particle is everywhere (and, therefore, nowhere, really).

In quantum physics, this problem is solved by introducing uncertainty. Introducing some uncertainty about the energy, or about the momentum, is mathematically equivalent to saying that we’re actually looking at a composite wave, i.e. the sum of a finite or infinite set of component waves. So we have the same ω = E/ħ and k = p/ħ relations, but we apply them to n energy levels, or to some continuous range of energy levels ΔE. It amounts to saying that our wave function doesn’t have a specific frequency: it now has n frequencies, or a range of frequencies Δω = ΔE/ħ.

We know what that does: it ensures our wavefunction is being ‘contained’ in some ‘envelope’. It becomes a wavetrain, or a kind of beat note, as illustrated below:

[The animation also shows the difference between the group and phase velocity: the green dot shows the group velocity, while the red dot travels at the phase velocity.]

This begs the following question: what’s the uncertainty really? Is it an uncertainty in the energy, or is it an uncertainty in the wavefunction? I mean: we have a function relating the energy to a frequency. Introducing some uncertainty about the energy is mathematically equivalent to introducing uncertainty about the frequency. Of course, the answer is: the uncertainty is in both, so it’s in the frequency and in the energy and both are related through the wavefunction. So… Well… Yes. In some way, we’re chasing our own tail. 🙂

However, the trick does the job, and perfectly so. Let me summarize what we did in the previous post: we had the ammonia molecule, i.e. an NH₃ molecule, with the nitrogen ‘flipping’ across the hydrogens from time to time, as illustrated below:

This ‘flip’ requires energy, which is why we associate two energy levels with the molecule, rather than just one. We wrote these two energy levels as E₀+ A and E₀− A. That assumption solved all of our problems. [Note that we don’t specify what the energy barrier really consists of: moving the center of mass obviously requires some energy, but it is likely that a ‘flip’ also involves overcoming some electrostatic forces, as shown by the reversal of the electric dipole moment in the illustration above.] To be specific, it gave us the following wavefunctions for the amplitude to be in the ‘up’ or ‘1’ state versus the ‘down’ or ‘2’ state respectivelly:

C₁= (1/2)·e^{−(i/ħ)·(E₀− A)·t}+ (1/2)·e^{−(i/ħ)·(E₀+ A)·t}
C₂= (1/2)·e^{−(i/ħ)·(E₀− A)·t}– (1/2)·e^{−(i/ħ)·(E₀+ A)·t}

Both are composite waves. To be precise, they are the sum of two component waves with a temporal frequency equal to ω₁= (E₀− A)/ħ and ω₁= (E₀+ A)/ħ respectively. [As for the minus sign in front of the second term in the wave equation for C₂, −1 = e^±iπ, so + (1/2)·e^{−(i/ħ)·(E₀+ A)·t}and – (1/2)·e^{−(i/ħ)·(E₀+ A)·t} are the same wavefunction: they only differ because their relative phase is shifted by ±π.] So the so-called base states of the molecule themselves are associated with two different energy levels: it’s not like one state has more energy than the other.

You’ll say: so what?

Well… Nothing. That’s it really. That’s all I wanted to say here. The absolute square of those two wavefunctions gives us those time-dependent probabilities above, i.e. the graph we started this post with. So… Well… Done!

You’ll say: where’s the ‘envelope’? Oh! Yes! Let me tell you. The C₁(t) and C₂(t) equations can be re-written as:

Now, remembering our rules for adding and subtracting complex conjugates (e^iθ + e^–iθ = 2cosθ and e^iθ − e^–iθ = 2sinθ), we can re-write this as:

So there we are! We’ve got wave equations whose temporal variation is basically defined by E₀but, on top of that, we have an envelope here: the cos(A·t/ħ) and sin(A·t/ħ) factor respectively. So their magnitude is no longer time-independent: both the phase as well as the amplitude now vary with time. The associated probabilities are the ones we plotted:

|C₁(t)|²= cos²[(A/ħ)·t], and
|C₂(t)|²= sin²[(A/ħ)·t].

So, to summarize it all once more, allowing the nitrogen atom to push its way through the three hydrogens, so as to flip to the other side, thereby breaking the energy barrier, is equivalent to associating two energy levels to the ammonia molecule as a whole, thereby introducing some uncertainty, or indefiniteness as to its energy, and that, in turn, gives us the amplitudes and probabilities that we’ve just calculated. [And you may want to note here that the probabilities “sloshing back and forth”, or “dumping into each other” – as Feynman puts it – is the result of the varying magnitudes of our amplitudes, so that’s the ‘envelope’ effect. It’s only because the magnitudes vary in time that their absolute square, i.e. the associated probability, varies too.

So… Well… That’s it. I think this and all of the previous posts served as a nice introduction to quantum physics. More in particular, I hope this post made you appreciate the mathematical framework is not as horrendous as it often seems to be.

When thinking about it, it’s actually all quite straightforward, and it surely respects Occam’s principle of parsimony in philosophical and scientific thought, also know as Occam’s Razor: “When trying to explain something, it is vain to do with more what can be done with less.” So the math we need is the math we need, really: nothing more, nothing less. As I’ve said a couple of times already, Occam would have loved the math behind QM: the physics call for the math, and the math becomes the physics.

That’s what makes it beautiful. 🙂

Post scriptum:

One might think that the addition of a term in the argument in itself would lead to a beat note and, hence, a varying probability but, no! We may look at e^{−(i/ħ)·(E₀+ A)·t}as a product of two amplitudes:

e^{−(i/ħ)·(E₀+ A)·t}= e^{−(i/ħ)·E₀·t}·e^{−(i/ħ)·A·t}

But, when writing this all out, one just gets a cos(α·t+β·t)–sin(α·t+β·t), whose absolute square |cos(α·t+β·t)–sin(α·t+β·t)|²= 1. However, writing e^{−(i/ħ)·(E₀+ A)·t}as a product of two amplitudes in itself is interesting. We multiply amplitudes when an event consists of two sub-events. For example, the amplitude for some particle to go from s to x via some point a is written as:

〈 x | s 〉_{via a} = 〈 x | a 〉〈 a | s 〉

Having said that, the graph of the product is uninteresting: the real and imaginary part of the wavefunction are a simple sine and cosine function, and their absolute square is constant, as shown below.

Adding two waves with very different frequencies – A is a fraction of E₀– gives a much more interesting pattern, like the one below, which shows an e^−iαt+e^−iβt= cos(αt)−i·sin(αt)+cos(βt)−i·sin(βt) = cos(αt)+cos(βt)−i·[sin(αt)+sin(βt)] pattern for α = 1 and β = 0.1.

That doesn’t look a beat note, does it? The graphs below, which use 0.5 and 0.01 for β respectively, are not typical beat notes either.

We get our typical ‘beat note’ only when we’re looking at a wave traveling in space, so then we involve the space variable x again, and the relations that come with in, i.e. a phase velocity v_p= ω/k = (E/ħ)/(p/ħ) = E/p = c²/v (read: all component waves travel at the same speed), and a group velocity v_g= dω/dk = v (read: the composite wave or wavetrain travels at the classical speed of our particle, so it travels with the particle, so to speak). That’s what’s I’ve shown numerous times already, but I’ll insert one more animation here, just to make sure you see what we’re talking about. [Credit for the animation goes to another site, one on acoustics, actually!]

So what’s left? Nothing much. The only thing you may want to do is to continue thinking about that wavefunction. It’s tempting to think it actually is the particle, somehow. But it isn’t. So what is it then? Well… Nobody knows, really, but I like to think it does travel with the particle. So it’s like a fundamental property of the particle. We need it every time when we try to measure something: its position, its momentum, its spin (i.e. angular momentum) or, in the example of our ammonia molecule, its orientation in space. So the funny thing is that, in quantum mechanics,

We can measure probabilities only, so there’s always some randomness. That’s how Nature works: we don’t really know what’s happening. We don’t know the internal wheels and gears, so to speak, or the ‘hidden variables’, as one interpretation of quantum mechanics would say. In fact, the most commonly accepted interpretation of quantum mechanics says there are no ‘hidden variables’.
But then, as Polonius famously put, there is a method in this madness, and the pioneers – I mean Werner Heisenberg, Louis de Broglie, Niels Bohr, Paul Dirac, etcetera – discovered. All probabilities can be found by taking the square of the absolute value of a complex-valued wavefunction (often denoted by Ψ), whose argument, or phase (θ), is given by the de Broglie relations ω = E/ħ and k = p/ħ:

θ = (ω·t − k ∙x) = (E/ħ)·t − (p/ħ)·x

That should be obvious by now, as I’ve written dozens of posts on this by now. 🙂 I still have trouble interpreting this, however—and I am not ashamed, because the Great Ones I just mentioned have trouble with that too. But let’s try to go as far as we can by making a few remarks:

Adding two terms in math implies the two terms should have the same dimension: we can only add apples to apples, and oranges to oranges. We shouldn’t mix them. Now, the (E/ħ)·t and (p/ħ)·x terms are actually dimensionless: they are pure numbers. So that’s even better. Just check it: energy is expressed in newton·meter (force over distance, remember?) or electronvolts (1 eV = 1.6×10⁻¹⁹J = 1.6×10⁻¹⁹N·m); Planck’s constant, as the quantum of action, is expressed in J·s or eV·s; and the unit of (linear) momentum is 1 N·s = 1 kg·m/s = 1 N·s. E/ħ gives a number expressed per second, and p/ħ a number expressed per meter. Therefore, multiplying it by t and x respectively gives us a dimensionless number indeed.
It’s also an invariant number, which means we’ll always get the same value for it. As mentioned above, that’s because the four-vector product p_μx_μ= E·t − p∙x is invariant: it doesn’t change when analyzing a phenomenon in one reference frame (e.g. our inertial reference frame) or another (i.e. in a moving frame).
Now, Planck’s quantum of action h or ħ (they only differ in their dimension: h is measured in cycles per second and ħ is measured in radians per second) is the quantum of energy really. Indeed, if “energy is the currency of the Universe”, and it’s real and/or virtual photons who are exchanging it, then it’s good to know the currency unit is h, i.e. the energy that’s associated with one cycle of a photon.
It’s not only time and space that are related, as evidenced by the fact that t − x itself is an invariant four-vector, E and p are related too, of course! They are related through the classical velocity of the particle that we’re looking at: E/p = c²/v and, therefore, we can write: E·β = p·c, with β = v/c, i.e. the relative velocity of our particle, as measured as a ratio of the speed of light. Now, I should add that the t − x four-vector is invariant only if we measure time and space in equivalent units. Otherwise, we have to write c·t − x. If we do that, so our unit of distance becomes c meter, rather than one meter, or our unit of time becomes the time that is needed for light to travel one meter, then c = 1, and the E·β = p·c becomes E·β = p, which we also write as β = p/E: the ratio of the energy and the momentum of our particle is its (relative) velocity.

Combining all of the above, we may want to assume that we are measuring energy and momentum in terms of the Planck constant, i.e. the ‘natural’ unit for both. In addition, we may also want to assume that we’re measuring time and distance in equivalent units. Then the equation for the phase of our wavefunctions reduces to:

θ = (ω·t − k ∙x) = E·t − p·x

Now, θ is the argument of a wavefunction, and we can always re-scale such argument by multiplying or dividing it by some constant. It’s just like writing the argument of a wavefunction as v·t–x or (v·t–x)/v = t –x/v with v the velocity of the waveform that we happen to be looking at. [In case you have trouble following this argument, please check the post I did for my kids on waves and wavefunctions.] Now, the energy conservation principle tells us the energy of a free particle won’t change. [Just to remind you, a ‘free particle’ means it is present in a ‘field-free’ space, so our particle is in a region of uniform potential.] You see what I am going to do now: we can, in this case, treat E as a constant, and divide E·t − p·x by E, so we get a re-scaled phase for our wavefunction, which I’ll write as:

φ = (E·t − p·x)/E = t − (p/E)·x = t − β·x

Now that’s the argument of a wavefunction with the argument expressed in distance units. Alternatively, we could also look at p as some constant, as there is no variation in potential energy that will cause a change in momentum, i.e. in kinetic energy. We’d then divide by p and we’d get (E·t − p·x)/p = (E/p)·t − x) = t/β − x, which amounts to the same, as we can always re-scale by multiplying it with β, which would then yield the same t − β·x argument.

The point is, if we measure energy and momentum in terms of the Planck unit (I mean: in terms of the Planck constant, i.e. the quantum of energy), and if we measure time and distance in ‘natural’ units too, i.e. we take the speed of light to be unity, then our Platonic wavefunction becomes as simple as:

Φ(φ) = a·e^−iφ= a·e^{−i(t − β·x)}

This is a wonderful formula, but let me first answer your most likely question: why would we use a relative velocity?Well… Just think of it: when everything is said and done, the whole theory of relativity and, hence, the whole of physics, is based on one fundamental and experimentally verified fact: the speed of light is absolute. In whatever reference frame, we will always measure it as 299,792,458 m/s. That’s obvious, you’ll say, but it’s actually the weirdest thing ever if you start thinking about it, and it explains why those Lorentz transformations look so damn complicated. In any case, this fact legitimately establishes c as some kind of absolute measure against which all speeds can be measured. Therefore, it is only natural indeed to express a velocity as some number between 0 and 1. Now that amounts to expressing it as the β = v/c ratio.

Let’s now go back to that Φ(φ) = a·e^−iφ= a·e^{−i(t − β·x)}wavefunction. Its temporal frequency ω is equal to one, and its spatial frequency k is equal to β = v/c. It couldn’t be simpler but, of course, we’ve got this remarkably simple result because we re-scaled the argument of our wavefunction using the energy and momentum itself as the scale factor. So, yes, we can re-write the wavefunction of our particle in a particular elegant and simple form using the only information that we have when looking at quantum-mechanical stuff: energy and momentum, because that’s what everything reduces to at that level.

Of course, the analysis above does not include uncertainty. Our information on the energy and the momentum of our particle will be incomplete: we’ll write E = E₀± σ_E, and p = p₀± σ_p. [I am a bit tired of using the Δ symbol, so I am using the σ symbol here, which denotes a standard deviation of some density function. It underlines the probabilistic, or statistical, nature of our approach.] But, including that, we’ve pretty much explained what quantum physics is about here.

You just need to get used to that complex exponential: e^−iφ = cos(−φ) + i·sin(−φ) = cos(φ) − i·sin(φ). Of course, it would have been nice if Nature would have given us a simple sine or cosine function. [Remember the sine and cosine function are actually the same, except for a phase difference of 90 degrees: sin(φ) = cos(π/2−φ) = cos(φ+π/2). So we can go always from one to the other by shifting the origin of our axis.] But… Well… As we’ve shown so many times already, a real-valued wavefunction doesn’t explain the interference we observe, be it interference of electrons or whatever other particles or, for that matter, the interference of electromagnetic waves itself, which, as you know, we also need to look at as a stream of photons , i.e. light quanta, rather than as some kind of infinitely flexible aether that’s undulating, like water or air.

So… Well… Just accept that e^−iφ is a very simple periodic function, consisting of two sine waves rather than just one, as illustrated below.

And then you need to think of stuff like this (the animation is taken from Wikipedia), but then with a projection of the sine of those phasors too. It’s all great fun, so I’ll let you play with it now. 🙂

Some content on this page was disabled on June 20, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 20, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Diffraction and the Uncertainty Principle (II)

Pre-script (dated 26 June 2020): This post did not suffer too much from the attack on this blog by the the dark force. It remains relevant. 🙂

Original post:

In my previous post, I derived and explained the general formula for the pattern generated by a light beam going through a slit or a circular aperture: the diffraction pattern. For light going through an aperture, this generates the so-called Airy pattern. In practice, diffraction causes a blurring of the image, and may make it difficult to distinguish two separate points, as shown below (credit for the image must go to Wikipedia again, I am afraid).

Airy_disk_spacing_near_Rayleigh_criterion

What’s actually going on is that the lens acts as a slit or, if it’s circular (which is usually the case), as an aperture indeed: the wavefront of the transmitted light is taken to be spherical or plane when it exits the lens and interferes with itself, thereby creating the ring-shaped diffraction pattern that we explained in the previous post.

The spatial resolution is also known as the angular resolution, which is quite appropriate, because it refers to an angle indeed: we know the first minimum (i.e. the first black ring) occurs at an angle θ such that sinθ = λ/L, with λ the wavelength of the light and L the lens diameter. It’s good to remind ourselves of the geometry of the situation: below we picture the array of oscillators, and so we know that the first minimum occurs at an angle such that Δ = λ. The second, third, fourth etc minimum occurs at an angle θ such that Δ = 2λ, 3λ, 4λ, etc. However, these secondary minima do not play any role in determining the resolving power of a lens, or a telescope, or an electron microscope, etc, and so you can just forget about them for the time being.

For small angles (expressed in radians), we can use the so-called small-angle approximation and equate sinθ with θ: the error of this approximation is less than one percent for angles smaller than 0.244 radians (14°), so we have the amazingly simply result that the first minimum occurs at an angle θ such that:

θ = λ/L

Spatial resolution of a microscope: the Rayleigh criterion versus Dawes’ limit

If we have two point sources right next to each other, they will create two Airy disks, as shown above, which may overlap. That may make it difficult to see them, in a telescope, a microscope, or whatever device. Hence, telescopes, microscopes (using light or electron beams or whatever) have a limited resolving power. How do we measure that?

The so-called Rayleigh criterion regards two point sources as just resolved when the principal diffraction maximum of one image coincides with the first minimum of the other, as shown below. If the distance is greater, the two points are (very) well resolved, and if it is smaller, they are regarded as not resolved. This angle is obviously related to the θ = λ/L angle but it’s not the same: in fact, it’s a slightly wider angle. The analysis involved in calculating the angular resolution in terms of angle, and we use the same symbol θ for it, is quite complicated and so I’ll skip that and just give you the result:

θ = 1.22λ/L

Note that, in this equation, θ stands for the angular resolution, λ for the wavelength of the light being used, and L is the diameter of the (aperture of) the lens. In the first of the three images above, the two points are well separated and, hence, the angle between them is well above the angular resolution. In the second, the angle between just meets the Rayleigh criterion, and in the third the angle between them is smaller than the angular resolution and, hence, the two points are not resolved.

Of course, the Rayleigh criterion is, to some extent, a matter of judgment. In fact, an English 19th century astronomer, named William Rutter Dawes, actually tested human observers on close binary stars of equal brightness, and found they could make out the two stars within an angle that was slightly narrower than the one given by the Rayleigh criterion. Hence, for an optical telescope, you’ll also find the simple θ = λ/L formula, so that’s the formula without the 1.22 factor (of course, λ here is, once again, the wavelength of the observed light or radiation, and L is the diameter of the telescope’s primary lens). This very simple formula allows us, for example, to calculate the diameter of the telescope lens we’d need to build to separate (see) objects in space with a resolution of, for example, 1 arcsec (i.e. 1/3600 of a degree or π/648,000 of a radian). Indeed, if we filter for yellow light only, which has a wavelength of 580 nm, we find L = 580×10⁻⁹ m/(π/648,000) = 0.119633×10⁻⁶ m ≈ 12 cm. [Just so you know: that’s about the size of the lens aperture of a good telescope (4 or 6 inches) for amateur astronomers–just in case you’d want one. :-)]

This simplified formula is called Dawes’ limit, and you’ll often see it used instead of Rayleigh’s criterion. However, the fact that it’s exactly the same formula as our formula for the first minimum of the Airy pattern should not confuse you: angular resolution is something different.

Now, after this introduction, let me get to the real topic of this post: Heisenberg’s Uncertainty Principle according to Heisenberg.

Heisenberg’s Uncertainty Principle according to Heisenberg

I don’t know about you but, as a kid, I didn’t know much about waves and fields and all that, and so I had difficulty understanding why the resolving power of a microscope or any other magnifying device depended on the frequency or wavelength. I now know my understanding was limited because I thought the concept of the amplitude of an electromagnetic wave had some spatial meaning, like the amplitude of a water or a sound wave. You know what I mean: this false idea that an electromagnetic wave is something that sort of wriggles through space, just like a water or sound wave wriggle through their medium (water and air respectively). Now I know better: the amplitude of an electromagnetic wave measures field strength and there’s no medium (no aether). So it’s not like a wave going around some object, or making some medium oscillate. I am not ashamed to acknowledge my stupidity at the time: I am just happy I finally got it, because it helps to really understand Heisenberg’s own illustration of his Uncertainty Principle, which I’ll present now.

Heisenberg imagined a gamma-ray microscope, as shown below (I copied this from the website of the American Institute for Physics ). Gamma-ray microscopes don’t exist – they’re hard to produce: you need a nuclear reactor or so 🙂 – but, as Heisenberg saw the development of new microscopes using higher and higher energy beams (as opposed to the 1.5-3 eV light in the visible spectrum) so as to increase the angular resolution and, hence, be able to see smaller things, he imagined one could use, perhaps, gamma-rays for imaging. Gamma rays are the hardest radiation, with frequencies of 10 exaherz and more (or >10¹⁹ Hz) and, hence, energies above 100 keV (i.e. 100,000 more than photons in the visible light spectrum, and 1000 times more than the electrons used in an average electron microscope). Gamma rays are not the result of some electron jumping from a higher to a lower energy level: they are emitted in decay processes of atomic nuclei (gamma decay). But I am digressing. Back to the main story line. So Heisenberg imagined we could ‘shine’ gamma rays on an electron and that we could then ‘see’ that electron in the microscope because some of the gamma photons would indeed end up in the microscope after their ‘collision’ with the electron, as shown below.

The experiment is described in many places elsewhere but I found these accounts often confusing, and so I present my own here. 🙂

What Heisenberg basically meant to show is that this set-up would allow us to gather precise information on the position of the electron–because we would know where it was–but that, as a result, we’d lose information in regard to its momentum. Why? To put it simply: because the electron recoils as a result of the interaction. The point, of course, is to calculate the exact relationship between the two (position and momentum). In other words: what we want to do is to state the Uncertainty Principle quantitatively, not qualitatively.

Now, the animation above uses the symbol L for the γ-ray wavelength λ, which is confusing because I used L for the diameter of the aperture in my explanation of diffraction above. The animation above also uses a different symbol for the angular resolution: A instead of θ. So let me borrow the diagram used in the Wikipedia article and rephrase the whole situation.

From the diagram above, it’s obvious that, to be scattered into the microscope, the γ-ray photon must be scattered into a cone with angle ε. That angle is obviously related to the angular resolution of the microscope, which is θ = ε/2 = λ/D, with D the diameter of the aperture (i.e. the primary lens). Now, the electron could actually be anywhere, and the scattering angle could be much larger than ε, and, hence, relating D to the uncertainty in position (Δx) is not as obvious as most accounts of this thought experiment make it out to be. The thing is: if the scattering angle is larger than ε, it won’t reach the light detector at the end of the microscope (so that’s the flat top in the diagram above). So that’s why we can equate D with Δx, so we write Δx = ± D/2 = D. To put it differently: the assumption here is basically that this imaginary microscope ‘sees’ an area that is approximately as large as the lens. Using the small-angle approximation (so we write sin(2ε) ≈ 2ε), we can write:

Δx = 2λ/ε

Now, because of the recoil effect, the electron receives some momentum from the γ-ray photon. How much? Well… The situation is somewhat complicated (much more complicated than the Wikipedia article on this very same topic suggests), because the photon keeps some but also gives some of its original momentum. In fact, what’s happening really is Compton scattering: the electron first absorbs the photon, and then emits another with a different energy and, hence, also with different frequency and wavelength. However, what we do now is that the photon’s original momentum was equal to E/c= p = h/λ. That’s just the Planck relation or, if you’d want to look at the photon as a particle, the de Broglie equation.

Now, because we’re doing an analysis in one dimension only (x), we’re only going to look at the momentum in this direction only, i.e. p_x, and we’ll assume that all of the momentum of the photon before the interaction (or ‘collision’ if you want) was horizontal. Hence, we can write p_x= h/λ. After the collision, however, this momentum is spread over the electron and the scattered or emitted photon that’s going into the microscope. Let’s now imagine the two extremes:

The scattered photon goes to the left edge of the lens. Hence, its horizontal momentum is negative (because it moves to the left) and the momentum p_xwill be distributed over the electron and the photon such that p_x= p’_x–h(ε/2)/λ’. Why the ε/2 factor? Well… That’s just trigonometry: the horizontal momentum of the scattered photon is obviously only a tiny fraction of its original horizontal momentum, and that fraction is given by the angle ε/2.
The scattered photon goes to the right edge of the lens. In that case, we write p_x= p”_x+ h(ε/2)/λ”.

Now, the spread in the momentum of the electron, which we’ll simply write as Δp, is obviously equal to:

Δp = p”_x– p’_x= p_x+ h(ε/2)/λ” – p_x+ h(ε/2)/λ’ = h(ε/2)/λ” + h(ε/2)/λ’ = h(ε/2)/λ” + h(ε/2)/λ’

That’s a nice formula, but what can we do with it? What we want is a relationship between Δx and Δp, i.e. the position and the momentum of the electron, and of the electron only. That involves another simplification, which is also dealt with very summarily – too summarily in my view – in most accounts of this experiment. So let me spell it out. The angle ε is obviously very small and, hence, we may equate λ’ and λ”. In addition, while these two wavelengths differ from the wavelength of the incoming photon, the scattered photon is, obviously, still a gamma ray and, therefore, we are probably not too far off when substituting both λ’ and λ” for λ, i.e. the frequency of the incoming γ-ray. Now, we can re-write Δx = 2λ/ε as 1/Δx = ε/(2λ). We then get:

Δp = p”_x– p’_x= hε/2λ” + hε/2λ’ = 2hε/2λ = 2h/Δx

Now that yields ΔpΔx = 2h, which is an approximate expression of Heisenberg’s Uncertainty Principle indeed (don’t worry about the factor 2, as that’s something that comes with all of the approximations).

A final moot point perhaps: it is obviously a thought experiment. Not only because we don’t have gamma-ray microscopes (that’s not relevant because we can effectively imagine constructing one) but because the experiment involves only one photon. A real microscope would organize a proper beam, but that would obviously complicate the analysis. In fact, it would defeat the purpose, because the whole point is to analyze one single interaction here.

The interpretation

Now how should we interpret all of this? Is this Heisenberg’s ‘proof’ of his own Principle? Yes and no, I’d say. It’s part illustration, and part ‘proof’, I would say. The crucial assumptions here are:

We can analyze γ-ray photons, or any photon for that matter, as particles having some momentum, and when ‘colliding’, or interacting, with an electron, the photon will impart some momentum to that electron.
Momentum is being conserved and, hence, the total (linear) momentum before and after the collision, considering both particles–i.e. (1) the incoming ray and the electron before the interaction and (2) the emitted photon and the electron that’s getting the kick after the interaction–must be the same.
For the γ-ray photon, we can relate (or associate, if you prefer that term) its wavelength λ with its momentum p through the Planck relation or, what amounts to the same for photons (because they have no mass), the de Broglie relation.

Now, these assumptions are then applied to an analysis of what we know to be true from experiment, and that’s the phenomenon of diffraction, part of which is the observation that the resolving power of a microscope is limited, and that its resolution is given by the θ = λ/D equation.

Bringing it all together, then gives us a theory which is consistent with experiment and, hence, we then assume the theory is true. Why? Well… I could start a long discourse here on the philosophy of science but, when everything is said and done, we should admit we don’t any ‘better’ theory.

But, you’ll say: what’s a ‘better’ theory? Well… Again, the answer to that question is the subject-matter of philosophers. As for me, I’d just refer to what’s known as Occam’s razor: among competing hypotheses, we should select the one with the fewest assumptions. Hence, while more complicated solutions may ultimately prove correct, the fewer assumptions that are made, the better. Now, when I was a kid, I thought quantum mechanics was very complicated and, hence, describing it here as a ‘simple’ theory sounds strange. But that’s what it is in the end: there’s no better (read: simpler) way to describe, for example, why electrons interfere with each other, and with themselves, when sending them through one or two slits, and so that’s what all these ‘illustrations’ want to show in the end, even if you think there must be simpler way to describe reality. As said, as a kid, I thought so too. 🙂

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/