My book is moving forward. I just produced a very first promotional video. Have a look and let me know what you think of it ! 🙂
This is the paper I always wanted to write. It is there now, and I think it is good – and that‘s an understatement. 🙂 It is probably best to download it as a pdf-file from the viXra.org site because this was a rather fast ‘copy and paste’ job from the Word version of the paper, so there may be issues with boldface notation (vector notation), italics and, most importantly, with formulas – which I, sadly, have to ‘snip’ into this WordPress blog, as they don’t have an easy copy function for mathematical formulas.
It’s great stuff. If you have been following my blog – and many of you have – you will want to digest this. 🙂
Abstract : This paper explores the implications of associating the components of the wavefunction with a physical dimension: force per unit mass – which is, of course, the dimension of acceleration (m/s2) and gravitational fields. The classical electromagnetic field equations for energy densities, the Poynting vector and spin angular momentum are then re-derived by substituting the electromagnetic N/C unit of field strength (mass per unit charge) by the new N/kg = m/s2 dimension.
The results are elegant and insightful. For example, the energy densities are proportional to the square of the absolute value of the wavefunction and, hence, to the probabilities, which establishes a physical normalization condition. Also, Schrödinger’s wave equation may then, effectively, be interpreted as a diffusion equation for energy, and the wavefunction itself can be interpreted as a propagating gravitational wave. Finally, as an added bonus, concepts such as the Compton scattering radius for a particle, spin angular momentum, and the boson-fermion dichotomy, can also be explained more intuitively.
While the approach offers a physical interpretation of the wavefunction, the author argues that the core of the Copenhagen interpretations revolves around the complementarity principle, which remains unchallenged because the interpretation of amplitude waves as traveling fields does not explain the particle nature of matter.
This is not another introduction to quantum mechanics. We assume the reader is already familiar with the key principles and, importantly, with the basic math. We offer an interpretation of wave mechanics. As such, we do not challenge the complementarity principle: the physical interpretation of the wavefunction that is offered here explains the wave nature of matter only. It explains diffraction and interference of amplitudes but it does not explain why a particle will hit the detector not as a wave but as a particle. Hence, the Copenhagen interpretation of the wavefunction remains relevant: we just push its boundaries.
The basic ideas in this paper stem from a simple observation: the geometric similarity between the quantum-mechanical wavefunctions and electromagnetic waves is remarkably similar. The components of both waves are orthogonal to the direction of propagation and to each other. Only the relative phase differs : the electric and magnetic field vectors (E and B) have the same phase. In contrast, the phase of the real and imaginary part of the (elementary) wavefunction (ψ = a·e−i∙θ = a∙cosθ – a∙sinθ) differ by 90 degrees (π/2). Pursuing the analogy, we explore the following question: if the oscillating electric and magnetic field vectors of an electromagnetic wave carry the energy that one associates with the wave, can we analyze the real and imaginary part of the wavefunction in a similar way?
We show the answer is positive and remarkably straightforward. If the physical dimension of the electromagnetic field is expressed in newton per coulomb (force per unit charge), then the physical dimension of the components of the wavefunction may be associated with force per unit mass (newton per kg). Of course, force over some distance is energy. The question then becomes: what is the energy concept here? Kinetic? Potential? Both?
The similarity between the energy of a (one-dimensional) linear oscillator (E = m·a2·ω2/2) and Einstein’s relativistic energy equation E = m∙c2 inspires us to interpret the energy as a two-dimensional oscillation of mass. To assist the reader, we construct a two-piston engine metaphor. We then adapt the formula for the electromagnetic energy density to calculate the energy densities for the wave function. The results are elegant and intuitive: the energy densities are proportional to the square of the absolute value of the wavefunction and, hence, to the probabilities. Schrödinger’s wave equation may then, effectively, be interpreted as a diffusion equation for energy itself.
As an added bonus, concepts such as the Compton scattering radius for a particle and spin angular, as well as the boson-fermion dichotomy can be explained in a fully intuitive way.
Of course, such interpretation is also an interpretation of the wavefunction itself, and the immediate reaction of the reader is predictable: the electric and magnetic field vectors are, somehow, to be looked at as real vectors. In contrast, the real and imaginary components of the wavefunction are not. However, this objection needs to be phrased more carefully. First, it may be noted that, in a classical analysis, the magnetic force is a pseudovector itself. Second, a suitable choice of coordinates may make quantum-mechanical rotation matrices irrelevant.
Therefore, the author is of the opinion that this little paper may provide some fresh perspective on the question, thereby further exploring Einstein’s basic sentiment in regard to quantum mechanics, which may be summarized as follows: there must be some physical explanation for the calculated probabilities.
We will, therefore, start with Einstein’s relativistic energy equation (E = mc2) and wonder what it could possibly tell us.
The structural similarity between the relativistic energy formula, the formula for the total energy of an oscillator, and the kinetic energy of a moving body, is striking:
- E = mc2
- E = mω2/2
- E = mv2/2
In these formulas, ω, v and c all describe some velocity. Of course, there is the 1/2 factor in the E = mω2/2 formula, but that is exactly the point we are going to explore here: can we think of an oscillation in two dimensions, so it stores an amount of energy that is equal to E = 2·m·ω2/2 = m·ω2?
That is easy enough. Think, for example, of a V-2 engine with the pistons at a 90-degree angle, as illustrated below. The 90° angle makes it possible to perfectly balance the counterweight and the pistons, thereby ensuring smooth travel at all times. With permanently closed valves, the air inside the cylinder compresses and decompresses as the pistons move up and down and provides, therefore, a restoring force. As such, it will store potential energy, just like a spring, and the motion of the pistons will also reflect that of a mass on a spring. Hence, we can describe it by a sinusoidal function, with the zero point at the center of each cylinder. We can, therefore, think of the moving pistons as harmonic oscillators, just like mechanical springs.
Figure 1: Oscillations in two dimensions
If we assume there is no friction, we have a perpetuum mobile here. The compressed air and the rotating counterweight (which, combined with the crankshaft, acts as a flywheel) store the potential energy. The moving masses of the pistons store the kinetic energy of the system.
At this point, it is probably good to quickly review the relevant math. If the magnitude of the oscillation is equal to a, then the motion of the piston (or the mass on a spring) will be described by x = a·cos(ω·t + Δ). Needless to say, Δ is just a phase factor which defines our t = 0 point, and ω is the natural angular frequency of our oscillator. Because of the 90° angle between the two cylinders, Δ would be 0 for one oscillator, and –π/2 for the other. Hence, the motion of one piston is given by x = a·cos(ω·t), while the motion of the other is given by x = a·cos(ω·t–π/2) = a·sin(ω·t).
The kinetic and potential energy of one oscillator (think of one piston or one spring only) can then be calculated as:
- K.E. = T = m·v2/2 = (1/2)·m·ω2·a2·sin2(ω·t + Δ)
- P.E. = U = k·x2/2 = (1/2)·k·a2·cos2(ω·t + Δ)
The coefficient k in the potential energy formula characterizes the restoring force: F = −k·x. From the dynamics involved, it is obvious that k must be equal to m·ω2. Hence, the total energy is equal to:
E = T + U = (1/2)· m·ω2·a2·[sin2(ω·t + Δ) + cos2(ω·t + Δ)] = m·a2·ω2/2
To facilitate the calculations, we will briefly assume k = m·ω2 and a are equal to 1. The motion of our first oscillator is given by the cos(ω·t) = cosθ function (θ = ω·t), and its kinetic energy will be equal to sin2θ. Hence, the (instantaneous) change in kinetic energy at any point in time will be equal to:
d(sin2θ)/dθ = 2∙sinθ∙d(sinθ)/dθ = 2∙sinθ∙cosθ
Let us look at the second oscillator now. Just think of the second piston going up and down in the V-2 engine. Its motion is given by the sinθ function, which is equal to cos(θ−π /2). Hence, its kinetic energy is equal to sin2(θ−π /2), and how it changes – as a function of θ – will be equal to:
2∙sin(θ−π /2)∙cos(θ−π /2) = = −2∙cosθ∙sinθ = −2∙sinθ∙cosθ
We have our perpetuum mobile! While transferring kinetic energy from one piston to the other, the crankshaft will rotate with a constant angular velocity: linear motion becomes circular motion, and vice versa, and the total energy that is stored in the system is T + U = ma2ω2.
We have a great metaphor here. Somehow, in this beautiful interplay between linear and circular motion, energy is borrowed from one place and then returns to the other, cycle after cycle. We know the wavefunction consist of a sine and a cosine: the cosine is the real component, and the sine is the imaginary component. Could they be equally real? Could each represent half of the total energy of our particle? Should we think of the c in our E = mc2 formula as an angular velocity?
These are sensible questions. Let us explore them.
The elementary wavefunction is written as:
ψ = a·e−i[E·t − p∙x]/ħ = a·e−i[E·t − p∙x]/ħ = a·cos(p∙x/ħ – E∙t/ħ) + i·a·sin(p∙x/ħ – E∙t/ħ)
When considering a particle at rest (p = 0) this reduces to:
ψ = a·e−i∙E·t/ħ = a·cos(–E∙t/ħ) + i·a·sin(–E∙t/ħ) = a·cos(E∙t/ħ) – i·a·sin(E∙t/ħ)
Let us remind ourselves of the geometry involved, which is illustrated below. Note that the argument of the wavefunction rotates clockwise with time, while the mathematical convention for measuring the phase angle (ϕ) is counter-clockwise.
Figure 2: Euler’s formula
If we assume the momentum p is all in the x-direction, then the p and x vectors will have the same direction, and p∙x/ħ reduces to p∙x/ħ. Most illustrations – such as the one below – will either freeze x or, else, t. Alternatively, one can google web animations varying both. The point is: we also have a two-dimensional oscillation here. These two dimensions are perpendicular to the direction of propagation of the wavefunction. For example, if the wavefunction propagates in the x-direction, then the oscillations are along the y– and z-axis, which we may refer to as the real and imaginary axis. Note how the phase difference between the cosine and the sine – the real and imaginary part of our wavefunction – appear to give some spin to the whole. I will come back to this.
Figure 3: Geometric representation of the wavefunction
Hence, if we would say these oscillations carry half of the total energy of the particle, then we may refer to the real and imaginary energy of the particle respectively, and the interplay between the real and the imaginary part of the wavefunction may then describe how energy propagates through space over time.
Let us consider, once again, a particle at rest. Hence, p = 0 and the (elementary) wavefunction reduces to ψ = a·e−i∙E·t/ħ. Hence, the angular velocity of both oscillations, at some point x, is given by ω = -E/ħ. Now, the energy of our particle includes all of the energy – kinetic, potential and rest energy – and is, therefore, equal to E = mc2.
Can we, somehow, relate this to the m·a2·ω2 energy formula for our V-2 perpetuum mobile? Our wavefunction has an amplitude too. Now, if the oscillations of the real and imaginary wavefunction store the energy of our particle, then their amplitude will surely matter. In fact, the energy of an oscillation is, in general, proportional to the square of the amplitude: E µ a2. We may, therefore, think that the a2 factor in the E = m·a2·ω2 energy will surely be relevant as well.
However, here is a complication: an actual particle is localized in space and can, therefore, not be represented by the elementary wavefunction. We must build a wave packet for that: a sum of wavefunctions, each with their own amplitude ak, and their own ωi = -Ei/ħ. Each of these wavefunctions will contribute some energy to the total energy of the wave packet. To calculate the contribution of each wave to the total, both ai as well as Ei will matter.
What is Ei? Ei varies around some average E, which we can associate with some average mass m: m = E/c2. The Uncertainty Principle kicks in here. The analysis becomes more complicated, but a formula such as the one below might make sense:We can re-write this as:What is the meaning of this equation? We may look at it as some sort of physical normalization condition when building up the Fourier sum. Of course, we should relate this to the mathematical normalization condition for the wavefunction. Our intuition tells us that the probabilities must be related to the energy densities, but how exactly? We will come back to this question in a moment. Let us first think some more about the enigma: what is mass?
Before we do so, let us quickly calculate the value of c2ħ2: it is about 1´10–51 N2∙m4. Let us also do a dimensional analysis: the physical dimensions of the E = m·a2·ω2 equation make sense if we express m in kg, a in m, and ω in rad/s. We then get: [E] = kg∙m2/s2 = (N∙s2/m)∙m2/s2 = N∙m = J. The dimensions of the left- and right-hand side of the physical normalization condition is N3∙m5.
We came up, playfully, with a meaningful interpretation for energy: it is a two-dimensional oscillation of mass. But what is mass? A new aether theory is, of course, not an option, but then what is it that is oscillating? To understand the physics behind equations, it is always good to do an analysis of the physical dimensions in the equation. Let us start with Einstein’s energy equation once again. If we want to look at mass, we should re-write it as m = E/c2:
[m] = [E/c2] = J/(m/s)2 = N·m∙s2/m2 = N·s2/m = kg
This is not very helpful. It only reminds us of Newton’s definition of a mass: mass is that what gets accelerated by a force. At this point, we may want to think of the physical significance of the absolute nature of the speed of light. Einstein’s E = mc2 equation implies we can write the ratio between the energy and the mass of any particle is always the same, so we can write, for example:This reminds us of the ω2= C–1/L or ω2 = k/m of harmonic oscillators once again. The key difference is that the ω2= C–1/L and ω2 = k/m formulas introduce two or more degrees of freedom. In contrast, c2= E/m for any particle, always. However, that is exactly the point: we can modulate the resistance, inductance and capacitance of electric circuits, and the stiffness of springs and the masses we put on them, but we live in one physical space only: our spacetime. Hence, the speed of light c emerges here as the defining property of spacetime – the resonant frequency, so to speak. We have no further degrees of freedom here.
The Planck-Einstein relation (for photons) and the de Broglie equation (for matter-particles) have an interesting feature: both imply that the energy of the oscillation is proportional to the frequency, with Planck’s constant as the constant of proportionality. Now, for one-dimensional oscillations – think of a guitar string, for example – we know the energy will be proportional to the square of the frequency. It is a remarkable observation: the two-dimensional matter-wave, or the electromagnetic wave, gives us two waves for the price of one, so to speak, each carrying half of the total energy of the oscillation but, as a result, we get a proportionality between E and f instead of between E and f2.
However, such reflections do not answer the fundamental question we started out with: what is mass? At this point, it is hard to go beyond the circular definition that is implied by Einstein’s formula: energy is a two-dimensional oscillation of mass, and mass packs energy, and c emerges us as the property of spacetime that defines how exactly.
When everything is said and done, this does not go beyond stating that mass is some scalar field. Now, a scalar field is, quite simply, some real number that we associate with a position in spacetime. The Higgs field is a scalar field but, of course, the theory behind it goes much beyond stating that we should think of mass as some scalar field. The fundamental question is: why and how does energy, or matter, condense into elementary particles? That is what the Higgs mechanism is about but, as this paper is exploratory only, we cannot even start explaining the basics of it.
What we can do, however, is look at the wave equation again (Schrödinger’s equation), as we can now analyze it as an energy diffusion equation.
The interpretation of Schrödinger’s equation as a diffusion equation is straightforward. Feynman (Lectures, III-16-1) briefly summarizes it as follows:
“We can think of Schrödinger’s equation as describing the diffusion of the probability amplitude from one point to the next. […] But the imaginary coefficient in front of the derivative makes the behavior completely different from the ordinary diffusion such as you would have for a gas spreading out along a thin tube. Ordinary diffusion gives rise to real exponential solutions, whereas the solutions of Schrödinger’s equation are complex waves.”
Let us review the basic math. For a particle moving in free space – with no external force fields acting on it – there is no potential (U = 0) and, therefore, the Uψ term disappears. Therefore, Schrödinger’s equation reduces to:
∂ψ(x, t)/∂t = i·(1/2)·(ħ/meff)·∇2ψ(x, t)
The ubiquitous diffusion equation in physics is:
∂φ(x, t)/∂t = D·∇2φ(x, t)
The structural similarity is obvious. The key difference between both equations is that the wave equation gives us two equations for the price of one. Indeed, because ψ is a complex-valued function, with a real and an imaginary part, we get the following equations:
- Re(∂ψ/∂t) = −(1/2)·(ħ/meff)·Im(∇2ψ)
- Im(∂ψ/∂t) = (1/2)·(ħ/meff)·Re(∇2ψ)
These equations make us think of the equations for an electromagnetic wave in free space (no stationary charges or currents):
- ∂B/∂t = –∇×E
- ∂E/∂t = c2∇×B
The above equations effectively describe a propagation mechanism in spacetime, as illustrated below.
Figure 4: Propagation mechanisms
The Laplacian operator (∇2), when operating on a scalar quantity, gives us a flux density, i.e. something expressed per square meter (1/m2). In this case, it is operating on ψ(x, t), so what is the dimension of our wavefunction ψ(x, t)? To answer that question, we should analyze the diffusion constant in Schrödinger’s equation, i.e. the (1/2)·(ħ/meff) factor:
- As a mathematical constant of proportionality, it will quantify the relationship between both derivatives (i.e. the time derivative and the Laplacian);
- As a physical constant, it will ensure the physical dimensions on both sides of the equation are compatible.
Now, the ħ/meff factor is expressed in (N·m·s)/(N· s2/m) = m2/s. Hence, it does ensure the dimensions on both sides of the equation are, effectively, the same: ∂ψ/∂t is a time derivative and, therefore, its dimension is s–1 while, as mentioned above, the dimension of ∇2ψ is m–2. However, this does not solve our basic question: what is the dimension of the real and imaginary part of our wavefunction?
At this point, mainstream physicists will say: it does not have a physical dimension, and there is no geometric interpretation of Schrödinger’s equation. One may argue, effectively, that its argument, (p∙x – E∙t)/ħ, is just a number and, therefore, that the real and imaginary part of ψ is also just some number.
To this, we may object that ħ may be looked as a mathematical scaling constant only. If we do that, then the argument of ψ will, effectively, be expressed in action units, i.e. in N·m·s. It then does make sense to also associate a physical dimension with the real and imaginary part of ψ. What could it be?
We may have a closer look at Maxwell’s equations for inspiration here. The electric field vector is expressed in newton (the unit of force) per unit of charge (coulomb). Now, there is something interesting here. The physical dimension of the magnetic field is N/C divided by m/s. We may write B as the following vector cross-product: B = (1/c)∙ex×E, with ex the unit vector pointing in the x-direction (i.e. the direction of propagation of the wave). Hence, we may associate the (1/c)∙ex× operator, which amounts to a rotation by 90 degrees, with the s/m dimension. Now, multiplication by i also amounts to a rotation by 90° degrees. Hence, we may boldly write: B = (1/c)∙ex×E = (1/c)∙i∙E. This allows us to also geometrically interpret Schrödinger’s equation in the way we interpreted it above (see Figure 3).
Still, we have not answered the question as to what the physical dimension of the real and imaginary part of our wavefunction should be. At this point, we may be inspired by the structural similarity between Newton’s and Coulomb’s force laws:Hence, if the electric field vector E is expressed in force per unit charge (N/C), then we may want to think of associating the real part of our wavefunction with a force per unit mass (N/kg). We can, of course, do a substitution here, because the mass unit (1 kg) is equivalent to 1 N·s2/m. Hence, our N/kg dimension becomes:
N/kg = N/(N·s2/m)= m/s2
What is this: m/s2? Is that the dimension of the a·cosθ term in the a·e−iθ = a·cosθ − i·a·sinθ wavefunction?
My answer is: why not? Think of it: m/s2 is the physical dimension of acceleration: the increase or decrease in velocity (m/s) per second. It ensures the wavefunction for any particle – matter-particles or particles with zero rest mass (photons) – and the associated wave equation (which has to be the same for all, as the spacetime we live in is one) are mutually consistent.
In this regard, we should think of how we would model a gravitational wave. The physical dimension would surely be the same: force per mass unit. It all makes sense: wavefunctions may, perhaps, be interpreted as traveling distortions of spacetime, i.e. as tiny gravitational waves.
Pursuing the geometric equivalence between the equations for an electromagnetic wave and Schrödinger’s equation, we can now, perhaps, see if there is an equivalent for the energy density. For an electromagnetic wave, we know that the energy density is given by the following formula:E and B are the electric and magnetic field vector respectively. The Poynting vector will give us the directional energy flux, i.e. the energy flow per unit area per unit time. We write:Needless to say, the ∇∙ operator is the divergence and, therefore, gives us the magnitude of a (vector) field’s source or sink at a given point. To be precise, the divergence gives us the volume density of the outward flux of a vector field from an infinitesimal volume around a given point. In this case, it gives us the volume density of the flux of S.
We can analyze the dimensions of the equation for the energy density as follows:
- E is measured in newton per coulomb, so [E∙E] = [E2] = N2/C2.
- B is measured in (N/C)/(m/s), so we get [B∙B] = [B2] = (N2/C2)·(s2/m2). However, the dimension of our c2 factor is (m2/s2) and so we’re also left with N2/C2.
- The ϵ0 is the electric constant, aka as the vacuum permittivity. As a physical constant, it should ensure the dimensions on both sides of the equation work out, and they do: [ε0] = C2/(N·m2) and, therefore, if we multiply that with N2/C2, we find that u is expressed in J/m3.
Replacing the newton per coulomb unit (N/C) by the newton per kg unit (N/kg) in the formulas above should give us the equivalent of the energy density for the wavefunction. We just need to substitute ϵ0 for an equivalent constant. We may to give it a try. If the energy densities can be calculated – which are also mass densities, obviously – then the probabilities should be proportional to them.
Let us first see what we get for a photon, assuming the electromagnetic wave represents its wavefunction. Substituting B for (1/c)∙i∙E or for −(1/c)∙i∙E gives us the following result:Zero!? An unexpected result! Or not? We have no stationary charges and no currents: only an electromagnetic wave in free space. Hence, the local energy conservation principle needs to be respected at all points in space and in time. The geometry makes sense of the result: for an electromagnetic wave, the magnitudes of E and B reach their maximum, minimum and zero point simultaneously, as shown below. This is because their phase is the same.
Figure 5: Electromagnetic wave: E and B
Should we expect a similar result for the energy densities that we would associate with the real and imaginary part of the matter-wave? For the matter-wave, we have a phase difference between a·cosθ and a·sinθ, which gives a different picture of the propagation of the wave (see Figure 3). In fact, the geometry of the suggestion suggests some inherent spin, which is interesting. I will come back to this. Let us first guess those densities. Making abstraction of any scaling constants, we may write:We get what we hoped to get: the absolute square of our amplitude is, effectively, an energy density !
|ψ|2 = |a·e−i∙E·t/ħ|2 = a2 = u
This is very deep. A photon has no rest mass, so it borrows and returns energy from empty space as it travels through it. In contrast, a matter-wave carries energy and, therefore, has some (rest) mass. It is therefore associated with an energy density, and this energy density gives us the probabilities. Of course, we need to fine-tune the analysis to account for the fact that we have a wave packet rather than a single wave, but that should be feasible.
As mentioned, the phase difference between the real and imaginary part of our wavefunction (a cosine and a sine function) appear to give some spin to our particle. We do not have this particularity for a photon. Of course, photons are bosons, i.e. spin-zero particles, while elementary matter-particles are fermions with spin-1/2. Hence, our geometric interpretation of the wavefunction suggests that, after all, there may be some more intuitive explanation of the fundamental dichotomy between bosons and fermions, which puzzled even Feynman:
“Why is it that particles with half-integral spin are Fermi particles, whereas particles with integral spin are Bose particles? We apologize for the fact that we cannot give you an elementary explanation. An explanation has been worked out by Pauli from complicated arguments of quantum field theory and relativity. He has shown that the two must necessarily go together, but we have not been able to find a way of reproducing his arguments on an elementary level. It appears to be one of the few places in physics where there is a rule which can be stated very simply, but for which no one has found a simple and easy explanation. The explanation is deep down in relativistic quantum mechanics. This probably means that we do not have a complete understanding of the fundamental principle involved.” (Feynman, Lectures, III-4-1)
The physical interpretation of the wavefunction, as presented here, may provide some better understanding of ‘the fundamental principle involved’: the physical dimension of the oscillation is just very different. That is all: it is force per unit charge for photons, and force per unit mass for matter-particles. We will examine the question of spin somewhat more carefully in section VII. Let us first examine the matter-wave some more.
The geometric representation of the matter-wave (see Figure 3) suggests a traveling wave and, yes, of course: the matter-wave effectively travels through space and time. But what is traveling, exactly? It is the pulse – or the signal – only: the phase velocity of the wave is just a mathematical concept and, even in our physical interpretation of the wavefunction, the same is true for the group velocity of our wave packet. The oscillation is two-dimensional, but perpendicular to the direction of travel of the wave. Hence, nothing actually moves with our particle.
Here, we should also reiterate that we did not answer the question as to what is oscillating up and down and/or sideways: we only associated a physical dimension with the components of the wavefunction – newton per kg (force per unit mass), to be precise. We were inspired to do so because of the physical dimension of the electric and magnetic field vectors (newton per coulomb, i.e. force per unit charge) we associate with electromagnetic waves which, for all practical purposes, we currently treat as the wavefunction for a photon. This made it possible to calculate the associated energy densities and a Poynting vector for energy dissipation. In addition, we showed that Schrödinger’s equation itself then becomes a diffusion equation for energy. However, let us now focus some more on the asymmetry which is introduced by the phase difference between the real and the imaginary part of the wavefunction. Look at the mathematical shape of the elementary wavefunction once again:
ψ = a·e−i[E·t − p∙x]/ħ = a·e−i[E·t − p∙x]/ħ = a·cos(p∙x/ħ − E∙t/ħ) + i·a·sin(p∙x/ħ − E∙t/ħ)
The minus sign in the argument of our sine and cosine function defines the direction of travel: an F(x−v∙t) wavefunction will always describe some wave that is traveling in the positive x-direction (with c the wave velocity), while an F(x+v∙t) wavefunction will travel in the negative x-direction. For a geometric interpretation of the wavefunction in three dimensions, we need to agree on how to define i or, what amounts to the same, a convention on how to define clockwise and counterclockwise directions: if we look at a clock from the back, then its hand will be moving counterclockwise. So we need to establish the equivalent of the right-hand rule. However, let us not worry about that now. Let us focus on the interpretation. To ease the analysis, we’ll assume we’re looking at a particle at rest. Hence, p = 0, and the wavefunction reduces to:
ψ = a·e−i∙E·t/ħ = a·cos(−E∙t/ħ) + i·a·sin(−E0∙t/ħ) = a·cos(E0∙t/ħ) − i·a·sin(E0∙t/ħ)
E0 is, of course, the rest mass of our particle and, now that we are here, we should probably wonder whose time t we are talking about: is it our time, or is the proper time of our particle? Well… In this situation, we are both at rest so it does not matter: t is, effectively, the proper time so perhaps we should write it as t0. It does not matter. You can see what we expect to see: E0/ħ pops up as the natural frequency of our matter-particle: (E0/ħ)∙t = ω∙t. Remembering the ω = 2π·f = 2π/T and T = 1/f formulas, we can associate a period and a frequency with this wave, using the ω = 2π·f = 2π/T. Noting that ħ = h/2π, we find the following:
T = 2π·(ħ/E0) = h/E0 ⇔ f = E0/h = m0c2/h
This is interesting, because we can look at the period as a natural unit of time for our particle. What about the wavelength? That is tricky because we need to distinguish between group and phase velocity here. The group velocity (vg) should be zero here, because we assume our particle does not move. In contrast, the phase velocity is given by vp = λ·f = (2π/k)·(ω/2π) = ω/k. In fact, we’ve got something funny here: the wavenumber k = p/ħ is zero, because we assume the particle is at rest, so p = 0. So we have a division by zero here, which is rather strange. What do we get assuming the particle is not at rest? We write:
vp = ω/k = (E/ħ)/(p/ħ) = E/p = E/(m·vg) = (m·c2)/(m·vg) = c2/vg
This is interesting: it establishes a reciprocal relation between the phase and the group velocity, with c as a simple scaling constant. Indeed, the graph below shows the shape of the function does not change with the value of c, and we may also re-write the relation above as:
vp/c = βp = c/vp = 1/βg = 1/(c/vp)
Figure 6: Reciprocal relation between phase and group velocity
We can also write the mentioned relationship as vp·vg = c2, which reminds us of the relationship between the electric and magnetic constant (1/ε0)·(1/μ0) = c2. This is interesting in light of the fact we can re-write this as (c·ε0)·(c·μ0) = 1, which shows electricity and magnetism are just two sides of the same coin, so to speak.
Interesting, but how do we interpret the math? What about the implications of the zero value for wavenumber k = p/ħ. We would probably like to think it implies the elementary wavefunction should always be associated with some momentum, because the concept of zero momentum clearly leads to weird math: something times zero cannot be equal to c2! Such interpretation is also consistent with the Uncertainty Principle: if Δx·Δp ≥ ħ, then neither Δx nor Δp can be zero. In other words, the Uncertainty Principle tells us that the idea of a pointlike particle actually being at some specific point in time and in space does not make sense: it has to move. It tells us that our concept of dimensionless points in time and space are mathematical notions only. Actual particles – including photons – are always a bit spread out, so to speak, and – importantly – they have to move.
For a photon, this is self-evident. It has no rest mass, no rest energy, and, therefore, it is going to move at the speed of light itself. We write: p = m·c = m·c2/c = E/c. Using the relationship above, we get:
vp = ω/k = (E/ħ)/(p/ħ) = E/p = c ⇒ vg = c2/vp = c2/c = c
This is good: we started out with some reflections on the matter-wave, but here we get an interpretation of the electromagnetic wave as a wavefunction for the photon. But let us get back to our matter-wave. In regard to our interpretation of a particle having to move, we should remind ourselves, once again, of the fact that an actual particle is always localized in space and that it can, therefore, not be represented by the elementary wavefunction ψ = a·e−i[E·t − p∙x]/ħ or, for a particle at rest, the ψ = a·e−i∙E·t/ħ function. We must build a wave packet for that: a sum of wavefunctions, each with their own amplitude ai, and their own ωi = −Ei/ħ. Indeed, in section II, we showed that each of these wavefunctions will contribute some energy to the total energy of the wave packet and that, to calculate the contribution of each wave to the total, both ai as well as Ei matter. This may or may not resolve the apparent paradox. Let us look at the group velocity.
To calculate a meaningful group velocity, we must assume the vg = ∂ωi/∂ki = ∂(Ei/ħ)/∂(pi/ħ) = ∂(Ei)/∂(pi) exists. So we must have some dispersion relation. How do we calculate it? We need to calculate ωi as a function of ki here, or Ei as a function of pi. How do we do that? Well… There are a few ways to go about it but one interesting way of doing it is to re-write Schrödinger’s equation as we did, i.e. by distinguishing the real and imaginary parts of the ∂ψ/∂t =i·[ħ/(2m)]·∇2ψ wave equation and, hence, re-write it as the following pair of two equations:
- Re(∂ψ/∂t) = −[ħ/(2meff)]·Im(∇2ψ) ⇔ ω·cos(kx − ωt) = k2·[ħ/(2meff)]·cos(kx − ωt)
- Im(∂ψ/∂t) = [ħ/(2meff)]·Re(∇2ψ) ⇔ ω·sin(kx − ωt) = k2·[ħ/(2meff)]·sin(kx − ωt)
Both equations imply the following dispersion relation:
ω = ħ·k2/(2meff)
Of course, we need to think about the subscripts now: we have ωi, ki, but… What about meff or, dropping the subscript, m? Do we write it as mi? If so, what is it? Well… It is the equivalent mass of Ei obviously, and so we get it from the mass-energy equivalence relation: mi = Ei/c2. It is a fine point, but one most people forget about: they usually just write m. However, if there is uncertainty in the energy, then Einstein’s mass-energy relation tells us we must have some uncertainty in the (equivalent) mass too. Here, I should refer back to Section II: Ei varies around some average energy E and, therefore, the Uncertainty Principle kicks in.
The elementary wavefunction vector – i.e. the vector sum of the real and imaginary component – rotates around the x-axis, which gives us the direction of propagation of the wave (see Figure 3). Its magnitude remains constant. In contrast, the magnitude of the electromagnetic vector – defined as the vector sum of the electric and magnetic field vectors – oscillates between zero and some maximum (see Figure 5).
We already mentioned that the rotation of the wavefunction vector appears to give some spin to the particle. Of course, a circularly polarized wave would also appear to have spin (think of the E and B vectors rotating around the direction of propagation – as opposed to oscillating up and down or sideways only). In fact, a circularly polarized light does carry angular momentum, as the equivalent mass of its energy may be thought of as rotating as well. But so here we are looking at a matter-wave.
The basic idea is the following: if we look at ψ = a·e−i∙E·t/ħ as some real vector – as a two-dimensional oscillation of mass, to be precise – then we may associate its rotation around the direction of propagation with some torque. The illustration below reminds of the math here.
Figure 7: Torque and angular momentum vectors
A torque on some mass about a fixed axis gives it angular momentum, which we can write as the vector cross-product L = r×p or, perhaps easier for our purposes here as the product of an angular velocity (ω) and rotational inertia (I), aka as the moment of inertia or the angular mass. We write:
L = I·ω
Note we can write L and ω in boldface here because they are (axial) vectors. If we consider their magnitudes only, we write L = I·ω (no boldface). We can now do some calculations. Let us start with the angular velocity. In our previous posts, we showed that the period of the matter-wave is equal to T = 2π·(ħ/E0). Hence, the angular velocity must be equal to:
ω = 2π/[2π·(ħ/E0)] = E0/ħ
We also know the distance r, so that is the magnitude of r in the L = r×p vector cross-product: it is just a, so that is the magnitude of ψ = a·e−i∙E·t/ħ. Now, the momentum (p) is the product of a linear velocity (v) – in this case, the tangential velocity – and some mass (m): p = m·v. If we switch to scalar instead of vector quantities, then the (tangential) velocity is given by v = r·ω. So now we only need to think about what we should use for m or, if we want to work with the angular velocity (ω), the angular mass (I). Here we need to make some assumption about the mass (or energy) distribution. Now, it may or may not sense to assume the energy in the oscillation – and, therefore, the mass – is distributed uniformly. In that case, we may use the formula for the angular mass of a solid cylinder: I = m·r2/2. If we keep the analysis non-relativistic, then m = m0. Of course, the energy-mass equivalence tells us that m0 = E0/c2. Hence, this is what we get:
L = I·ω = (m0·r2/2)·(E0/ħ) = (1/2)·a2·(E0/c2)·(E0/ħ) = a2·E02/(2·ħ·c2)
Does it make sense? Maybe. Maybe not. Let us do a dimensional analysis: that won’t check our logic, but it makes sure we made no mistakes when mapping mathematical and physical spaces. We have m2·J2 = m2·N2·m2 in the numerator and N·m·s·m2/s2 in the denominator. Hence, the dimensions work out: we get N·m·s as the dimension for L, which is, effectively, the physical dimension of angular momentum. It is also the action dimension, of course, and that cannot be a coincidence. Also note that the E = mc2 equation allows us to re-write it as:
L = a2·E02/(2·ħ·c2)
Of course, in quantum mechanics, we associate spin with the magnetic moment of a charged particle, not with its mass as such. Is there way to link the formula above to the one we have for the quantum-mechanical angular momentum, which is also measured in N·m·s units, and which can only take on one of two possible values: J = +ħ/2 and −ħ/2? It looks like a long shot, right? How do we go from (1/2)·a2·m02/ħ to ± (1/2)∙ħ? Let us do a numerical example. The energy of an electron is typically 0.510 MeV » 8.1871×10−14 N∙m, and a… What value should we take for a?
We have an obvious trio of candidates here: the Bohr radius, the classical electron radius (aka the Thompon scattering length), and the Compton scattering radius.
Let us start with the Bohr radius, so that is about 0.×10−10 N∙m. We get L = a2·E02/(2·ħ·c2) = 9.9×10−31 N∙m∙s. Now that is about 1.88×104 times ħ/2. That is a huge factor. The Bohr radius cannot be right: we are not looking at an electron in an orbital here. To show it does not make sense, we may want to double-check the analysis by doing the calculation in another way. We said each oscillation will always pack 6.626070040(81)×10−34 joule in energy. So our electron should pack about 1.24×10−20 oscillations. The angular momentum (L) we get when using the Bohr radius for a and the value of 6.626×10−34 joule for E0 and the Bohr radius is equal to 6.49×10−59 N∙m∙s. So that is the angular momentum per oscillation. When we multiply this with the number of oscillations (1.24×10−20), we get about 8.01×10−51 N∙m∙s, so that is a totally different number.
The classical electron radius is about 2.818×10−15 m. We get an L that is equal to about 2.81×10−39 N∙m∙s, so now it is a tiny fraction of ħ/2! Hence, this leads us nowhere. Let us go for our last chance to get a meaningful result! Let us use the Compton scattering length, so that is about 2.42631×10−12 m.
This gives us an L of 2.08×10−33 N∙m∙s, which is only 20 times ħ. This is not so bad, but it is good enough? Let us calculate it the other way around: what value should we take for a so as to ensure L = a2·E02/(2·ħ·c2) = ħ/2? Let us write it out:
In fact, this is the formula for the so-called reduced Compton wavelength. This is perfect. We found what we wanted to find. Substituting this value for a (you can calculate it: it is about 3.8616×10−33 m), we get what we should find:
This is a rather spectacular result, and one that would – a priori – support the interpretation of the wavefunction that is being suggested in this paper.
Let us do some more thinking on the boson-fermion dichotomy. Again, we should remind ourselves that an actual particle is localized in space and that it can, therefore, not be represented by the elementary wavefunction ψ = a·e−i[E·t − p∙x]/ħ or, for a particle at rest, the ψ = a·e−i∙E·t/ħ function. We must build a wave packet for that: a sum of wavefunctions, each with their own amplitude ai, and their own ωi = −Ei/ħ. Each of these wavefunctions will contribute some energy to the total energy of the wave packet. Now, we can have another wild but logical theory about this.
Think of the apparent right-handedness of the elementary wavefunction: surely, Nature can’t be bothered about our convention of measuring phase angles clockwise or counterclockwise. Also, the angular momentum can be positive or negative: J = +ħ/2 or −ħ/2. Hence, we would probably like to think that an actual particle – think of an electron, or whatever other particle you’d think of – may consist of right-handed as well as left-handed elementary waves. To be precise, we may think they either consist of (elementary) right-handed waves or, else, of (elementary) left-handed waves. An elementary right-handed wave would be written as:
ψ(θi) = ai·(cosθi + i·sinθi)
In contrast, an elementary left-handed wave would be written as:
ψ(θi) = ai·(cosθi − i·sinθi)
How does that work out with the E0·t argument of our wavefunction? Position is position, and direction is direction, but time? Time has only one direction, but Nature surely does not care how we count time: counting like 1, 2, 3, etcetera or like −1, −2, −3, etcetera is just the same. If we count like 1, 2, 3, etcetera, then we write our wavefunction like:
ψ = a·cos(E0∙t/ħ) − i·a·sin(E0∙t/ħ)
If we count time like −1, −2, −3, etcetera then we write it as:
ψ = a·cos(−E0∙t/ħ) − i·a·sin(−E0∙t/ħ)= a·cos(E0∙t/ħ) + i·a·sin(E0∙t/ħ)
Hence, it is just like the left- or right-handed circular polarization of an electromagnetic wave: we can have both for the matter-wave too! This, then, should explain why we can have either positive or negative quantum-mechanical spin (+ħ/2 or −ħ/2). It is the usual thing: we have two mathematical possibilities here, and so we must have two physical situations that correspond to it.
It is only natural. If we have left- and right-handed photons – or, generalizing, left- and right-handed bosons – then we should also have left- and right-handed fermions (electrons, protons, etcetera). Back to the dichotomy. The textbook analysis of the dichotomy between bosons and fermions may be epitomized by Richard Feynman’s Lecture on it (Feynman, III-4), which is confusing and – I would dare to say – even inconsistent: how are photons or electrons supposed to know that they need to interfere with a positive or a negative sign? They are not supposed to know anything: knowledge is part of our interpretation of whatever it is that is going on there.
Hence, it is probably best to keep it simple, and think of the dichotomy in terms of the different physical dimensions of the oscillation: newton per kg versus newton per coulomb. And then, of course, we should also note that matter-particles have a rest mass and, therefore, actually carry charge. Photons do not. But both are two-dimensional oscillations, and the point is: the so-called vacuum – and the rest mass of our particle (which is zero for the photon and non-zero for everything else) – give us the natural frequency for both oscillations, which is beautifully summed up in that remarkable equation for the group and phase velocity of the wavefunction, which applies to photons as well as matter-particles:
(vphase·c)·(vgroup·c) = 1 ⇔ vp·vg = c2
The final question then is: why are photons spin-zero particles? Well… We should first remind ourselves of the fact that they do have spin when circularly polarized. Here we may think of the rotation of the equivalent mass of their energy. However, if they are linearly polarized, then there is no spin. Even for circularly polarized waves, the spin angular momentum of photons is a weird concept. If photons have no (rest) mass, then they cannot carry any charge. They should, therefore, not have any magnetic moment. Indeed, what I wrote above shows an explanation of quantum-mechanical spin requires both mass as well as charge.
There are, of course, other ways to look at the matter – literally. For example, we can imagine two-dimensional oscillations as circular rather than linear oscillations. Think of a tiny ball, whose center of mass stays where it is, as depicted below. Any rotation – around any axis – will be some combination of a rotation around the two other axes. Hence, we may want to think of a two-dimensional oscillation as an oscillation of a polar and azimuthal angle.
Figure 8: Two-dimensional circular movement
The point of this paper is not to make any definite statements. That would be foolish. Its objective is just to challenge the simplistic mainstream viewpoint on the reality of the wavefunction. Stating that it is a mathematical construct only without physical significance amounts to saying it has no meaning at all. That is, clearly, a non-sustainable proposition.
The interpretation that is offered here looks at amplitude waves as traveling fields. Their physical dimension may be expressed in force per mass unit, as opposed to electromagnetic waves, whose amplitudes are expressed in force per (electric) charge unit. Also, the amplitudes of matter-waves incorporate a phase factor, but this may actually explain the rather enigmatic dichotomy between fermions and bosons and is, therefore, an added bonus.
The interpretation that is offered here has some advantages over other explanations, as it explains the how of diffraction and interference. However, while it offers a great explanation of the wave nature of matter, it does not explain its particle nature: while we think of the energy as being spread out, we will still observe electrons and photons as pointlike particles once they hit the detector. Why is it that a detector can sort of ‘hook’ the whole blob of energy, so to speak?
The interpretation of the wavefunction that is offered here does not explain this. Hence, the complementarity principle of the Copenhagen interpretation of the wavefunction surely remains relevant.
The 1/2 factor in Schrödinger’s equation is related to the concept of the effective mass (meff). It is easy to make the wrong calculations. For example, when playing with the famous de Broglie relations – aka as the matter-wave equations – one may be tempted to derive the following energy concept:
- E = h·f and p = h/λ. Therefore, f = E/h and λ = p/h.
- v = f·λ = (E/h)∙(p/h) = E/p
- p = m·v. Therefore, E = v·p = m·v2
E = m·v2? This resembles the E = mc2 equation and, therefore, one may be enthused by the discovery, especially because the m·v2 also pops up when working with the Least Action Principle in classical mechanics, which states that the path that is followed by a particle will minimize the following integral:Now, we can choose any reference point for the potential energy but, to reflect the energy conservation law, we can select a reference point that ensures the sum of the kinetic and the potential energy is zero throughout the time interval. If the force field is uniform, then the integrand will, effectively, be equal to KE − PE = m·v2.
However, that is classical mechanics and, therefore, not so relevant in the context of the de Broglie equations, and the apparent paradox should be solved by distinguishing between the group and the phase velocity of the matter wave.
The effective mass – as used in Schrödinger’s equation – is a rather enigmatic concept. To make sure we are making the right analysis here, I should start by noting you will usually see Schrödinger’s equation written as:This formulation includes a term with the potential energy (U). In free space (no potential), this term disappears, and the equation can be re-written as:
∂ψ(x, t)/∂t = i·(1/2)·(ħ/meff)·∇2ψ(x, t)
We just moved the i·ħ coefficient to the other side, noting that 1/i = –i. Now, in one-dimensional space, and assuming ψ is just the elementary wavefunction (so we substitute a·e−i∙[E·t − p∙x]/ħ for ψ), this implies the following:
−a·i·(E/ħ)·e−i∙[E·t − p∙x]/ħ = −i·(ħ/2meff)·a·(p2/ħ2)· e−i∙[E·t − p∙x]/ħ
⇔ E = p2/(2meff) ⇔ meff = m∙(v/c)2/2 = m∙β2/2
It is an ugly formula: it resembles the kinetic energy formula (K.E. = m∙v2/2) but it is, in fact, something completely different. The β2/2 factor ensures the effective mass is always a fraction of the mass itself. To get rid of the ugly 1/2 factor, we may re-define meff as two times the old meff (hence, meffNEW = 2∙meffOLD), as a result of which the formula will look somewhat better:
meff = m∙(v/c)2 = m∙β2
We know β varies between 0 and 1 and, therefore, meff will vary between 0 and m. Feynman drops the subscript, and just writes meff as m in his textbook (see Feynman, III-19). On the other hand, the electron mass as used is also the electron mass that is used to calculate the size of an atom (see Feynman, III-2-4). As such, the two mass concepts are, effectively, mutually compatible. It is confusing because the same mass is often defined as the mass of a stationary electron (see, for example, the article on it in the online Wikipedia encyclopedia).
In the context of the derivation of the electron orbitals, we do have the potential energy term – which is the equivalent of a source term in a diffusion equation – and that may explain why the above-mentioned meff = m∙(v/c)2 = m∙β2 formula does not apply.
This paper discusses general principles in physics only. Hence, references can be limited to references to physics textbooks only. For ease of reading, any reference to additional material has been limited to a more popular undergrad textbook that can be consulted online: Feynman’s Lectures on Physics (http://www.feynmanlectures.caltech.edu). References are per volume, per chapter and per section. For example, Feynman III-19-3 refers to Volume III, Chapter 19, Section 3.
 Of course, an actual particle is localized in space and can, therefore, not be represented by the elementary wavefunction ψ = a·e−i∙θ = a·e−i[E·t − p∙x]/ħ = a·(cosθ – i·a·sinθ). We must build a wave packet for that: a sum of wavefunctions, each with its own amplitude ak and its own argument θk = (Ek∙t – pk∙x)/ħ. This is dealt with in this paper as part of the discussion on the mathematical and physical interpretation of the normalization condition.
 The N/kg dimension immediately, and naturally, reduces to the dimension of acceleration (m/s2), thereby facilitating a direct interpretation in terms of Newton’s force law.
 In physics, a two-spring metaphor is more common. Hence, the pistons in the author’s perpetuum mobile may be replaced by springs.
 The author re-derives the equation for the Compton scattering radius in section VII of the paper.
 The magnetic force can be analyzed as a relativistic effect (see Feynman II-13-6). The dichotomy between the electric force as a polar vector and the magnetic force as an axial vector disappears in the relativistic four-vector representation of electromagnetism.
 For example, when using Schrödinger’s equation in a central field (think of the electron around a proton), the use of polar coordinates is recommended, as it ensures the symmetry of the Hamiltonian under all rotations (see Feynman III-19-3)
 This sentiment is usually summed up in the apocryphal quote: “God does not play dice.”The actual quote comes out of one of Einstein’s private letters to Cornelius Lanczos, another scientist who had also emigrated to the US. The full quote is as follows: “You are the only person I know who has the same attitude towards physics as I have: belief in the comprehension of reality through something basically simple and unified… It seems hard to sneak a look at God’s cards. But that He plays dice and uses ‘telepathic’ methods… is something that I cannot believe for a single moment.” (Helen Dukas and Banesh Hoffman, Albert Einstein, the Human Side: New Glimpses from His Archives, 1979)
 Of course, both are different velocities: ω is an angular velocity, while v is a linear velocity: ω is measured in radians per second, while v is measured in meter per second. However, the definition of a radian implies radians are measured in distance units. Hence, the physical dimensions are, effectively, the same. As for the formula for the total energy of an oscillator, we should actually write: E = m·a2∙ω2/2. The additional factor (a) is the (maximum) amplitude of the oscillator.
 We also have a 1/2 factor in the E = mv2/2 formula. Two remarks may be made here. First, it may be noted this is a non-relativistic formula and, more importantly, incorporates kinetic energy only. Using the Lorentz factor (γ), we can write the relativistically correct formula for the kinetic energy as K.E. = E − E0 = mvc2 − m0c2 = m0γc2 − m0c2 = m0c2(γ − 1). As for the exclusion of the potential energy, we may note that we may choose our reference point for the potential energy such that the kinetic and potential energy mirror each other. The energy concept that then emerges is the one that is used in the context of the Principle of Least Action: it equals E = mv2. Appendix 1 provides some notes on that.
 Instead of two cylinders with pistons, one may also think of connecting two springs with a crankshaft.
 It is interesting to note that we may look at the energy in the rotating flywheel as potential energy because it is energy that is associated with motion, albeit circular motion. In physics, one may associate a rotating object with kinetic energy using the rotational equivalent of mass and linear velocity, i.e. rotational inertia (I) and angular velocity ω. The kinetic energy of a rotating object is then given by K.E. = (1/2)·I·ω2.
 Because of the sideways motion of the connecting rods, the sinusoidal function will describe the linear motion only approximately, but you can easily imagine the idealized limit situation.
 The ω2= 1/LC formula gives us the natural or resonant frequency for a electric circuit consisting of a resistor (R), an inductor (L), and a capacitor (C). Writing the formula as ω2= C–1/L introduces the concept of elastance, which is the equivalent of the mechanical stiffness (k) of a spring.
 The resistance in an electric circuit introduces a damping factor. When analyzing a mechanical spring, one may also want to introduce a drag coefficient. Both are usually defined as a fraction of the inertia, which is the mass for a spring and the inductance for an electric circuit. Hence, we would write the resistance for a spring as γm and as R = γL respectively.
 Photons are emitted by atomic oscillators: atoms going from one state (energy level) to another. Feynman (Lectures, I-33-3) shows us how to calculate the Q of these atomic oscillators: it is of the order of 108, which means the wave train will last about 10–8 seconds (to be precise, that is the time it takes for the radiation to die out by a factor 1/e). For example, for sodium light, the radiation will last about 3.2×10–8 seconds (this is the so-called decay time τ). Now, because the frequency of sodium light is some 500 THz (500×1012 oscillations per second), this makes for some 16 million oscillations. There is an interesting paradox here: the speed of light tells us that such wave train will have a length of about 9.6 m! How is that to be reconciled with the pointlike nature of a photon? The paradox can only be explained by relativistic length contraction: in an analysis like this, one need to distinguish the reference frame of the photon – riding along the wave as it is being emitted, so to speak – and our stationary reference frame, which is that of the emitting atom.
 This is a general result and is reflected in the K.E. = T = (1/2)·m·ω2·a2·sin2(ω·t + Δ) and the P.E. = U = k·x2/2 = (1/2)· m·ω2·a2·cos2(ω·t + Δ) formulas for the linear oscillator.
 Feynman further formalizes this in his Lecture on Superconductivity (Feynman, III-21-2), in which he refers to Schrödinger’s equation as the “equation for continuity of probabilities”. The analysis is centered on the local conservation of energy, which confirms the interpretation of Schrödinger’s equation as an energy diffusion equation.
 The meff is the effective mass of the particle, which depends on the medium. For example, an electron traveling in a solid (a transistor, for example) will have a different effective mass than in an atom. In free space, we can drop the subscript and just write meff = m. Appendix 2 provides some additional notes on the concept. As for the equations, they are easily derived from noting that two complex numbers a + i∙b and c + i∙d are equal if, and only if, their real and imaginary parts are the same. Now, the ∂ψ/∂t = i∙(ħ/meff)∙∇2ψ equation amounts to writing something like this: a + i∙b = i∙(c + i∙d). Now, remembering that i2 = −1, you can easily figure out that i∙(c + i∙d) = i∙c + i2∙d = − d + i∙c.
 The dimension of B is usually written as N/(m∙A), using the SI unit for current, i.e. the ampere (A). However, 1 C = 1 A∙s and, hence, 1 N/(m∙A) = 1 (N/C)/(m/s).
 Of course, multiplication with i amounts to a counterclockwise rotation. Hence, multiplication by –i also amounts to a rotation by 90 degrees, but clockwise. Now, to uniquely identify the clockwise and counterclockwise directions, we need to establish the equivalent of the right-hand rule for a proper geometric interpretation of Schrödinger’s equation in three-dimensional space: if we look at a clock from the back, then its hand will be moving counterclockwise. When writing B = (1/c)∙i∙E, we assume we are looking in the negative x-direction. If we are looking in the positive x-direction, we should write: B = -(1/c)∙i∙E. Of course, Nature does not care about our conventions. Hence, both should give the same results in calculations. We will show in a moment they do.
 In fact, when multiplying C2/(N·m2) with N2/C2, we get N/m2, but we can multiply this with 1 = m/m to get the desired result. It is significant that an energy density (joule per unit volume) can also be measured in newton (force per unit area.
 The illustration shows a linearly polarized wave, but the obtained result is general.
 The sine and cosine are essentially the same functions, except for the difference in the phase: sinθ = cos(θ−π /2).
 I must thank a physics blogger for re-writing the 1/(ε0·μ0) = c2 equation like this. See: http://reciprocal.systems/phpBB3/viewtopic.php?t=236 (retrieved on 29 September 2017).
 A circularly polarized electromagnetic wave may be analyzed as consisting of two perpendicular electromagnetic plane waves of equal amplitude and 90° difference in phase.
 Of course, the reader will now wonder: what about neutrons? How to explain neutron spin? Neutrons are neutral. That is correct, but neutrons are not elementary: they consist of (charged) quarks. Hence, neutron spin can (or should) be explained by the spin of the underlying quarks.
 We detailed the mathematical framework and detailed calculations in the following online article: https://readingfeynman.org/2017/09/15/the-principle-of-least-action-re-visited.
Let’s play a bit with the stuff we found in our previous post. This is going to be unconventional, or experimental, if you want. The idea is to give you… Well… Some ideas. So you can play yourself. 🙂 Let’s go.
Let’s first look at Feynman’s (simplified) formula for the amplitude of a photon to go from point a to point b. If we identify point a by the position vector r1 and point b by the position vector r2, and using Dirac’s fancy bra-ket notation, then it’s written as:
So we have a vector dot product here: p∙r12 = |p|∙|r12|· cosθ = p∙r12·cosα. The angle here (α) is the angle between the p and r12 vector. All good. Well… No. We’ve got a problem. When it comes to calculating probabilities, the α angle doesn’t matter: |ei·θ/r|2 = 1/r2. Hence, for the probability, we get: P = | 〈r2|r1〉 |2 = 1/r122. Always ! Now that’s strange. The θ = p∙r12/ħ argument gives us a different phase depending on the angle (α) between p and r12. But… Well… Think of it: cosα goes from 1 to 0 when α goes from 0 to ±90° and, of course, is negative when p and r12 have opposite directions but… Well… According to this formula, the probabilities do not depend on the direction of the momentum. That’s just weird, I think. Did Feynman, in his iconic Lectures, give us a meaningless formula?
Maybe. We may also note this function looks like the elementary wavefunction for any particle, which we wrote as:
ψ(x, t) = a·e−i∙θ = a·e−i∙(E∙t − p∙x)/ħ= a·e−i∙(E∙t)/ħ·ei∙(p∙x)/ħ
The only difference is that the 〈r2|r1〉 sort of abstracts away from time, so… Well… Let’s get a feel for the quantities. Let’s think of a photon carrying some typical amount of energy. Hence, let’s talk visible light and, therefore, photons of a few eV only – say 5.625 eV = 5.625×1.6×10−19 J = 9×10−19 J. Hence, their momentum is equal to p = E/c = (9×10−19 N·m)/(3×105 m/s) = 3×10−24 N·s. That’s tiny but that’s only because newtons and seconds are enormous units at the (sub-)atomic scale. As for the distance, we may want to use the thickness of a playing card as a starter, as that’s what Young used when establishing the experimental fact of light interfering with itself. Now, playing cards in Young’s time were obviously rougher than those today, but let’s take the smaller distance: modern cards are as thin as 0.3 mm. Still, that distance is associated with a value of θ that is equal to 13.6 million. Hence, the density of our wavefunction is enormous at this scale, and it’s a bit of a miracle that Young could see any interference at all ! As shown in the table below, we only get meaningful values (remember: θ is a phase angle) when we go down to the nanometer scale (10−9 m) or, even better, the angstroms scale ((10−9 m).
So… Well… Again: what can we do with Feynman’s formula? Perhaps he didn’t give us a propagator function but something that is more general (read: more meaningful) at our (limited) level of knowledge. As I’ve been reading Feynman for quite a while now – like three or four years 🙂 – I think… Well… Yes. That’s it. Feynman wants us to think about it. 🙂 Are you joking again, Mr. Feynman? 🙂 So let’s assume the reasonable thing: let’s assume it gives us the amplitude to go from point a to point b by the position vector along some path r. So, then, in line with what we wrote in our previous post, let’s say p·r (momentum over a distance) is the action (S) we’d associate with this particular path (r) and then see where we get. So let’s write the formula like this:
ψ = a·ei·θ = (1/r)·ei·S/ħ = ei·p∙r/ħ/r
We’ll use an index to denote the various paths: r0 is the straight-line path and ri is any (other) path. Now, quantum mechanics tells us we should calculate this amplitude for every possible path. The illustration below shows the straight-line path and two nearby paths. So each of these paths is associated with some amount of action, which we measure in Planck units: θ = S/ħ.
The time interval is given by t = t0 = r0/c, for all paths. Why is the time interval the same for all paths? Because we think of a photon going from some specific point in space and in time to some other specific point in space and in time. Indeed, when everything is said and done, we do think of light as traveling from point a to point b at the speed of light (c). In fact, all of the weird stuff here is all about trying to explain how it does that. 🙂
Now, if we would think of the photon actually traveling along this or that path, then this implies its velocity along any of the nonlinear paths will be larger than c, which is OK. That’s just the weirdness of quantum mechanics, and you should actually not think of the photon actually traveling along one of these paths anyway although we’ll often put it that way. Think of something fuzzier, whatever that may be. 🙂
So the action is energy times time, or momentum times distance. Hence, the difference in action between two paths i and j is given by:
δS = p·rj − p·ri = p·(rj − ri) = p·Δr
I’ll explain the δS < 2πħ/3 thing in a moment. Let’s first pause and think about the uncertainty and how we’re modeling it. We can effectively think of the variation in S as some uncertainty in the action: δS = ΔS = p·Δr. However, if S is also equal to energy times time (S = E·t), and we insist t is the same for all paths, then we must have some uncertainty in the energy, right? Hence, we can write δS as ΔS = ΔE·t. But, of course, E = E = m·c2 = p·c, so we will have an uncertainty in the momentum as well. Hence, the variation in S should be written as:
δS = ΔS = Δp·Δr
That’s just logical thinking: if we, somehow, entertain the idea of a photon going from some specific point in spacetime to some other specific point in spacetime along various paths, then the variation, or uncertainty, in the action will effectively combine some uncertainty in the momentum and the distance. We can calculate Δp as ΔE/c, so we get the following:
δS = ΔS = Δp·Δr = ΔE·Δr/c = ΔE·Δt with Δt = Δr/c
So we have the two expressions for the Uncertainty Principle here: ΔS = Δp·Δr = ΔE·Δt. Just be careful with the interpretation of Δt: it’s just the equivalent of Δr. We just express the uncertainty in distance in seconds using the (absolute) speed of light. We are not changing our spacetime interval: we’re still looking at a photon going from a to b in t seconds, exactly. Let’s now look at the δS < 2πħ/3 thing. If we’re adding two amplitudes (two arrows or vectors, so to speak) and we want the magnitude of the result to be larger than the magnitude of the two contributions, then the angle between them should be smaller than 120 degrees, so that’s 2π/3 rad. The illustration below shows how you can figure that out geometrically.Hence, if S0 is the action for r0, then S1 = S0 + ħ and S2 = S0 + 2·ħ are still good, but S3 = S0 + 3·ħ is not good. Why? Because the difference in the phase angles is Δθ = S1/ħ − S0/ħ = (S0 + ħ)/ħ − S0/ħ = 1 and Δθ = S2/ħ − S0/ħ = (S0 + 2·ħ)/ħ − S0/ħ = 2 respectively, so that’s 57.3° and 114.6° respectively and that’s, effectively, less than 120°. In contrast, for the next path, we find that Δθ = S3/ħ − S0/ħ = (S0 + 3·ħ)/ħ − S0/ħ = 3, so that’s 171.9°. So that amplitude gives us a negative contribution.
Let’s do some calculations using a spreadsheet. To simplify things, we will assume we measure everything (time, distance, force, mass, energy, action,…) in Planck units. Hence, we can simply write: Sn = S0 + n. Of course, n = 1, 2,… etcetera, right? Well… Maybe not. We are measuring action in units of ħ, but do we actually think action comes in units of ħ? I am not sure. It would make sense, intuitively, but… Well… There’s uncertainty on the energy (E) and the momentum (p) of our photon, right? And how accurately can we measure the distance? So there’s some randomness everywhere. 😦 So let’s leave that question open as for now.
We will also assume that the phase angle for S0 is equal to 0 (or some multiple of 2π, if you want). That’s just a matter of choosing the origin of time. This makes it really easy: ΔSn = Sn − S0 = n, and the associated phase angle θn = Δθn is the same. In short, the amplitude for each path reduces to ψn = ei·n/r0. So we need to add these first and then calculate the magnitude, which we can then square to get a probability. Of course, there is also the issue of normalization (probabilities have to add up to one) but let’s tackle that later. For the calculations, we use Euler’s r·ei·θ = r·(cosθ + i·sinθ) = r·cosθ + i·r·sinθ formula. Needless to say, |r·ei·θ|2 = |r|2·|ei·θ|2 = |r|2·(cos2θ + sin2θ) = r. Finally, when adding complex numbers, we add the real and imaginary parts respectively, and we’ll denote the ψ0 + ψ1 +ψ2 + … sum as Ψ.
Now, we also need to see how our ΔS = Δp·Δr works out. We may want to assume that the uncertainty in p and in r will both be proportional to the overall uncertainty in the action. For example, we could try writing the following: ΔSn = Δpn·Δrn = n·Δp1·Δr1. It also makes sense that you may want Δpn and Δrn to be proportional to Δp1 and Δr1 respectively. Combining both, the assumption would be this:
Δpn = √n·Δp1 and Δrn = √n·Δr1
So now we just need to decide how we will distribute ΔS1 = ħ = 1 over Δp1 and Δr1 respectively. For example, if we’d assume Δp1 = 1, then Δr1 = ħ/Δp1 = 1/1 = 1. These are the calculations. I will let you analyze them. 🙂Well… We get a weird result. It reminds me of Feynman’s explanation of the partial reflection of light, shown below, but… Well… That doesn’t make much sense, does it?
Hmm… Maybe it does. 🙂 Look at the graph more carefully. The peaks sort of oscillate out so… Well… That might make sense… 🙂
Does it? Are we doing something wrong here? These amplitudes should reflect the ones that are reflected in those nice animations (like this one, for example, which is part of that’s part of the Wikipedia article on Feynman’s path integral formulation of quantum mechanics). So what’s wrong, if anything? Well… Our paths differ by some fixed amount of action, which doesn’t quite reflect the geometric approach that’s used in those animations. The graph below shows how the distance r varies as a function of n.
If we’d use a model in which the distance would increase linearly or, preferably, exponentially, then we’d get the result we want to get, right?
Well… Maybe. Let’s try it. Hmm… We need to think about the geometry here. Look at the triangle below. If b is the straight-line path (r0), then ac could be one of the crooked paths (rn). To simplify, we’ll assume isosceles triangles, so a equals c and, hence, rn = 2·a = 2·c. We will also assume the successive paths are separated by the same vertical distance (h = h1) right in the middle, so hb = hn = n·h1. It is then easy to show the following:This gives the following graph for rn = 10 and h1 = 0.01.
Is this the right step increase? Not sure. We can vary the values in our spreadsheet. Let’s first build it. The photon will have to travel faster in order to cover the extra distance in the same time, so its momentum will be higher. Let’s think about the velocity. Let’s start with the first path (n = 1). In order to cover the extra distance Δr1, the velocity c1 must be equal to (r0 + Δr1)/t = r0/t + Δr1/t = c + Δr1/t = c0 + Δr1/t. We can write c1 as c1 = c0 + Δc1, so Δc1 = Δr1/t. Now, the ratio of p1 and p0 will be equal to the ratio of c1 and c0 because p1/p0 = (mc1)/mc0) = c1/c0. Hence, we have the following formula for p1:
p1 = p0·c1/c0 = p0·(c0 + Δc1)/c0 = p0·[1 + Δr1/(c0·t) = p0·(1 + Δr1/r0)
For pn, the logic is the same, so we write:
pn = p0·cn/c0 = p0·(c0 + Δcn)/c0 = p0·[1 + Δrn/(c0·t) = p0·(1 + Δrn/r0)
Let’s do the calculations, and let’s use meaningful values, so the nanometer scale and actual values for Planck’s constant and the photon momentum. The results are shown below.
Pretty interesting. In fact, this looks really good. The probability first swings around wildly, because of these zones of constructive and destructive interference, but then stabilizes. [Of course, I would need to normalize the probabilities, but you get the idea, right?] So… Well… I think we get a very meaningful result with this model. Sweet ! 🙂 I’m lovin’ it ! 🙂 And, here you go, this is (part of) the calculation table, so you can see what I am doing. 🙂
The graphs below look even better: I just changed the h1/r0 ratio from 1/100 to 1/10. The probability stabilizes almost immediately. 🙂 So… Well… It’s not as fancy as the referenced animation, but I think the educational value of this thing here is at least as good ! 🙂
🙂 This is good stuff… 🙂
Post scriptum (19 September 2017): There is an obvious inconsistency in the model above, and in the calculations. We assume there is a path r1 = , r2, r2,etcetera, and then we calculate the action for it, and the amplitude, and then we add the amplitude to the sum. But, surely, we should count these paths twice, in two-dimensional space, that is. Think of the graph: we have positive and negative interference zones that are sort of layered around the straight-line path, as shown below.
In three-dimensional space, these lines become surfaces. Hence, rather than adding one arrow for every δ having one contribution only, we may want to add… Well… In three-dimensional space, the formula for the surface around the straight-line path would probably look like π·hn·r1, right? Hmm… Interesting idea. I changed my spreadsheet to incorporate that idea, and I got the graph below. It’s a nonsensical result, because the probability does swing around, but it gradually spins out of control: it never stabilizes.That’s because we increase the weight of the paths that are further removed from the center. So… Well… We shouldn’t be doing that, I guess. 🙂 I’ll you look for the right formula, OK? Let me know when you found it. 🙂
About three weeks ago, I brought my most substantial posts together in one document: it’s the Deep Blue page of this site. I also published it on Amazon/Kindle. It’s nice. It crowns many years of self-study, and many nights of short and bad sleep – as I was mulling over yet another paradox haunting me in my dreams. It’s been an extraordinary climb but, frankly, the view from the top is magnificent. 🙂
The offer is there: anyone who is willing to go through it and offer constructive and/or substantial comments will be included in the book’s acknowledgements section when I go for a second edition (which it needs, I think). First person to be acknowledged here is my wife though, Maria Elena Barron, as she has given me the spacetime and, more importantly, the freedom to take this bull by its horns.
Below I just copy the foreword, just to give you a taste of it. 🙂
Another introduction to quantum mechanics? Yep. I am not hoping to sell many copies, but I do hope my unusual background—I graduated as an economist, not as a physicist—will encourage you to take on the challenge and grind through this.
I’ve always wanted to thoroughly understand, rather than just vaguely know, those quintessential equations: the Lorentz transformations, the wavefunction and, above all, Schrödinger’s wave equation. In my bookcase, I’ve always had what is probably the most famous physics course in the history of physics: Richard Feynman’s Lectures on Physics, which have been used for decades, not only at Caltech but at many of the best universities in the world. Plus a few dozen other books. Popular books—which I now regret I ever read, because they were an utter waste of time: the language of physics is math and, hence, one should read physics in math—not in any other language.
But Feynman’s Lectures on Physics—three volumes of about fifty chapters each—are not easy to read. However, the experimental verification of the existence of the Higgs particle in CERN’s LHC accelerator a couple of years ago, and the award of the Nobel prize to the scientists who had predicted its existence (including Peter Higgs and François Englert), convinced me it was about time I take the bull by its horns. While, I consider myself to be of average intelligence only, I do feel there’s value in the ideal of the ‘Renaissance man’ and, hence, I think stuff like this is something we all should try to understand—somehow. So I started to read, and I also started a blog (www.readingfeynman.org) to externalize my frustration as I tried to cope with the difficulties involved. The site attracted hundreds of visitors every week and, hence, it encouraged me to publish this booklet.
So what is it about? What makes it special? In essence, it is a common-sense introduction to the key concepts in quantum physics. However, while common-sense, it does not shy away from the math, which is complicated, but not impossible. So this little book is surely not a Guide to the Universe for Dummies. I do hope it will guide some Not-So-Dummies. It basically recycles what I consider to be my more interesting posts, but combines them in a comprehensive structure.
It is a bit of a philosophical analysis of quantum mechanics as well, as I will – hopefully – do a better job than others in distinguishing the mathematical concepts from what they are supposed to describe, i.e. physical reality.
Last but not least, it does offer some new didactic perspectives. For those who know the subject already, let me briefly point these out:
I. Few, if any, of the popular writers seems to have noted that the argument of the wavefunction (θ = E·t – p·t) – using natural units (hence, the numerical value of ħ and c is one), and for an object moving at constant velocity (hence, x = v·t) – can be written as the product of the proper time of the object and its rest mass:
θ = E·t – p·x = E·t − p·x = mv·t − mv·v·x = mv·(t − v·x)
⇔ θ = m0·(t − v·x)/√(1 – v2) = m0·t’
Hence, the argument of the wavefunction is just the proper time of the object with the rest mass acting as a scaling factor for the time: the internal clock of the object ticks much faster if it’s heavier. This symmetry between the argument of the wavefunction of the object as measured in its own (inertial) reference frame, and its argument as measured by us, in our own reference frame, is remarkable, and allows to understand the nature of the wavefunction in a more intuitive way.
While this approach reflects Feynman’s idea of the photon stopwatch, the presentation in this booklet generalizes the concept for all wavefunctions, first and foremost the wavefunction of the matter-particles that we’re used to (e.g. electrons).
II. Few, if any, have thought of looking at Schrödinger’s wave equation as an energy propagation mechanism. In fact, when helping my daughter out as she was trying to understand non-linear regression (logit and Poisson regressions), it suddenly realized we can analyze the wavefunction as a link function that connects two physical spaces: the physical space of our moving object, and a physical energy space.
Re-inserting Planck’s quantum of action in the argument of the wavefunction – so we write θ as θ = (E/ħ)·t – (p/ħ)·x = [E·t – p·x]/ħ – we may assign a physical dimension to it: when interpreting ħ as a scaling factor only (and, hence, when we only consider its numerical value, not its physical dimension), θ becomes a quantity expressed in newton·meter·second, i.e. the (physical) dimension of action. It is only natural, then, that we would associate the real and imaginary part of the wavefunction with some physical dimension too, and a dimensional analysis of Schrödinger’s equation tells us this dimension must be energy.
This perspective allows us to look at the wavefunction as an energy propagation mechanism, with the real and imaginary part of the probability amplitude interacting in very much the same way as the electric and magnetic field vectors E and B. This leads me to the next point, which I make rather emphatically in this booklet: the propagation mechanism for electromagnetic energy – as described by Maxwell’s equations – is mathematically equivalent to the propagation mechanism that’s implicit in the Schrödinger equation.
I am, therefore, able to present the Schrödinger equation in a much more coherent way, describing not only how this famous equation works for electrons, or matter-particles in general (i.e. fermions or spin-1/2 particles), which is probably the only use of the Schrödinger equation you are familiar with, but also how it works for bosons, including the photon, of course, but also the theoretical zero-spin boson!
In fact, I am personally rather proud of this. Not because I am doing something that hasn’t been done before (I am sure many have come to the same conclusions before me), but because one always has to trust one’s intuition. So let me say something about that third innovation: the photon wavefunction.
III. Let me tell you the little story behind my photon wavefunction. One of my acquaintances is a retired nuclear scientist. While he knew I was delving into it all, I knew he had little time to answer any of my queries. However, when I asked him about the wavefunction for photons, he bluntly told me photons didn’t have a wavefunction. I should just study Maxwell’s equations and that’s it: there’s no wavefunction for photons: just this traveling electric and a magnetic field vector. Look at Feynman’s Lectures, or any textbook, he said. None of them talk about photon wavefunctions. That’s true, but I knew he had to be wrong. I mulled over it for several months, and then just sat down and started doing to fiddle with Maxwell’s equations, assuming the oscillations of the E and B vector could be described by regular sinusoids. And – Lo and behold! – I derived a wavefunction for the photon. It’s fully equivalent to the classical description, but the new expression solves the Schrödinger equation, if we modify it in a rather logical way: we have to double the diffusion constant, which makes sense, because E and B give you two waves for the price of one!
In any case, I am getting ahead of myself here, and so I should wrap up this rather long introduction. Let me just say that, through my rather long journey in search of understanding – rather than knowledge alone – I have learned there are so many wrong answers out there: wrong answers that hamper rather than promote a better understanding. Moreover, I was most shocked to find out that such wrong answers are not the preserve of amateurs alone! This emboldened me to write what I write here, and to publish it. Quantum mechanics is a logical and coherent framework, and it is not all that difficult to understand. One just needs good pointers, and that’s what I want to provide here.
As of now, it focuses on the mechanics in particular, i.e. the concept of the wavefunction and wave equation (better known as Schrödinger’s equation). The other aspect of quantum mechanics – i.e. the idea of uncertainty as implied by the quantum idea – will receive more attention in a later version of this document. I should also say I will limit myself to quantum electrodynamics (QED) only, so I won’t discuss quarks (i.e. quantum chromodynamics, which is an entirely different realm), nor will I delve into any of the other more recent advances of physics.
In the end, you’ll still be left with lots of unanswered questions. However, that’s quite OK, as Richard Feynman himself was of the opinion that he himself did not understand the topic the way he would like to understand it. But then that’s exactly what draws all of us to quantum physics: a common search for a deep and full understanding of reality, rather than just some superficial description of it, i.e. knowledge alone.
So let’s get on with it. I am not saying this is going to be easy reading. In fact, I blogged about much easier stuff than this in my blog—treating only aspects of the whole theory. This is the whole thing, and it’s not easy to swallow. In fact, it may well too big to swallow as a whole. But please do give it a try. I wanted this to be an intuitive but formally correct introduction to quantum math. However, when everything is said and done, you are the only who can judge if I reached that goal.
Of course, I should not forget the acknowledgements but… Well… It was a rather lonely venture, so I am only going to acknowledge my wife here, Maria, who gave me all of the spacetime and all of the freedom I needed, as I would get up early, or work late after coming home from my regular job. I sacrificed weekends, which we could have spent together, and – when mulling over yet another paradox – the nights were often short and bad. Frankly, it’s been an extraordinary climb, but the view from the top is magnificent.
I just need to insert one caution, my site (www.readingfeynman.org) includes animations, which make it much easier to grasp some of the mathematical concepts that I will be explaining. Hence, I warmly recommend you also have a look at that site, and its Deep Blue page in particular – as that page has the same contents, more or less, but the animations make it a much easier read.
Have fun with it!
Jean Louis Van Belle, BA, MA, BPhil, Drs.
Post scriptum note added on 11 July 2016: This is one of the more speculative posts which led to my e-publication analyzing the wavefunction as an energy propagation. With the benefit of hindsight, I would recommend you to immediately the more recent exposé on the matter that is being presented here, which you can find by clicking on the provided link. In fact, I actually made some (small) mistakes when writing the post below.
In my previous post, I introduced the elementary wavefunction of a particle with zero rest mass in free space (i.e. the particle also has zero potential). I wrote that wavefunction as ei(kx − ωt) = ei(x/2 − t/2) = cos[(x−t)/2] + i∙sin[(x−t)/2], and we can represent that function as follows:
If the real and imaginary axis in the image above are the y- and z-axis respectively, then the x-axis here is time, so here we’d be looking at the shape of the wavefunction at some fixed point in space.
Now, we could also look at its shape at some fixed in point in time, so the x-axis would then represent the spatial dimension. Better still, we could animate the illustration to incorporate both the temporal as well as the spatial dimension. The following animation does the trick quite well:
Please do note that space is one-dimensional here: the y- and z-axis represent the real and imaginary part of the wavefunction, not the y- or z-dimension in space.
You’ve seen this animation before, of course: I took it from Wikipedia, and it actually represents the electric field vector (E) for a circularly polarized electromagnetic wave. To get a complete picture of the electromagnetic wave, we should add the magnetic field vector (B), which is not shown here. We’ll come back to that later. Let’s first look at our zero-mass particle denuded of all properties, so that’s not an electromagnetic wave—read: a photon. No. We don’t want to talk charges here.
OK. So far so good. A zero-mass particle in free space. So we got that ei(x/2 − t/2) = cos[(x−t)/2] + i∙sin[(x−t)/2] wavefunction. We got that function assuming the following:
- Time and distance are measured in equivalent units, so c = 1. Hence, the classical velocity (v) of our zero-mass particle is equal to 1, and we also find that the energy (E), mass (m) and momentum (p) of our particle are numerically the same. We wrote: E = m = p, using the p = m·v (for v = c) and the E = m∙c2 formulas.
- We also assumed that the quantum of energy (and, hence, the quantum of mass, and the quantum of momentum) was equal to ħ/2, rather than ħ. The de Broglie relations (k = p/ħ and ω = E/ħ) then gave us the rather particular argument of our wavefunction: kx − ωt = x/2 − t/2.
The latter hypothesis (E = m = p = ħ/2) is somewhat strange at first but, as I showed in that post of mine, it avoids an apparent contradiction: if we’d use ħ, then we would find two different values for the phase and group velocity of our wavefunction. To be precise, we’d find v for the group velocity, but v/2 for the phase velocity. Using ħ/2 solves that problem. In addition, using ħ/2 is consistent with the Uncertainty Principle, which tells us that ΔxΔp = ΔEΔt = ħ/2.
OK. Take a deep breath. Here I need to say something about dimensions. If we’re saying that we’re measuring time and distance in equivalent units – say, in meter, or in seconds – then we are not saying that they’re the same. The dimension of time and space is fundamentally different, as evidenced by the fact that, for example, time flows in one direction only, as opposed to x. To be precise, we assumed that x and t become countable variables themselves at some point in time. However, if we’re at t = 0, then we’d count time as t = 1, 2, etcetera only. In contrast, at the point x = 0, we can go to x = +1, +2, etcetera but we may also go to x = −1, −2, etc.
I have to stress this point, because what follows will require some mental flexibility. In fact, we often talk about natural units, such as Planck units, which we get from equating fundamental constants, such as c, or ħ, to 1, but then we often struggle to interpret those units, because we fail to grasp what it means to write c = 1, or ħ = 1. For example, writing c = 1 implies we can measure distance in seconds, or time in meter, but it does not imply that distance becomes time, or vice versa. We still need to keep track of whether or not we’re talking a second in time, or a second in space, i.e. c meter, or, conversely, whether we’re talking a meter in space, or a meter in time, i.e. 1/c seconds. We can make the distinction in various ways. For example, we could mention the dimension of each equation between brackets, so we’d write: t = 1×10−15 s [t] ≈ 299.8×10−9 m [t]. Alternatively, we could put a little subscript (like t, or d), so as to make sure it’s clear our meter is a a ‘light-meter’, so we’d write: t = 1×10−15 s ≈ 299.8×10−9 mt. Likewise, we could add a little subscript when measuring distance in light-seconds, so we’d write x = 3×108 m ≈ 1 sd, rather than x = 3×108 m [x] ≈ 1 s [x].
If you wish, we could refer to the ‘light-meter’ as a ‘time-meter’ (or a meter of time), and to the light-second as a ‘distance-second’ (or a second of distance). It doesn’t matter what you call it, or how you denote it. In fact, you will never hear of a meter of time, nor will you ever see those subscripts or brackets. But that’s because physicists always keep track of the dimensions of an equation, and so they know. They know, for example, that the dimension of energy combines the dimensions of both force as well as distance, so we write: [energy] = [force]·[distance]. Read: energy amounts to applying a force over a distance. Likewise, momentum amounts to applying some force over some time, so we write: [momentum] = [force]·[time]. Using the usual symbols for energy, momentum, force, distance and time respectively, we can write this as [E] = [F]·[x] and [p] = [F]·[t]. Using the units you know, i.e. joule, newton, meter and seconds, we can also write this as: 1 J = 1 N·m and 1…
Hey! Wait a minute! What’s that N·s unit for momentum? Momentum is mass times velocity, isn’t it? It is. But it amounts to the same. Remember that mass is a measure for the inertia of an object, and so mass is measured with reference to some force (F) and some acceleration (a): F = m·a ⇔ m = F/a. Hence, [m] = kg = [F/a] = N/(m/s2) = N·s2/m. [Note that the m in the brackets is symbol for mass but the other m is a meter!] So the unit of momentum is (N·s2/m)·(m/s) = N·s = newton·second.
Now, the dimension of Planck’s constant is the dimension of action, which combines all dimensions: force, time and distance. We write: ħ ≈ 1.0545718×10−34 N·m·s (newton·meter·second). That’s great, and I’ll show why in a moment. But, at this point, you should just note that when we write that E = m = p = ħ/2, we’re just saying they are numerically the same. The dimensions of E, m and p are not the same. So what we’re really saying is the following:
- The quantum of energy is ħ/2 newton·meter ≈ 0.527286×10−34 N·m.
- The quantum of momentum is ħ/2 newton·second ≈ 0.527286×10−34 N·s.
What’s the quantum of mass? That’s where the equivalent units come in. We wrote: 1 kg = 1 N·s2/m. So we could substitute the distance unit in this equation (m) by sd/c = sd/(3×108). So we get: 1 kg = 3×108 N·s2/sd. Can we scrap both ‘seconds’ and say that the quantum of mass (ħ/2) is equal to the quantum of momentum? Think about it.
The answer is… Yes and no—but much more no than yes! The two sides of the equation are only numerically equal, but we’re talking a different dimension here. If we’d write that 1 kg = 0.527286×10−34 N·s2/sd = 0.527286×10−34 N·s, you’d be equating two dimensions that are fundamentally different: space versus time. To reinforce the point, think of it the other way: think of substituting the second (s) for 3×108 m. Again, you’d make a mistake. You’d have to write 0.527286×10−34 N·(mt)2/m, and you should not assume that a time-meter is equal to a distance-meter. They’re equivalent units, and so you can use them to get some number right, but they’re not equal: what they measure, is fundamentally different. A time-meter measures time, while a distance-meter measure distance. It’s as simple as that. So what is it then? Well… What we can do is remember Einstein’s energy-mass equivalence relation once more: E = m·c2 (and m is the mass here). Just check the dimensions once more: [m]·[c2] = (N·s2/m)·(m2/s2) = N·m. So we should think of the quantum of mass as the quantum of energy, as energy and mass are equivalent, really.
Back to the wavefunction
The beauty of the construct of the wavefunction resides in several mathematical properties of this construct. The first is its argument:
θ = kx − ωt, with k = p/ħ and ω = E/ħ
Its dimension is the dimension of an angle: we express in it in radians. What’s a radian? You might think that a radian is a distance unit because… Well… Look at how we measure an angle in radians below:
But you’re wrong. An angle’s measurement in radians is numerically equal to the length of the corresponding arc of the unit circle but… Well… Numerically only. 🙂 Just do a dimensional analysis of θ = kx − ωt = (p/ħ)·x − (E/ħ)·t. The dimension of p/ħ is (N·s)/(N·m·s) = 1/m = m−1, so we get some quantity expressed per meter, which we then multiply by x, so we get a pure number. No dimension whatsoever! Likewise, the dimension of E/ħ is (N·m)/(N·m·s) = 1/s = s−1, which we then multiply by t, so we get another pure number, which we then add to get our argument θ. Hence, Planck’s quantum of action (ħ) does two things for us:
- It expresses p and E in units of ħ.
- It sorts out the dimensions, ensuring our argument is a dimensionless number indeed.
In fact, I’d say the ħ in the (p/ħ)·x term in the argument is a different ħ than the ħ in the (E/ħ)·t term. Huh? What? Yes. Think of the distinction I made between s and sd, or between m and mt. Both were numerically the same: they captured a magnitude, but they measured different things. We’ve got the same thing here:
- The meter (m) in ħ ≈ 1.0545718×10−34 N·m·s in (p/ħ)·x is the dimension of x, and so it gets rid of the distance dimension. So the m in ħ ≈ 1.0545718×10−34 N·m·s goes, and what’s left measures p in terms of units equal to 1.0545718×10−34 N·s, so we get a pure number indeed.
- Likewise, the second (s) in ħ ≈ 1.0545718×10−34 N·m·s in (E/ħ)·t is the dimension of t, and so it gets rid of the time dimension. So the s in ħ ≈ 1.0545718×10−34 N·m·s goes, and what’s left measures E in terms of units equal to 1.0545718×10−34 N·m, so we get another pure number.
- Adding both gives us the argument θ: a pure number that measures some angle.
That’s why you need to watch out when writing θ = (p/ħ)·x − (E/ħ)·t as θ = (p·x − E·t)/ħ or – in the case of our elementary wavefunction for the zero-mass particle – as θ = (x/2 − t/2) = (x − t)/2. You can do it – in fact, you should do when trying to calculate something – but you need to be aware that you’re making abstraction of the dimensions. That’s quite OK, as you’re just calculating something—but don’t forget the physics behind!
You’ll immediately ask: what are the physics behind here? Well… I don’t know. Perhaps nobody knows. As Feynman once famously said: “I think I can safely say that nobody understands quantum mechanics.” But then he never wrote that, and I am sure he didn’t really mean that. And then he said that back in 1964, which is 50 years ago now. 🙂 So let’s try to understand it at least. 🙂
Planck’s quantum of action – 1.0545718×10−34 N·m·s – comes to us as a mysterious quantity. A quantity is more than a a number. A number is something like π or e, for example. It might be a complex number, like eiθ, but that’s still a number. In contrast, a quantity has some dimension, or some combination of dimensions. A quantity may be a scalar quantity (like distance), or a vector quantity (like a field vector). In this particular case (Planck’s ħ or h), we’ve got a physical constant combining three dimensions: force, time and distance—or space, if you want. It’s a quantum, so it comes as a blob—or a lump, if you prefer that word. However, as I see it, we can sort of project it in space as well as in time. In fact, if this blob is going to move in spacetime, then it will move in space as well as in time: t will go from 0 to 1, and x goes from 0 to ± 1, depending on what direction we’re going. So when I write that E = p = ħ/2—which, let me remind you, are two numerical equations, really—I sort of split Planck’s quantum over E = m and p respectively.
You’ll say: what kind of projection or split is that? When projecting some vector, we’ll usually have some sine and cosine, or a 1/√2 factor—or whatever, but not a clean 1/2 factor. Well… I have no answer to that, except that this split fits our mathematical construct. Or… Well… I should say: my mathematical construct. Because what I want to find is this clean Schrödinger equation:
∂ψ/∂t = i·(ħ/2m)·∇2ψ = i·∇2ψ for m = ħ/2
Now I can only get this equation if (1) E = m = p and (2) if m = ħ/2 (which amounts to writing that E = p = m = ħ/2). There’s also the Uncertainty Principle. If we are going to consider the quantum vacuum, i.e. if we’re going to look at space (or distance) and time as count variables, then Δx and Δt in the ΔxΔp = ΔEΔt = ħ/2 equations are ± 1 and, therefore, Δp and ΔE must be ± ħ/2. In any case, I am not going to try to justify my particular projection here. Let’s see what comes out of it.
The quantum vacuum
Schrödinger’s equation for my zero-mass particle (with energy E = m = p = ħ/2) amounts to writing:
- Re(∂ψ/∂t) = −Im(∇2ψ)
- Im(∂ψ/∂t) = Re(∇2ψ)
Now that reminds of the propagation mechanism for the electromagnetic wave, which we wrote as ∂B/∂t = –∇×E and ∂E/∂t = ∇×B, also assuming we measure time and distance in equivalent units. However, we’ll come back to that later. Let’s first study the equation we have, i.e.
ei(kx − ωt) = ei(ħ·x/2 − ħ·t/2)/ħ = ei(x/2 − t/2) = cos[(x−t)/2] + i∙sin[(x−t)/2]
Let’s think some more. What is that ei(x/2 − t/2) function? It’s subject to conceiving time and distance as countable variables, right? I am tempted to say: as discrete variables, but I won’t go that far—not now—because the countability may be related to a particular interpretation of quantum physics. So I need to think about that. In any case… The point is that x can only take on values like 0, 1, 2, etcetera. And the same goes for t. To make things easy, we’ll not consider negative values for x right now (and, obviously, not for t either). But you can easily check it doesn’t make a difference: if you think of the propagation mechanism – which is what we’re trying to model here – then x is always positive, because we’re moving away from some source that caused the wave. In any case, we’ve got a infinite set of points like:
- ei(0/2 − 0/2) = ei(0) = cos(0) + i∙sin(0)
- ei(1/2 − 0/2) = ei(1/2) = cos(1/2) + i∙sin(1/2)
- ei(0/2 − 1/2) = ei(−1/2) = cos(−1/2) + i∙sin(−1/2)
- ei(1/2 − 1/2) = ei(0) = cos(0) + i∙sin(0)
In my previous post, I calculated the real and imaginary part of this wavefunction for x going from 0 to 14 (as mentioned, in steps of 1) and for t doing the same (also in steps of 1), and what we got looked pretty good:
I also said that, if you wonder how the quantum vacuum could possibly look like, you should probably think of these discrete spacetime points, and some complex-valued wave that travels as illustrated above. In case you wonder what’s being illustrated here: the right-hand graph is the cosine value for all possible x = 0, 1, 2,… and t = 0, 1, 2,… combinations, and the left-hand graph depicts the sine values, so that’s the imaginary part of our wavefunction. Taking the absolute square of both gives 1 for all combinations. So it’s obvious we’d need to normalize and, more importantly, we’d have to localize the particle by adding several of these waves with the appropriate contributions. But so that’s not our worry right now. I want to check whether those discrete time and distance units actually make sense. What’s their size? Is it anything like the Planck length (for distance) and/or the Planck time?
Let’s see. What are the implications of our model? The question here is: if ħ/2 is the quantum of energy, and the quantum of momentum, what’s the quantum of force, and the quantum of time and/or distance?
Huh? Yep. We treated distance and time as countable variables above, but now we’d like to express the difference between x = 0 and x = 1 and between t = 0 and t = 1 in the units we know, this is in meter and in seconds. So how do we go about that? Do we have enough equations here? Not sure. Let’s see…
We obviously need to keep track of the various dimensions here, so let’s refer to that discrete distance and time unit as tP and lP respectively. The subscript (P) refers to Planck, and the l refers to a length, but we’re likely to find something else than Planck units. I just need placeholder symbols here. To be clear: tP and lP are expressed in meter and seconds respectively, just like the actual Planck time and distance, which are equal to 5.391×10−44 s (more or less) and 1.6162×10−35 m (more or less) respectively. As I mentioned above, we get these Planck units by equating fundamental physical constants to 1. Just check it: (1.6162×10−35 m)/(5.391×10−44 s) = c ≈ 3×108 m/s. So the following relation must be true: lP = c·tP, or lP/tP = c.
Now, as mentioned above, there must be some quantum of force as well, which we’ll write as FP, and which is – obviously – expressed in newton (N). So we have:
- E = ħ/2 ⇒ 0.527286×10−34 N·m = FP·lP N·m
- p = ħ/2 ⇒ 0.527286×10−34 N·s = FP·tP N·s
Let’s try to divide both formulas: E/p = (FP·lP N·m)/(FP·tP N·s) = lP/tP m/s = lP/tP m/s = c m/s. That’s consistent with the E/p = c equation. Hmm… We found what we knew already. My model is not fully determined, it seems. 😦
What about the following simplistic approach? E is numerically equal to 0.527286×10−34, and its dimension is [E] = [F]·[x], so we write: E = 0.527286×10−34·[E] = 0.527286×10−34·[F]·[x]. Hence, [x] = [E]/[F] = (N·m)/N = m. That just confirms what we already know: the quantum of distance (i.e. our fundamental unit of distance) can be expressed in meter. But our model does not give that fundamental unit. It only gives us its dimension (meter), which is stuff we knew from the start. 😦
Let’s try something else. Let’s just accept that Planck length and time, so we write:
- lP = 1.6162×10−35 m
- tP = 5.391×10−44 s
Now, if the quantum of action is equal to ħ N·m·s = FP·lP·tP N·m·s = 1.0545718×10−34 N·m·s, and if the two definitions of lP and tP above hold, then 1.0545718×10−34 N·m·s = (FP N)×(1.6162×10−35 m)×(5.391×10−44 s) ≈ FP 8.713×10−79 N·m·s ⇔ FP ≈ 1.21×1044 N.
Does that make sense? It does according to Wikipedia, but how do we relate this to our E = p = m = ħ/2 equations? Let’s try this:
- EP = (1.0545718×10−34 N·m·s)/(5.391×10−44 s) = 1.956×109 J. That corresponds to the regular Planck energy.
- pP = (1.0545718×10−34 N·m·s)/(1.6162×10−35 m) = 0.6525 N·s. That corresponds to the regular Planck momentum.
Is EP = pP? Let’s substitute: 1.956×109 N·m = 1.956×109 N·(s/c) = 1.956×109/2.998×109 N·s = 0.6525 N·s. So, yes, it comes out alright. In fact, I omitted the 1/2 factor in the calculations, but it doesn’t matter: it does come out alright. So I did not prove that the difference between my x = 0 and x = 1 points (or my t = 0 and t = 1 points) is equal to the Planck length (or the Planck time unit), but I did show my theory is, at the very least, compatible with those units. That’s more than enough for now. And I’ll come surely come back to it in my next post. 🙂
Post Scriptum: One must solve the following equations to get the fundamental Planck units:
We have five fundamental equations for five fundamental quantities respectively: tP, lP, FP, mP, and EP respectively, so that’s OK: it’s a fully determined system alright! But where do the expressions with G, kB (the Boltzmann constant) and ε0 come from? What does it mean to equate those constants to 1? Well… I need to think about that, and I’ll get back to you on it. 🙂
You know the two de Broglie relations, also known as matter-wave equations:
f = E/h and λ = h/p
You’ll find them in almost any popular account of quantum mechanics, and the writers of those popular books will tell you that f is the frequency of the ‘matter-wave’, and λ is its wavelength. In fact, to add some more weight to their narrative, they’ll usually write them in a somewhat more sophisticated form: they’ll write them using ω and k. The omega symbol (using a Greek letter always makes a big impression, doesn’t it?) denotes the angular frequency, while k is the so-called wavenumber. Now, k = 2π/λ and ω = 2π·f and, therefore, using the definition of the reduced Planck constant, i.e. ħ = h/2π, they’ll write the same relations as:
- λ = h/p = 2π/k ⇔ k = 2π·p/h
- f = E/h = (ω/2π)
⇒ k = p/ħ and ω = E/ħ
They’re the same thing: it’s just that working with angular frequencies and wavenumbers is more convenient, from a mathematical point of view that is: it’s why we prefer expressing angles in radians rather than in degrees (k is expressed in radians per meter, while ω is expressed in radians per second). In any case, the ‘matter wave’ – even Wikipedia uses that term now – is, of course, the amplitude, i.e. the wave-function ψ(x, t), which has a frequency and a wavelength, indeed. In fact, as I’ll show in a moment, it’s got two frequencies: one temporal, and one spatial. I am modest and, hence, I’ll admit it took me quite a while to fully distinguish the two frequencies, and so that’s why I always had trouble connecting these two ‘matter wave’ equations.
Indeed, if they represent the same thing, they must be related, right? But how exactly? It should be easy enough. The wavelength and the frequency must be related through the wave velocity, so we can write: f·λ = v, with v the velocity of the wave, which must be equal to the classical particle velocity, right? And then momentum and energy are also related. To be precise, we have the relativistic energy-momentum relationship: p·c = mv·v·c = mv·c2·v/c = E·v/c. So it’s just a matter of substitution. We should be able to go from one equation to the other, and vice versa. Right?
Well… No. It’s not that simple. We can start with either of the two equations but it doesn’t work. Try it. Whatever substitution you try, there’s no way you can derive one of the two equations above from the other. The fact that it’s impossible is evidenced by what we get when we’d multiply both equations. We get:
- f·λ = (E/h)·(h/p) = E/p
- v = f·λ ⇒ f·λ = v = E/p ⇔ E = v·p = v·(m·v)
⇒ E = m·v2
Huh? What kind of formula is that? E = m·v2? That’s a formula you’ve never ever seen, have you? It reminds you of the kinetic energy formula of course—K.E. = m·v2/2—but… That factor 1/2 should not be there. Let’s think about it for a while. First note that this E = m·v2 relation makes perfectly sense if v = c. In that case, we get Einstein’s mass-energy equivalence (E = m·c2), but that’s besides the point here. The point is: if v = c, then our ‘particle’ is a photon, really, and then the E = h·f is referred to as the Planck-Einstein relation. The wave velocity is then equal to c and, therefore, f·λ = c, and so we can effectively substitute to find what we’re looking for:
E/p = (h·f)/(h/λ) = f·λ = c ⇒ E = p·c
So that’s fine: we just showed that the de Broglie relations are correct for photons. [You remember that E = p·c relation, no? If not, check out my post on it.] However, while that’s all nice, it is not what the de Broglie equations are about: we’re talking the matter-wave here, and so we want to do something more than just re-confirm that Planck-Einstein relation, which you can interpret as the limit of the de Broglie relations for v = c. In short, we’re doing something wrong here! Of course, we are. I’ll tell you what exactly in a moment: it’s got to do with the fact we’ve got two frequencies really.
Let’s first try something else. We’ve been using the relativistic E = mv·c2 equation above. Let’s try some other energy concept: let’s substitute the E in the f = E/h by the kinetic energy and then see where we get—if anywhere at all. So we’ll use the Ekinetic = m∙v2/2 equation. We can then use the definition of momentum (p = m∙v) to write E = p2/(2m), and then we can relate the frequency f to the wavelength λ using the v = λ∙f formula once again. That should work, no? Let’s do it. We write:
- E = p2/(2m)
- E = h∙f = h·v/λ
⇒ λ = h·v/E = h·v/(p2/(2m)) = h·v/[m2·v2/(2m)] = h/[m·v/2] = 2∙h/p
So we find λ = 2∙h/p. That is almost right, but not quite: that factor 2 should not be there. Well… Of course you’re smart enough to see it’s just that factor 1/2 popping up once more—but as a reciprocal, this time around. 🙂 So what’s going on? The honest answer is: you can try anything but it will never work, because the f = E/h and λ = h/p equations cannot be related—or at least not so easily. The substitutions above only work if we use that E = m·v2 energy concept which, you’ll agree, doesn’t make much sense—at first, at least. Again: what’s going on? Well… Same honest answer: the f = E/h and λ = h/p equations cannot be related—or at least not so easily—because the wave equation itself is not so easy.
Let’s review the basics once again.
The amplitude of a particle is represented by a wavefunction. If we have no information whatsoever on its position, then we usually write that wavefunction as the following complex-valued exponential:
ψ(x, t) = a·e−i·[(E/ħ)·t − (p/ħ)∙x] = a·e−i·(ω·t − k∙x) = a·ei(k∙x−ω·t) = a·eiθ = a·(cosθ + i·sinθ)
θ is the so-called phase of our wavefunction and, as you can see, it’s the argument of a wavefunction indeed, with temporal frequency ω and spatial frequency k (if we choose our x-axis so its direction is the same as the direction of k, then we can substitute the k and x vectors for the k and x scalars, so that’s what we’re doing here). Now, we know we shouldn’t worry too much about a, because that’s just some normalization constant (remember: all probabilities have to add up to one). However, let’s quickly develop some logic here. Taking the absolute square of this wavefunction gives us the probability of our particle being somewhere in space at some point in time. So we get the probability as a function of x and t. We write:
P(x ,t) = |a·e−i·[(E/ħ)·t − (p/ħ)∙x]|2 = a2
As all probabilities have to add up to one, we must assume we’re looking at some box in spacetime here. So, if the length of our box is Δx = x2 − x1, then (Δx)·a2 = (x2−x1)·a2 = 1 ⇔ Δx = 1/a2. [We obviously simplify the analysis by assuming a one-dimensional space only here, but the gist of the argument is essentially correct.] So, freezing time (i.e. equating t to some point t = t0), we get the following probability density function:
That’s simple enough. The point is: the two de Broglie equations f = E/h and λ = h/p give us the temporal and spatial frequencies in that ψ(x, t) = a·e−i·[(E/ħ)·t − (p/ħ)∙x] relation. As you can see, that’s an equation that implies a much more complicated relationship between E/ħ = ω and p/ħ = k. Or… Well… Much more complicated than what one would think of at first.
To appreciate what’s being represented here, it’s good to play a bit. We’ll continue with our simple exponential above, which also illustrates how we usually analyze those wavefunctions: we either assume we’re looking at the wavefunction in space at some fixed point in time (t = t0) or, else, at how the wavefunction changes in time at some fixed point in space (x = x0). Of course, we know that Einstein told us we shouldn’t do that: space and time are related and, hence, we should try to think of spacetime, i.e. some ‘kind of union’ of space and time—as Minkowski famously put it. However, when everything is said and done, mere mortals like us are not so good at that, and so we’re sort of condemned to try to imagine things using the classical cut-up of things. 🙂 So we’ll just an online graphing tool to play with that a·ei(k∙x−ω·t) = a·eiθ = a·(cosθ + i·sinθ) formula.
Compare the following two graps, for example. Just imagine we either look at how the wavefunction behaves at some point in space, with the time fixed at some point t = t0, or, alternatively, that we look at how the wavefunction behaves in time at some point in space x = x0. As you can see, increasing k = p/ħ or increasing ω = E/ħ gives the wavefunction a higher ‘density’ in space or, alternatively, in time.
That makes sense, intuitively. In fact, when thinking about how the energy, or the momentum, affects the shape of the wavefunction, I am reminded of an airplane propeller: as it spins, faster and faster, it gives the propeller some ‘density’, in space as well as in time, as its blades cover more space in less time. It’s an interesting analogy: it helps—me, at least—to think through what that wavefunction might actually represent.
So as to stimulate your imagination even more, you should also think of representing the real and complex part of that ψ = a·ei(k∙x−ω·t) = a·eiθ = a·(cosθ + i·sinθ) formula in a different way. In the graphs above, we just showed the sine and cosine in the same plane but, as you know, the real and the imaginary axis are orthogonal, so Euler’s formula a·eiθ = a·(cosθ + i·sinθ) = a·cosθ + i·a·sinθ = Re(ψ) + i·Im(ψ) may also be graphed as follows:
The illustration above should make you think of yet another illustration you’ve probably seen like a hundred times before: the electromagnetic wave, propagating through space as the magnetic and electric field induce each other, as illustrated below. However, there’s a big difference: Euler’s formula incorporates a phase shift—remember: sinθ = cos(θ − π/2)—and you don’t have that in the graph below. The difference is much more fundamental, however: it’s really hard to see how one could possibly relate the magnetic and electric field to the real and imaginary part of the wavefunction respectively. Having said that, the mathematical similarity makes one think!
Of course, you should remind yourself of what E and B stand for: they represent the strength of the electric (E) and magnetic (B) field at some point x at some time t. So you shouldn’t think of those wavefunctions above as occupying some three-dimensional space. They don’t. Likewise, our wavefunction ψ(x, t) does not occupy some physical space: it’s some complex number—an amplitude that’s associated with each and every point in spacetime. Nevertheless, as mentioned above, the visuals make one think and, as such, do help us as we try to understand all of this in a more intuitive way.
Let’s now look at that energy-momentum relationship once again, but using the wavefunction, rather than those two de Broglie relations.
Energy and momentum in the wavefunction
I am not going to talk about uncertainty here. You know that Spiel. If there’s uncertainty, it’s in the energy or the momentum, or in both. The uncertainty determines the size of that ‘box’ (in spacetime) in which we hope to find our particle, and it’s modeled by a splitting of the energy levels. We’ll say the energy of the particle may be E0, but it might also be some other value, which we’ll write as En = E0 ± n·ħ. The thing to note is that energy levels will always be separated by some integer multiple of ħ, so ħ is, effectively , the quantum of energy for all practical—and theoretical—purposes. We then super-impose the various wave equations to get a wave function that might—or might not—resemble something like this:
Who knows? 🙂 In any case, that’s not what I want to talk about here. Let’s repeat the basics once more: if we write our wavefunction a·e−i·[(E/ħ)·t − (p/ħ)∙x] as a·e−i·[ω·t − k∙x], we refer to ω = E/ħ as the temporal frequency, i.e. the frequency of our wavefunction in time (i.e. the frequency it has if we keep the position fixed), and to k = p/ħ as the spatial frequency (i.e. the frequency of our wavefunction in space (so now we stop the clock and just look at the wave in space). Now, let’s think about the energy concept first. The energy of a particle is generally thought of to consist of three parts:
- The particle’s rest energy m0c2, which de Broglie referred to as internal energy (Eint): it includes the rest mass of the ‘internal pieces’, as Feynman puts it (now we call those ‘internal pieces’ quarks), as well as their binding energy (i.e. the quarks’ interaction energy);
- Any potential energy it may have because of some field (so de Broglie was not assuming the particle was traveling in free space), which we’ll denote by U, and note that the field can be anything—gravitational, electromagnetic: it’s whatever changes the energy because of the position of the particle;
- The particle’s kinetic energy, which we write in terms of its momentum p: m·v2/2 = m2·v2/(2m) = (m·v)2/(2m) = p2/(2m).
So we have one energy concept here (the rest energy) that does not depend on the particle’s position in spacetime, and two energy concepts that do depend on position (potential energy) and/or how that position changes because of its velocity and/or momentum (kinetic energy). The two last bits are related through the energy conservation principle. The total energy is E = mvc2, of course—with the little subscript (v) ensuring the mass incorporates the equivalent mass of the particle’s kinetic energy.
So what? Well… In my post on quantum tunneling, I drew attention to the fact that different potentials , so different potential energies (indeed, as our particle travels one region to another, the field is likely to vary) have no impact on the temporal frequency. Let me re-visit the argument, because it’s an important one. Imagine two different regions in space that differ in potential—because the field has a larger or smaller magnitude there, or points in a different direction, or whatever: just different fields, which corresponds to different values for U1 and U2, i.e. the potential in region 1 versus region 2. Now, the different potential will change the momentum: the particle will accelerate or decelerate as it moves from one region to the other, so we also have a different p1 and p2. Having said that, the internal energy doesn’t change, so we can write the corresponding amplitudes, or wavefunctions, as:
- ψ1(θ1) = Ψ1(x, t) = a·e−iθ1 = a·e−i[(Eint + p12/(2m) + U1)·t − p1∙x]/ħ
- ψ2(θ2) = Ψ2(x, t) = a·e−iθ2 = a·e−i[(Eint + p22/(2m) + U2)·t − p2∙x]/ħ
Now how should we think about these two equations? We are definitely talking different wavefunctions. However, their temporal frequencies ω1 = Eint + p12/(2m) + U1 and ω1 = Eint + p22/(2m) + U2 must be the same. Why? Because of the energy conservation principle—or its equivalent in quantum mechanics, I should say: the temporal frequency f or ω, i.e. the time-rate of change of the phase of the wavefunction, does not change: all of the change in potential, and the corresponding change in kinetic energy, goes into changing the spatial frequency, i.e. the wave number k or the wavelength λ, as potential energy becomes kinetic or vice versa. The sum of the potential and kinetic energy doesn’t change, indeed. So the energy remains the same and, therefore, the temporal frequency does not change. In fact, we need this quantum-mechanical equivalent of the energy conservation principle to calculate how the momentum and, hence, the spatial frequency of our wavefunction, changes. We do so by boldly equating ω1 = Eint + p12/(2m) + U1 and ω2 = Eint + p22/(2m) + U2, and so we write:
ω1 = ω2 ⇔ Eint + p12/(2m) + U1 = Eint + p22/(2m) + U2
⇔ p12/(2m) − p22/(2m) = U2 – U1 ⇔ p22 = (2m)·[p12/(2m) – (U2 – U1)]
⇔ p2 = (p12 – 2m·ΔU )1/2
We played with this in a previous post, assuming that p12 is larger than 2m·ΔU, so as to get a positive number on the right-hand side of the equation for p22, so then we can confidently take the positive square root of that (p12 – 2m·ΔU ) expression to calculate p2. For example, when the potential difference ΔU = U2 – U1 was negative, so ΔU < 0, then we’re safe and sure to get some real positive value for p2.
Having said that, we also contemplated the possibility that p22 = p12 – 2m·ΔU was negative, in which case p2 has to be some pure imaginary number, which we wrote as p2 = i·p’ (so p’ (read: p prime) is a real positive number here). We could work with that: it resulted in an exponentially decreasing factor e−p’·x/ħ that ended up ‘killing’ the wavefunction in space. However, its limited existence still allowed particles to ‘tunnel’ through potential energy barriers, thereby explaining the quantum-mechanical tunneling phenomenon.
This is rather weird—at first, at least. Indeed, one would think that, because of the E/ħ = ω equation, any change in energy would lead to some change in ω. But no! The total energy doesn’t change, and the potential and kinetic energy are like communicating vessels: any change in potential energy is associated with a change in p, and vice versa. It’s a really funny thing. It helps to think it’s because the potential depends on position only, and so it should not have an impact on the temporal frequency of our wavefunction. Of course, it’s equally obvious that the story would change drastically if the potential would change with time, but… Well… We’re not looking at that right now. In short, we’re assuming energy is being conserved in our quantum-mechanical system too, and so that implies what’s described above: no change in ω, but we obviously do have changes in p whenever our particle goes from one region in space to another, and the potentials differ. So… Well… Just remember: the energy conservation principle implies that the temporal frequency of our wave function doesn’t change. Any change in potential, as our particle travels from one place to another, plays out through the momentum.
Now that we know that, let’s look at those de Broglie relations once again.
Re-visiting the de Broglie relations
As mentioned above, we usually think in one dimension only: we either freeze time or, else, we freeze space. If we do that, we can derive some funny new relationships. Let’s first simplify the analysis by re-writing the argument of the wavefunction as:
θ = E·t − p·x
Of course, you’ll say: the argument of the wavefunction is not equal to E·t − p·x: it’s (E/ħ)·t − (p/ħ)∙x. Moreover, θ should have a minus sign in front. Well… Yes, you’re right. We should put that 1/ħ factor in front, but we can change units, and so let’s just measure both E as well as p in units of ħ here. We can do that. No worries. And, yes, the minus sign should be there—Nature choose a clockwise direction for θ—but that doesn’t matter for the analysis hereunder.
The E·t − p·x expression reminds one of those invariant quantities in relativity theory. But let’s be precise here. We’re thinking about those so-called four-vectors here, which we wrote as pμ = (E, px, py, pz) = (E, p) and xμ = (t, x, y, z) = (t, x) respectively. [Well… OK… You’re right. We wrote those four-vectors as pμ = (E, px·c , py·c, pz·c) = (E, p·c) and xμ = (c·t, x, y, z) = (t, x). So what we write is true only if we measure time and distance in equivalent units so we have c = 1. So… Well… Let’s do that and move on.] In any case, what was invariant was not E·t − p·x·c or c·t − x (that’s a nonsensical expression anyway: you cannot subtract a vector from a scalar), but pμ2 = pμpμ = E2 − (p·c)2 = E2 − p2·c2 = E2 − (px2 + py2 + pz2)·c2 and xμ2 = xμxμ = (c·t)2 − x2 = c2·t2 − (x2 + y2 + z2) respectively. [Remember pμpμ and xμxμ are four-vector dot products, so they have that +— signature, unlike the p2 and x2 or a·b dot products, which are just a simple sum of the squared components.] So… Well… E·t − p·x is not an invariant quantity. Let’s try something else.
Let’s re-simplify by equating ħ as well as c to one again, so we write: ħ = c = 1. [You may wonder if it is possible to ‘normalize’ both physical constants simultaneously, but the answer is yes. The Planck unit system is an example.] then our relativistic energy-momentum relationship can be re-written as E/p = 1/v. [If c would not be one, we’d write: E·β = p·c, with β = v/c. So we got E/p = c/β. We referred to β as the relative velocity of our particle: it was the velocity, but measured as a ratio of the speed of light. So here it’s the same, except that we use the velocity symbol v now for that ratio.]
Now think of a particle moving in free space, i.e. without any fields acting on it, so we don’t have any potential changing the spatial frequency of the wavefunction of our particle, and let’s also assume we choose our x-axis such that it’s the direction of travel, so the position vector (x) can be replaced by a simple scalar (x). Finally, we will also choose the origin of our x-axis such that x = 0 zero when t = 0, so we write: x(t = 0) = 0. It’s obvious then that, if our particle is traveling in spacetime with some velocity v, then the ratio of its position x and the time t that it’s been traveling will always be equal to v = x/t. Hence, for that very special position in spacetime (t, x = v·t) – so we’re talking the actual position of the particle in spacetime here – we get: θ = E·t − p·x = E·t − p·v·t = E·t − m·v·v·t= (E − m∙v2)·t. So… Well… There we have the m∙v2 factor.
The question is: what does it mean? How do we interpret this? I am not sure. When I first jotted this thing down, I thought of choosing a different reference potential: some negative value such that it ensures that the sum of kinetic, rest and potential energy is zero, so I could write E = 0 and then the wavefunction would reduce to ψ(t) = e−i·m∙v2·t. Feynman refers to that as ‘choosing the zero of our energy scale such that E = 0’, and you’ll find this in many other works too. However, it’s not that simple. Free space is free space: if there’s no change in potential from one region to another, then the concept of some reference point for the potential becomes meaningless. There is only rest energy and kinetic energy, then. The total energy reduces to E = m (because we chose our units such that c = 1 and, therefore, E = mc2 = m·12 = m) and so our wavefunction reduces to:
ψ(t) = a·e−i·m·(1 − v2)·t
We can’t reduce this any further. The mass is the mass: it’s a measure for inertia, as measured in our inertial frame of reference. And the velocity is the velocity, of course—also as measured in our frame of reference. We can re-write it, of course, by substituting t for t = x/v, so we get:
ψ(x) = a·e−i·m·(1/v − v)·x
For both functions, we get constant probabilities, but a wavefunction that’s ‘denser’ for higher values of m. The (1 − v2) and (1/v − v) factors are different, however: these factors becomes smaller for higher v, so our wavefunction becomes less dense for higher v. In fact, for v = 1 (so for travel at the speed of light, i.e. for photons), we get that ψ(t) = ψ(x) = e0 = 1. [You should use the graphing tool once more, and you’ll see the imaginary part, i.e. the sine of the a·(cosθ + i·sinθ) expression, just vanishes, as sinθ = 0 for θ = 0.]
The wavefunction and relativistic length contraction
Are exercises like this useful? As mentioned above, these constant probability wavefunctions are a bit nonsensical, so you may wonder why I wrote what I wrote. There may be no real conclusion, indeed: I was just fiddling around a bit, and playing with equations and functions. I feel stuff like this helps me to understand what that wavefunction actually is somewhat better. If anything, it does illustrate that idea of the ‘density’ of a wavefunction, in space or in time. What we’ve been doing by substituting x for x = v·t or t for t = x/v is showing how, when everything is said and done, the mass and the velocity of a particle are the actual variables determining that ‘density’ and, frankly, I really like that ‘airplane propeller’ idea as a pedagogic device. In fact, I feel it may be more than just a pedagogic device, and so I’ll surely re-visit it—once I’ve gone through the rest of Feynman’s Lectures, that is. 🙂
That brings me to what I added in the title of this post: relativistic length contraction. You’ll wonder why I am bringing that into a discussion like this. Well… Just play a bit with those (1 − v2) and (1/v − v) factors. As mentioned above, they decrease the density of the wavefunction. In other words, it’s like space is being ‘stretched out’. Also, it can’t be a coincidence we find the same (1 − v2) factor in the relativistic length contraction formula: L = L0·√(1 − v2), in which L0 is the so-called proper length (i.e. the length in the stationary frame of reference) and v is the (relative) velocity of the moving frame of reference. Of course, we also find it in the relativistic mass formula: m = mv = m0/√(1−v2). In fact, things become much more obvious when substituting m for m0/√(1−v2) in that ψ(t) = e−i·m·(1 − v2)·t function. We get:
ψ(t) = a·e−i·m·(1 − v2)·t = a·e−i·m0·√(1−v2)·t
Well… We’re surely getting somewhere here. What if we go back to our original ψ(x, t) = a·e−i·[(E/ħ)·t − (p/ħ)∙x] function? Using natural units once again, that’s equivalent to:
ψ(x, t) = a·e−i·(m·t − p∙x) = a·e−i·[(m0/√(1−v2))·t − (m0·v/√(1−v2)∙x)
= a·e−i·[m0/√(1−v2)]·(t − v∙x)
Interesting! We’ve got a wavefunction that’s a function of x and t, but with the rest mass (or rest energy) and velocity as parameters! Now that really starts to make sense. Look at the (blue) graph for that 1/√(1−v2) factor: it goes from one (1) to infinity (∞) as v goes from 0 to 1 (remember we ‘normalized’ v: it’s a ratio between 0 and 1 now). So that’s the factor that comes into play for t. For x, it’s the red graph, which has the same shape but goes from zero (0) to infinity (∞) as v goes from 0 to 1.
Now that makes sense: the ‘density’ of the wavefunction, in time and in space, increases as the velocity v increases. In space, that should correspond to the relativistic length contraction effect: it’s like space is contracting, as the velocity increases and, therefore, the length of the object we’re watching contracts too. For time, the reasoning is a bit more complicated: it’s our time that becomes more dense and, therefore, our clock that seems to tick faster.
I know I need to explore this further—if only so as to assure you I have not gone crazy. Unfortunately, I have no time to do that right now. Indeed, from time to time, I need to work on other stuff besides this physics ‘hobby’ of mine.
Post scriptum 1: As for the E = m·v2 formula, I also have a funny feeling that it might be related to the fact that, in quantum mechanics, both the real and imaginary part of the oscillation actually matter. You’ll remember that we’d represent any oscillator in physics by a complex exponential, because it eased our calculations. So instead of writing A = A0·cos(ωt + Δ), we’d write: A = A0·ei(ωt + Δ) = A0·cos(ωt + Δ) + i·A0·sin(ωt + Δ). When calculating the energy or intensity of a wave, however, we couldn’t just take the square of the complex amplitude of the wave – remembering that E ∼ A2. No! We had to get back to the real part only, i.e. the cosine or the sine only. Now the mean (or average) value of the squared cosine function (or a squared sine function), over one or more cycles, is 1/2, so the mean of A2 is equal to 1/2 = A02. cos(ωt + Δ). I am not sure, and it’s probably a long shot, but one must be able to show that, if the imaginary part of the oscillation would actually matter – which is obviously the case for our matter-wave – then 1/2 + 1/2 is obviously equal to 1. I mean: try to think of an image with a mass attached to two springs, rather than one only. Does that make sense? 🙂 […] I know: I am just freewheeling here. 🙂
Post scriptum 2: The other thing that this E = m·v2 equation makes me think of is – curiously enough – an eternally expanding spring. Indeed, the kinetic energy of a mass on a spring and the potential energy that’s stored in the spring always add up to some constant, and the average potential and kinetic energy are equal to each other. To be precise: 〈K.E.〉 + 〈P.E.〉 = (1/4)·k·A2 + (1/4)·k·A2 = k·A2/2. It means that, on average, the total energy of the system is twice the average kinetic energy (or potential energy). You’ll say: so what? Well… I don’t know. Can we think of a spring that expands eternally, with the mass on its end not gaining or losing any speed? In that case, v is constant, and the total energy of the system would, effectively, be equal to Etotal = 2·〈K.E.〉 = (1/2)·m·v2/2 = m·v2.
Post scriptum 3: That substitution I made above – substituting x for x = v·t – is kinda weird. Indeed, if that E = m∙v2 equation makes any sense, then E − m∙v2 = 0, of course, and, therefore, θ = E·t − p·x = E·t − p·v·t = E·t − m·v·v·t= (E − m∙v2)·t = 0·t = 0. So the argument of our wavefunction is 0 and, therefore, we get a·e0 = a for our wavefunction. It basically means our particle is where it is. 🙂
Post scriptum 4: This post scriptum – no. 4 – was added later—much later. On 29 February 2016, to be precise. The solution to the ‘riddle’ above is actually quite simple. We just need to make a distinction between the group and the phase velocity of our complex-valued wave. The solution came to me when I was writing a little piece on Schrödinger’s equation. I noticed that we do not find that weird E = m∙v2 formula when substituting ψ for ψ = ei(kx − ωt) in Schrödinger’s equation, i.e. in:
Let me quickly go over the logic. To keep things simple, we’ll just assume one-dimensional space, so ∇2ψ = ∂2ψ/∂x2. The time derivative on the left-hand side is ∂ψ/∂t = −iω·ei(kx − ωt). The second-order derivative on the right-hand side is ∂2ψ/∂x2 = (ik)·(ik)·ei(kx − ωt) = −k2·ei(kx − ωt) . The ei(kx − ωt) factor on both sides cancels out and, hence, equating both sides gives us the following condition:
−iω = −(iħ/2m)·k2 ⇔ ω = (ħ/2m)·k2
Substituting ω = E/ħ and k = p/ħ yields:
E/ħ = (ħ/2m)·p2/ħ2 = m2·v2/(2m·ħ) = m·v2/(2ħ) ⇔ E = m·v2/2
In short: the E = m·v2/2 is the correct formula. It must be, because… Well… Because Schrödinger’s equation is a formula we surely shouldn’t doubt, right? So the only logical conclusion is that we must be doing something wrong when multiplying the two de Broglie equations. To be precise: our v = f·λ equation must be wrong. Why? Well… It’s just something one shouldn’t apply to our complex-valued wavefunction. The ‘correct’ velocity formula for the complex-valued wavefunction should have that 1/2 factor, so we’d write 2·f·λ = v to make things come out alright. But where would this formula come from? The period of cosθ + isinθ is the period of the sine and cosine function: cos(θ+2π) + isin(θ+2π) = cosθ + isinθ, so T = 2π and f = 1/T = 1/2π do not change.
But so that’s a mathematical point of view. From a physical point of view, it’s clear we got two oscillations for the price of one: one ‘real’ and one ‘imaginary’—but both are equally essential and, hence, equally ‘real’. So the answer must lie in the distinction between the group and the phase velocity when we’re combining waves. Indeed, the group velocity of a sum of waves is equal to vg = dω/dk. In this case, we have:
vg = d[E/ħ]/d[p/ħ] = dE/dp
We can now use the kinetic energy formula to write E as E = m·v2/2 = p·v/2. Now, v and p are related through m (p = m·v, so v = p/m). So we should write this as E = m·v2/2 = p2/(2m). Substituting E and p = m·v in the equation above then gives us the following:
dω/dk = d[p2/(2m)]/dp = 2p/(2m) = vg = v
However, for the phase velocity, we can just use the vp = ω/k formula, which gives us that 1/2 factor:
vp = ω/k = (E/ħ)/(p/ħ) = E/p = (m·v2/2)/(m·v) = v/2
Bingo! Riddle solved! 🙂 Isn’t it nice that our formula for the group velocity also applies to our complex-valued wavefunction? I think that’s amazing, really! But I’ll let you think about it. 🙂
Ammonia, i.e. NH3, is a colorless gas with a strong smell. Its serves as a precursor in the production of fertilizer, but we also know it as a cleaning product, ammonium hydroxide, which is NH3 dissolved in water. It has a lot of other uses too. For example, its use in this post, is to illustrate a two-state system. 🙂 We’ll apply everything we learned in our previous posts and, as I mentioned when finishing the last of those rather mathematical pieces, I think the example really feels like a reward after all of the tough work on all of those abstract concepts – like that Hamiltonian matrix indeed – so I hope you enjoy it. So… Here we go!
The geometry of the NH3 molecule can be described by thinking of it as a trigonal pyramid, with the nitrogen atom (N) at its apex, and the three hydrogen atoms (H) at the base, as illustrated below. [Feynman’s illustration is slightly misleading, though, because it may give the impression that the hydrogen atoms are bonded together somehow. That’s not the case: the hydrogen atoms share their electron with the nitrogen, thereby completing the outer shell of both atoms. This is referred to as a covalent bond. You may want to look it up, but it is of no particular relevance to what follows here.]
Here, we will only worry about the spin of the molecule about its axis of symmetry, as shown above, which is either in one direction or in the other, obviously. So we’ll discuss the molecule as a two-state system. So we don’t care about its translational (i.e. linear) momentum, its internal vibrations, or whatever else that might be going on. It is one of those situations illustrating that the spin vector, i.e. the vector representing angular momentum, is an axial vector: the first state, which is denoted by | 1 〉 is not the mirror image of state | 2 〉. In fact, there is a more sophisticated version of the illustration above, which usefully reminds us of the physics involved.
It should be noted, however, that we don’t need to specify what the energy barrier really consists of: moving the center of mass obviously requires some energy, but it is likely that a ‘flip’ also involves overcoming some electrostatic forces, as shown by the reversal of the electric dipole moment in the illustration above. In fact, the illustration may confuse you, because we’re usually thinking about some net electric charge that’s spinning, and so the angular momentum results in a magnetic dipole moment, that’s either ‘up’ or ‘down’, and it’s usually also denoted by the very same μ symbol that’s used below. As I explained in my post on angular momentum and the magnetic moment, it’s related to the angular momentum J through the so-called g-number. In the illustration above, however, the μ symbol is used to denote an electric dipole moment, so that’s different. Don’t rack your brain over it: just accept there’s an energy barrier, and it requires energy to get through it. Don’t worry about its details!
Indeed, in quantum mechanics, we abstract away from such nitty-gritty, and so we just say that we have base states | i 〉 here, with i equal to 1 or 2. One or the other. Now, in our post on quantum math, we introduced what Feynman only half-jokingly refers to as the Great Law of Quantum Physics: | = ∑ | i 〉〈 i | over all base states i. It basically means that we should always describe our initial and end states in terms of base states. Applying that principle to the state of our ammonia molecule, which we’ll denote by | ψ 〉, we can write:
You may – in fact, you should – mechanically apply that | = ∑ | i 〉〈 i | substitution to | ψ 〉 to get what you get here, but you should also think about what you’re writing. It’s not an easy thing to interpret, but it may help you to think of the similarity of the formula above with the description of a vector in terms of its base vectors, which we write as A = Ax·e1 + Ay·e2 + Az·e3. Just substitute the Ai coefficients for Ci and the ei base vectors for the | i 〉 base states, and you may understand this formula somewhat better. It also explains why the | ψ 〉 state is often referred to as the | ψ 〉 state vector: unlike our A = ∑ Ai·ei sum of base vectors, our | 1 〉 C1 + | 2 〉 C2 sum does not have any geometrical interpretation but… Well… Not all ‘vectors’ in math have a geometric interpretation, and so this is a case in point.
It may also help you to think of the time-dependency. Indeed, this formula makes a lot more sense when realizing that the state of our ammonia molecule, and those coefficients Ci, depend on time, so we write: ψ = ψ(t) and Ci = Ci(t). Hence, if we would know, for sure, that our molecule is always in state | 1 〉, then C1 = 1 and C2 = 0, and we’d write: | ψ 〉 = | 1 〉 = | 1 〉 1 + | 2 〉 0. [I am always tempted to insert a little dot (·), and change the order of the factors, so as to show we’re talking some kind of product indeed – so I am tempted to write | ψ 〉 = C1·| 1 〉 C1 + C2·| 2 〉 C2, but I note that’s not done conventionally, so I won’t do it either.]
Why this time dependency? It’s because we’ll allow for the possibility of the nitrogen to push its way through the pyramid – through the three hydrogens, really – and flip to the other side. It’s unlikely, because it requires a lot of energy to get half-way through (we’ve got what we referred to as an energy barrier here), but it may happen and, as we’ll see shortly, it results in us having to think of the the ammonia molecule as having two separate energy levels, rather than just one. We’ll denote those energy levels as E0 ± A. However, I am getting ahead of myself here, so let me get back to the main story.
To fully understand the story, you should really read my previous post on the Hamiltonian, which explains how those Ci coefficients, as a function of time, can be determined. They’re determined by a set of differential equations (i.e. equations involving a function and the derivative of that function) which we wrote as:
If we have two base states only – which is the case here – then this set of equations is:
Two equations and two functions – C1 = C1(t) and C2 = C2(t) – so we should be able to solve this thing, right? Well… No. We don’t know those Hij coefficients. As I explained in my previous post, they also evolve in time, so we should write them as Hij(t) instead of Hij tout court, and so it messes the whole thing up. We have two equations and six functions really. There is no way we can solve this! So how do we get out of this mess?
Well… By trial and error, I guess. 🙂 Let us just assume the molecule would behave nicely—which we know it doesn’t, but so let’s push the ‘classical’ analysis as far as we can, so we might get some clues as to how to solve this problem. In fact, our analysis isn’t ‘classical’ at all, because we’re still talking amplitudes here! However, you’ll agree the ‘simple’ solution would be that our ammonia molecule doesn’t ‘tunnel’. It just stays in the same spin direction forever. Then H12 and H21 must be zero (think of the U12(t + Δt, t) and U21(t + Δt, t) functions) and H11 and H22 are equal to… Well… I’d love to say they’re equal to 1 but… Well… You should go through my previous posts: these Hamiltonian coefficients are related to probabilities but… Well… Same-same but different, as they say in Asia. 🙂 They’re amplitudes, which are things you use to calculate probabilities. But calculating probabilities involve normalization and other stuff, like allowing for interference of amplitudes, and so… Well… To make a long story short, if our ammonia molecule would stay in the same spin direction forever, then H11 and H22 are not one but some constant. In any case, the point is that they would not change in time (so H11(t) = H11 and H22(t ) = H22), and, therefore, our two equations would reduce to:
So the coefficients are now proper coefficients, in the sense that they’ve got some definite value, and so we have two equations and two functions only now, and so we can solve this. Indeed, remembering all of the stuff we wrote on the magic of exponential functions (more in particular, remembering that d[ex]/dx), we can understand the proposed solution:
As Feynman notes: “These are just the amplitudes for stationary states with the energies E1 = H11 and E2 = H22.” Now let’s think about that. Indeed, I find the term ‘stationary’ state quite confusing, as it’s ill-defined. In this context, it basically means that we have a wavefunction that is determined by (i) a definite (i.e. unambiguous, or precise) energy level and (ii) that there is no spatial variation. Let me refer you to my post on the basics of quantum math here. We often use a sort of ‘Platonic’ example of the wavefunction indeed:
a·e−i·θ = e−i·(ω·t − k ∙x) = a·e−(i/ħ)·(E·t − p∙x)
So that’s a wavefunction assuming the particle we’re looking at has some well-defined energy E and some equally well-defined momentum p. Now, that’s kind of ‘Platonic’ indeed, because it’s more like an idea, rather than something real. Indeed, a wavefunction like that means that the particle is everywhere and nowhere, really—because its wavefunction is spread out all of over space. Of course, we may think of the ‘space’ as some kind of confined space, like a box, and then we can think of this particle as being ‘somewhere’ in that box, and then we look at the temporal variation of this function only – which is what we’re doing now: we don’t consider the space variable x at all. So then the equation reduces to a·e–(i/ħ)·(E·t), and so… Well… Yes. We do find that our Hamiltonian coefficient Hii is like the energy of the | i 〉 state of our NH3 molecule, so we write: H11 = E1, and H22 = E2, and the ‘wavefunctions’ of our C1 and C2 coefficients can be written as:
- C1 = a·e−(i/ħ)·(H11·t) = a·e−(i/ħ)·(E1·t), with H11 = E1, and
- C2 = a·e−(i/ħ)·(H22·t) = a·e−(i/ħ)·(E2·t), with H22 = E2.
But can we interpret C1 and C2 as proper amplitudes? They are just coefficients in these equations, aren’t they? Well… Yes and no. From what we wrote in previous posts, you should remember that these Ci coefficients are equal to 〈 i | ψ 〉, so they are the amplitude to find our ammonia molecule in one state or the other.
Back to Feynman now. He adds, logically but brilliantly:
“We note, however, that for the ammonia molecule the two states |1〉 and |2〉 have a definite symmetry. If nature is at all reasonable, the matrix elements H11 and H22 must be equal. We’ll call them both E0, because they correspond to the energy the states would have if H11 and H22 were zero.”
So our C1 and C2 amplitudes then reduce to:
- C1 = 〈 1 | ψ 〉 = a·e−(i/ħ)·(E0·t)
- C2 =〈 2 | ψ 〉 = a·e−(i/ħ)·(E0·t)
We can now take the absolute square of both to find the probability for the molecule to be in state 1 or in state 2:
- |〈 1 | ψ 〉|2 = |a·e−(i/ħ)·(E0·t)|2 = a2
- |〈 2 | ψ 〉|2 = |a·e−(i/ħ)·(E0·t)|2 = a2
Now, the probabilities have to add up to 1, so a2 + a2 = 1 and, therefore, the probability to be in either in state 1 or state 2 is 0.5, which is what we’d expect.
Note: At this point, it is probably good to get back to our | ψ 〉 = | 1 〉 C1 + | 2 〉 C2 equation, so as to try to understand what it really says. Substituting the a·e−(i/ħ)·(E0·t) expression for C1 and C2 yields:
| ψ 〉 = | 1 〉 a·e−(i/ħ)·(E0·t) + | 2 〉 a·e−(i/ħ)·(E0·t) = [| 1 〉 + | 2 〉] a·e−(i/ħ)·(E0·t)
Now, what is this saying, really? In our previous post, we explained this is an ‘open’ equation, so it actually doesn’t mean all that much: we need to ‘close’ or ‘complete’ it by adding a ‘bra’, i.e. a state like 〈 χ |, so we get a 〈 χ | ψ〉 type of amplitude that we can actually do something with. Now, in this case, our final 〈 χ | state is either 〈 1 | or 〈 2 |, so we write:
- 〈 1 | ψ 〉 = [〈 1 | 1 〉 + 〈 1 | 2 〉]·a·e−(i/ħ)·(E0·t) = [1 + 0]·a·e−(i/ħ)·(E0·t)· = a·e−(i/ħ)·(E0·t)
- 〈 2 | ψ 〉 = [〈 2 | 1 〉 + 〈 2 | 2 〉]·a·e−(i/ħ)·(E0·t) = [0 + 1]·a·e−(i/ħ)·(E0·t)· = a·e−(i/ħ)·(E0·t)
Note that I finally added the multiplication dot (·) because we’re talking proper amplitudes now and, therefore, we’ve got a proper product too: we multiply one complex number with another. We can now take the absolute square of both to find the probability for the molecule to be in state 1 or in state 2:
- |〈 1 | ψ 〉|2 = |a·e−(i/ħ)·(E0·t)|2 = a2
- |〈 2 | ψ 〉|2 = |a·e−(i/ħ)·(E0·t)|2 = a2
Unsurprisingly, we find the same thing: these probabilities have to add up to 1, so a2 + a2 = 1 and, therefore, the probability to be in state 1 or state 2 is 0.5. So the notation and the logic behind makes perfect sense. But let me get back to the lesson now.
The point is: the true meaning of a ‘stationary’ state here, is that we have non-fluctuating probabilities. So they are and remain equal to some constant, i.e. 1/2 in this case. This implies that the state of the molecule does not change: there is no way to go from state 1 to state 2 and vice versa. Indeed, if we know the molecule is in state 1, it will stay in that state. [Think about what normalization of probabilities means when we’re looking at one state only.]
You should note that these non-varying probabilities are related to the fact that the amplitudes have a non-varying magnitude. The phase of these amplitudes varies in time, of course, but their magnitude is and remains a, always. The amplitude is not being ‘enveloped’ by another curve, so to speak.
OK. That should be clear enough. Sorry I spent so much time on this, but this stuff on ‘stationary’ states comes back again and again and so I just wanted to clear that up as much as I can. Let’s get back to the story.
So we know that, what we’re describing above, is not what ammonia does really. As Feynman puts it: “The equations [i.e. the C1 and C2 equations above] don’t tell us what what ammonia really does. It turns out that it is possible for the nitrogen to push its way through the three hydrogens and flip to the other side. It is quite difficult; to get half-way through requires a lot of energy. How can it get through if it hasn’t got enough energy? There is some amplitude that it will penetrate the energy barrier. It is possible in quantum mechanics to sneak quickly across a region which is illegal energetically. There is, therefore, some [small] amplitude that a molecule which starts in |1〉 will get to the state |2〉. The coefficients H12 and H21 are not really zero.”
He adds: “Again, by symmetry, they should both be the same—at least in magnitude. In fact, we already know that, in general, Hij must be equal to the complex conjugate of Hji.”
His next step, then, is to interpreted as either a stroke of genius or, else, as unexplained. 🙂 He invokes the symmetry of the situation to boldly state that H12 is some real negative number, which he denotes as −A, which – because it’s a real number (so the imaginary part is zero) – must be equal to its complex conjugate H21. So then Feynman does this fantastic jump in logic. First, he keeps using the E0 value for H11 and H22, motivating that as follows: “If nature is at all reasonable, the matrix elements H11 and H22 must be equal, and we’ll call them both E0, because they correspond to the energy the states would have if H11 and H22 were zero.” Second, he uses that minus A value for H12 and H21. In short, the two equations and six functions are now reduced to:
Solving these equations is rather boring. Feynman does it as follows:
Now, what does these equations actually mean? It depends on those a and b coefficients. Looking at the solutions, the most obvious question to ask is: what if a or b are zero? If b is zero, then the second terms in both equations is zero, and so C1 and C2 are exactly the same: two amplitudes with the same temporal frequency ω = (E0 − A)/ħ. If a is zero, then C1 and C2 are the same too, but with opposite sign: two amplitudes with the same temporal frequency ω = (E0 + A)/ħ. Squaring them – in both cases (i.e. for a = 0 or b = 0) – yields, once again, an equal and constant probability for the spin of the ammonia molecule to in the ‘up’ or ‘down’ or ‘down’. To be precise, we We can now take the absolute square of both to find the probability for the molecule to be in state 1 or in state 2:
- For b = 0: |〈 1 | ψ 〉|2 = |(a/2)·e−(i/ħ)·(E0 − A)·t|2 = a2/4 = |〈 2 | ψ 〉|2
- For a = 0: |〈 1 | ψ 〉|2 =|(b/2)·e−(i/ħ)·(E0 + A)·t|2 = b2/4 = |〈 2 | ψ 〉|2 (the minus sign in front of b/2 is squared away)
So we get two stationary states now. Why two instead of one? Well… You need to use your imagination a bit here. They actually reflect each other: they’re the same as the one stationary state we found when assuming our nitrogen atom could not ‘flip’ from one position to the other. It’s just that the introduction of that possibility now results in a sort of ‘doublet’ of energy levels. But so we shouldn’t waste our time on this, as we want to analyze the general case, for which the probabilities to be in state 1 or state 2 do vary in time. So that’s when a and b are non-zero.
To analyze it all, we may want to start with equating t to zero. We then get:
This leads us to conclude that a = b = 1, so our equations for C1(t) and C2(t) can now be written as:
Remembering our rules for adding and subtracting complex conjugates (eiθ + e–iθ = 2cosθ and eiθ − e–iθ = 2sinθ), we can re-write this as:
Now these amplitudes are much more interesting. Their temporal variation is defined by E0 but, on top of that, we have an envelope here: the cos(A·t/ħ) and sin(A·t/ħ) factor respectively. So their magnitude is no longer time-independent: both the phase as well as the amplitude now vary with time. What’s going on here becomes quite obvious when calculating and plotting the associated probabilities, which are
- |C1(t)|2 = cos2(A·t/ħ), and
- |C2(t)|2 = sin2(A·t/ħ)
respectively (note that the absolute square of i is equal to 1, not −1). The graph of these functions is depicted below.
As Feynman puts it: “The probability sloshes back and forth.” Indeed, the way to think about this is that, if our ammonia molecule is in state 1, then it will not stay in that state. In fact, one can be sure the nitrogen atom is going to flip at some point in time, with the probabilities being defined by that fluctuating probability density function above. Indeed, as time goes by, the probability to be in state 2 increases, until it will effectively be in state 2. And then the cycle reverses.
Our | ψ 〉 = | 1 〉 C1 + | 2 〉 C2 equation is a lot more interesting now, as we do have a proper mix of pure states now: we never really know in what state our molecule will be, as we have these ‘oscillating’ probabilities now, which we should interpret carefully.
The point to note is that the a = 0 and b = 0 solutions came with precise temporal frequencies: (E0 − A)/ħ and (E0 + A)/ħ respectively, which correspond to two separate energy levels: E0 − A and E0 + A respectively, with |A| = H12 = H21. So everything is related to everything once again: allowing the nitrogen atom to push its way through the three hydrogens, so as to flip to the other side, thereby breaking the energy barrier, is equivalent to associating two energy levels to the ammonia molecule as a whole, thereby introducing some uncertainty, or indefiniteness as to its energy, and that, in turn, gives us the amplitudes and probabilities that we’ve just calculated.
Note that the probabilities “sloshing back and forth”, or “dumping into each other” – as Feynman puts it – is the result of the varying magnitudes of our amplitudes, going up and down and, therefore, their absolute square varies too.
So… Well… That’s it as an introduction to a two-state system. There’s more to come. Ammonia is used in the ammonia maser. Now that is something that’s interesting to analyze—both from a classical as well as from a quantum-mechanical perspective. Feynman devotes a full chapter to it, so I’d say… Well… Have a look. 🙂
Post scriptum: I must assume this analysis of the NH3 molecule, with the nitrogen ‘flipping’ across the hydrogens, triggers a lot of questions, so let me try to answer some. Let me first insert the illustration once more, so you don’t have to scroll up:
The first thing that you should note is that the ‘flip’ involves a change in the center of mass position. So that requires energy, which is why we associate two different energy levels with the molecule: E0 + A and E0 − A. However, as mentioned above, we don’t care about the nitty-gritty here: the energy barrier is likely to combine a number of factors, including electrostatic forces, as evidenced by the flip in the electric dipole moment, which is what the μ symbol here represents! Just note that the two energy levels are separated by an amount that’s equal to 2·A, rather than A and that, once again, it becomes obvious now why Feynman would prefer the Hamiltonian to be called the ‘energy matrix’, as its coefficients do represent specific energy levels, or differences between them! Now, that assumption yielded the following wavefunctions for C1 = 〈 1 | ψ 〉 and C1 = 〈 2 | ψ 〉:
- C1 = 〈 1 | ψ 〉 = (1/2)·e−(i/ħ)·(E0 − A)·t + (1/2)·e−(i/ħ)·(E0 + A)·t
- C2 = 〈 2 | ψ 〉 = (1/2)·e−(i/ħ)·(E0 − A)·t – (1/2)·e−(i/ħ)·(E0 + A)·t
Both are composite waves. To be precise, they are the sum of two component waves with a temporal frequency equal to ω1 = (E0 − A)/ħ and ω1 = (E0 + A)/ħ respectively. [As for the minus sign in front of the second term in the wave equation for C2, −1 = e±iπ, so + (1/2)·e−(i/ħ)·(E0 + A)·t and – (1/2)·e−(i/ħ)·(E0 + A)·t are the same wavefunction: they only differ because their relative phase is shifted by ±π.]
Now, writing things this way, rather than in terms of probabilities, makes it clear that the two base states of the molecule themselves are associated with two different energy levels, so it is not like one state has more energy than the other. It’s just that the possibility of going from one state to the other requires an uncertainty about the energy, which is reflected by the energy doublet E0 ± A in the wavefunction of the base states. Now, if the wavefunction of the base states incorporates that energy doublet, then it is obvious that the state of the ammonia molecule, at any point in time, will also incorporate that energy doublet.
This triggers the following remark: what’s the uncertainty really? Is it an uncertainty in the energy, or is it an uncertainty in the wavefunction? I mean: we have a function relating the energy to a frequency. Introducing some uncertainty about the energy is mathematically equivalent to introducing uncertainty about the frequency. Think of it: two energy levels implies two frequencies, and vice versa. More in general, introducing n energy levels, or some continuous range of energy levels ΔE, amounts to saying that our wave function doesn’t have a specific frequency: it now has n frequencies, or a range of frequencies Δω = ΔE/ħ. Of course, the answer is: the uncertainty is in both, so it’s in the frequency and in the energy and both are related through the wavefunction. So… In a way, we’re chasing our own tail.
Having said that, the energy may be uncertain, but it is real. It’s there, as evidenced by the fact that the ammonia molecule behaves like an atomic oscillator: we can excite it in exactly the same way as we can excite an electron inside an atom, i.e. by shining light on it. The only difference is the photon energies: to cause a transition in an atom, we use photons in the optical or ultraviolet range, and they give us the same radiation back. To cause a transition in an ammonia molecule, we only need photons with energies in the microwave range. Here, I should quickly remind you of the frequencies and energies involved. visible light is radiation in the 400–800 terahertz range and, using the E = h·f equation, we can calculate the associated energies of a photon as 1.6 to 3.2 eV. Microwave radiation – as produced in your microwave oven – is typically in the range of 1 to 2.5 gigahertz, and the associated photon energy is 4 to 10 millionths of an eV. Having illustrated the difference in terms of the energies involved, I should add that masers and lasers are based on the same physical principle: LASER and MASER stand for Light/Micro-wave Amplification by Stimulated Emission of Radiation, respectively.
So… How shall I phrase this? There’s uncertainty, but the way we are modeling that uncertainty matters. So yes, the uncertainty in the frequency of our wavefunction and the uncertainty in the energy are mathematically equivalent, but the wavefunction has a meaning that goes much beyond that. [You may want to reflect on that yourself.]
Finally, another question you may have is why would Feynman take minus A (i.e. −A) for H12 and H21. Frankly, my first thought on this was that it should have something to do with the original equation for these Hamiltonian coefficients, which also has a minus sign: Uij(t + Δt, t) = δij + Kij(t)·Δt = δij − (i/ħ)·Hij(t)·Δt. For i ≠ j, this reduces to:
Uij(t + Δt, t) = + Kij(t)·Δt = − (i/ħ)·Hij(t)·Δt
However, the answer is: it really doesn’t matter. One could write: H12 and H21 = +A, and we’d find the same equations. We’d just switch the indices 1 and 2, and the coefficients a and b. But we get the same solutions. You can figure that out yourself. Have fun with it !
Oh ! And please do let me know if some of the stuff above would trigger other questions. I am not sure if I’ll be able to answer them, but I’ll surely try, and good question always help to ensure we sort of ‘get’ this stuff in a more intuitive way. Indeed, when everything is said and done, the goal of this blog is not simply re-produce stuff, but to truly ‘get’ it, as good as we can. 🙂
Waves are peculiar: there is one single waveform, i.e. one motion only, but that motion can always be analyzed as the sum of the motions of all the different wave modes, combined with the appropriate amplitudes and phases. Saying the same thing using different words: we can always analyze the wave function as the sum of a (possibly infinite) number of components, i.e. a so-called Fourier series:
The f(t) function can be any wave, but the simple examples in physics textbooks usually involve a string or, in two dimensions, some vibrating membrane, and I’ll stick to those examples too in this post. Feynman calls the Fourier components harmonic functions, or harmonics tout court, but the term ‘harmonic’ refers to so many different things in math that it may be better not to use it in this context. The component waves are sinusoidal functions, so sinusoidals might be a better term but it’s not in use, because a more general analysis will use complex exponentials, rather than sines and/or cosines. Complex exponentials (e.g. 10ix) are periodic functions too, so they are totally unlike real exponential functions (e.g. (e.g. 10x). Hence, Feynman also uses the term ‘exponentials’. At some point, he also writes that the pattern of motion (of a mode) varies ‘exponentially’ but, of course, he’s thinking of complex exponentials, and, therefore, we should substitute ‘exponentially’ for ‘sinusoidally’ when talking real-valued wave functions.
[…] I know. I am already getting into the weeds here. As I am a bit off-track anyway now, let me make another remark here. You may think that we have two types of sinusoidals, or two types of functions, in that Fourier decomposition: sines and cosines. You should not think of it that way: the sine and cosine function are essentially the same. I know your old math teacher in high school never told you that, but it’s true. They both come with the same circle (yes, I know that’s ridiculous statement but I don’t know how to phrase it otherwise): the difference between a sine and a cosines is just a phase shift: cos(ωt) = sin(ωt + π/2) and, conversely, sin(ωt) = cos(ωt − π/2). If the starting phases of all of the component waves would be the same, we’d have a Fourier decomposition involving cosines only, or sines only—whatever you prefer. Indeed, because they’re the same function except for that phase shift (π/2), we can always go from one to the other by shifting our origin of space (x) and/or time (t). However, we cannot assume that all of the component waves have the same starting phase and, therefore, we should write each component as cos(n·ωt + Φn), or a sine with a similar argument. Now, you’ll remember – because your math teacher in high school told you that at least 🙂 – that there’s a formula for the cosine (and sine) of the sum of two angles: we can write cos(n·ωt + Φn) as cos(n·ωt + Φn) = [cos(Φn)·cos(n·ωt) – sin(Φn)·sin(n·ωt)]. Substituting cos(Φn) and – sin(Φn) for an and bn respectively gives us the an·cos(n·ωt) + bn·sin(n·ωt) expressions above. In addition, the component waves may not only differ in phase, but also in amplitude, and, hence, the an and bn coefficients do more than only capturing the phase differences. But let me get back on the track. 🙂
Those sinusoidals have a weird existence: they are not there, physically—or so it seems. Indeed, there is one waveform only, i.e. one motion only—and, if it’s any real wave, it’s most likely to be non-sinusoidal. At the same time, I noted, in my previous post, that, if you pluck a string or play a chord on your guitar, some string you did not pluck may still pick up one or more of its harmonics (i.e. one or more of its overtones) and, hence, start to vibrate too! It’s the resonance phenomenon. If you have a grand piano, it’s even more obvious: if you’d press the C4 key on a piano, a small hammer will strike the C4 string and it will vibrate—but the C5 string (one octave higher) will also vibrate, although nothing touched it—except for the air transmitting the sound wave (including the harmonics causing the resonance) from the C4 string, of course! So the component waves are there and, at the same time, they’re not. Whatever they are, they are more than mathematical forms: the so-called superposition principle (on which the Fourier analysis is based) is grounded in reality: it’s because we can add forces. I know that sounds extremely obvious – or ridiculous, you might say 🙂 – but it is actually not so obvious. […] I am tempted to write something about conservative forces here but… Well… I need to move on.
Let me show that diagram of the first seven harmonics of an ideal string once again. All of them, and the higher ones too, would be in our wave function. Hence, assuming there’s no phase difference between the harmonics, we’d write:
f(t) = sin(ωt) + sin(2ωt) + sin(3ωt) + … + sin(nωt) + …
The frequencies of the various modes of our ideal string are all simple multiples of the fundamental frequency ω, as evidenced from the argument in our sine functions (ω, 2ω, 3ω, etcetera). Conversely, the respective wavelengths are λ, λ/2, λ/3, etcetera. [Remember: the speed of the wave is fixed, and frequency and wavelength are inversely proportional: c = λ·f = λ/T = λ·(ω/2π).] So, yes, these frequencies and wavelengths can all be related to each other in terms of equally simple harmonic ratios: 1:2, 2:3, 3:5, 4:5 etcetera. I explained in my previous posts why that does not imply that the musical notes themselves are related in such way: the musical scale is logarithmic. So I won’t repeat myself. All of the above is just an introduction to the more serious stuff, which I’ll talk about now.
Modes in two dimensions
An analysis of waves in two dimensions is often done assuming some drum membrane. The Great Teacher played drums, as you can see from his picture in his Lectures, and there are also videos of him performing on YouTube. So that’s why the drum is used almost all textbooks now. 🙂
The illustration of one of the normal modes of a circular membrane comes from the Wikipedia article on modes. There are many other normal modes – some of them with a simpler shape, but some of them more complicated too – but this is a nice one as it also illustrates the concept of a nodal line, which is closely related to the concept of a mode. Huh? Yes. The modes of a one-dimensional string have nodes, i.e. points where the displacement is always zero. Indeed, as you can see from the illustration above (not below), the first overtone has one node, the second two, etcetera. So the equivalent of a node in two dimensions is a nodal line: for the mode shown below, we have one bisecting the disc and then another one—a circle about halfway between the edge and center. The third nodal line is the edge itself, obviously. [The author of the Wikipedia article nodes that the animation isn’t perfect, because the nodal line and the nodal circle halfway the edge and the center both move a little bit. In any case, it’s pretty good, I think. I should also learn how to make animations like that. :-)]
What’s a mode?
How do we find these modes? And how are they defined really? To explain that, I have to briefly return to the one-dimensional example. The key to solving the problem (i.e. finding the modes, and defining their characteristics) is the following fact: when a wave reaches the clamped end of a string, it will be reflected with a change in sign, as illustrated below: we’ve got that F(x+ct) wave coming in, and then it goes back indeed, but with the sign reversed.
It’s a complicated illustration because it also shows some hypothetical wave coming from the other side, where there is no string to vibrate. That hypothetical wave is the same wave, but travelling in the other direction and with the sign reversed (–F). So what’s that all about? Well… I never gave any general solution for a waveform traveling up and down a string: I just said the waveform was traveling up and down the string (now that is obvious: just look at that diagram with the seven first harmonics once again, and think about how that oscillation goes up and down with time), but so I did not really give any general solution for them (the sine and cosine functions are specific solutions). So what is the general solution?
Let’s first assume the string is not held anywhere, so that we have an infinite string along which waves can travel in either direction. In fact, the most general functional form to capture the fact that a waveform can travel in any direction is to write the displacement y as the sum of two functions: one wave traveling one way (which we’ll denote by F), and the other wave (which we’ll denote by G) traveling the other way. From the illustration above, it’s obvious that the F wave is traveling towards the negative x-direction and, hence, its argument will be x + ct. Conversely, the G wave travels in the positive x-direction, so its argument is x – ct. So we write:
y = F(x + ct) + G(x – ct)
[I’ve explained this thing about directions and why the argument in a wavefunction (x ± ct) is what it is before. You should look it up in case you don’t understand. As for the c in this equation, that’s the wave velocity once more, which is constant and which depends, as always, on the medium, so that’s the material and the diameter and the tension and whatever of the string.]
So… We know that the string is actually not infinite, but that it’s fixed to some ‘infinitely solid wall’ (as Feynman puts it). Hence, y is equal to zero there: y = 0. Now let’s choose the origin of our x-axis at the fixed end so as to simplify the analysis. Hence, where y is zero, x is also zero. Now, at x = 0, our general solution above for the infinite string becomes y = F(ct) + G(−ct) = 0, for all values of t. Of course, that means G(−ct) must be equal to –F(ct). Now, that equality is there for all values of t. So it’s there for all values of ct and −ct. In short, that equality is valid for whatever value of the argument of G and –F. As Feynman puts it: “G of anything must be –F of minus that same thing.” Now, the ‘anything’ in G is its argument: x – ct, so ‘minus that same thing’ is –(x – ct) = −x + ct. Therefore, our equation becomes:
y = F(x + ct) − F(−x + ct)
So that’s what’s depicted in the diagram above: the F(x + ct) wave ‘vanishes’ behind the wall as the − F(−x + ct) wave comes out of it. Conversely, the − F(−x + ct) is hypothetical indeed until it reaches the origin, after which it becomes the real wave. Their sum is only relevant near the origin x = 0, and on the positive side only (on the negative side of the x-axis, the F and G functions are both hypothetical). [I know, it’s not easy to follow, but textbooks are really short on this—which is why I am writing my blog: I want to help you ‘get’ it.]
Now, the results above are valid for any wave, periodic or not. Let’s now confine the analysis to periodic waves only. In fact, we’ll limit the analysis to sinusoidal wavefunctions only. So that should be easy. Yes. Too easy. I agree. 🙂
So let’s make things difficult again by introducing the complex exponential notation, so that’s Euler’s formula: eiθ = cosθ + isinθ, with i the imaginary unit, and isinθ the imaginary component of our wave. So the only thing that is real, is cosθ.
What the heck? Just bear with me. It’s good to make the analysis somewhat more general, especially because we’ll be talking about the relevance of all of this to quantum physics, and in quantum physics the waves are complex-valued indeed! So let’s get on with it. To use Euler’s formula, we need to substitute x + ct for the phase of the wave, so that involves the angular frequency and the wavenumber. Let me just write it down:
F(x + ct) = eiω(t+x/c) and F(−x + ct) = eiω(t−x/c)
Huh? Yeah. Sorry. I’ll resist the temptation to go off-track here, because I really shouldn’t be copying what I wrote in other posts. Most of what I write above is really those simple relations: c = λ·f = ω/k, with k, i.e. the wavenumber, being defined as k = 2π/λ. For details, go to one of my others posts indeed, in which I explain how that works in very much detail: just click on the link here, and scroll down to the section on the phase of a wave, in which I explain why the phase of wave is equal to θ = ωt–kx = ω(t–x/c). And, yes, I know: the thing with the wave directions and the signs is quite tricky. Just remember: for a wave traveling in the positive x-direction, the signs in front of x and t are each other’s opposite but, if the wave’s traveling in the negative y-direction, they are the same. As mentioned, all the rest is usually a matter of shifting the phase, which amounts to shifting the origin of either the x- or the t-axis. I need to move on. Using the exponential notation for our sinusoidal wave, y = F(x + ct) − F(−x + ct) becomes:
y = eiω(t+x/c) − eiω(t−x/c)
I can hear you sigh again: Now what’s that for? What can we do with this? Just continue to bear with me for a while longer. Let’s factor the eiωt term out. [Why? Patience, please!] So we write:
y = eiωt [eiωx/c) − e−iωx/c)]
Now, you can just use Euler’s formula again to double-check that eiθ − e−θ = 2isinθ. [To get that result, you should remember that cos(−θ) = cosθ, but sin(−θ) = −sin(θ).] So we get:
y = eiωt [eiωx/c) − e−iωx/c)] = 2ieiωtsin(ωx/c)
Now, we’re only interested in the real component of this amplitude of course – but that’s only we’re in the classical world here, not in the real world, which is quantum-mechanical and, hence, involves the imaginary stuff also 🙂 – so we should write this out using Euler’s formula again to convert the exponential to sinusoidals again. Hence, remembering that i2 = −1, we get:
y = 2ieiωtsin(ωx/c) = 2icos(ωt)·sin(ωx/c) – 2sin(ωt)·sin(ωx/c)
OK. You need a break. So let me pause here for a while. What the hell are we doing? Is this legit? I mean… We’re talking some real wave, here, don’t we? We do. So is this conversion from/to real amplitudes to/from complex amplitudes legit? It is. And, in this case (i.e. in classical physics), it’s true that we’re interested in the real component of y only. But then it’s nice the analysis is valid for complex amplitudes as well, because we’ll be talking complex amplitudes in quantum physics.
[…] OK. I acknowledge it all looks very tricky so let’s see what we’d get using our old-fashioned sine and/or cosine function. So let’s write F(x + ct) as cos(ωt+ωx/c) and F(−x + ct) as cos(ωt−ωx/c). So we write y = cos(ωt+ωx/c) − cos(ωt−ωx/c). Now work on this using the cos(α+β) = cosα·cosβ − sinα·sinβ formula and the cos(−α) = cosα and sin(−α) = −sinα identities. You (should) get: y = −2sin(ωt)·sin(ωx/c). So that’s the real component in our y function above indeed. So, yes, we do get the same results when doing this funny business using complex exponentials as we’d get when sticking to real stuff only! Fortunately! 🙂
[Why did I get off-track again? Well… It’s true these conversions from real to complex amplitudes should not be done carelessly. It is tricky and non-intuitive, to say the least. The weird thing about it is that, if we multiply two imaginary components, we get a real component, because i2 is a real number: it’s −1! So it’s fascinating indeed: we add an imaginary component to our real-valued function, do all kinds of manipulations with – including stuff that involves the use of the i2 = −1 – and, when done, we just take out the real component and it’s alright: we know that the result is OK because of the ‘magic’ of complex numbers! In any case, I need to move on so I can’t dwell on this. I also explained much of the ‘magic’ in other posts already, so I shouldn’t repeat myself. If you’re interested, click on this link, for instance.]
Let’s go back to our y = – 2sin(ωt)·sin(ωx/c) function. So that’s the oscillation. Just look at the equation and think about what it tells us. Suppose we fix x, so we’re looking at one point on the string only and only let t vary: then sin(ωx/c) is some constant and it’s our sin(ωt) factor that goes up and down. So our oscillation has frequency ω, at every point x, so that’s everywhere!
Of course, this result shouldn’t surprise us, should it? That’s what we put in when we wrote F as F(x + ct) = eiω(t+x/c) or as cos(ωt+ωx/c), isn’t it? Well… Yes and no. Yes, because you’re right: we put in that angular frequency. But then, no, because we’re talking a composite wave here: a wave traveling up and down, with the components traveling in opposite directions. Indeed, we’ve also got that G(x) = −F(–x) function here. So, no, it’s not quite the same.
Let’s fix t now, and take a snapshot of the whole wave, so now we look at x as the variable and sin(ωt) is some constant. What we see is a sine wave, and sin(ωt) is its maximum amplitude. Again, you’ll say: of course! Well… Yes. The thing is: the point where the amplitude of our oscillation is equal to zero, is always the same, regardless of t. So we have fixed nodes indeed. Where are they? The nodes are, obviously, the points where sin(ωx/c) = 0, so that’s when ωx/c is equal to 0, obviously, or – more importantly – whenever ωx/c is equal to π, 2π, 3π, 4π, etcetera. More, generally, we can say whenever ωx/c = n·π with n = 0, 1, 2,… etc. Now, that’s the same as writing x = n·π·c/ω = n·π/k = n·π·λ/2π = n·λ/2.
Now let’s remind ourselves of what λ really is: for the fundamental frequency it’s twice the length of the string, so λ = 2·L. For the next mode (i.e. the second harmonic), it’s the length itself: λ = L. For the third, it’s λ = (2/3)·L, etcetera. So, in general, it’s λ = (2/m)·L with m = 1, 2, etcetera. [We may or may not want to include a zero mode by allowing m to equal zero as well, so then there’s no oscillation and y = 0 everywhere. 🙂 But that’s a minor point.] In short, our grand result is:
x = n·λ/2 = n·(2/m)·L/2 = (n/m)·L
Of course, we have to exclude the x points lying outside of our string by imposing that n/m ≤ 1, i.e. the condition that n ≤ m. So for m = 1, n is 0 or 1, so the nodes are, effectively, both ends of the string. For m = 2, n can be 0, 1 and 2, so the nodes are the ends of the string and it’s middle point L/2. And so on and so on.
I know that, by now, you’ve given up. So no one is reading anymore and so I am basically talking to myself now. What’s the point? Well… I wanted to get here in order to define the concept of a mode: a mode is a pattern of motion, which has the property that, at any point, the object moves perfectly sinusoidally, and that all points move at the same frequency (though some will move more than others). Modes also have nodes, i.e. points that don’t move at all, and above I showed how we can find the nodes of the modes of a one-dimensional string.
Also note how remarkable that result actually is: we didn’t specify anything about that string, so we don’t care about its material or diameter or tension or whatever. Still, we know its fundamental (or normal modes), and we know their nodes: they’re a function of the length of the string, and the number of the mode only: x = (n/m)·L. While an oscillating string may seem to be the most simple thing on earth, it isn’t: think of all the forces between the molecules, for instance, as that string is vibrating. Still, we’ve got this remarkably simple formula. Don’t you find that amazing?
[…] OK… If you’re still reading, I know you want me to move on, so I’ll just do that.
Back to two dimensions
The modes are all that matters: when linear forces (i.e. linear systems) are involved, any motion can be analyzed as the sum of the motions of all the different modes, combined with appropriate amplitudes and phases. Let me reproduce the Fourier series once more (the more you see, the better you’ll understand it—I should hope!): Of course, we should generalize this also include x as a variable which, again, is easier if we’d use complex exponentials instead of the sinusoidal components. The nice illustration on Fourier analysis from Wikipedia shows how it works, in essence, that is. The red function below consists of six of those modes.
OK. Enough of this. Let’s go to the two-dimensional case now. To simplify the analysis, Feynman invented a rectangular drum. A rectangular drum is probably more difficult to play, but it’s easier to analyze—as compared to a circular drum, that is! 🙂
In two dimensions, our sinusoidal one-dimensional ei(ωt−kx) waveform becomes ei(ωt−kxx−kyy). So we have a wavenumber for the x and y directions, and the sign in front is determined by the direction of the wave, so we need to check whether it moves in the positive or negative direction of the x- and y-axis respectively. Now, we can rewrite ei(ωt+kxx+kyy) as eiωt·ei(ωt+kxx+kyy), of course, which is what you see in the diagram above, except that the wave is moving in the negative y direction and, hence, we’ve got + sign in front of our kyy term. All the rest is rather well explained in Feynman, so I’ll refer you to the textbook here.
We basically need to ensure that we have a nodal line at x = 0 and at x = a, and then we do the same for y = 0 and y = a. Then we apply exactly the same logic as for the one-dimensional string: the wave needs to be coherently reflected. The analysis is somewhat more complicated because it involves some angle of incidence now, i.e. the θ in the diagram above, so that’s another page in Feynman’s textbook. And then we have the same gymnastics for finding wavelengths in terms of the dimensions a and b, as well as in terms of n and m, where n is the number of the mode involved when fixing the nodal lines at x = 0 and x = a, and m is the number of the mode involved when fixing the nodal lines at y = 0 and y = b. Sounds difficult? Well… Yes. But I won’t copy Feynman here. Just go and check for yourself.
The grand result is that we do get some formula for a wavelength λ of what satisfies the definition of a mode: a perfectly sinusoidal motion, that has all points on the drum move at the same frequency, though some move more than others. Also, as evidenced from my illustration for the circular disk: we’ve got nodal lines, and then I mean other nodal lines, different from the edges! I’ll just give you that formula here (again, for the detail, go and check Feynman yourself):
Feynman also works out an example for a = 2b. I’ll just copy the results hereunder, which is a formula for the (angular) frequencies ω, and a table of the mode shapes in a qualitative way (I’ll leave it to you to google animations that match the illustration).
Again, we should note the amazing simplicity of the result: we don’t care about the type of membrane or whatever other material the drum is made of. It’s proportions are all that matters.
Finally, you should also note the last two columns in the table above: these just show to illustrate that, unlike our modes in the one-dimensional case, the natural frequencies here are not multiples of the fundamental frequency. As Feynman notes, we should not be led astray by the example of the one-dimensional ideal string. It’s again a departure from the Pythagorean idea, that all in Nature respects harmonic ratios. It’s just not true. Let me quote Feynman, as I have no better summary: “The idea that the natural frequencies are harmonically related is not generally true. It is not true for a system with more than one dimension, nor is it true for one-dimensional systems which are more complicated than a string with uniform density and tension.“
So… That says it all, I’d guess. Maybe I should just quote his example of a one-dimensional system that does not obey Pythagoras’ prescription: a hanging chain which, because of the weight of the chain, has higher tension at the top than at the bottom. If such chain is set in oscillation, there are various modes and frequencies, but the frequencies will not be simply multiples of each other, nor of any other number. It is also interesting to note that the mode shapes will also not be sinusoidal. However, here we’re getting into non-linear dynamics, and so I’ll you read about that elsewhere too: once again, Feynman’s analysis of non-linear systems is very accessible and an interesting read. Hence, I warmly recommend it.
Modes in three dimensions and in quantum mechanics.
Well… Unlike what you might expect, I won’t bury you under formulas this time. Let me refer you, instead, to Wikipedia’s article on the so-called Leidenfrost effect. Just do it. Don’t bother too much about the text, scroll down a bit, and play the video that comes with it. I saw it, sort of by accident, and, at first, I thought it was something very high-tech. But no: it’s just a drop of water skittering around in a hot pan. It takes on all kinds of weird forms and oscillates in the weirdest of ways, but all is nothing but an excitation of the various normal modes of it, with various amplitudes and phases, of course, as a Fourier analysis of the phenomenon dictates.
There’s plenty of other stuff around to satisfy your curiosity, all quite understandable and fun—because you now understand the basics of it for the one- and two-dimensional case.
So… Well… I’ve kept this section extremely short, because now I want to say a few words about quantum-mechanical systems. Well… In fact, I’ll simply quote Feynman on it, because he writes about in a style that’s unsurpassed. He also nicely sums up the previous conversation. Here we go:
The ideas discussed above are all aspects of what is probably the most general and wonderful principle of mathematical physics. If we have a linear system whose character is independent of the time, then the motion does not have to have any particular simplicity, and in fact may be exceedingly complex, but there are very special motions, usually a series of special motions, in which the whole pattern of motion varies exponentially with the time. For the vibrating systems that we are talking about now, the exponential is imaginary, and instead of saying “exponentially” we might prefer to say “sinusoidally” with time. However, one can be more general and say that the motions will vary exponentially with the time in very special modes, with very special shapes. The most general motion of the system can always be represented as a superposition of motions involving each of the different exponentials.
This is worth stating again for the case of sinusoidal motion: a linear system need not be moving in a purely sinusoidal motion, i.e., at a definite single frequency, but no matter how it does move, this motion can be represented as a superposition of pure sinusoidal motions. The frequency of each of these motions is a characteristic of the system, and the pattern or waveform of each motion is also a characteristic of the system. The general motion in any such system can be characterized by giving the strength and the phase of each of these modes, and adding them all together. Another way of saying this is that any linear vibrating system is equivalent to a set of independent harmonic oscillators, with the natural frequencies corresponding to the modes.
In quantum mechanics the vibrating object, or the thing that varies in space, is the amplitude of a probability function that gives the probability of finding an electron, or system of electrons, in a given configuration. This amplitude function can vary in space and time, and satisfies, in fact, a linear equation. But in quantum mechanics there is a transformation, in that what we call frequency of the probability amplitude is equal, in the classical idea, to energy. Therefore we can translate the principle stated above to this case by taking the word frequency and replacing it with energy. It becomes something like this: a quantum-mechanical system, for example an atom, need not have a definite energy, just as a simple mechanical system does not have to have a definite frequency; but no matter how the system behaves, its behavior can always be represented as a superposition of states of definite energy. The energy of each state is a characteristic of the atom, and so is the pattern of amplitude which determines the probability of finding particles in different places. The general motion can be described by giving the amplitude of each of these different energy states. This is the origin of energy levels in quantum mechanics. Since quantum mechanics is represented by waves, in the circumstance in which the electron does not have enough energy to ultimately escape from the proton, they are confined waves. Like the confined waves of a string, there are definite frequencies for the solution of the wave equation for quantum mechanics. The quantum-mechanical interpretation is that these are definite energies. Therefore a quantum-mechanical system, because it is represented by waves, can have definite states of fixed energy; examples are the energy levels of various atoms.
Isn’t that great? What a summary! It also shows a deeper understanding of classical physics makes it sooooo much better to read something about quantum mechanics. In any case, as for the examples, I should add – because that’s what you’ll often find when you google for quantum-mechanical modes – the vibrational modes of molecules. There’s tons of interesting analysis out there, and so I’ll let you now have fun with it yourself! 🙂
This is my third and final comments on Feynman’s popular little booklet: The Strange Theory of Light and Matter, also known as Feynman’s Lectures on Quantum Electrodynamics (QED).
The origin of this short lecture series is quite moving: the death of Alix G. Mautner, a good friend of Feynman’s. She was always curious about physics but her career was in English literature and so she did not manage the math. Hence, Feynman introduces this 1985 publication by writing: “Here are the lectures I really prepared for Alix, but unfortunately I can’t tell them to her directly, now.”
Alix Mautner died from a brain tumor, and it is her husband, Leonard Mautner, who sponsored the QED lectures series at the UCLA, which Ralph Leigton transcribed and published as the booklet that we’re talking about here. Feynman himself died a few years later, at the relatively young age of 69. Tragic coincidence: he died of cancer too. Despite all this weirdness, Feynman’s QED never quite got the same iconic status of, let’s say, Stephen Hawking’s Brief History of Time. I wonder why, but the answer to that question is probably in the realm of chaos theory. 🙂 I actually just saw the movie on Stephen Hawking’s life (The Theory of Everything), and I noted another strange coincidence: Jane Wilde, Hawking’s first wife, also has a PhD in literature. It strikes me that, while the movie documents that Jane Wilde gave Hawking three children, after which he divorced her to marry his nurse, Elaine, the movie does not mention that he separated from Elaine too, and that he has some kind of ‘working relationship’ with Jane again.
Hmm… What to say? I should get back to quantum mechanics here or, to be precise, to quantum electrodynamics.
One reason why Feynman’s Strange Theory of Light and Matter did not sell like Hawking’s Brief History of Time, might well be that, in some places, the text is not entirely accurate. Why? Who knows? It would make for an interesting PhD thesis in History of Science. Unfortunately, I have no time for such PhD thesis. Hence, I must assume that Richard Feynman simply didn’t have much time or energy left to correct some of the writing of Ralph Leighton, who transcribed and edited these four short lectures a few years before Feynman’s death. Indeed, when everything is said and done, Ralph Leighton is not a physicist and, hence, I think he did compromise – just a little bit – on accuracy for the sake of readability. Ralph Leighton’s father, Robert Leighton, an eminent physicist who worked with Feynman, would probably have done a much better job.
I feel that one should not compromise on accuracy, even when trying to write something reader-friendly. That’s why I am writing this blog, and why I am writing three posts specifically on this little booklet. Indeed, while I’d warmly recommend that little book on QED as an excellent non-mathematical introduction to the weird world of quantum mechanics, I’d also say that, while Ralph Leighton’s story is great, it’s also, in some places, not entirely accurate indeed.
So… Well… I want to do better than Ralph Leighton here. Nothing more. Nothing less. 🙂 Let’s go for it.
I. Probability amplitudes: what are they?
The greatest achievement of that little QED publication is that it manages to avoid any reference to wave functions and other complicated mathematical constructs: all of the complexity of quantum mechanics is reduced to three basic events or actions and, hence, three basic amplitudes which are represented as ‘arrows’—literally.
Now… Well… You may or may not know that a (probability) amplitude is actually a complex number, but it’s not so easy to intuitively understand the concept of a complex number. In contrast, everyone easily ‘gets’ the concept of an ‘arrow’. Hence, from a pedagogical point of view, representing complex numbers by some ‘arrow’ is truly a stroke of genius.
Whatever we call it, a complex number or an ‘arrow’, a probability amplitude is something with (a) a magnitude and (b) a phase. As such, it resembles a vector, but it’s not quite the same, if only because we’ll impose some restrictions on the magnitude. But I shouldn’t get ahead of myself. Let’s start with the basics.
A magnitude is some real positive number, like a length, but you should not associate it with some spatial dimension in physical space: it’s just a number. As for the phase, we could associate that concept with some direction but, again, you should just think of it as a direction in a mathematical space, not in the real (physical) space.
Let me insert a parenthesis here. If I say the ‘real’ or ‘physical’ space, I mean the space in which the electrons and photons and all other real-life objects that we’re looking at exist and move. That’s a non-mathematical definition. In fact, in math, the real space is defined as a coordinate space, with sets of real numbers (vectors) as coordinates, so… Well… That’s a mathematical space only, not the ‘real’ (physical) space. So the real (vector) space is not real. 🙂 The mathematical real space may, or may not, accurately describe the real (physical) space. Indeed, you may have heard that physical space is curved because of the presence of massive objects, which means that the real coordinate space will actually not describe it very accurately. I know that’s a bit confusing but I hope you understand what I mean: if mathematicians talk about the real space, they do not mean the real space. They refer to a vector space, i.e. a mathematical construct. To avoid confusion, I’ll use the term ‘physical space’ rather than ‘real’ space in the future. So I’ll let the mathematicians get away with using the term ‘real space’ for something that isn’t real actually. 🙂
End of digression. Let’s discuss these two mathematical concepts – magnitude and phase – somewhat more in detail.
A. The magnitude
Let’s start with the magnitude or ‘length’ of our arrow. We know that we have to square these lengths to find some probability, i.e. some real number between 0 and 1. Hence, the length of our arrows cannot be larger than one. That’s the restriction I mentioned already, and this ‘normalization’ condition reinforces the point that these ‘arrows’ do not have any spatial dimension (not in any real space anyway): they represent a function. To be specific, they represent a wavefunction.
If we’d be talking complex numbers instead of ‘arrows’, we’d say the absolute value of the complex number cannot be larger than one. We’d also say that, to find the probability, we should take the absolute square of the complex number, so that’s the square of the magnitude or absolute value of the complex number indeed. We cannot just square the complex number: it has to be the square of the absolute value.
Why? Well… Just write it out. [You can skip this section if you’re not interested in complex numbers, but I would recommend you try to understand. It’s not that difficult. Indeed, if you’re reading this, you’re most likely to understand something of complex numbers and, hence, you should be able to work your way through it. Just remember that a complex number is like a two-dimensional number, which is why it’s sometimes written using bold-face (z), rather than regular font (z). However, I should immediately add this convention is usually not followed. I like the boldface though, and so I’ll try to use it in this post.] The square of a complex number z = a + bi is equal to z2 = a2 + 2abi – b2, while the square of its absolute value (i.e. the absolute square) is |z|2 = [√(a2 + b2)]2 = a2 + b2. So you can immediately see that the square and the absolute square of a complex numbers are two very different things indeed: it’s not only the 2abi term, but there’s also the minus sign in the first expression, because of the i2 = –1 factor. In case of doubt, always remember that the square of a complex number may actually yield a negative number, as evidenced by the definition of the imaginary unit itself: i2 = –1.
End of digression. Feynman and Leighton manage to avoid any reference to complex numbers in that short series of four lectures and, hence, all they need to do is explain how one squares a length. Kids learn how to do that when making a square out of rectangular paper: they’ll fold one corner of the paper until it meets the opposite edge, forming a triangle first. They’ll then cut or tear off the extra paper, and then unfold. Done. [I could note that the folding is a 90 degree rotation of the original length (or width, I should say) which, in mathematical terms, is equivalent to multiplying that length with the imaginary unit (i). But I am sure the kids involved would think I am crazy if I’d say this. 🙂 So let me get back to Feynman’s arrows.
B. The phase
Feynman and Leighton’s second pedagogical stroke of genius is the metaphor of the ‘stopwatch’ and the ‘stopwatch hand’ for the variable phase. Indeed, although I think it’s worth explaining why z = a + bi = rcosφ + irsinφ in the illustration below can be written as z = reiφ = |z|eiφ, understanding Euler’s representation of complex number as a complex exponential requires swallowing a very substantial piece of math and, if you’d want to do that, I’ll refer you to one of my posts on complex numbers).
The metaphor of the stopwatch represents a periodic function. To be precise, it represents a sinusoid, i.e. a smooth repetitive oscillation. Now, the stopwatch hand represents the phase of that function, i.e. the φ angle in the illustration above. That angle is a function of time: the speed with which the stopwatch turns is related to some frequency, i.e. the number of oscillations per unit of time (i.e. per second).
You should now wonder: what frequency? What oscillations are we talking about here? Well… As we’re talking photons and electrons here, we should distinguish the two:
- For photons, the frequency is given by Planck’s energy-frequency relation, which relates the energy (E) of a photon (1.5 to 3.5 eV for visible light) to its frequency (ν). It’s a simple proportional relation, with Planck’s constant (h) as the proportionality constant: E = hν, or ν = E/h.
- For electrons, we have the de Broglie relation, which looks similar to the Planck relation (E = hf, or f = E/h) but, as you know, it’s something different. Indeed, these so-called matter waves are not so easy to interpret because there actually is no precise frequency f. In fact, the matter wave representing some particle in space will consist of a potentially infinite number of waves, all superimposed one over another, as illustrated below.
For the sake of accuracy, I should mention that the animation above has its limitations: the wavetrain is complex-valued and, hence, has a real as well as an imaginary part, so it’s something like the blob underneath. Two functions in one, so to speak: the imaginary part follows the real part with a phase difference of 90 degrees (or π/2 radians). Indeed, if the wavefunction is a regular complex exponential reiθ, then rsin(φ–π/2) = rcos(φ), which proves the point: we have two functions in one here. 🙂 I am actually just repeating what I said before already: the probability amplitude, or the wavefunction, is a complex number. You’ll usually see it written as Ψ (psi) or Φ (phi). Here also, using boldface (Ψ or Φ instead of Ψ or Φ) would usefully remind the reader that we’re talking something ‘two-dimensional’ (in mathematical space, that is), but this convention is usually not followed.
In any case… Back to frequencies. The point to note is that, when it comes to analyzing electrons (or any other matter-particle), we’re dealing with a range of frequencies f really (or, what amounts to the same, a range of wavelengths λ) and, hence, we should write Δf = ΔE/h, which is just one of the many expressions of the Uncertainty Principle in quantum mechanics.
Now, that’s just one of the complications. Another difficulty is that matter-particles, such as electrons, have some rest mass, and so that enters the energy equation as well (literally). Last but not least, one should distinguish between the group velocity and the phase velocity of matter waves. As you can imagine, that makes for a very complicated relationship between ‘the’ wavelength and ‘the’ frequency. In fact, what I write above should make it abundantly clear that there’s no such thing as the wavelength, or the frequency: it’s a range really, related to the fundamental uncertainty in quantum physics. I’ll come back to that, and so you shouldn’t worry about it here. Just note that the stopwatch metaphor doesn’t work very well for an electron!
In his postmortem lectures for Alix Mautner, Feynman avoids all these complications. Frankly, I think that’s a missed opportunity because I do not think it’s all that incomprehensible. In fact, I write all that follows because I do want you to understand the basics of waves. It’s not difficult. High-school math is enough here. Let’s go for it.
One turn of the stopwatch corresponds to one cycle. One cycle, or 1 Hz (i.e. one oscillation per second) covers 360 degrees or, to use a more natural unit, 2π radians. [Why is radian a more natural unit? Because it measures an angle in terms of the distance unit itself, rather than in arbitrary 1/360 cuts of a full circle. Indeed, remember that the circumference of the unit circle is 2π.] So our frequency ν (expressed in cycles per second) corresponds to a so-called angular frequency ω = 2πν. From this formula, it should be obvious that ω is measured in radians per second.
We can also link this formula to the period of the oscillation, T, i.e. the duration of one cycle. T = 1/ν and, hence, ω = 2π/T. It’s all nicely illustrated below. [And, yes, it’s an animation from Wikipedia: nice and simple.]
The easy math above now allows us to formally write the phase of a wavefunction – let’s denote the wavefunction as φ (phi), and the phase as θ (theta) – as a function of time (t) using the angular frequency ω. So we can write: θ = ωt = 2π·ν·t. Now, the wave travels through space, and the two illustrations above (i.e. the one with the super-imposed waves, and the one with the complex wave train) would usually represent a wave shape at some fixed point in time. Hence, the horizontal axis is not t but x. Hence, we can and should write the phase not only as a function of time but also of space. So how do we do that? Well… If the hypothesis is that the wave travels through space at some fixed speed c, then its frequency ν will also determine its wavelength λ. It’s a simple relationship: c = λν (the number of oscillations per second times the length of one wavelength should give you the distance traveled per second, so that’s, effectively, the wave’s speed).
Now that we’ve expressed the frequency in radians per second, we can also express the wavelength in radians per unit distance too. That’s what the wavenumber does: think of it as the spatial frequency of the wave. We denote the wavenumber by k, and write: k = 2π/λ. [Just do a numerical example when you have difficulty following. For example, if you’d assume the wavelength is 5 units distance (i.e. 5 meter) – that’s a typical VHF radio frequency: ν = (3×108 m/s)/(5 m) = 0.6×108 Hz = 60 MHz – then that would correspond to (2π radians)/(5 m) ≈ 1.2566 radians per meter. Of course, we can also express the wave number in oscillations per unit distance. In that case, we’d have to divide k by 2π, because one cycle corresponds to 2π radians. So we get the reciprocal of the wavelength: 1/λ. In our example, 1/λ is, of course, 1/5 = 0.2, so that’s a fifth of a full cycle. You can also think of it as the number of waves (or wavelengths) per meter: if the wavelength is λ, then one can fit 1/λ waves in a meter.
Now, from the ω = 2πν, c = λν and k = 2π/λ relations, it’s obvious that k = 2π/λ = 2π/(c/ν) = (2πν)/c = ω/c. To sum it all up, frequencies and wavelengths, in time and in space, are all related through the speed of propagation of the wave c. More specifically, they’re related as follows:
c = λν = ω/k
From that, it’s easy to see that k = ω/c, which we’ll use in a moment. Now, it’s obvious that the periodicity of the wave implies that we can find the same phase by going one oscillation (or a multiple number of oscillations back or forward in time, or in space. In fact, we can also find the same phase by letting both time and space vary. However, if we want to do that, it should be obvious that we should either (a) go forward in space and back in time or, alternatively, (b) go back in space and forward in time. In other words, if we want to get the same phase, then time and space sort of substitute for each other. Let me quote Feynman on this: “This is easily seen by considering the mathematical behavior of a(t−r/c). Evidently, if we add a little time Δt, we get the same value for a(t−r/c) as we would have if we had subtracted a little distance: Δr = −cΔt.” The variable a stands for the acceleration of an electric charge here, causing an electromagnetic wave, but the same logic is valid for the phase, with a minor twist though: we’re talking a nice periodic function here, and so we need to put the angular frequency in front. Hence, the rate of change of the phase in respect to time is measured by the angular frequency ω. In short, we write:
θ = ω(t–x/c) = ωt–kx
Hence, we can re-write the wavefunction, in terms of its phase, as follows:
φ(θ) = φ[θ(x, t)] = φ[ωt–kx]
Note that, if the wave would be traveling in the ‘other’ direction (i.e. in the negative x-direction), we’d write φ(θ) = φ[kx+ωt]. Time travels in one direction only, of course, but so one minus sign has to be there because of the logic involved in adding time and subtracting distance. You can work out an example (with a sine or cosine wave, for example) for yourself.
So what, you’ll say? Well… Nothing. I just hope you agree that all of this isn’t rocket science: it’s just high-school math. But so it shows you what that stopwatch really is and, hence, I – but who am I? – would have put at least one or two footnotes on this in a text like Feynman’s QED.
Now, let me make a much longer and more serious digression:
Digression 1: on relativity and spacetime
As you can see from the argument (or phase) of that wave function φ(θ) = φ[θ(x, t)] = φ[ωt–kx] = φ[–k(x–ct)], any wave equation establishes a deep relation between the wave itself (i.e. the ‘thing’ we’re describing) and space and time. In fact, that’s what the whole wave equation is all about! So let me say a few things more about that.
Because you know a thing or two about physics, you may ask: when we’re talking time, whose time are we talking about? Indeed, if we’re talking photons going from A to B, these photons will be traveling at or near the speed of light and, hence, their clock, as seen from our (inertial) frame of reference, doesn’t move. Likewise, according to the photon, our clock seems to be standing still.
Let me put the issue to bed immediately: we’re looking at things from our point of view. Hence, we’re obviously using our clock, not theirs. Having said that, the analysis is actually fully consistent with relativity theory. Why? Well… What do you expect? If it wasn’t, the analysis would obviously not be valid. 🙂 To illustrate that it’s consistent with relativity theory, I can mention, for example, that the (probability) amplitude for a photon to travel from point A to B depends on the spacetime interval, which is invariant. Hence, A and B are four-dimensional points in spacetime, involving both spatial as well as time coordinates: A = (xA, yA, zA, tA) and B = (xB, yB, zB, tB). And so the ‘distance’ – as measured through the spacetime interval – is invariant.
Now, having said that, we should draw some attention to the intimate relationship between space and time which, let me remind you, results from the absoluteness of the speed of light. Indeed, one will always measure the speed of light c as being equal to 299,792,458 m/s, always and everywhere. It does not depend on your reference frame (inertial or moving). That’s why the constant c anchors all laws in physics, and why we can write what we write above, i.e. include both distance (x) as well as time (t) in the wave function φ = φ(x, t) = φ[ωt–kx] = φ[–k(x–ct)]. The k and ω are related through the ω/k = c relationship: the speed of light links the frequency in time (ν = ω/2π = 1/T) with the frequency in space (i.e. the wavenumber or spatial frequency k). There is only degree of freedom here: the frequency—in space or in time, it doesn’t matter: ν and ω are not independent. [As noted above, the relationship between the frequency in time and in space is not so obvious for electrons, or for matter waves in general: for those matter-waves, we need to distinguish group and phase velocity, and so we don’t have a unique frequency.]
Let me make another small digression within the digression here. Thinking about travel at the speed of light invariably leads to paradoxes. In previous posts, I explained the mechanism of light emission: a photon is emitted – one photon only – when an electron jumps back to its ground state after being excited. Hence, we may imagine a photon as a transient electromagnetic wave–something like what’s pictured below. Now, the decay time of this transient oscillation (τ) is measured in nanoseconds, i.e. billionths of a second (1 ns = 1×10–9 s): the decay time for sodium light, for example, is some 30 ns only.
However, because of the tremendous speed of light, that still makes for a wavetrain that’s like ten meter long, at least (30×10–9 s times 3×108 m/s is nine meter, but you should note that the decay time measures the time for the oscillation to die out by a factor 1/e, so the oscillation itself lasts longer than that). Those nine or ten meters cover like 16 to 17 million oscillations (the wavelength of sodium light is about 600 nm and, hence, 10 meter fits almost 17 million oscillations indeed). Now, how can we reconcile the image of a photon as a ten-meter long wavetrain with the image of a photon as a point particle?
The answer to that question is paradoxical: from our perspective, anything traveling at the speed of light – including this nine or ten meter ‘long’ photon – will have zero length because of the relativistic length contraction effect. Length contraction? Yes. I’ll let you look it up, because… Well… It’s not easy to grasp. Indeed, from the three measurable effects on objects moving at relativistic speeds – i.e. (1) an increase of the mass (the energy needed to further accelerate particles in particle accelerators increases dramatically at speeds nearer to c), (2) time dilation, i.e. a slowing down of the (internal) clock (because of their relativistic speeds when entering the Earth’s atmosphere, the measured half-life of muons is five times that when at rest), and (3) length contraction – length contraction is probably the most paradoxical of all.
Let me end this digression with yet another short note. I said that one will always measure the speed of light c as being equal to 299,792,458 m/s, always and everywhere and, hence, that it does not depend on your reference frame (inertial or moving). Well… That’s true and not true at the same time. I actually need to nuance that statement a bit in light of what follows: an individual photon does have an amplitude to travel faster or slower than c, and when discussing matter waves (such as the wavefunction that’s associated with an electron), we can have phase velocities that are faster than light! However, when calculating those amplitudes, c is a constant.
That doesn’t make sense, you’ll say. Well… What can I say? That’s how it is unfortunately. I need to move on and, hence, I’ll end this digression and get back to the main story line. Part I explained what probability amplitudes are—or at least tried to do so. Now it’s time for part II: the building blocks of all of quantum electrodynamics (QED).
II. The building blocks: P(A to B), E(A to B) and j
The three basic ‘events’ (and, hence, amplitudes) in QED are the following:
1. P(A to B)
P(A to B) is the (probability) amplitude for a photon to travel from point A to B. However, I should immediately note that A and B are points in spacetime. Therefore, we associate them not only with some specific (x, y, z) position in space, but also with a some specific time t. Now, quantum-mechanical theory gives us an easy formula for P(A to B): it depends on the so-called (spacetime) interval between the two points A and B, i.e. I = Δr2 – Δt2 = (x2–x1)2+(y2–y1)2+(z2–z1)2 – (t2–t1)2. The point to note is that the spacetime interval takes both the distance in space as well as the ‘distance’ in time into account. As I mentioned already, this spacetime interval does not depend on our reference frame and, hence, it’s invariant (as long as we’re talking reference frames that move with constant speed relative to each other). Also note that we should measure time and distance in equivalent units when using that Δr2 – Δt2 formula for I. So we either measure distance in light-seconds or, else, we measure time in units that correspond to the time that’s needed for light to travel one meter. If no equivalent units are adopted, the formula is I = Δr2 – c·Δt2.
Now, in quantum theory, anything is possible and, hence, not only do we allow for crooked paths, but we also allow for the difference in time to differ from the time you’d expect a photon to need to travel along some curve (whose length we’ll denote by l), i.e. l/c. Hence, our photon may actually travel slower or faster than the speed of light c! There is one lucky break, however, that makes all come out alright: it’s easy to show that the amplitudes associated with the odd paths and strange timings generally cancel each other out. [That’s what the QED booklet shows.] Hence, what remains, are the paths that are equal or, importantly, those that very near to the so-called ‘light-like’ intervals in spacetime only. The net result is that light – even one single photon – effectively uses a (very) small core of space as it travels, as evidenced by the fact that even one single photon interferes with itself when traveling through a slit or a small hole!
[If you now wonder what it means for a photon to interfere for itself, let me just give you the easy explanation: it may change its path. We assume it was traveling in a straight line – if only because it left the source at some point in time and then arrived at the slit obviously – but so it no longer travels in a straight line after going through the slit. So that’s what we mean here.]
2. E(A to B)
E(A to B) is the (probability) amplitude for an electron to travel from point A to B. The formula for E(A to B) is much more complicated, and it’s the one I want to discuss somewhat more in detail in this post. It depends on some complex number j (see the next remark) and some real number n.
Finally, an electron could emit or absorb a photon, and the amplitude associated with this event is denoted by j, for junction number. It’s the same number j as the one mentioned when discussing E(A to B) above.
Now, this junction number is often referred to as the coupling constant or the fine-structure constant. However, the truth is, as I pointed out in my previous post, that these numbers are related, but they are not quite the same: α is the square of j, so we have α = j2. There is also one more, related, number: the gauge parameter, which is denoted by g (despite the g notation, it has nothing to do with gravitation). The value of g is the square root of 4πε0α, so g2 = 4πε0α. I’ll come back to this. Let me first make an awfully long digression on the fine-structure constant. It will be awfully long. So long that it’s actually part of the ‘core’ of this post actually.
Digression 2: on the fine-structure constant, Planck units and the Bohr radius
The value for j is approximately –0.08542454.
How do we know that?
The easy answer to that question is: physicists measured it. In fact, they usually publish the measured value as the square root of the (absolute value) of j, which is that fine-structure constant α. Its value is published (and updated) by the US National Institute on Standards and Technology. To be precise, the currently accepted value of α is 7.29735257×10−3. In case you doubt, just check that square root:
j = –0.08542454 ≈ –√0.00729735257 = –√α
As noted in Feynman’s (or Leighton’s) QED, older and/or more popular books will usually mention 1/α as the ‘magical’ number, so the ‘special’ number you may have seen is the inverse fine-structure constant, which is about 137, but not quite:
1/α = 137.035999074 ± 0.000000044
I am adding the standard uncertainty just to give you an idea of how precise these measurements are. 🙂 About 0.32 parts per billion (just divide the 137.035999074 number by the uncertainty). So that‘s the number that excites popular writers, including Leighton. Indeed, as Leighton puts it:
“Where does this number come from? Nobody knows. It’s one of the greatest damn mysteries of physics: a magic number that comes to us with no understanding by man. You might say the “hand of God” wrote that number, and “we don’t know how He pushed his pencil.” We know what kind of a dance to do experimentally to measure this number very accurately, but we don’t know what kind of dance to do on the computer to make this number come out, without putting it in secretly!”
Is it Leighton, or did Feynman really say this? Not sure. While the fine-structure constant is a very special number, it’s not the only ‘special’ number. In fact, we derive it from other ‘magical’ numbers. To be specific, I’ll show you how we derive it from the fundamental properties – as measured, of course – of the electron. So, in fact, I should say that we do know how to make this number come out, which makes me doubt whether Feynman really said what Leighton said he said. 🙂
So we can derive α from some other numbers. That brings me to the more complicated answer to the question as to what the value of j really is: j‘s value is the electron charge expressed in Planck units, which I’ll denote by –eP:
j = –eP
[You may want to reflect on this, and quickly verify on the Web. The Planck unit of electric charge, expressed in Coulomb, is about 1.87555×10–18 C. If you multiply that j = –eP, so with –0.08542454, you get the right answer: the electron charge is about –0.160217×10–18 C.]
Now that is strange.
Why? Well… For starters, when doing all those quantum-mechanical calculations, we like to think of j as a dimensionless number: a coupling constant. But so here we do have a dimension: electric charge.
Let’s look at the basics. If j is –√α, and it’s also equal to –eP, then the fine-structure constant must also be equal to the square of the electron charge eP, so we can write:
α = eP2
You’ll say: yes, so what? Well… I am pretty sure that, if you’ve ever seen a formula for α, it’s surely not this simple j = –eP or α = eP2 formula. What you’ve seen, most likely, is one or more of the following expressions below :
That’s a pretty impressive collection of physical constants, isn’t it? 🙂 They’re all different but, somehow, when we combine them in one or the other ratio (we have not less than five different expressions here (each identity is a separate expression), and I could give you a few more!), we get the very same number: α. Now that is what I call strange. Truly strange. Incomprehensibly weird!
You’ll say… Well… Those constants must all be related… Of course! That’s exactly the point I am making here. They are, but look how different they are: me measures mass, re measures distance, e is a charge, and so these are all very different numbers with very different dimensions. Yet, somehow, they are all related through this α number. Frankly, I do not know of any other expression that better illustrates some kind of underlying unity in Nature than the one with those five identities above.
Let’s have a closer look at those constants. You know most of them already. The only constants you may not have seen before are μ0, RK and, perhaps, re as well as me . However, these can easily be defined as some easy function of the constants that you did see before, so let me quickly do that:
- The μ0 constant is the so-called magnetic constant. It’s something similar as ε0 and it’s referred to as the magnetic permeability of the vacuum. So it’s just like the (electric) permittivity of the vacuum (i.e. the electric constant ε0) and the only reason why this blog hasn’t mentioned this constant before is because I haven’t really discussed magnetic fields so far. I only talked about the electric field vector. In any case, you know that the electric and magnetic force are part and parcel of the same phenomenon (i.e. the electromagnetic interaction between charged particles) and, hence, they are closely related. To be precise, μ0ε0 = 1/c2 = c–2. So that shows the first and second expression for α are, effectively, fully equivalent. [Just in case you’d doubt that μ0ε0 = 1/c2, let me give you the values: μ0 = 4π·10–7 N/A2, and ε0 = (1/4π·c2)·107 C2/N·m2. Just plug them in, and you’ll see it’s bang on. Moreover, note that the ampere (A) unit is equal to the coulomb per second unit (C/s), so even the units come out alright. 🙂 Of course they do!]
- The ke constant is the Coulomb constant and, from its definition ke = 1/4πε0, it’s easy to see how those two expressions are, in turn, equivalent with the third expression for α.
- The RK constant is the so-called von Klitzing constant. Huh? Yes. I know. I am pretty sure you’ve never ever heard of that one before. Don’t worry about it. It’s, quite simply, equal to RK = h/e2. Hence, substituting (and don’t forget that h = 2πħ) will demonstrate the equivalence of the fourth expression for α.
- Finally, the re factor is the classical electron radius, which is usually written as a function of me, i.e. the electron mass: re = e2/4πε0mec2. Also note that this also implies that reme = e2/4πε0c2. In words: the product of the electron mass and the electron radius is equal to some constant involving the electron (e), the electric constant (ε0), and c (the speed of light).
I am sure you’re under some kind of ‘formula shock’ now. But you should just take a deep breath and read on. The point to note is that all these very different things are all related through α.
So, again, what is that α really? Well… A strange number indeed. It’s dimensionless (so we don’t measure in kg, m/s, eV·s or whatever) and it pops up everywhere. [Of course, you’ll say: “What’s everywhere? This is the first time I‘ve heard of it!” :-)]
Well… Let me start by explaining the term itself. The fine structure in the name refers to the splitting of the spectral lines of atoms. That’s a very fine structure indeed. 🙂 We also have a so-called hyperfine structure. Both are illustrated below for the hydrogen atom. The numbers n, J, I, and F are quantum numbers used in the quantum-mechanical explanation of the emission spectrum, which is also depicted below, but note that the illustration gives you the so-called Balmer series only, i.e. the colors in the visible light spectrum (there are many more ‘colors’ in the high-energy ultraviolet and the low-energy infrared range).
To be precise: (1) n is the principal quantum number: here it takes the values 1 or 2, and we could say these are the principal shells; (2) the S, P, D,… orbitals (which are usually written in lower case: s, p, d, f, g, h and i) correspond to the (orbital) angular momentum quantum number l = 0, 1, 2,…, so we could say it’s the subshell; (3) the J values correspond to the so-called magnetic quantum number m, which goes from –l to +l; (4) the fourth quantum number is the spin angular momentum s. I’ve copied another diagram below so you see how it works, more or less, that is.
Now, our fine-structure constant is related to these quantum numbers. How exactly is a bit of a long story, and so I’ll just copy Wikipedia’s summary on this: ” The gross structure of line spectra is the line spectra predicted by the quantum mechanics of non-relativistic electrons with no spin. For a hydrogenic atom, the gross structure energy levels only depend on the principal quantum number n. However, a more accurate model takes into account relativistic and spin effects, which break the degeneracy of the the energy levels and split the spectral lines. The scale of the fine structure splitting relative to the gross structure energies is on the order of (Zα)2, where Z is the atomic number and α is the fine-structure constant.” There you go. You’ll say: so what? Well… Nothing. If you aren’t amazed by that, you should stop reading this.
It is an ‘amazing’ number, indeed, and, hence, it does quality for being “one of the greatest damn mysteries of physics”, as Feynman and/or Leighton put it. Having said that, I would not go as far as to write that it’s “a magic number that comes to us with no understanding by man.” In fact, I think Feynman/Leighton could have done a much better job when explaining what it’s all about. So, yes, I hope to do better than Leighton here and, as he’s still alive, I actually hope he reads this. 🙂
The point is: α is not the only weird number. What’s particular about it, as a physical constant, is that it’s dimensionless, because it relates a number of other physical constants in such a way that the units fall away. Having said that, the Planck or Boltzmann constant are at least as weird.
So… What is this all about? Well… You’ve probably heard about the so-called fine-tuning problem in physics and, if you’re like me, your first reaction will be to associate fine-tuning with fine-structure. However, the two terms have nothing in common, except for four letters. 🙂 OK. Well… I am exaggerating here. The two terms are actually related, to some extent at least, but let me explain how.
The term fine-tuning refers to the fact that all the parameters or constants in the so-called Standard Model of physics are, indeed, all related to each other in the way they are. We can’t sort of just turn the knob of one and change it, because everything falls apart then. So, in essence, the fine-tuning problem in physics is more like a philosophical question: why is the value of all these physical constants and parameters exactly what it is? So it’s like asking: could we change some of the ‘constants’ and still end up with the world we’re living in? Or, if it would be some different world, how would it look like? What if c was some other number? What if ke or ε0 was some other number? In short, and in light of those expressions for α, we may rephrase the question as: why is α what is is?
Of course, that’s a question one shouldn’t try to answer before answering some other, more fundamental, question: how many degrees of freedom are there really? Indeed, we just saw that ke and ε0 are intimately related through some equation, and other constants and parameters are related too. So the question is like: what are the ‘dependent’ and the ‘independent’ variables in this so-called Standard Model?
There is no easy answer to that question. In fact, one of the reasons why I find physics so fascinating is that one cannot easily answer such questions. There are the obvious relationships, of course. For example, the ke = 1/4πε0 relationship, and the context in which they are used (Coulomb’s Law) does, indeed, strongly suggest that both constants are actually part and parcel of the same thing. Identical, I’d say. Likewise, the μ0ε0 = 1/c2 relation also suggests there’s only one degree of freedom here, just like there’s only one degree of freedom in that ω/k = c relationship (if we set a value for ω, we have k, and vice versa). But… Well… I am not quite sure how to phrase this, but… What physical constants could be ‘variables’ indeed?
It’s pretty obvious that the various formulas for α cannot answer that question: you could stare at them for days and weeks and months and years really, but I’d suggest you use your time to read more of Feynman’s real Lectures instead. 🙂 One point that may help to come to terms with this question – to some extent, at least – is what I casually mentioned above already: the fine-structure constant is equal to the square of the electron charge expressed in Planck units: α = eP2.
Now, that’s very remarkable because Planck units are some kind of ‘natural units’ indeed (for the detail, see my previous post: among other things, it explains what these Planck units really are) and, therefore, it is quite tempting to think that we’ve actually got only one degree of freedom here: α itself. All the rest should follow from it.
It should… But… Does it?
The answer is: yes and no. To be frank, it’s more no than yes because, as I noted a couple of times already, the fine-structure constant relates a lot of stuff but it’s surely not the only significant number in the Universe. For starters, I said that our E(A to B) formula has two ‘variables’:
- We have that complex number j, which, as mentioned, is equal to the electron charge expressed in Planck units. [In case you wonder why –eP ≈ –0.08542455 is said to be an amplitude, i.e. a complex number or an ‘arrow’… Well… Complex numbers include the real numbers and, hence, –0.08542455 is both real and complex. When combining ‘arrows’ or, to be precise, when multiplying some complex number with –0.08542455, we will (a) shrink the original arrow to about 8.5% of its original value (8.542455% to be precise) and (b) rotate it over an angle of plus or minus 180 degrees. In other words, we’ll reverse its direction. Hence, using Euler’s notation for complex numbers, we can write: –1 = eiπ = e–iπ and, hence, –0.085 = 0.085·eiπ = 0.085·e–iπ. So, in short, yes, j is a complex number, or an ‘arrow’, if you prefer that term.]
- We also have some some real number n in the E(A to B) formula. So what’s the n? Well… Believe it or not, it’s the electron mass! Isn’t that amazing?
You’ll say: “Well… Hmm… I suppose so.” But then you may – and actually should – also wonder: the electron mass? In what units? Planck units again? And are we talking relativistic mass (i.e. its total mass, including the equivalent mass of its kinetic energy) or its rest mass only? And we were talking α here, so can we relate it to α too, just like the electron charge?
These are all very good questions. Let’s start with the second one. We’re talking rather slow-moving electrons here, so the relativistic mass (m) and its rest mass (m0) is more or less the same. Indeed, the Lorentz factor γ in the m = γm0 equation is very close to 1 for electrons moving at their typical speed. So… Well… That question doesn’t matter very much. Really? Yes. OK. Because you’re doubting, I’ll quickly show it to you. What is their ‘typical’ speed?
We know we shouldn’t attach too much importance to the concept of an electron in orbit around some nucleus (we know it’s not like some planet orbiting around some star) and, hence, to the concept of speed or velocity (velocity is speed with direction) when discussing an electron in an atom. The concept of momentum (i.e. velocity combined with mass or energy) is much more relevant. There’s a very easy mathematical relationship that gives us some clue here: the Uncertainty Principle. In fact, we’ll use the Uncertainty Principle to relate the momentum of an electron (p) to the so-called Bohr radius r (think of it as the size of a hydrogen atom) as follows: p ≈ ħ/r. [I’ll come back on this in a moment, and show you why this makes sense.]
Now we also know its kinetic energy (K.E.) is mv2/2, which we can write as p2/2m. Substituting our p ≈ ħ/r conjecture, we get K.E. = mv2/2 = ħ2/2mr2. This is equivalent to m2v2 = ħ2/r2 (just multiply both sides with m). From that, we get v = ħ/mr. Now, one of the many relations we can derive from the formulas for the fine-structure constant is re = α2r. [I haven’t showed you that yet, but I will shortly. It’s a really amazing expression. However, as for now, just accept it as a simple formula for interim use in this digression.] Hence, r = re/α2. The re factor in this expression is the so-called classical electron radius. So we can now write v = ħα2/mre. Let’s now throw c in: v/c = α2ħ/mcre. However, from that fifth expression for α, we know that ħ/mcre = α, so we get v/c = α. We have another amazing result here: the v/c ratio for an electron (i.e. its speed expressed as a fraction of the speed of light) is equal to that fine-structure constant α. So that’s about 1/137, so that’s less than 1% of the speed of light. Now… I’ll leave it to you to calculate the Lorentz factor γ but… Well… It’s obvious that it will be very close to 1. 🙂 Hence, the electron’s speed – however we want to visualize that – doesn’t matter much indeed, so we should not worry about relativistic corrections in the formulas.
Let’s now look at the question in regard to the Planck units. If you know nothing at all about them, I would advise you to read what I wrote about them in my previous post. Let me just note we get those Planck units by equating not less than five fundamental physical constants to 1, notably (1) the speed of light, (2) Planck’s (reduced) constant, (3) Boltzmann’s constant, (4) Coulomb’s constant and (5) Newton’s constant (i.e. the gravitational constant). Hence, we have a set of five equations here (c = ħ = kB = ke = G = 1), and so we can solve that to get the five Planck units, i.e. the Planck length unit, the Planck time unit, the Planck mass unit, the Planck energy unit, the Planck charge unit and, finally (oft forgotten), the Planck temperature unit. Of course, you should note that all mass and energy units are directly related because of the mass-energy equivalence relation E = mc2, which simplifies to E = m if c is equated to 1. [I could also say something about the relation between temperature and (kinetic) energy, but I won’t, as it would only further confuse you.]
Now, you may or may not remember that the Planck time and length units are unimaginably small, but that the Planck mass unit is actually quite sizable—at the atomic scale, that is. Indeed, the Planck mass is something huge, like the mass of an eyebrow hair, or a flea egg. Is that huge? Yes. Because if you’d want to pack it in a Planck-sized particle, it would make for a tiny black hole. 🙂 No kidding. That’s the physical significance of the Planck mass and the Planck length and, yes, it’s weird. 🙂
Let me give you some values. First, the Planck mass itself: it’s about 2.1765×10−8 kg. Again, if you think that’s tiny, think again. From the E = mc2 equivalence relationship, we get that this is equivalent to 2 giga-joule, approximately. Just to give an idea, that’s like the monthly electricity consumption of an average American family. So that’s huge indeed! 🙂 [Many people think that nuclear energy involves the conversion of mass into energy, but the story is actually more complicated than that. In any case… I need to move on.]
Let me now give you the electron mass expressed in the Planck mass unit:
- Measured in our old-fashioned super-sized SI kilogram unit, the electron mass is me = 9.1×10–31 kg.
- The Planck mass is mP = 2.1765×10−8 kg.
- Hence, the electron mass expressed in Planck units is meP = me/mP = (9.1×10–31 kg)/(2.1765×10−8 kg) = 4.181×10−23.
We can, once again, write that as some function of the fine-structure constant. More specifically, we can write:
meP = α/reP = α/α2rP = 1/αrP
So… Well… Yes: yet another amazing formula involving α.
In this formula, we have reP and rP, which are the (classical) electron radius and the Bohr radius expressed in Planck (length) units respectively. So you can see what’s going on here: we have all kinds of numbers here expressed in Planck units: a charge, a radius, a mass,… And we can relate all of them to the fine-structure constant.
Why? Who knows? I don’t. As Leighton puts it: that’s just the way “God pushed His pencil.” 🙂
Note that the beauty of natural units ensures that we get the same number for the (equivalent) energy of an electron. Indeed, from the E = mc2 relation, we know the mass of an electron can also be written as 0.511 MeV/c2. Hence, the equivalent energy is 0.511 MeV (so that’s, quite simply, the same number but without the 1/c2 factor). Now, the Planck energy EP (in eV) is 1.22×1028 eV, so we get EeP = Ee/EP = (0.511×106 eV)/(1.22×1028 eV) = 4.181×10−23. So it’s exactly the same as the electron mass expressed in Planck units. Isn’t that nice? 🙂
Now, are all these numbers dimensionless, just like α? The answer to that question is complicated. Yes, and… Well… No:
- Yes. They’re dimensionless because they measure something in natural units, i.e. Planck units, and, hence, that’s some kind of relative measure indeed so… Well… Yes, dimensionless.
- No. They’re not dimensionless because they do measure something, like a charge, a length, or a mass, and when you chose some kind of relative measure, you still need to define some gauge, i.e. some kind of standard measure. So there’s some ‘dimension’ involved there.
So what’s the final answer? Well… The Planck units are not dimensionless. All we can say is that they are closely related, physically. I should also add that we’ll use the electron charge and mass (expressed in Planck units) in our amplitude calculations as a simple (dimensionless) number between zero and one. So the correct answer to the question as to whether these numbers have any dimension is: expressing some quantities in Planck units sort of normalizes them, so we can use them directly in dimensionless calculations, like when we multiply and add amplitudes.
Hmm… Well… I can imagine you’re not very happy with this answer but it’s the best I can do. Sorry. I’ll let you further ponder that question. I need to move on.
Note that that 4.181×10−23 is still a very small number (23 zeroes after the decimal point!), even if it’s like 46 million times larger than the electron mass measured in our conventional SI unit (i.e. 9.1×10–31 kg). Does such small number make any sense? The answer is: yes, it does. When we’ll finally start discussing that E(A to B) formula (I’ll give it to you in a moment), you’ll see that a very small number for n makes a lot of sense.
Before diving into it all, let’s first see if that formula for that alpha, that fine-structure constant, still makes sense with me expressed in Planck units. Just to make sure. 🙂 To do that, we need to use the fifth (last) expression for a, i.e. the one with re in it. Now, in my previous post, I also gave some formula for re: re = e2/4πε0mec2, which we can re-write as reme = e2/4πε0c2. If we substitute that expression for reme in the formula for α, we can calculate α from the electron charge, which indicates both the electron radius and its mass are not some random God-given variable, or “some magic number that comes to us with no understanding by man“, as Feynman – well… Leighton, I guess – puts it. No. They are magic numbers alright, one related to another through the equally ‘magic’ number α, but so I do feel we actually can create some understanding here.
At this point, I’ll digress once again, and insert some quick back-of-the-envelope argument from Feynman’s very serious Caltech Lectures on Physics, in which, as part of the introduction to quantum mechanics, he calculates the so-called Bohr radius from Planck’s constant h. Let me quickly explain: the Bohr radius is, roughly speaking, the size of the simplest atom, i.e. an atom with one electron (so that’s hydrogen really). So it’s not the classical electron radius re. However, both are also related to that ‘magical number’ α. To be precise, if we write the Bohr radius as r, then re = α2r ≈ 0.000053… times r, which we can re-write as:
α = √(re /r) = (re /r)1/2
So that’s yet another amazing formula involving the fine-structure constant. In fact, it’s the formula I used as an ‘interim’ expression to calculate the relative speed of electrons. I just used it without any explanation there, but I am coming back to it here. Alpha again…
Just think about it for a while. In case you’d still doubt the magic of that number, let me write what we’ve discovered so far:
(1) α is the square of the electron charge expressed in Planck units: α = eP2.
(2) α is the square root of the ratio of (a) the classical electron radius and (b) the Bohr radius: α = √(re /r). You’ll see this more often written as re = α2r. Also note that this is an equation that does not depend on the units, in contrast to equation 1 (above), and 4 and 5 (below), which require you to switch to Planck units. It’s the square of a ratio and, hence, the units don’t matter. They fall away.
(3) α is the (relative) speed of an electron: α = v/c. [The relative speed is the speed as measured against the speed of light. Note that the ‘natural’ unit of speed in the Planck system of units is equal to c. Indeed, if you divide one Planck length by one Planck time unit, you get (1.616×10−35 m)/(5.391×10−44 s) = c m/s. However, this is another equation, just like (2), that does not depend on the units: we can express v and c in whatever unit we want, as long we’re consistent and express both in the same units.]
(4) Finally – I’ll show you in a moment – α is also equal to the product of (a) the electron mass (which I’ll simply write as me here) and (b) the classical electron radius re (if both are expressed in Planck units): α = me·re. Now I think that’s, perhaps, the most amazing of all of the expressions for α. If you don’t think that’s amazing, I’d really suggest you stop trying to study physics. 🙂
Note that, from (2) and (4), we find that:
(5) The electron mass (in Planck units) is equal me = α/re = α/α2r = 1/αr. So that gives us an expression, using α once again, for the electron mass as a function of the Bohr radius r expressed in Planck units.
Finally, we can also substitute (1) in (5) to get:
(6) The electron mass (in Planck units) is equal to me = α/re = eP2/re. Using the Bohr radius, we get me = 1/αr = 1/eP2r.
So… As you can see, this fine-structure constant really links ALL of the fundamental properties of the electron: its charge, its radius, its distance to the nucleus (i.e. the Bohr radius), its velocity, its mass (and, hence, its energy),… In short,
IT IS ALL IN ALPHA!
Now that should answer the question in regard to the degrees of freedom we have here, doesn’t it? It looks like we’ve got only one degree of freedom here. Indeed, if we’ve got some value for α, then we’ve have the electron charge, and from the electron charge, we can calculate the Bohr radius r (as I will show below), and if we have r, we have me and re. And then we can also calculate v, which gives us its momentum (mv) and its kinetic energy (mv2/2). In short,
ALPHA GIVES US EVERYTHING!
Isn’t that amazing? Hmm… You should reserve your judgment as for now, and carefully go over all of the formulas above and verify my statement. If you do that, you’ll probably struggle to find the Bohr radius from the charge (i.e. from α). So let me show you how you do that, because it will also show you why you should, indeed, reserve your judgment. In other words, I’ll show you why alpha does NOT give us everything! The argument below will, finally, prove some of the formulas that I didn’t prove above. Let’s go for it:
1. If we assume that (a) an electron takes some space – which I’ll denote by r 🙂 – and (b) that it has some momentum p because of its mass m and its velocity v, then the ΔxΔp = ħ relation (i.e. the Uncertainty Principle in its roughest form) suggests that the order of magnitude of r and p should be related in the very same way. Hence, let’s just boldly write r ≈ ħ/p and see what we can do with that. So we equate Δx with r and Δp with p. As Feynman notes, this is really more like a ‘dimensional analysis’ (he obviously means something very ‘rough’ with that) and so we don’t care about factors like 2 or 1/2. [Indeed, note that the more precise formulation of the Uncertainty Principle is σxσp ≥ ħ/2.] In fact, we didn’t even bother to define r very rigorously. We just don’t care about precise statements at this point. We’re only concerned about orders of magnitude. [If you’re appalled by the rather rude approach, I am sorry for that, but just try to go along with it.]
2. From our discussions on energy, we know that the kinetic energy is mv2/2, which we can write as p2/2m so we get rid of the velocity factor. [Why? Because we can’t really imagine what it is anyway. As I said a couple of times already, we shouldn’t think of electrons as planets orbiting around some star. That model doesn’t work.] So… What’s next? Well… Substituting our p ≈ ħ/r conjecture, we get K.E. = ħ2/2mr2. So that’s a formula for the kinetic energy. Next is potential.
3. Unfortunately, the discussion on potential energy is a bit more complicated. You’ll probably remember that we had an easy and very comprehensible formula for the energy that’s needed (i.e. the work that needs to be done) to bring two charges together from a large distance (i.e. infinity). Indeed, we derived that formula directly from Coulomb’s Law (and Newton’s law of force) and it’s U = q1q2/4πε0r12. [If you think I am going too fast, sorry, please check for yourself by reading my other posts.] Now, we’re actually talking about the size of an atom here in my previous post, so one charge is the proton (+e) and the other is the electron (–e), so the potential energy is U = P.E. = –e2/4πε0r, with r the ‘distance’ between the proton and the electron—so that’s the Bohr radius we’re looking for!
[In case you’re struggling a bit with those minus signs when talking potential energy – I am not ashamed to admit I did! – let me quickly help you here. It has to do with our reference point: the reference point for measuring potential energy is at infinity, and it’s zero there (that’s just our convention). Now, to separate the proton and the electron, we’d have to do quite a lot of work. To use an analogy: imagine we’re somewhere deep down in a cave, and we have to climb back to the zero level. You’ll agree that’s likely to involve some sweat, don’t you? Hence, the potential energy associated with us being down in the cave is negative. Likewise, if we write the potential energy between the proton and the electron as U(r), and the potential energy at the reference point as U(∞) = 0, then the work to be done to separate the charges, i.e. the potential difference U(∞) – U(r), will be positive. So U(∞) – U(r) = 0 – U(r) > 0 and, hence, U(r) < 0. If you still don’t ‘get’ this, think of the electron being in some (potential) well, i.e. below the zero level, and so it’s potential energy is less than zero. Huh? Sorry. I have to move on. :-)]
4. We can now write the total energy (which I’ll denote by E, but don’t confuse it with the electric field vector!) as
E = K.E. + P.E. = ħ2/2mr2 – e2/4πε0r
Now, the electron (whatever it is) is, obviously, in some kind of equilibrium state. Why is that obvious? Well… Otherwise our hydrogen atom wouldn’t or couldn’t exist. 🙂 Hence, it’s in some kind of energy ‘well’ indeed, at the bottom. Such equilibrium point ‘at the bottom’ is characterized by its derivative (in respect to whatever variable) being equal to zero. Now, the only ‘variable’ here is r (all the other symbols are physical constants), so we have to solve for dE/dr = 0. Writing it all out yields:
dE/dr = –ħ2/mr3 + e2/4πε0r2 = 0 ⇔ r = 4πε0ħ2/me2
You’ll say: so what? Well… We’ve got a nice formula for the Bohr radius here, and we got it in no time! 🙂 But the analysis was rough, so let’s check if it’s any good by putting the values in:
r = 4πε0h2/me2
= [(1/(9×109) C2/N·m2)·(1.055×10–34 J·s)2]/[(9.1×10–31 kg)·(1.6×10–19 C)2]
= 53×10–12 m = 53 pico-meter (pm)
So what? Well… Double-check it on the Internet: the Bohr radius is, effectively, about 53 trillionths of a meter indeed! So we’re right on the spot!
[In case you wonder about the units, note that mass is a measure of inertia: one kg is the mass of an object which, subject to a force of 1 newton, will accelerate at the rate of 1 m/s per second. Hence, we write F = m·a, which is equivalent to m = F/a. Hence, the kg, as a unit, is equivalent to 1 N/(m/s2). If you make this substitution, we get r in the unit we want to see: [(C2/N·m2)·(N2·m2·s2)/[(N·s2/m)·C2] = m.]
Moreover, if we take that value for r and put it in the (total) energy formula above, we’d find that the energy of the electron is –13.6 eV. [Don’t forget to convert from joule to electronvolt when doing the calculation!] Now you can check that on the Internet too: 13.6 eV is exactly the amount of energy that’s needed to ionize a hydrogen atom (i.e. the energy that’s needed to kick the electron out of that energy well)!
Waw ! Isn’t it great that such simple calculations yield such great results? 🙂 [Of course, you’ll note that the omission of the 1/2 factor in the Uncertainty Principle was quite strategic. :-)] Using the r = 4πε0ħ2/me2 formula for the Bohr radius, you can now easily check the re = α2r formula. You should find what we jotted down already: the classical electron radius is equal to re = e2/4πε0mec2. To be precise, re = (53×10–6)·(53×10–12m) = 2.8×10–15 m. Now that’s again something you should check on the Internet. Guess what? […] It’s right on the spot again. 🙂
We can now also check that α = m·re formula: α = m·re = 4.181×10−23 times… Hey! Wait! We have to express re in Planck units as well, of course! Now, (2.81794×10–15 m)/(1.616×10–35 m) ≈ 1.7438 ×1020. So now we get 4.181×10−23 times 1.7438×1020 = 7.29×10–3 = 0.00729 ≈ 1/137. Bingo! We got the magic number once again. 🙂
So… Well… Doesn’t that confirm we actually do have it all with α?
Well… Yes and no… First, you should note that I had to use h in that calculation of the Bohr radius. Moreover, the other physical constants (most notably c and the Coulomb constant) were actually there as well, ‘in the background’ so to speak, because one needs them to derive the formulas we used above. And then we have the equations themselves, of course, most notably that Uncertainty Principle… So… Well…
It’s not like God gave us one number only (α) and that all the rest flows out of it. We have a whole bunch of ‘fundamental’ relations and ‘fundamental’ constants here.
Having said that, it’s true that statement still does not diminish the magic of alpha.
Hmm… Now you’ll wonder: how many? How many constants do we need in all of physics?
Well… I’d say, you should not only ask about the constants: you should also ask about the equations: how many equations do we need in all of physics? [Just for the record, I had to smile when the Hawking of the movie says that he’s actually looking for one formula that sums up all of physics. Frankly, that’s a nonsensical statement. Hence, I think the real Hawking never said anything like that. Or, if he did, that it was one of those statements one needs to interpret very carefully.]
But let’s look at a few constants indeed. For example, if we have c, h and α, then we can calculate the electric charge e and, hence, the electric constant ε0 = e2/2αhc. From that, we get Coulomb’s constant ke, because ke is defined as 1/4πε0… But…
Hey! Wait a minute! How do we know that ke = 1/4πε0? Well… From experiment. But… Yes? That means 1/4π is some fundamental proportionality coefficient too, isn’t it?
Wow! You’re smart. That’s a good and valid remark. In fact, we use the so-called reduced Planck constant ħ in a number of calculations, and so that involves a 2π factor too (ħ = h/2π). Hence… Well… Yes, perhaps we should consider 2π as some fundamental constant too! And, then, well… Now that I think of it, there’s a few other mathematical constants out there, like Euler’s number e, for example, which we use in complex exponentials.
I am joking, right? I am not saying that 2π and Euler’s number are fundamental ‘physical’ constants, am I? [Note that it’s a bit of a nuisance we’re also using the e symbol for Euler’s number, but so we’re not talking the electron charge here: we’re talking that 2.71828…etc number that’s used in so-called ‘natural’ exponentials and logarithms.]
Well… Yes and no. They’re mathematical constants indeed, rather than physical, but… Well… I hope you get my point. What I want to show here, is that it’s quite hard to say what’s fundamental and what isn’t. We can actually pick and choose a bit among all those constants and all those equations. As one physicist puts its: it depends on how we slice it. The one thing we know for sure is that a great many things are related, in a physical way (α connects all of the fundamental properties of the electron, for example) and/or in a mathematical way (2π connects not only the circumference of the unit circle with the radius but quite a few other constants as well!), but… Well… What to say? It’s a tough discussion and I am not smart enough to give you an unambiguous answer. From what I gather on the Internet, when looking at the whole Standard Model (including the strong force, the weak force and the Higgs field), we’ve got a few dozen physical ‘fundamental’ constants, and then a few mathematical ones as well.
That’s a lot, you’ll say. Yes. At the same time, it’s not an awful lot. Whatever number it is, it does raise a very fundamental question: why are they what they are? That brings us back to that ‘fine-tuning’ problem. Now, I can’t make this post too long (it’s way too long already), so let me just conclude this discussion by copying Wikipedia on that question, because what it has on this topic is not so bad:
“Some physicists have explored the notion that if the physical constants had sufficiently different values, our Universe would be so radically different that intelligent life would probably not have emerged, and that our Universe therefore seems to be fine-tuned for intelligent life. The anthropic principle states a logical truism: the fact of our existence as intelligent beings who can measure physical constants requires those constants to be such that beings like us can exist.“
I like this. But the article then adds the following, which I do not like so much, because I think it’s a bit too ‘frivolous’:
“There are a variety of interpretations of the constants’ values, including that of a divine creator (the apparent fine-tuning is actual and intentional), or that ours is one universe of many in a multiverse (e.g. the many-worlds interpretation of quantum mechanics), or even that, if information is an innate property of the universe and logically inseparable from consciousness, a universe without the capacity for conscious beings cannot exist.”
Hmm… As said, I am quite happy with the logical truism: we are there because alpha (and a whole range of other stuff) is what it is, and we can measure alpha (and a whole range of other stuff) as what it is, because… Well… Because we’re here. Full stop. As for the ‘interpretations’, I’ll let you think about that for yourself. 🙂
I need to get back to the lesson. Indeed, this was just a ‘digression’. My post was about the three fundamental events or actions in quantum electrodynamics, and so I was talking about that E(A to B) formula. However, I had to do that digression on alpha to ensure you understand what I want to write about that. So let me now get back to it. End of digression. 🙂
The E(A to B) formula
Indeed, I must assume that, with all these digressions, you are truly despairing now. Don’t. We’re there! We’re finally ready for the E(A to B) formula! Let’s go for it.
We’ve now got those two numbers measuring the electron charge and the electron mass in Planck units respectively. They’re fundamental indeed and so let’s loosen up on notation and just write them as e and m respectively. Let me recap:
1. The value of e is approximately –0.08542455, and it corresponds to the so-called junction number j, which is the amplitude for an electron-photon coupling. When multiplying it with another amplitude (to find the amplitude for an event consisting of two sub-events, for example), it corresponds to a ‘shrink’ to less than one-tenth (something like 8.5% indeed, corresponding to the magnitude of e) and a ‘rotation’ (or a ‘turn’) over 180 degrees, as mentioned above.
Please note what’s going on here: we have a physical quantity, the electron charge (expressed in Planck units), and we use it in a quantum-mechanical calculation as a dimensionless (complex) number, i.e. as an amplitude. So… Well… That’s what physicists mean when they say that the charge of some particle (usually the electric charge but, in quantum chromodynamics, it will be the ‘color’ charge of a quark) is a ‘coupling constant’.
2. We also have m, the electron mass, and we’ll use in the same way, i.e. as some dimensionless amplitude. As compared to j, it’s is a very tiny number: approximately 4.181×10−23. So if you look at it as an amplitude, indeed, then it corresponds to an enormous ‘shrink’ (but no turn) of the amplitude(s) that we’ll be combining it with.
So… Well… How do we do it?
Well… At this point, Leighton goes a bit off-track. Just a little bit. 🙂 From what he writes, it’s obvious that he assumes the frequency (or, what amounts to the same, the de Broglie wavelength) of an electron is just like the frequency of a photon. Frankly, I just can’t imagine why and how Feynman let this happen. It’s wrong. Plain wrong. As I mentioned in my introduction already, an electron traveling through space is not like a photon traveling through space.
For starters, an electron is much slower (because it’s a matter-particle: hence, it’s got mass). Secondly, the de Broglie wavelength and/or frequency of an electron is not like that of a photon. For example, if we take an electron and a photon having the same energy, let’s say 1 eV (that corresponds to infrared light), then the de Broglie wavelength of the electron will be 1.23 nano-meter (i.e. 1.23 billionths of a meter). Now that’s about one thousand times smaller than the wavelength of our 1 eV photon, which is about 1240 nm. You’ll say: how is that possible? If they have the same energy, then the f = E/h and ν = E/h should give the same frequency and, hence, the same wavelength, no?
Well… No! Not at all! Because an electron, unlike the photon, has a rest mass indeed – measured as not less than 0.511 MeV/c2, to be precise (note the rather particular MeV/c2 unit: it’s from the E = mc2 formula) – one should use a different energy value! Indeed, we should include the rest mass energy, which is 0.511 MeV. So, almost all of the energy here is rest mass energy! There’s also another complication. For the photon, there is an easy relationship between the wavelength and the frequency: it has no mass and, hence, all its energy is kinetic, or movement so to say, and so we can use that ν = E/h relationship to calculate its frequency ν: it’s equal to ν = E/h = (1 eV)/(4.13567×10–15 eV·s) ≈ 0.242×1015 Hz = 242 tera-hertz (1 THz = 1012 oscillations per second). Now, knowing that light travels at the speed of light, we can check the result by calculating the wavelength using the λ = c/ν relation. Let’s do it: (2.998×108 m/s)/(242×1012 Hz) ≈ 1240 nm. So… Yes, done!
But so we’re talking photons here. For the electron, the story is much more complicated. That wavelength I mentioned was calculated using the other of the two de Broglie relations: λ = h/p. So that uses the momentum of the electron which, as you know, is the product of its mass (m) and its velocity (v): p = mv. You can amuse yourself and check if you find the same wavelength (1.23 nm): you should! From the other de Broglie relation, f = E/h, you can also calculate its frequency: for an electron moving at non-relativistic speeds, it’s about 0.123×1021 Hz, so that’s like 500,000 times the frequency of the photon we we’re looking at! When multiplying the frequency and the wavelength, we should get its speed. However, that’s where we get in trouble. Here’s the problem with matter waves: they have a so-called group velocity and a so-called phase velocity. The idea is illustrated below: the green dot travels with the wave packet – and, hence, its velocity corresponds to the group velocity – while the red dot travels with the oscillation itself, and so that’s the phase velocity. [You should also remember, of course, that the matter wave is some complex-valued wavefunction, so we have both a real as well as an imaginary part oscillating and traveling through space.]
To be precise, the phase velocity will be superluminal. Indeed, using the usual relativistic formula, we can write that p = γm0v and E = γm0c2, with v the (classical) velocity of the electron and c what it always is, i.e. the speed of light. Hence, λ = h/γm0v and f = γm0c2/h, and so λf = c2/v. Because v is (much) smaller than c, we get a superluminal velocity. However, that’s the phase velocity indeed, not the group velocity, which corresponds to v. OK… I need to end this digression.
So what? Well, to make a long story short, the ‘amplitude framework’ for electrons is differerent. Hence, the story that I’ll be telling here is different from what you’ll read in Feynman’s QED. I will use his drawings, though, and his concepts. Indeed, despite my misgivings above, the conceptual framework is sound, and so the corrections to be made are relatively minor.
So… We’re looking at E(A to B), i.e. the amplitude for an electron to go from point A to B in spacetime, and I said the conceptual framework is exactly the same as that for a photon. Hence, the electron can follow any path really. It may go in a straight line and travel at a speed that’s consistent with what we know of its momentum (p), but it may also follow other paths. So, just like the photon, we’ll have some so-called propagator function, which gives you amplitudes based on the distance in space as well as in the distance in ‘time’ between two points. Now, Ralph Leighton identifies that propagator function with the propagator function for the photon, i.e. P(A to B), but that’s wrong: it’s not the same.
The propagator function for an electron depends on its mass and its velocity, and/or on the combination of both (like it momentum p = mv and/or its kinetic energy: K.E. = mv2 = p2/2m). So we have a different propagator function here. However, I’ll use the same symbol for it: P(A to B).
So, the bottom line is that, because of the electron’s mass (which, remember, is a measure for inertia), momentum and/or kinetic energy (which, remember, are conserved in physics), the straight line is definitely the most likely path, but (big but!), just like the photon, the electron may follow some other path as well.
So how do we formalize that? Let’s first associate an amplitude P(A to B) with an electron traveling from point A to B in a straight line and in a time that’s consistent with its velocity. Now, as mentioned above, the P here stands for propagator function, not for photon, so we’re talking a different P(A to B) here than that P(A to B) function we used for the photon. Sorry for the confusion. 🙂 The left-hand diagram below then shows what we’re talking about: it’s the so-called ‘one-hop flight’, and so that’s what the P(A to B) amplitude is associated with.
Now, the electron can follow other paths. For photons, we said the amplitude depended on the spacetime interval I: when negative or positive (i.e. paths that are not associated with the photon traveling in a straight line and/or at the speed of light), the contribution of those paths to the final amplitudes (or ‘final arrow’, as it was called) was smaller.
For an electron, we have something similar, but it’s modeled differently. We say the electron could take a ‘two-hop flight’ (via point C or C’), or a ‘three-hop flight’ (via D and E) from point A to B. Now, it makes sense that these paths should be associated with amplitudes that are much smaller. Now that’s where that n-factor comes in. We just put some real number n in the formula for the amplitude for an electron to go from A to B via C, which we write as:
P(A to C)∗n2∗P(C to B)
Note what’s going on here. We multiply two amplitudes, P(A to C) and P(C to B), which is OK, because that’s what the rules of quantum mechanics tell us: if an ‘event’ consists of two sub-events, we need to multiply the amplitudes (not the probabilities) in order to get the amplitude that’s associated with both sub-events happening. However, we add an extra factor: n2. Note that it must be some very small number because we have lots of alternative paths and, hence, they should not be very likely! So what’s the n? And why n2 instead of just n?
Well… Frankly, I don’t know. Ralph Leighton boldly equates n to the mass of the electron. Now, because he obviously means the mass expressed in Planck units, that’s the same as saying n is the electron’s energy (again, expressed in Planck’s ‘natural’ units), so n should be that number m = meP = EeP = 4.181×10−23. However, I couldn’t find any confirmation on the Internet, or elsewhere, of the suggested n = m identity, so I’ll assume n = m indeed, but… Well… Please check for yourself. It seems the answer is to be found in a mathematical theory that helps physicists to actually calculate j and n from experiment. It’s referred to as perturbation theory, and it’s the next thing on my study list. As for now, however, I can’t help you much. I can only note that the equation makes sense.
Of course, it does: inserting a tiny little number n, close to zero, ensures that those other amplitudes don’t contribute too much to the final ‘arrow’. And it also makes a lot of sense to associate it with the electron’s mass: if mass is a measure of inertia, then it should be some factor reducing the amplitude that’s associated with the electron following such crooked path. So let’s go along with it, and see what comes out of it.
A three-hop flight is even weirder and uses that n2 factor two times:
P(A to E)∗n2∗P(E to D)∗n2∗P(D to B)
So we have an (n2)2 = n4 factor here, which is good, because two hops should be much less likely than one hop. So what do we get? Well… (4.181×10−23)4 ≈ 305×10−92. Pretty tiny, huh? 🙂 Of course, any point in space is a potential hop for the electron’s flight from point A to B and, hence, there’s a lot of paths and a lot of amplitudes (or ‘arrows’ if you want), which, again, is consistent with a very tiny value for n indeed.
So, to make a long story short, E(A to B) will be a giant sum (i.e. some kind of integral indeed) of a lot of different ways an electron can go from point A to B. It will be a series of terms P(A to E) + P(A to C)∗n2∗P(C to B) + P(A to E)∗n2∗P(E to D)∗n2∗P(D to B) + … for all possible intermediate points C, D, E, and so on.
What about the j? The junction number of coupling constant. How does that show up in the E(A to B) formula? Well… Those alternative paths with hops here and there are actually the easiest bit of the whole calculation. Apart from taking some strange path, electrons can also emit and/or absorb photons during the trip. In fact, they’re doing that constantly actually. Indeed, the image of an electron ‘in orbit’ around the nucleus is that of an electron exchanging so-called ‘virtual’ photons constantly, as illustrated below. So our image of an electron absorbing and then emitting a photon (see the diagram on the right-hand side) is really like the tiny tip of a giant iceberg: most of what’s going on is underneath! So that’s where our junction number j comes in, i.e. the charge (e) of the electron.
So, when you hear that a coupling constant is actually equal to the charge, then this is what it means: you should just note it’s the charge expressed in Planck units. But it’s a deep connection, isn’t? When everything is said and done, a charge is something physical, but so here, in these amplitude calculations, it just shows up as some dimensionless negative number, used in multiplications and additions of amplitudes. Isn’t that remarkable?
The situation becomes even more complicated when more than one electron is involved. For example, two electrons can go in a straight line from point 1 and 2 to point 3 and 4 respectively, but there’s two ways in which this can happen, and they might exchange photons along the way, as shown below. If there’s two alternative ways in which one event can happen, you know we have to add amplitudes, rather than multiply them. Hence, the formula for E(A to B) becomes even more complicated.
Moreover, a single electron may first emit and then absorb a photon itself, so there’s no need for other particles to be there to have lots of j factors in our calculation. In addition, that photon may briefly disintegrate into an electron and a positron, which then annihilate each other to again produce a photon: in case you wondered, that’s what those little loops in those diagrams depicting the exchange of virtual photons is supposed to represent. So, every single junction (i.e. every emission and/or absorption of a photon) involves a multiplication with that junction number j, so if there are two couplings involved, we have a j2 factor, and so that’s 0.085424552 = α ≈ 0.0073. Four couplings implies a factor of 0.085424554 ≈ 0.000053.
Just as an example, I copy two diagrams involving four, five or six couplings indeed. They all have some ‘incoming’ photon, because Feynman uses them to explain something else (the so-called magnetic moment of a photon), but it doesn’t matter: the same illustrations can serve multiple purposes.
Now, it’s obvious that the contributions of the alternatives with many couplings add almost nothing to the final amplitude – just like the ‘many-hop’ flights add almost nothing – but… Well… As tiny as these contributions are, they are all there, and so they all have to be accounted for. So… Yes. You can easily appreciate how messy it all gets, especially in light of the fact that there are so many points that can serve as a ‘hop’ or a ‘coupling’ point!
So… Well… Nothing. That’s it! I am done! I realize this has been another long and difficult story, but I hope you appreciated and that it shed some light on what’s really behind those simplified stories of what quantum mechanics is all about. It’s all weird and, admittedly, not so easy to understand, but I wouldn’t say an understanding is really beyond the reach of us, common mortals. 🙂
Post scriptum: When you’ve reached here, you may wonder: so where’s the final formula then for E(A to B)? Well… I have no easy formula for you. From what I wrote above, it should be obvious that we’re talking some really awful-looking integral and, because it’s so awful, I’ll let you find it yourself. 🙂
I should also note another reason why I am reluctant to identify n with m. The formulas in Feynman’s QED are definitely not the standard ones. The more standard formulations will use the gauge coupling parameter about which I talked already. I sort of discussed it, indirectly, in my first comments on Feynman’s QED, when I criticized some other part of the book, notably its explanation of the phenomenon of diffraction of light, which basically boiled down to: “When you try to squeeze light too much [by forcing it to go through a small hole], it refuses to cooperate and begins to spread out”, because “there are not enough arrows representing alternative paths.”
Now that raises a lot of questions, and very sensible ones, because that simplification is nonsensical. Not enough arrows? That statement doesn’t make sense. We can subdivide space in as many paths as we want, and probability amplitudes don’t take up any physical space. We can cut up space in smaller and smaller pieces (so we analyze more paths within the same space). The consequence – in terms of arrows – is that directions of our arrows won’t change but their length will be much and much smaller as we’re analyzing many more paths. That’s because of the normalization constraint. However, when adding them all up – a lot of very tiny ones, or a smaller bunch of bigger ones – we’ll still get the same ‘final’ arrow. That’s because the direction of those arrows depends on the length of the path, and the length of the path doesn’t change simply because we suddenly decide to use some other ‘gauge’.
Indeed, the real question is: what’s a ‘small’ hole? What’s ‘small’ and what’s ‘large’ in quantum electrodynamics? Now, I gave an intuitive answer to that question in that post of mine, but it’s much more accurate than Feynman’s, or Leighton’s. The answer to that question is: there’s some kind of natural ‘gauge’, and it’s related to the wavelength. So the wavelength of a photon, or an electron, in this case, comes with some kind of scale indeed. That’s why the fine-structure constant is often written in yet another form:
α = 2πre/λe = reke
λe and ke are the Compton wavelength and wavenumber of the electron (so ke is not the Coulomb constant here). The Compton wavelength is the de Broglie wavelength of the electron. [You’ll find that Wikipedia defines it as “the wavelength that’s equivalent to the wavelength of a photon whose energy is the same as the rest-mass energy of the electron”, but that’s a very confusing definition, I think.]
The point to note is that the spatial dimension in both the analysis of photons as well as of matter waves, especially in regard to studying diffraction and/or interference phenomena, is related to the frequencies, wavelengths and/or wavenumbers of the wavefunctions involved. There’s a certain ‘gauge’ involved indeed, i.e. some measure that is relative, like the gauge pressure illustrated below. So that’s where that gauge parameter g comes in. And the fact that it’s yet another number that’s closely related to that fine-structure constant is… Well… Again… That alpha number is a very magic number indeed… 🙂
Post scriptum (5 October 2015):
Much stuff is physics is quite ‘magical’, but it’s never ‘too magical’. I mean: there’s always an explanation. So there is a very logical explanation for the above-mentioned deep connection between the charge of an electron, its energy and/or mass, its various radii (or physical dimensions) and the coupling constant too. I wrote a piece about that, much later than when I wrote the piece above. I would recommend you read that piece too. It’s a piece in which I do take the magic out of ‘God’s number’. Understanding it involves a deep understanding of electromagnetism, however, and that requires some effort. It’s surely worth the effort, though.
If we limit our attention to the interaction between light and matter (i.e. the behavior of photons and electrons only—so we we’re not talking quarks and gluons here), then the ‘crazy ideas’ of quantum mechanics can be summarized as follows:
- At the atomic or sub-atomic scale, we can no longer look at light as an electromagnetic wave. It consists of photons, and photons come in blobs. Hence, to some extent, photons are ‘particle-like’.
- At the atomic or sub-atomic scale, electrons don’t behave like particles. For example, if we send them through a slit that’s small enough, we’ll observe a diffraction pattern. Hence, to some extent, electrons are ‘wave-like’.
In short, photons aren’t waves, but they aren’t particles either. Likewise, electrons aren’t particles, but they aren’t waves either. They are neither. The weirdest thing of all, perhaps, is that, while light and matter are two very different things in our daily experience – light and matter are opposite concepts, I’d say, just like particles and waves are opposite concepts) – they look pretty much the same in quantum physics: they are both represented by a wavefunction.
Let me immediately make a little note on terminology here. The term ‘wavefunction’ is a bit ambiguous, in my view, because it makes one think of a real wave, like a water wave, or an electromagnetic wave. Real waves are described by real-valued wave functions describing, for example, the motion of a ball on a spring, or the displacement of a gas (e.g. air) as a sound wave propagates through it, or – in the case of an electromagnetic wave – the strength of the electric and magnetic field.
You may have questions about the ‘reality’ of fields, but electromagnetic waves – i.e. the classical description of light – are quite ‘real’ too, even if:
- Light doesn’t travel in a medium (like water or air: there is no aether), and
- The magnitude of the electric and magnetic field (they are usually denoted by E and B) depend on your reference frame: if you calculate the fields using a moving coordinate system, you will get a different mixture of E and B. Therefore, E and B may not feel very ‘real’ when you look at them separately, but they are very real when we think of them as representing one physical phenomenon: the electromagnetic interaction between particles. So the E and B mix is, indeed, a dual representation of one reality. I won’t dwell on that, as I’ve done that in another post of mine.
How ‘real’ is the quantum-mechanical wavefunction?
The quantum-mechanical wavefunction is not like any of these real waves. In fact, I’d rather use the term ‘probability wave’ but, apparently, that’s used only by bloggers like me 🙂 and so it’s not very scientific. That’s for a good reason, because it’s not quite accurate either: the wavefunction in quantum mechanics represents probability amplitudes, not probabilities. So we should, perhaps, be consistent and term it a ‘probability amplitude wave’ – but then that’s too cumbersome obviously, so the term ‘probability wave’ may be confusing, but it’s not so bad, I think.
Amplitudes and probabilities are related as follows:
- Probabilities are real numbers between 0 and 1: they represent the probability of something happening, e.g. a photon moves from point A to B, or a photon is absorbed (and emitted) by an electron (i.e. a ‘junction’ or ‘coupling’, as you know).
- Amplitudes are complex numbers, or ‘arrows’ as Feynman calls them: they have a length (or magnitude) and a direction.
- We get the probabilities by taking the (absolute) square of the amplitudes.
So photons aren’t waves, but they aren’t particles either. Likewise, electrons aren’t particles, but they aren’t waves either. They are neither. So what are they? We don’t have words to describe what they are. Some use the term ‘wavicle’ but that doesn’t answer the question, because who knows what a ‘wavicle’ is? So we don’t know what they are. But we do know how they behave. As Feynman puts it, when comparing the behavior of light and then of electrons in the double-slit experiment—struggling to find language to describe what’s going on: “There is one lucky break: electrons behave just like light.”
He says so because of that wave function: the mathematical formalism is the same, for photons and for electrons. Exactly the same? […] But that’s such a weird thing to say, isn’t it? We can’t help thinking of light as waves, and of electrons as particles. They can’t be the same. They’re different, aren’t they? They are.
Scales and senses
To some extent, the weirdness can be explained because the scale of our world is not atomic or sub-atomic. Therefore, we ‘see’ things differently. Let me say a few words about the instrument we use to look at the world: our eye.
Our eye is particular. The retina has two types of receptors: the so-called cones are used in bright light, and distinguish color, but when we are in a dark room, the so-called rods become sensitive, and it is believed that they actually can detect a single photon of light. However, neural filters only allow a signal to pass to the brain when at least five photons arrive within less than a tenth of a second. A tenth of a second is, roughly, the averaging time of our eye. So, as Feynman puts it: “If we were evolved a little further so we could see ten times more sensitively, we wouldn’t have this discussion—we would all have seen very dim light of one color as a series of intermittent little flashes of equal intensity.” In other words, the ‘particle-like’ character of light would have been obvious to us.
Let me make a few more remarks here, which you may or may not find useful. The sense of ‘color’ is not something ‘out there’: colors, like red or brown, are experiences in our eye and our brain. There are ‘pigments’ in the cones (cones are the receptors that work only if the intensity of the light is high enough) and these pigments absorb the light spectrum somewhat differently, as a result of which we ‘see’ color. Different animals see different things. For example, a bee can distinguish between white paper using zinc white versus lead white, because they reflect light differently in the ultraviolet spectrum, which the bee can see but we don’t. Bees can also tell the direction of the sun without seeing the sun itself, because they are sensitive to polarized light, and the scattered light of the sky (i.e. the blue sky as we see it) is polarized. The bee can also notice flicker up to 200 oscillations per second, while we see it only up to 20, because our averaging time is like a tenth of a second, which is short for us, but so the averaging time of the bee is much shorter. So we cannot see the quick leg movements and/or wing vibrations of bees, but the bee can!
Sometimes we can’t see any color. For example, we see the night sky in ‘black and white’ because the light intensity is very low, and so it’s our rods, not the cones, that process the signal, and so these rods can’t ‘see’ color. So those beautiful color pictures of nebulae are not artificial (although the pictures are often enhanced). It’s just that the camera that is used to take those pictures (film or, nowadays, digital) is much more sensitive than our eye.
Regardless, color is a quality which we add to our experience of the outside world ourselves. What’s out there are electromagnetic waves with this or that wavelength (or, what amounts to the same, this or that frequency). So when critics of the exact sciences say so much is lost when looking at (visible) light as an electromagnetic wave in the range of 430 to 790 teraherz, they’re wrong. Those critics will say that physics reduces reality. That is not the case.
What’s going on is that our senses process the signal that they are receiving, especially when it comes to vision. As Feynman puts it: “None of the other senses involves such a large amount of calculation, so to speak, before the signal gets into a nerve that one can make measurements on. The calculations for all the rest of the senses usually happen in the brain itself, where it is very difficult to get at specific places to make measurements, because there are so many interconnections. Here, with the visual sense, we have the light, three layers of cells making calculations, and the results of the calculations being transmitted through the optic nerve.”
Hence, things like color and all of the other sensations that we have are the object of study of other sciences, including biochemistry and neurobiology, or physiology. For all we know, what’s ‘out there’ is, effectively, just ‘boring’ stuff, like electromagnetic radiation, energy and ‘elementary particles’—whatever they are. No colors. Just frequencies. 🙂
Light versus matter
If we accept the crazy ideas of quantum mechanics, then the what and the how become one and the same. Hence we can say that photons and electrons are a wavefunction somewhere in space. Photons, of course, are always traveling, because they have energy but no rest mass. Hence, all their energy is in the movement: it’s kinetic, not potential. Electrons, on the other hand, usually stick around some nucleus. And, let’s not forget, they have an electric charge, so their energy is not only kinetic but also potential.
But, otherwise, it’s the same type of ‘thing’ in quantum mechanics: a wavefunction, like those below.
Why diagram A and B? It’s just to emphasize the difference between a real-valued wave function and those ‘probability waves’ we’re looking at here (diagram C to H). A and B represent a mass on a spring, oscillating at more or less the same frequency but a different amplitude. The amplitude here means the displacement of the mass. The function describing the displacement of a mass on a spring (so that’s diagram A and B) is an example of a real-valued wave function: it’s a simple sine or cosine function, as depicted below. [Note that a sine and a cosine are the same function really, except for a phase difference of 90°.]
Let’s now go back to our ‘probability waves’. Photons and electrons, light and matter… The same wavefunction? Really? How can the sunlight that warms us up in the morning and makes trees grow be the same as our body, or the tree? The light-matter duality that we experience must be rooted in very different realities, isn’t it?
Well… Yes and no. If we’re looking at one photon or one electron only, it’s the same type of wavefunction indeed. The same type… OK, you’ll say. So they are the same family or genus perhaps, as they say in biology. Indeed, both of them are, obviously, being referred to as ‘elementary particles’ in the so-called Standard Model of physics. But so what makes an electron and a photon specific as a species? What are the differences?
There’re quite a few, obviously:
1. First, as mentioned above, a photon is a traveling wave function and, because it has no rest mass, it travels at the ultimate speed, i.e. the speed of light (c). An electron usually sticks around or, if it travels through a wire, it travels at very low speeds. Indeed, you may find it hard to believe, but the drift velocity of the free electrons in a standard copper wire is measured in cm per hour, so that’s very slow indeed—and while the electrons in an electron microscope beam may be accelerated up to 70% of the speed of light, and close to c in those huge accelerators, you’re not likely to find an electron microscope or accelerator in Nature. In fact, you may want to remember that a simple thing like electricity going through copper wires in our houses is a relatively modern invention. 🙂
So, yes, those oscillating wave functions in those diagrams above are likely to represent some electron, rather than a photon. To be precise, the wave functions above are examples of standing (or stationary) waves, while a photon is a traveling wave: just extend that sine and cosine function in both directions if you’d want to visualize it or, even better, think of a sine and cosine function in an envelope traveling through space, such as the one depicted below.
Indeed, while the wave function of our photon is traveling through space, it is likely to be limited in space because, when everything is said and done, our photon is not everywhere: it must be somewhere.
At this point, it’s good to pause and think about what is traveling through space. It’s the oscillation. But what’s the oscillation? There is no medium here, and even if there would be some medium (like water or air or something like aether—which, let me remind you, isn’t there!), the medium itself would not be moving, or – I should be precise here – it would only move up and down as the wave propagates through space, as illustrated below. To be fully complete, I should add we also have longitudinal waves, like sound waves (pressure waves): in that case, the particles oscillate back and forth along the direction of wave propagation. But you get the point: the medium does not travel with the wave.
When talking electromagnetic waves, we have no medium. These E and B vectors oscillate but is very wrong to assume they use ‘some core of nearby space’, as Feynman puts it. They don’t. Those field vectors represent a condition at one specific point (admittedly, a point along the direction of travel) in space but, for all we know, an electromagnetic wave travels in a straight line and, hence, we can’t talk about its diameter or so.
Still, as mentioned above, we can imagine, more or less, what E and B stand for (we can use field line to visualize them, for instance), even if we have to take into account their relativity (calculating their values from a moving reference frame results in different mixtures of E and B). But what are those amplitudes? How should we visualize them?
The honest answer is: we can’t. They are what they are: two mathematical quantities which, taken together, form a two-dimensional vector, which we square to find a value for a real-life probability, which is something that – unlike the amplitude concept – does make sense to us. Still, that representation of a photon above (i.e. the traveling envelope with a sine and cosine inside) may help us to ‘understand’ it somehow. Again, you absolute have to get rid of the idea that these ‘oscillations’ would somehow occupy some physical space. They don’t. The wave itself has some definite length, for sure, but that’s a measurement in the direction of travel, which is often denoted as x when discussing uncertainty in its position, for example—as in the famous Uncertainty Principle (ΔxΔp > h).
You’ll say: Oh!—but then, at the very least, we can talk about the ‘length’ of a photon, can’t we? So then a photon is one-dimensional at least, not zero-dimensional! The answer is yes and no. I’ve talked about this before and so I’ll be short(er) on it now. A photon is emitted by an atom when an electron jumps from one energy level to another. It thereby emits a wave train that lasts about 10–8 seconds. That’s not very long but, taking into account the rather spectacular speed of light (3×108 m/s), that still makes for a wave train with a length of not less than 3 meter. […] That’s quite a length, you’ll say. You’re right. But you forget that light travels at the speed of light and, hence, we will see this length as zero because of the relativistic length contraction effect. So… Well… Let me get back to the question: if photons and electrons are both represented by a wavefunction, what makes them different?
2. A more fundamental difference between photons and electrons is how they interact with each other.
From what I’ve written above, you understand that probability amplitudes are complex numbers, or ‘arrows’, or ‘two-dimensional vectors’. [Note that all of these terms have precise mathematical definitions and so they’re actually not the same, but the difference is too subtle to matter here.] Now, there are two ways of combining amplitudes, which are referred to as ‘positive’ and ‘negative’ interference respectively. I should immediately note that there’s actually nothing ‘positive’ or ‘negative’ about the interaction: we’re just putting two arrows together, and there are two ways to do that. That’s all.
The diagrams below show you these two ways. You’ll say: there are four! However, remember that we square an arrow to get a probability. Hence, the direction of the final arrow doesn’t matter when we’re taking the square: we get the same probability. It’s the direction of the individual amplitudes that matters when combining them. So the square of A+B is the same as the square of –(A+B) = –A+(–B) = –A–B. Likewise, the square of A–B is the same as the square of –(A–B) = –A+B.
These are the only two logical possibilities for combining arrows. I’ve written ad nauseam about this elsewhere: see my post on amplitudes and statistics, and so I won’t go into too much detail here. Or, in case you’d want something less than a full mathematical treatment, I can refer you to my previous post also, where I talked about the ‘stopwatch’ and the ‘phase’: the convention for the stopwatch is to have its hand turn clockwise (obviously!) while, in quantum physics, the phase of a wave function will turn counterclockwise. But so that’s just convention and it doesn’t matter, because it’s the phase difference between two amplitudes that counts. To use plain language: it’s the difference in the angles of the arrows, and so that difference is just the same if we reverse the direction of both arrows (which is equivalent to putting a minus sign in front of the final arrow).
OK. Let me get back to the lesson. The point is: this logical or mathematical dichotomy distinguishes bosons (i.e. force-carrying ‘particles’, like photons, which carry the electromagnetic force) from fermions (i.e. ‘matter-particles’, such as electrons and quarks, which make up protons and neutrons). Indeed, the so-called ‘positive’ and ‘negative’ interference leads to two very different behaviors:
- The probability of getting a boson where there are already n present, is n+1 times stronger than it would be if there were none before.
- In contrast, the probability of getting two electrons into exactly the same state is zero.
The behavior of photons makes lasers possible: we can pile zillions of photon on top of each other, and then release all of them in one powerful burst. [The ‘flickering’ of a laser beam is due to the quick succession of such light bursts. If you want to know how it works in detail, check my post on lasers.]
The behavior of electrons is referred to as Fermi’s exclusion principle: it is only because real-life electrons can have one of two spin polarizations (i.e. two opposite directions of angular momentum, which are referred to as ‘up’ or ‘down’, but they might as well have been referred to as ‘left’ or ‘right’) that we find two electrons (instead of just one) in any atomic or molecular orbital.
So, yes, while both photons and electrons can be described by a similar-looking wave function, their behavior is fundamentally different indeed. How is that possible? Adding and subtracting ‘arrows’ is a very similar operation, isn’it?
It is and it isn’t. From a mathematical point of view, I’d say: yes. From a physics point of view, it’s obviously not very ‘similar’, as it does lead to these two very different behaviors: the behavior of photons allows for laser shows, while the behavior of electrons explain (almost) all the peculiarities of the material world, including us walking into doors. 🙂 If you want to check it out for yourself, just check Feynman’s Lectures for more details on this or, else, re-read my posts on it indeed.
3. Of course, there are even more differences between photons and electrons than the two key differences I mentioned above. Indeed, I’ve simplified a lot when I wrote what I wrote above. The wavefunctions of electrons in orbit around a nucleus can take very weird shapes, as shown in the illustration below—and please do google a few others if you’re not convinced. As mentioned above, they’re so-called standing waves, because they occupy a well-defined position in space only, but standing waves can look very weird. In contrast, traveling plane waves, or envelope curves like the one above, are much simpler.
In short: yes, the mathematical representation of photons and electrons (i.e. the wavefunction) is very similar, but photons and electrons are very different animals indeed.
Potentiality and interconnectedness
I guess that, by now, you agree that quantum theory is weird but, as you know, quantum theory does explain all of the stuff that couldn’t be explained before: “It works like a charm”, as Feynman puts it. In fact, he’s often quoted as having said the following:
“It is often stated that of all the theories proposed in this century, the silliest is quantum theory. Some say the the only thing that quantum theory has going for it, in fact, is that it is unquestionably correct.”
Silly? Crazy? Uncommon-sensy? Truth be told, you do get used to thinking in terms of amplitudes after a while. And, when you get used to them, those ‘complex’ numbers are no longer complicated. 🙂 Most importantly, when one thinks long and hard enough about it (as I am trying to do), it somehow all starts making sense.
For example, we’ve done away with dualism by adopting a unified mathematical framework, but the distinction between bosons and fermions still stands: an ‘elementary particle’ is either this or that. There are no ‘split personalities’ here. So the dualism just pops up at a different level of description, I’d say. In fact, I’d go one step further and say it pops up at a deeper level of understanding.
But what about the other assumptions in quantum mechanics. Some of them don’t make sense, do they? Well… I struggle for quite a while with the assumption that, in quantum mechanics, anything is possible really. For example, a photon (or an electron) can take any path in space, and it can travel at any speed (including speeds that are lower or higher than light). The probability may be extremely low, but it’s possible.
Now that is a very weird assumption. Why? Well… Think about it. If you enjoy watching soccer, you’ll agree that flying objects (I am talking about the soccer ball here) can have amazing trajectories. Spin, lift, drag, whatever—the result is a weird trajectory, like the one below:
But, frankly, a photon taking the ‘southern’ route in the illustration below? What are the ‘wheels and gears’ there? There’s nothing sensible about that route, is there?
In fact, there’s at least three issues here:
- First, you should note that strange curved paths in the real world (such as the trajectories of billiard or soccer balls) are possible only because there’s friction involved—between the felt of the pool table cloth and the ball, or between the balls, or, in the case of soccer, between the ball and the air. There’s no friction in the vacuum. Hence, in empty space, all things should go in a straight line only.
- While it’s quite amazing what’s possible, in the real world that is, in terms of ‘weird trajectories’, even the weirdest trajectories of a billiard or soccer ball can be described by a ‘nice’ mathematical function. We obviously can’t say the same of that ‘southern route’ which a photon could follow, in theory that is. Indeed, you’ll agree the function describing that trajectory cannot be ‘nice’. So even we’d allow all kinds of ‘weird’ trajectories, shouldn’t we limit ourselves to ‘nice’ trajectories only? I mean: it doesn’t make sense to allow the photons traveling from your computer screen to your retina take some trajectory to the Sun and back, does it?
- Finally, and most fundamentally perhaps, even when we would assume that there’s some mechanism combining (a) internal ‘wheels and gears’ (such as spin or angular momentum) with (b) felt or air or whatever medium to push against, what would be the mechanism determining the choice of the photon in regard to these various paths? In Feynman’s words: How does the photon ‘make up its mind’?
Feynman answers these questions, fully or partially (I’ll let you judge), when discussing the double-slit experiment with photons:
“Saying that a photon goes this or that way is false. I still catch myself saying, “Well, it goes either this way or that way,” but when I say that, I have to keep in mind that I mean in the sense of adding amplitudes: the photon has an amplitude to go one way, and an amplitude to go the other way. If the amplitudes oppose each other, the light won’t get there—even though both holes are open.”
It’s probably worth re-calling the results of that experiment here—if only to help you judge whether or not Feynman fully answer those questions above!
The set-up is shown below. We have a source S, two slits (A and B), and a detector D. The source sends photons out, one by one. In addition, we have two special detectors near the slits, which may or may not detect a photon, depending on whether or not they’re switched on as well as on their accuracy.
First, we close one of the slits, and we find that 1% of the photons goes through the other (so that’s one photon for every 100 photons that leave S). Now, we open both slits to study interference. You know the results already:
- If we switch the detectors off (so we have no way of knowing where the photon went), we get interference. The interference pattern depends on the distance between A and B and varies from 0% to 4%, as shown in diagram (a) below. That’s pretty standard. As you know, classical theory can explain that too assuming light is an electromagnetic wave. But so we have blobs of energy – photons – traveling one by one. So it’s really that double-slit experiment with electrons, or whatever other microscopic particles (as you know, they’ve done these interference electrons with large molecules as well—and they get the same result!). We get the interference pattern by using those quantum-mechanical rules to calculate probabilities: we first add the amplitudes, and it’s only when we’re finished adding those amplitudes, that we square the resulting arrow to the final probability.
- If we switch those special detectors on, and if they are 100% reliable (i.e. all photons going through are being detected), then our photon suddenly behaves like a particle, instead of as a wave: they will go through one of the slits only, i.e. either through A, or, alternatively, through B. So the two special detectors never go off together. Hence, as Feynman puts it: we shouldn’t think there is “sneaky way that the photon divides in two and then comes back together again.” It’s one or the other way and, and there’s no interference: the detector at D goes off 2% of the time, which is the simple sum of the probabilities for A and B (i.e. 1% + 1%).
- When the special detectors near A and B are not 100% reliable (and, hence, do not detect all photons going through), we have three possible final conditions: (i) A and D go off, (ii) B and D go off, and (iii) D goes off alone (none of the special detectors went off). In that case, we have a final curve that’s a mixture, as shown in diagram (c) and (d) below. We get it using the same quantum-mechanical rules: we add amplitudes first, and then we square to get the probabilities.
Now, I think you’ll agree with me that Feynman doesn’t answer my (our) question in regard to the ‘weird paths’. In fact, all of the diagrams he uses assume straight or nearby paths. Let me re-insert two of those diagrams below, to show you what I mean.
So where are all the strange non-linear paths here? Let me, in order to make sure you get what I am saying here, insert that illustration with the three crazy routes once again. What we’ve got above (Figure 33 and 34) is not like that. Not at all: we’ve got only straight lines there! Why? The answer to that question is easy: the crazy paths don’t matter because their amplitudes cancel each other out, and so that allows Feynman to simplify the whole situation and show all the relevant paths as straight lines only.
Now, I struggled with that for quite a while. Not because I can’t see the math or the geometry involved. No. Feynman does a great job showing why those amplitudes cancel each other out indeed (if you want a summary, see my previous post once again). My ‘problem’ is something else. It’s hard to phrase it, but let me try: why would we even allow for the logical or mathematical possibility of ‘weird paths’ (and let me again insert that stupid diagram below) if our ‘set of rules’ ensures that the truly ‘weird’ paths (like that photon traveling from your computer screen to your eye doing a detour taking it to the Sun and back) cancel each other out anyway? Does that respect Occam’s Razor? Can’t we devise some theory including ‘sensible’ paths only?
Of course, I am just an autodidact with limited time, and I know hundreds (if not thousands) of the best scientists have thought long and hard about this question and, hence, I readily accept the answer is quite simply: no. There is no better theory. I accept that answer, ungrudgingly, not only because I think I am not so smart as those scientists but also because, as I pointed out above, one can’t explain any path that deviates from a straight line really, as there is no medium, so there are no ‘wheels and gears’. The only path that makes sense is the straight line, and that’s only because…
Well… Thinking about it… We think the straight path makes sense because we have no good theory for any of the other paths. Hmm… So, from a logical point of view, assuming that the straight line is the only reasonable path is actually pretty random too. When push comes to shove, we have no good theory for the straight line either!
You’ll say I’ve just gone crazy. […] Well… Perhaps you’re right. 🙂 But… Somehow, it starts to make sense to me. We allow for everything to, then, indeed weed out the crazy paths using our interference theory, and so we do end up with what we’re ending up with: some kind of vague idea of “light not really traveling in a straight line but ‘smelling’ all of the neighboring paths around it and, hence, using a small core of nearby space“—as Feynman puts it.
Hmm… It brings me back to Richard Feynman’s introduction to his wonderful little book, in which he says we should just be happy to know how Nature works and not aspire to know why it works that way. In fact, he’s basically saying that, when it comes to quantum mechanics, the ‘how’ and the ‘why’ are one and the same, so asking ‘why’ doesn’t make sense, because we know ‘how’. He compares quantum theory with the system of calculation used by the Maya priests, which was based on a system of bars and dots, which helped them to do complex multiplications and divisions, for example. He writes the following about it: “The rules were tricky, but they were a much more efficient way of getting an answer to complicated questions (such as when Venus would rise again) than by counting beans.”
When I first read this, I thought the comparison was flawed: if a common Maya Indian did not want to use the ‘tricky’ rules of multiplication and what have you (or, more likely, if he didn’t understand them), he or she could still resort to counting beans. But how do we count beans in quantum mechanics? We have no ‘simpler’ rules than those weird rules about adding amplitudes and taking the (absolute) square of complex numbers so… Well… We actually are counting beans here then:
- We allow for any possibility—any path: straight, curved or crooked. Anything is possible.
- But all those possibilities are inter-connected. Also note that every path has a mirror image: for every route ‘south’, there is a similar route ‘north’, so to say, except for the straight line, which is a mirror image of itself.
- And then we have some clock ticking. Time goes by. It ensures that the paths that are too far removed from the straight line cancel each other. [Of course, you’ll ask: what is too far? But I answered that question – convincingly, I hope – in my previous post: it’s not about the ‘number of arrows’ (as suggested in the caption under that Figure 34 above), but about the frequency and, hence, the ‘wavelength’ of our photon.]
- And so… Finally, what’s left is a limited number of possibilities that interfere with each other, which results in what we ‘see’: light seems to use a small core of space indeed–a limited number of nearby paths.
You’ll say… Well… That still doesn’t ‘explain’ why the interference pattern disappears with those special detectors or – what amounts to the same – why the special detectors at the slits never click simultaneously.
You’re right. How do we make sense of that? I don’t know. You should try to imagine what happens for yourself. Everyone has his or her own way of ‘conceptualizing’ stuff, I’d say, and you may well be content and just accept all of the above without trying to ‘imagine’ what’s happening really when a ‘photon’ goes through one or both of those slits. In fact, that’s the most sensible thing to do. You should not try to imagine what happens and just follow the crazy calculus rules.
However, when I think about it, I do have some image in my head. The image is of one of those ‘touch-me-not’ weeds. I quickly googled one of these images, but I couldn’t quite find what I am looking for: it would be more like something that, when you touch it, curls up in a little ball. Any case… You know what I mean, I hope.
You’ll shake your head now and solemnly confirm that I’ve gone mad. Touch-me-not weeds? What’s that got to do with photons?
Well… It’s obvious you and I cannot really imagine how a photon looks like. But I think of it as a blob of energy indeed, which is inseparable, and which effectively occupies some space (in three dimensions that is). I also think that, whatever it is, it actually does travel through both slits, because, as it interferes with itself, the interference pattern does depend on the space between the two slits as well as the width of those slits. In short, the whole ‘geometry’ of the situation matters, and so the ‘interaction’ is some kind of ‘spatial’ thing. [Sorry for my awfully imprecise language here.]
Having said that, I think it’s being detected by one detector only because only one of them can sort of ‘hook’ it, somehow. Indeed, because it’s interconnected and inseparable, it’s the whole blob that gets hooked, not just one part of it. [You may or may not imagine that the detectors that’s got the best hold of it gets it, but I think that’s pushing the description too much.] In any case, the point is that a photon is surely not like a lizard dropping its tail while trying to escape. Perhaps it’s some kind of unbreakable ‘string’ indeed – and sorry for summarizing string theory so unscientifically here – but then a string oscillating in dimensions we can’t imagine (or in some dimension we can’t observe, like the Kaluza-Klein theory suggests). It’s something, for sure, and something that stores energy in some kind of oscillation, I think.
What it is, exactly, we can’t imagine, and we’ll probably never find out—unless we accept that the how of quantum mechanics is not only the why, but also the what. 🙂
Does this make sense? Probably not but, if anything, I hope it fired your imagination at least. 🙂
My previous post was tough. Tough for you–if you’ve read it. But tough for me too. 🙂
The blackbody radiation problem is complicated but, when everything is said and done, what the analysis says is that the the ‘equipartition theorem’ in the kinetic theory of gases ‘theorem (or the ‘theorem concerning the average energy of the center-of-mass motion’, as Feynman terms it), is not correct. That equipartition theorem basically states that, in thermal equilibrium, energy is shared equally among all of its various forms. For example, the average kinetic energy per degree of freedom in the translation motion of a molecule should equal that of its rotational motions. That equipartition theorem is also quite precise: it also states that the mean energy, for each atom or molecule, for each degree of freedom, is kT/2. Hence, that’s the (average) energy the 19th century scientists also assigned to the atomic oscillators in a gas.
However, the discrepancy between the theoretical and empirical result of their work shows that adding atomic oscillators–as radiators and absorbers of light–to the system (a box of gas that’s being heated) is not just a matter of adding additional ‘degree of freedom’ to the system. It can’t be analyzed in ‘classical’ terms: the actual spectrum of blackbody radiation shows that these atomic oscillators do not absorb, on average, an amount of energy equal to kT/2. Hence, they are not just another ‘independent direction of motion’.
So what are they then? Well… Who knows? I don’t. But, as I didn’t quite go through the full story in my previous post, the least I can do is to try to do that here. It should be worth the effort. In Feynman’s words: “This was the first quantum-mechanical formula ever known, or discussed, and it was the beautiful culmination of decades of puzzlement.” And then it does not involve complex numbers or wave functions, so that’s another reason why looking at the detail is kind of nice. 🙂
Discrete energy levels and the nature of h
To solve the blackbody radiation problem, Planck assumed that the permitted energy levels of the atomic harmonic oscillator were equally spaced, at ‘distances’ ħω0 apart from each other. That’s what’s illustrated below.
Now, I don’t want to make too many digressions from the main story, but this En = nħω0 formula obviously deserves some attention. First note it immediately shows why the dimension of ħ is expressed in joule-seconds (J·s), or electronvolt-seconds (J·s): we’re multiplying it with a frequency indeed, so that’s something expressed per second (hence, its dimension is s–1) in order to get a measure of energy: joules or, because of the atomic scale, electronvolts. [The eV is just a (much) smaller measure than the joule, but it amounts to the same: 1 eV ≈ 1.6×10−19 J.]
One thing to note is that the equal spacing consists of distances equal to ħω0, not of ħ. Hence, while h, or ħ (ħ is the constant to be used when the frequency is expressed in radians per second, rather than oscillations per second, so ħ = h/2π) is now being referred to as the quantum of action (das elementare Wirkungsquantum in German), Planck referred to it as as a Hilfsgrösse only (that’s why he chose the h as a symbol, it seems), so that’s an auxiliary constant only: the actual quantum of action is, of course, ΔE, i.e. the difference between the various energy levels, which is the product of ħ and ω0 (or of h and ν0 if we express frequency in oscillations per second, rather than in angular frequency). Hence, Planck (and later Einstein) did not assume that an atomic oscillator emits or absorbs packets of energy as tiny as ħ or h, but packets of energy as big as ħω0 or, what amounts to the same (ħω = (h/2π)(2πν) = hν), hν0. Just to give an example, the frequency of sodium light (ν) is 500×1012 Hz, and so its energy is E = hν. That’s not a lot–about 2 eV only– but it still packs 500×1012 ‘quanta of action’ !
Another thing is that ω (or ν) is a continuous variable: hence, the assumption of equally spaced energy levels does not imply that energy itself is a discrete variable: light can have any frequency and, hence, we can also imagine photons with any energy level: the only thing we’re saying is that the energy of a photon of a specific color (i.e. a specific frequency ν) will be a multiple of hν.
The second key assumption of Planck as he worked towards a solution of the blackbody radiation problem was that the probability (P) of occupying a level of energy E is P(E) = αe−E/kT. OK… Why not? But what is this assumption really? You’ll think of some ‘bell curve’, of course. But… No. That wouldn’t make sense. Remember that the energy has to be positive. The general shape of this P(E) curve is shown below.
The highest probability density is near E = 0, and then it goes down as E gets larger, with kT determining the slope of the curve (just take the derivative). In short, this assumption basically states that higher energy levels are not so likely, and that very high energy levels are very unlikely. Indeed, this formula implies that the relative chance, i.e. the probability of being in state E1 relative to the chance of being in state E0, is P1/P0 = e−(E1–E0)kT = e−ΔE/kT. Now, P1 is n1/N and P0 is n0/N and, hence, we find that n1 must be equal to n0e−ΔE/kT. What this means is that the atomic oscillator is less likely to be in a higher energy state than in a lower one.
That makes sense, doesn’t it? I mean… I don’t want to criticize those 19th century scientists but… What were they thinking? Did they really imagine that infinite energy levels were as likely as… Well… More down-to-earth energy levels? I mean… A mechanical spring will break when you overload it. Hence, I’d think it’s pretty obvious those atomic oscillators cannot be loaded with just about anything, can they? Garbage in, garbage out: of course, that theoretical spectrum of blackbody radiation didn’t make sense!
Let me copy Feynman now, as the rest of the story is pretty straightforward:
Now, we have a lot of oscillators here, and each is a vibrator of frequency w0. Some of these vibrators will be in the bottom quantum state, some will be in the next one, and so forth. What we would like to know is the average energy of all these oscillators. To find out, let us calculate the total energy of all the oscillators and divide by the number of oscillators. That will be the average energy per oscillator in thermal equilibrium, and will also be the energy that is in equilibrium with the blackbody radiation and that should go in the equation for the intensity of the radiation as a function of the frequency, instead of kT. [See my previous post: that equation is I(ω) = (ω2kt)/(π2c2).]
Thus we let N0 be the number of oscillators that are in the ground state (the lowest energy state); N1 the number of oscillators in the state E1; N2 the number that are in state E2; and so on. According to the hypothesis (which we have not proved) that in quantum mechanics the law that replaced the probability e−P.E./kT or e−K.E./kT in classical mechanics is that the probability goes down as e−ΔE/kT, where ΔE is the excess energy, we shall assume that the number N1 that are in the first state will be the number N0 that are in the ground state, times e−ħω/kT. Similarly, N2, the number of oscillators in the second state, is N2 =N0e−2ħω/kT. To simplify the algebra, let us call e−ħω/kT = x. Then we simply have N1 = N0x, N2 = N0x2, …, Nn = N0xn.
The total energy of all the oscillators must first be worked out. If an oscillator is in the ground state, there is no energy. If it is in the first state, the energy is ħω, and there are N1 of them. So N1ħω, or ħωN0x is how much energy we get from those. Those that are in the second state have 2ħω, and there are N2 of them, so N2⋅2ħω=2ħωN0x2 is how much energy we get, and so on. Then we add it all together to get Etot = N0ħω(0+x+2x2+3x3+…).
And now, how many oscillators are there? Of course, N0 is the number that are in the ground state, N1 in the first state, and so on, and we add them together: Ntot = N0(1+x+x2+x3+…). Thus the average energy is
Feynman concludes as follows: “This, then, was the first quantum-mechanical formula ever known, or ever discussed, and it was the beautiful culmination of decades of puzzlement. Maxwell knew that there was something wrong, and the problem was, what was right? Here is the quantitative answer of what is right instead of kT. This expression should, of course, approach kT as ω → 0 or as T → ∞.”
It does, of course. And so Planck’s analysis does result in a theoretical I(ω) curve that matches the observed I(ω) curve as a function of both temperature (T) and frequency (ω). But so what it is, then? What’s the equation describing the dotted curves? It’s given below:
I’ll just quote Feynman once again to explain the shape of those dotted curves: “We see that for a large ω, even though we have ω3 in the numerator, there is an e raised to a tremendous power in the denominator, so the curve comes down again and does not “blow up”—we do not get ultraviolet light and x-rays where we do not expect them!”
Is the analysis necessarily discrete?
One question I can’t answer, because I just am not strong enough in math, is the question or whether or not there would be any other way to derive the actual blackbody spectrum. I mean… This analysis obviously makes sense and, hence, provides a theory that’s consistent and in accordance with experiment. However, the question whether or not it would be possible to develop another theory, without having recourse to the assumption that energy levels in atomic oscillators are discrete and equally spaced with the ‘distance’ between equal to hν0, is not easy to answer. I surely can’t, as I am just a novice, but I can imagine smarter people than me have thought about this question. The answer must be negative, because I don’t know of any other theory: quantum mechanics obviously prevailed. Still… I’d be interested to see the alternatives that must have been considered.
Post scriptum: The “playing with the sums” is a bit confusing. The key to the formula above is the substitution of (0+x+2x2+3x3+…)/(1+x+x2+x3+…) by 1/[(1/x)–1)] = 1/[eħω/kT–1]. Now, the denominator 1+x+x2+x3+… is the Maclaurin series for 1/(1–x). So we have:
(0+x+2x2+3x3+…)/(1+x+x2+x3+…) = (0+x+2x2+3x3+…)(1–x)
= x+2x2+3x3… –x2–2x3–3x4… = x+x2+x3+x4…
= –1+(1+x+x2+x3…) = –1 + 1/(1–x) = –(1–x)+1/(1–x) = x/(1–x).
Note the tricky bit: if x = e−ħω/kT, then eħω/kT is x−1 = 1/x, and so we have (1/x)–1 in the denominator of that (mean) energy formula, not 1/(x–1). Now 1/[(1/x)–1)] = 1/[(1–x)/x] = x/(1–x), indeed, and so the formula comes out alright.
In my previous post, I derived and explained the general formula for the pattern generated by a light beam going through a slit or a circular aperture: the diffraction pattern. For light going through an aperture, this generates the so-called Airy pattern. In practice, diffraction causes a blurring of the image, and may make it difficult to distinguish two separate points, as shown below (credit for the image must go to Wikipedia again, I am afraid).
What’s actually going on is that the lens acts as a slit or, if it’s circular (which is usually the case), as an aperture indeed: the wavefront of the transmitted light is taken to be spherical or plane when it exits the lens and interferes with itself, thereby creating the ring-shaped diffraction pattern that we explained in the previous post.
The spatial resolution is also known as the angular resolution, which is quite appropriate, because it refers to an angle indeed: we know the first minimum (i.e. the first black ring) occurs at an angle θ such that sinθ = λ/L, with λ the wavelength of the light and L the lens diameter. It’s good to remind ourselves of the geometry of the situation: below we picture the array of oscillators, and so we know that the first minimum occurs at an angle such that Δ = λ. The second, third, fourth etc minimum occurs at an angle θ such that Δ = 2λ, 3λ, 4λ, etc. However, these secondary minima do not play any role in determining the resolving power of a lens, or a telescope, or an electron microscope, etc, and so you can just forget about them for the time being.
For small angles (expressed in radians), we can use the so-called small-angle approximation and equate sinθ with θ: the error of this approximation is less than one percent for angles smaller than 0.244 radians (14°), so we have the amazingly simply result that the first minimum occurs at an angle θ such that:
θ = λ/L
Spatial resolution of a microscope: the Rayleigh criterion versus Dawes’ limit
If we have two point sources right next to each other, they will create two Airy disks, as shown above, which may overlap. That may make it difficult to see them, in a telescope, a microscope, or whatever device. Hence, telescopes, microscopes (using light or electron beams or whatever) have a limited resolving power. How do we measure that?
The so-called Rayleigh criterion regards two point sources as just resolved when the principal diffraction maximum of one image coincides with the first minimum of the other, as shown below. If the distance is greater, the two points are (very) well resolved, and if it is smaller, they are regarded as not resolved. This angle is obviously related to the θ = λ/L angle but it’s not the same: in fact, it’s a slightly wider angle. The analysis involved in calculating the angular resolution in terms of angle, and we use the same symbol θ for it, is quite complicated and so I’ll skip that and just give you the result:
θ = 1.22λ/L
Note that, in this equation, θ stands for the angular resolution, λ for the wavelength of the light being used, and L is the diameter of the (aperture of) the lens. In the first of the three images above, the two points are well separated and, hence, the angle between them is well above the angular resolution. In the second, the angle between just meets the Rayleigh criterion, and in the third the angle between them is smaller than the angular resolution and, hence, the two points are not resolved.
Of course, the Rayleigh criterion is, to some extent, a matter of judgment. In fact, an English 19th century astronomer, named William Rutter Dawes, actually tested human observers on close binary stars of equal brightness, and found they could make out the two stars within an angle that was slightly narrower than the one given by the Rayleigh criterion. Hence, for an optical telescope, you’ll also find the simple θ = λ/L formula, so that’s the formula without the 1.22 factor (of course, λ here is, once again, the wavelength of the observed light or radiation, and L is the diameter of the telescope’s primary lens). This very simple formula allows us, for example, to calculate the diameter of the telescope lens we’d need to build to separate (see) objects in space with a resolution of, for example, 1 arcsec (i.e. 1/3600 of a degree or π/648,000 of a radian). Indeed, if we filter for yellow light only, which has a wavelength of 580 nm, we find L = 580×10−9 m/(π/648,000) = 0.119633×10−6 m ≈ 12 cm. [Just so you know: that’s about the size of the lens aperture of a good telescope (4 or 6 inches) for amateur astronomers–just in case you’d want one. :-)]
This simplified formula is called Dawes’ limit, and you’ll often see it used instead of Rayleigh’s criterion. However, the fact that it’s exactly the same formula as our formula for the first minimum of the Airy pattern should not confuse you: angular resolution is something different.
Now, after this introduction, let me get to the real topic of this post: Heisenberg’s Uncertainty Principle according to Heisenberg.
Heisenberg’s Uncertainty Principle according to Heisenberg
I don’t know about you but, as a kid, I didn’t know much about waves and fields and all that, and so I had difficulty understanding why the resolving power of a microscope or any other magnifying device depended on the frequency or wavelength. I now know my understanding was limited because I thought the concept of the amplitude of an electromagnetic wave had some spatial meaning, like the amplitude of a water or a sound wave. You know what I mean: this false idea that an electromagnetic wave is something that sort of wriggles through space, just like a water or sound wave wriggle through their medium (water and air respectively). Now I know better: the amplitude of an electromagnetic wave measures field strength and there’s no medium (no aether). So it’s not like a wave going around some object, or making some medium oscillate. I am not ashamed to acknowledge my stupidity at the time: I am just happy I finally got it, because it helps to really understand Heisenberg’s own illustration of his Uncertainty Principle, which I’ll present now.
Heisenberg imagined a gamma-ray microscope, as shown below (I copied this from the website of the American Institute for Physics ). Gamma-ray microscopes don’t exist – they’re hard to produce: you need a nuclear reactor or so 🙂 – but, as Heisenberg saw the development of new microscopes using higher and higher energy beams (as opposed to the 1.5-3 eV light in the visible spectrum) so as to increase the angular resolution and, hence, be able to see smaller things, he imagined one could use, perhaps, gamma-rays for imaging. Gamma rays are the hardest radiation, with frequencies of 10 exaherz and more (or >1019 Hz) and, hence, energies above 100 keV (i.e. 100,000 more than photons in the visible light spectrum, and 1000 times more than the electrons used in an average electron microscope). Gamma rays are not the result of some electron jumping from a higher to a lower energy level: they are emitted in decay processes of atomic nuclei (gamma decay). But I am digressing. Back to the main story line. So Heisenberg imagined we could ‘shine’ gamma rays on an electron and that we could then ‘see’ that electron in the microscope because some of the gamma photons would indeed end up in the microscope after their ‘collision’ with the electron, as shown below.
The experiment is described in many places elsewhere but I found these accounts often confusing, and so I present my own here. 🙂
What Heisenberg basically meant to show is that this set-up would allow us to gather precise information on the position of the electron–because we would know where it was–but that, as a result, we’d lose information in regard to its momentum. Why? To put it simply: because the electron recoils as a result of the interaction. The point, of course, is to calculate the exact relationship between the two (position and momentum). In other words: what we want to do is to state the Uncertainty Principle quantitatively, not qualitatively.
Now, the animation above uses the symbol L for the γ-ray wavelength λ, which is confusing because I used L for the diameter of the aperture in my explanation of diffraction above. The animation above also uses a different symbol for the angular resolution: A instead of θ. So let me borrow the diagram used in the Wikipedia article and rephrase the whole situation.
From the diagram above, it’s obvious that, to be scattered into the microscope, the γ-ray photon must be scattered into a cone with angle ε. That angle is obviously related to the angular resolution of the microscope, which is θ = ε/2 = λ/D, with D the diameter of the aperture (i.e. the primary lens). Now, the electron could actually be anywhere, and the scattering angle could be much larger than ε, and, hence, relating D to the uncertainty in position (Δx) is not as obvious as most accounts of this thought experiment make it out to be. The thing is: if the scattering angle is larger than ε, it won’t reach the light detector at the end of the microscope (so that’s the flat top in the diagram above). So that’s why we can equate D with Δx, so we write Δx = ± D/2 = D. To put it differently: the assumption here is basically that this imaginary microscope ‘sees’ an area that is approximately as large as the lens. Using the small-angle approximation (so we write sin(2ε) ≈ 2ε), we can write:
Δx = 2λ/ε
Now, because of the recoil effect, the electron receives some momentum from the γ-ray photon. How much? Well… The situation is somewhat complicated (much more complicated than the Wikipedia article on this very same topic suggests), because the photon keeps some but also gives some of its original momentum. In fact, what’s happening really is Compton scattering: the electron first absorbs the photon, and then emits another with a different energy and, hence, also with different frequency and wavelength. However, what we do now is that the photon’s original momentum was equal to E/c= p = h/λ. That’s just the Planck relation or, if you’d want to look at the photon as a particle, the de Broglie equation.
Now, because we’re doing an analysis in one dimension only (x), we’re only going to look at the momentum in this direction only, i.e. px, and we’ll assume that all of the momentum of the photon before the interaction (or ‘collision’ if you want) was horizontal. Hence, we can write px = h/λ. After the collision, however, this momentum is spread over the electron and the scattered or emitted photon that’s going into the microscope. Let’s now imagine the two extremes:
- The scattered photon goes to the left edge of the lens. Hence, its horizontal momentum is negative (because it moves to the left) and the momentum px will be distributed over the electron and the photon such that px = p’x –h(ε/2)/λ’. Why the ε/2 factor? Well… That’s just trigonometry: the horizontal momentum of the scattered photon is obviously only a tiny fraction of its original horizontal momentum, and that fraction is given by the angle ε/2.
- The scattered photon goes to the right edge of the lens. In that case, we write px = p”x + h(ε/2)/λ”.
Now, the spread in the momentum of the electron, which we’ll simply write as Δp, is obviously equal to:
Δp = p”x – p’x = px + h(ε/2)/λ” – px + h(ε/2)/λ’ = h(ε/2)/λ” + h(ε/2)/λ’ = h(ε/2)/λ” + h(ε/2)/λ’
That’s a nice formula, but what can we do with it? What we want is a relationship between Δx and Δp, i.e. the position and the momentum of the electron, and of the electron only. That involves another simplification, which is also dealt with very summarily – too summarily in my view – in most accounts of this experiment. So let me spell it out. The angle ε is obviously very small and, hence, we may equate λ’ and λ”. In addition, while these two wavelengths differ from the wavelength of the incoming photon, the scattered photon is, obviously, still a gamma ray and, therefore, we are probably not too far off when substituting both λ’ and λ” for λ, i.e. the frequency of the incoming γ-ray. Now, we can re-write Δx = 2λ/ε as 1/Δx = ε/(2λ). We then get:
Δp = p”x – p’x = hε/2λ” + hε/2λ’ = 2hε/2λ = 2h/Δx
Now that yields ΔpΔx = 2h, which is an approximate expression of Heisenberg’s Uncertainty Principle indeed (don’t worry about the factor 2, as that’s something that comes with all of the approximations).
A final moot point perhaps: it is obviously a thought experiment. Not only because we don’t have gamma-ray microscopes (that’s not relevant because we can effectively imagine constructing one) but because the experiment involves only one photon. A real microscope would organize a proper beam, but that would obviously complicate the analysis. In fact, it would defeat the purpose, because the whole point is to analyze one single interaction here.
Now how should we interpret all of this? Is this Heisenberg’s ‘proof’ of his own Principle? Yes and no, I’d say. It’s part illustration, and part ‘proof’, I would say. The crucial assumptions here are:
- We can analyze γ-ray photons, or any photon for that matter, as particles having some momentum, and when ‘colliding’, or interacting, with an electron, the photon will impart some momentum to that electron.
- Momentum is being conserved and, hence, the total (linear) momentum before and after the collision, considering both particles–i.e. (1) the incoming ray and the electron before the interaction and (2) the emitted photon and the electron that’s getting the kick after the interaction–must be the same.
- For the γ-ray photon, we can relate (or associate, if you prefer that term) its wavelength λ with its momentum p through the Planck relation or, what amounts to the same for photons (because they have no mass), the de Broglie relation.
Now, these assumptions are then applied to an analysis of what we know to be true from experiment, and that’s the phenomenon of diffraction, part of which is the observation that the resolving power of a microscope is limited, and that its resolution is given by the θ = λ/D equation.
Bringing it all together, then gives us a theory which is consistent with experiment and, hence, we then assume the theory is true. Why? Well… I could start a long discourse here on the philosophy of science but, when everything is said and done, we should admit we don’t any ‘better’ theory.
But, you’ll say: what’s a ‘better’ theory? Well… Again, the answer to that question is the subject-matter of philosophers. As for me, I’d just refer to what’s known as Occam’s razor: among competing hypotheses, we should select the one with the fewest assumptions. Hence, while more complicated solutions may ultimately prove correct, the fewer assumptions that are made, the better. Now, when I was a kid, I thought quantum mechanics was very complicated and, hence, describing it here as a ‘simple’ theory sounds strange. But that’s what it is in the end: there’s no better (read: simpler) way to describe, for example, why electrons interfere with each other, and with themselves, when sending them through one or two slits, and so that’s what all these ‘illustrations’ want to show in the end, even if you think there must be simpler way to describe reality. As said, as a kid, I thought so too. 🙂
What is that we are trying to understand? As a kid, when I first heard about atoms consisting of a nucleus with electrons orbiting around it, I had this vision of worlds inside worlds, like a set of babushka dolls, one inside the other. Now I know that this model – which is nothing but the 1911 Rutherford model basically – is plain wrong, even if it continues to be used in the logo of the International Atomic Energy Agency, or the US Atomic Energy Commission.
Electrons are not planet-like things orbiting around some center. If one wants to understand something about the reality of electrons, one needs to familiarize oneself with complex-valued wave functions whose argument represents a weird quantity referred to as a probability amplitude and, contrary to what you may think (unless you read my blog, or if you just happen to know a thing or two about quantum mechanics), the relation between that amplitude and the concept of probability tout court is not very straightforward.
Familiarizing oneself with the math involved in quantum mechanics is not an easy task, as evidenced by all those convoluted posts I’ve been writing. In fact, I’ve been struggling with these things for almost a year now and I’ve started to realize that Roger Penrose’s Road to Reality (or should I say Feynman’s Lectures?) may lead nowhere – in terms of that rather spiritual journey of trying to understand what it’s all about. If anything, they made me realize that the worlds inside worlds are not the same. They are different – very different.
When everything is said and done, I think that’s what’s nagging us as common mortals. What we are all looking for is some kind of ‘Easy Principle’ that explains All and Everything, and we just can’t find it. The point is: scale matters. At the macro-scale, we usually analyze things using some kind of ‘billiard-ball model’. At a smaller scale, let’s say the so-called wave zone, our ‘law’ of radiation holds, and we can analyze things in terms of electromagnetic or gravitational fields. But then, when we further reduce scale, by another order of magnitude really – when trying to get very close to the source of radiation, or if we try to analyze what is oscillating really – we get in deep trouble: our easy laws do no longer hold, and the equally easy math – easy is relative of course 🙂 – we use to analyze fields or interference phenomena, becomes totally useless.
Religiously inclined people would say that God does not want us to understand all or, taking a somewhat less selfish picture of God, they would say that Reality (with a capital R to underline its transcendental aspects) just can’t be understood. Indeed, it is rather surprising – in my humble view at least – that things do seem to get more difficult as we drill down: in physics, it’s not the bigger things – like understanding thermonuclear fusion in the Sun, for example – but the smallest things which are difficult to understand. Of course, that’s partly because physics leaves some of the bigger things which are actually very difficult to understand – like how a living cell works, for example, or how our eye or our brain works – to other sciences to study (biology and biochemistry for cells, or for vision or brain functionality). In that respect, physics may actually be described as the science of the smallest things. The surprising thing, then, is that the smallest things are not necessarily the simplest things – on the contrary.
Still, that being said, I can’t help feeling some sympathy for the simpler souls who think that, if God exists, he seems to throw up barriers as mankind tries to advance its knowledge. Isn’t it strange, indeed, that the math describing the ‘reality’ of electrons and photons (i.e. quantum mechanics and quantum electrodynamics), as complicated as it is, becomes even more complicated – and, important to note, also much less accurate – when it’s used to try to describe the behavior of quarks and gluons? Additional ‘variables’ are needed (physicists call these ‘variables’ quantum numbers; however, when everything is said and done, that’s what quantum numbers actually are: variables in a theory), and the agreement between experimental results and predictions in QCD is not as obvious as it is in QED.
Frankly, I don’t know much about quantum chromodynamics – nothing at all to be honest – but when I read statements such as “analytic or perturbative solutions in low-energy QCD are hard or impossible due to the highly nonlinear nature of the strong force” (I just took this one line from the Wikipedia article on QCD), I instinctively feel that QCD is, in fact, a different world as well – and then I mean different from QED, in which analytic or perturbative solutions are the norm. Hence, I already know that, once I’ll have mastered Feynman’s Volume III, it won’t help me all that much to get to the next level of understanding: understanding quantum chromodynamics will be yet another long grind. In short, understanding quantum mechanics is only a first step.
Of course, that should not surprise us, because we’re talking very different order of magnitudes here: femtometers (10–15 m), in the case of electrons, as opposed to attometers (10–18 m) or even zeptometers (10–21 m) when we’re talking quarks. Hence, if past experience (I mean the evolution of scientific thought) is any guidance, we actually should expect an entirely different world. Babushka thinking is not the way forward.
What’s babushka thinking? You know what babushkas are, don’t you? These dolls inside dolls. [The term ‘babushka’ is actually Russian for an old woman or grandmother, which is what these dolls usually depict.] Babushka thinking is the fallacy of thinking that worlds inside worlds are the same. It’s what I did as a kid. It’s what many of us still do. It’s thinking that, when everything is said and done, it’s just a matter of not being able to ‘see’ small things and that, if we’d have the appropriate equipment, we actually would find the same doll within the larger doll – the same but smaller – and then again the same doll with that smaller doll. In Asia, they have these funny expression: “Same-same but different.” Well… That’s what babushka thinking all about: thinking that you can apply the same concepts, tools and techniques to what is, in fact, an entirely different ballgame.
Let me illustrate it. We discussed interference. We could assume that the laws of interference, as described by superimposing various waves, always hold, at every scale, and that it’s just the crudeness of our detection apparatus that prevents us from seeing what’s going on. Take two light sources, for example, and let’s say they are a billion wavelengths apart – so that’s anything between 400 to 700 meters for visible light (because the wavelength of visible light is 400 to 700 billionths of a meter). So then we won’t see any interference indeed, because we can’t register it. In fact, none of the standard equipment can. The interference term oscillates wildly up and down, from positive to negative and back again, if we move the detector just a tiny bit left or right – not more than the thickness of a hair (i.e. 0.07 mm or so). Hence, the range of angles θ (remember that angle θ was the key variable when calculating solutions for the resultant wave in previous posts) that are being covered by our eye – or by any standard sensor really – is so wide that the positive and negative interference averages out: all that we ‘see’ is the sum of the intensities of the two lights. The terms in the interference term cancel each other out. However, we are still essentially correct assuming there actually is interference: we just cannot see it – but it’s there.
Reinforcing the point, I should also note that, apart from this issue of ‘distance scale’, there is also the scale of time. Our eye has a tenth-of-a-second averaging time. That’s a huge amount of time when talking fundamental physics: remember that an atomic oscillator – despite its incredibly high Q – emits radiation for like 10-8 seconds only, so that’s one-hundred millionths of a second. Then another atom takes over, and another – and so that’s why we get unpolarized light: it’s all the same frequencies (because the electron oscillators radiate at their resonant frequencies), but so there is no fixed phase difference between all of these pulses: the interference between all of these pulses should result in ‘beats’ – as they interfere positively or negatively – but it all cancels out for us, because it’s too fast.
Indeed, while the ‘sensors’ in the retina of the human eye (there are actually four kind of cells there, but so the principal ones are referred to as ‘rod’ and ‘cone’ cells respectively) are, apparently, sensitive enough able to register individual photons, the “tenth-of-a-second averaging” time means that the cells – which are interconnected and ‘pre-process’ light really – will just amalgamate all those individual pulses into one signal of a certain color (frequency) and a certain intensity (energy). As one scientist puts it: “The neural filters only allow a signal to pass to the brain when at least about five to nine photons arrive within less than 100 ms.” Hence, that signal will not keep track of the spacing between those photons.
In short, information gets lost. But so that, in itself, does not invalidate babushka thinking. Let me visualize it by a non-very-mathematically-rigorous illustration. Suppose that we have some very regular wave train coming in, like the one below: one wave train consisting of three ‘groups’ separated between ‘nodes’.
All will depend on the period of the wave as compared to that one-tenth-of-a-second averaging time. In fact, we have two ‘periods’: the periodicity of the group – which is related to the concept of group velocity – and, hence, I’ll associate a ‘group wavelength’ and a ‘group period’ with that. [In case you haven’t heard of these terms before, don’t worry: I haven’t either. :-)] Now, if one tenth of a second covers like two or all three of the groups between the nodes (so that means that one tenth of a second is a multiple of the group period Tg), then even the envelope of the wave does not matter much in terms of ‘signal’: our brain will just get one pulse that averages it all out. We will see none of the detail of this wave train. Our eye will just get light in (remember that the intensity of the light is the square of the amplitude, so the negative amplitudes make contributions too) but we cannot distinguish any particular pulse: it’s just one signal. This is the most common situation when we are talking about electromagnetic radiation: many photons arrive but our eye just sends one signal to the brain: “Hey Boss! Light of color X and intensity Y coming from direction Z.”
In fact, it’s quite remarkable that our eye can distinguish colors in light of the fact that the wavelengths of various colors (violet, blue, green, yellow, orange and red) differs 30 to 40 billionths of a meter only! Better still: if the signal lasts long enough, we can distinguish shades whose wavelengths differ by 10 or 15 nm only, so that’s a difference of 1% or 2% only. In case you wonder how it works: Feynman devotes not less than two chapters in his Lectures to the physiology of the eye: not something you’ll find in other physics handbooks! There are apparently three pigments in the cells in our eyes, each sensitive to color in a different way and it is “the spectral absorption in those three pigments that produces the color sense.” So it’s a bit like the RGB system in a television – but then more complicated, of course!
But let’s go back to our wave there and analyze the second possibility. If a tenth of a second covers less than that ‘group wavelength’, then it’s different: we will actually see the individual groups as two or three separate pulses. Hence, in that case, our eye – or whatever detector (another detector will just have another averaging time – will average over a group, but not over the whole wave train. [Just in case you wonder how we humans compare with our living beings: from what I wrote above, it’s obvious we can see ‘flicker’ only if the oscillation is in the range of 10 or 20 Hz. The eye of a bee is made to see the vibrations of feet and wings of other bees and, hence, its averaging time is much shorter, like a hundredth of a second and, hence, it can see flicker up to 200 oscillations per second! In addition, the eye of a bee is sensitive over a much wider range of ‘color’ – it sees UV light down to a wavelength of 300 nm (where as we don’t see light with a wavelength below 400 nm) – and, to top it all off, it has got a special sensitivity for polarized light, so light that gets reflected or diffracted looks different to the bee.]
Let’s go to the third and final case. If a tenth of a second would cover less than the wavelength of the the so-called carrier wave, i.e. the actual oscillation, then we will be able to distinguish the individual peaks and troughs of the carrier wave!
Of course, this discussion is not limited to our eye as a sensor: any instrument will be able to measure individual phenomena only within a certain range, with an upper and a lower range, i.e. the ‘biggest’ thing it can see, and the ‘smallest’. So that explains the so-called resolution of an optical or an electron microscope: whatever the instrument, it cannot really ‘see’ stuff that’s smaller than the wavelength of the ‘light’ (real light or – in the case of an electron microscope – electron beams) it uses to ‘illuminate’ the object it is looking at. [The actual formula for the resolution of a microscope is obviously a bit more complicated, but this statement does reflect the gist of it.]
However, all that I am writing above, suggests that we can think of what’s going on here as ‘waves within waves’, with the wave between nodes not being any different – in substance that is – as the wave as a whole: we’ve got something that’s oscillating, and within each individual oscillation, we find another oscillation. From a math point of view, babushka thinking is thinking we can analyze the world using Fourier’s machinery to decompose some function (see my posts on Fourier analysis). Indeed, in the example above, we have a modulated carrier wave (it is an example of amplitude modulation – the old-fashioned way of transmitting radio signals), and we see a wave within a wave and, hence, just like the Rutherford model of an atom, you may think there will always be ‘a wave within a wave’.
In this regard, you may think of fractals too: fractals are repeating or self-similar patterns that are always there, at every scale. However, the point to note is that fractals do not represent an accurate picture of how reality is actually structured: worlds within worlds are not the same.
Reality is no onion
Reality is not some kind of onion, from which you peel off a layer and then you find some other layer, similar to the first: “same-same but different”, as they’d say in Asia. The Coast of Britain is, in fact, finite, and the grain of sand you’ll pick up at one of its beaches will not look like the coastline when you put it under a microscope. In case you don’t believe me: I’ve inserted a real-life photo below. The magnification factor is a rather modest 300 times. Isn’t this amazing? [The credit for this nice picture goes to a certain Dr. Gary Greenberg. Please do google his stuff. It’s really nice.]
In short, fractals are wonderful mathematical structures but – in reality – there are limits to how small things get: we cannot carve a babushka doll out of the cellulose and lignin molecules that make up most of what we call wood. Likewise, the atoms that make up the D-glucose chains in the cellulose will never resemble the D-glucose chains. Hence, the babushka doll, the D-glucose chains that make up wood, and the atoms that make up the molecules within those macro-molecules are three different worlds. They’re not like layers of the same onion. Scale matters. The worlds inside words are different, and fundamentally so: not “same-same but different” but just plain different. Electrons are no longer point-like negative charges when we look at them at close range.
In fact, that’s the whole point: we can’t look at them at close range because we can’t ‘locate’ them. They aren’t particles. They are these strange ‘wavicles’ which we described, physically and mathematically, with a complex wave function relating their position (or their momentum) with some probability amplitude, and we also need to remember these funny rules for adding these amplitudes, depending on whether or not the ‘wavicle’ obeys Fermi or Bose statistics.
Weird, but – come to think of it – not more weird, in terms of mathematical description, than these electromagnetic waves. Indeed, when jotting down all these equations and developing all those mathematical argument, one often tends to forget that we are not talking some physical wave here. The field vector E (or B) is a mathematical construct: it tells us what force a charge will feel when we put it here or there. It’s not like a water or sound wave that makes some medium (water or air) actually move. The field is an influence that travels through empty space. But how can something actually through empty space? When it’s truly empty, you can’t travel through it, can you?
Oh – you’ll say – but we’ve got these photons, don’t we? Waves are not actually waves: they come in little packets of energy – photons. Yes. You’re right. But, as mentioned above, these photons aren’t little bullets – or particles if you want. They’re as weird as the wave and, in any case, even a billiard ball view of the world is not very satisfying: what happens exactly when two billiard balls collide in a so-called elastic collision? What are the springs on the surface of those balls – in light of the quick reaction, they must resemble more like little explosive charges that detonate on impact, isn’t it? – that make the two balls recoil from each other?
So any mathematical description of reality becomes ‘weird’ when you keep asking questions, like that little child I was – and I still am, in a way, I guess. Otherwise I would not be reading physics at the age of 45, would I? 🙂
Let me wrap up here. All of what I’ve been blogging about over the past few months concerns the classical world of physics. It consists of waves and fields on the one hand, and solid particles on the other – electrons and nucleons. But so we know it’s not like that when we have more sensitive apparatuses, like the apparatus used in that 2012 double-slit electron interference experiment at the University of Nebraska–Lincoln, that I described at length in one of my earlier posts. That apparatus allowed control of two slits – both not more than 62 nanometer wide (so that’s the difference between the wavelength of dark-blue and light-blue light!), and the monitoring of single-electron detection events. Back in 1963, Feynman already knew what this experiment would yield as a result. He was sure about it, even if he thought such instrument could never be built. [To be fully correct, he did have some vague idea about a new science, for which he himself coined the term ‘nanotechnology’, but what we can do today surpasses, most probably, all his expectations at the time. Too bad he died too young to see his dreams come through.]
The point to note is that this apparatus does not show us another layer of the same onion: it shows an entirely different world. While it’s part of reality, it’s not ‘our’ reality, nor is it the ‘reality’ of what’s being described by classical electromagnetic field theory. It’s different – and fundamentally so, as evidenced by those weird mathematical concepts one needs to introduce to sort of start to ‘understand’ it.
So… What do I want to say here? Nothing much. I just had to remind myself where I am right now. I myself often still fall prey to babushka thinking. We shouldn’t. We should wonder about the wood these dolls are made of. In physics, the wood seems to be math. The models I’ve presented in this blog are weird: what are those fields? And just how do they exert a force on some charge? What’s the mechanics behind? To these questions, classical physics does not have an answer really.
But, of course, quantum mechanics does not have a very satisfactory answer either: what does it mean when we say that the wave function collapses? Out of all of the possibilities in that wonderful indeterminate world ‘inside’ the quantum-mechanical universe, one was ‘chosen’ as something that actually happened: a photon imparts momentum to an electron, for example. We can describe it, mathematically, but – somehow – we still don’t really understand what’s going on.
So what’s going on? We open a doll, and we do not find another doll that is smaller but similar. No. What we find is a completely different toy. However – Surprise ! Surprise ! – it’s something that can be ‘opened’ as well, to reveal even weirder stuff, for which we need even weirder ‘tools’ to somehow understand how it works (like lattice QCD, if you’d want an example: just google it if you want to get an inkling of what that’s about). Where is this going to end? Did it end with the ‘discovery’ of the Higgs particle? I don’t think so.
However, with the ‘discovery’ (or, to be generous, let’s call it an experimental confirmation) of the Higgs particle, we may have hit a wall in terms of verifying our theories. At the center of a set of babushka dolls, you’ll usually have a little baby: a solid little thing that is not like the babushkas surrounding it: it’s young, male and solid, as opposed to the babushkas. Well… It seems that, in physics, we’ve got several of these little babies inside: electrons, photons, quarks, gluons, Higgs particles, etcetera. And we don’t know what’s ‘inside’ of them. Just that they’re different. Not “same-same but different”. No. Fundamentally different. So we’ve got a lot of ‘babies’ inside of reality, very different from the ‘layers’ around them, which make up ‘our’ reality. Hence, ‘Reality’ is not a fractal structure. What is it? Well… I’ve started to think we’ll never know. For all of the math and wonderful intellectualism involved, do we really get closer to an ‘understanding’ of what it’s all about?
I am not sure. The more I ‘understand’, the less I ‘know’ it seems. But then that’s probably why many physicists still nurture an acute sense of mystery, and why I am determined to keep reading. 🙂
Post scriptum: On the issue of the ‘mechanistic universe’ and the (related) issue of determinability and indeterminability, that’s not what I wanted to write about above, because I consider that solved. This post is meant to convey some wonder – on the different models of understanding that we need to apply to different scales. It’s got little to do with determinability or not. I think that issue got solved long time ago, and I’ll let Feynman summarize that discussion:
“The indeterminacy of quantum mechanics has given rise to all kinds of nonsense and questions on the meaning of freedom of will, and of the idea that the world is uncertain. […] Classical physics is also indeterminate. It is true, classically, that if we knew the position and the velocity of every particle in the world, or in a box of gas, we could predict exactly what would happen. And therefore the classical world is deterministic. Suppose, however, we have a finite accuracy and do not know exactly where just one atom is, say to one part in a billion. Then as it goes along it hits another atom, and because we did not know the position better than one part in a billion, we find an even larger error in the position after the collision. And that is amplified, of course, in the next collision, so that if we start with only a tiny error it rapidly magnifies to a very great uncertainty. […] Speaking more precisely, given an arbitrary accuracy, no matter how precise, one can find a time long enough that we cannot make predictions valid for that long a time. That length of time is not very large. It is not that the time is millions of years if the accuracy is one part in a billion. The time goes only logarithmically with the error. In only a very, very tiny time – less than the time it took to state the accuracy – we lose all our information. It is therefore not fair to say that from the apparent freedom and indeterminacy of the human mind, we should have realized that classical ‘deterministic’ physics could not ever hope to understand, and to welcome quantum mechanics as a release from a completely ‘mechanistic’ universe. For already in classical mechanics, there was indeterminability from a practical point of view.” (Feynman, Lectures, 1963, p. 38-10)
That really says it all, I think. I’ll just continue to keep my head down – i.e. stay away from philosophy as for now – and try to find a way to open the toy inside the toy. 🙂
The title above refers to a previous post: An Easy Piece: Introducing the wave function.
Indeed, I may have been sloppy here and there – I hope not – and so that’s why it’s probably good to clarify that the wave function (usually represented as Ψ – the psi function) and the wave equation (Schrödinger’s equation, for example – but there are other types of wave equations as well) are two related but different concepts: wave equations are differential equations, and wave functions are their solutions.
Indeed, from a mathematical point of view, a differential equation (such as a wave equation) relates a function (such as a wave function) with its derivatives, and its solution is that function or – more generally – the set (or family) of functions that satisfies this equation.
The function can be real-valued or complex-valued, and it can be a function involving only one variable (such as y = y(x), for example) or more (such as u = u(x, t) for example). In the first case, it’s a so-called ordinary differential equation. In the second case, the equation is referred to as a partial differential equation, even if there’s nothing ‘partial’ about it: it’s as ‘complete’ as an ordinary differential equation (the name just refers to the presence of partial derivatives in the equation). Hence, in an ordinary differential equation, we will have terms involving dy/dx and/or d2y/dx2, i.e. the first and second derivative of y respectively (and/or higher-order derivatives, depending on the degree of the differential equation), while in partial differential equations, we will see terms involving ∂u/∂t and/or ∂u2/∂x2 (and/or higher-order partial derivatives), with ∂ replacing d as a symbol for the derivative.
The independent variables could also be complex-valued but, in physics, they will usually be real variables (or scalars as real numbers are also being referred to – as opposed to vectors, which are nothing but two-, three- or more-dimensional numbers really). In physics, the independent variables will usually be x – or let’s use r = (x, y, z) for a change, i.e. the three-dimensional space vector – and the time variable t. An example is that wave function which we introduced in our ‘easy piece’.
Ψ(r, t) = Aei(p·r – Et)ħ
[If you read the Easy Piece, then you might object that this is not quite what I wrote there, and you are right: I wrote Ψ(r, t) = Aei(p/ħ)·r – ωt). However, here I am just introducing the other de Broglie relation (i.e. the one relating energy and frequency): E = hf =ħω and, hence, ω = E/ħ. Just re-arrange a bit and you’ll see it’s the same.]
From a physics point of view, a differential equation represents a system subject to constraints, such as the energy conservation law (the sum of the potential and kinetic energy remains constant), and Newton’s law of course: F = d(mv)/dt. A differential equation will usually also be given with one or more initial conditions, such as the value of the function at point t = 0, i.e. the initial value of the function. To use Wikipedia’s definition: “Differential equations arise whenever a relation involving some continuously varying quantities (modeled by functions) and their rates of change in space and/or time (expressed as derivatives) is known or postulated.”
That sounds a bit more complicated, perhaps, but it means the same: once you have a good mathematical model of a physical problem, you will often end up with a differential equation representing the system you’re looking at, and then you can do all kinds of things, such as analyzing whether or not the actual system is in an equilibrium and, if not, whether it will tend to equilibrium or, if not, what the equilibrium conditions would be. But here I’ll refer to my previous posts on the topic of differential equations, because I don’t want to get into these details – as I don’t need them here.
The one thing I do need to introduce is an operator referred to as the gradient (it’s also known as the del operator, but I don’t like that word because it does not convey what it is). The gradient – denoted by ∇ – is a shorthand for the partial derivatives of our function u or Ψ with respect to space, so we write:
∇ = (∂/∂x, ∂/∂y, ∂/∂z)
You should note that, in physics, we apply the gradient only to the spatial variables, not to time. For the derivative in regard to time, we just write ∂u/∂t or ∂Ψ/∂t.
Of course, an operator means nothing until you apply it to a (real- or complex-valued) function, such as our u(x, t) or our Ψ(r, t):
∇u = ∂u/∂x and ∇Ψ = (∂Ψ/∂x, ∂Ψ/∂y, ∂Ψ/∂z)
As you can see, the gradient operator returns a vector with three components if we apply it to a real- or complex-valued function of r, and so we can do all kinds of funny things with it combining it with the scalar or vector product, or with both. Here I need to remind you that, in a vector space, we can multiply vectors using either (i) the scalar product, aka the dot product (because of the dot in its notation: a•b) or (ii) the vector product, aka as the cross product (yes, because of the cross in its notation: a×b).
So we can define a whole range of new operators using the gradient and these two products, such as the divergence and the curl of a vector field. For example, if E is the electric field vector (I am using an italic bold-type E so you should not confuse E with the energy E, which is a scalar quantity), then div E = ∇•E, and curl E =∇×E. Taking the divergence of a vector will yield some number (so that’s a scalar), while taking the curl will yield another vector.
I am mentioning these operators because you will often see them. A famous example is the set of equations known as Maxwell’s equations, which integrate all of the laws of electromagnetism and from which we can derive the electromagnetic wave equation:
(1) ∇•E = ρ/ε0 (Gauss’ law)
(2) ∇×E = –∂B/∂t (Faraday’s law)
(3) ∇•B = 0
(4) c2∇×B = j/ε0 + ∂E/∂t
I should not explain these but let me just remind you of the essentials:
- The first equation (Gauss’ law) can be derived from the equations for Coulomb’s law and the forces acting upon a charge q in an electromagnetic field: F = q(E + v×B) – with B the magnetic field vector (F is also referred to as the Lorentz force: it’s the combined force on a charged particle caused by the electric and magnetic fields; v the velocity of the (moving) charge; ρ the charge density (so charge is thought of as being distributed in space, rather than being packed into points, and that’s OK because our scale is not the quantum-mechanical one here); and, finally, ε0 the electric constant (some 8.854×10−12 farads per meter).
- The second equation (Faraday’s law) gives the electric field associated with a changing magnetic field.
- The third equation basically states that there is no such thing as a magnetic charge: there are only electric charges.
- Finally, in the last equation, we have a vector j representing the current density: indeed, remember than magnetism only appears when (electric) charges are moving, so if there’s an electric current. As for the equation itself, well… That’s a more complicated story so I will leave that for the post scriptum.
We can do many more things: we can also take the curl of the gradient of some scalar, or the divergence of the curl of some vector (both have the interesting property that they are zero), and there are many more possible combinations – some of them useful, others not so useful. However, this is not the place to introduce differential calculus of vector fields (because that’s what it is).
The only other thing I need to mention here is what happens when we apply this gradient operator twice. Then we have an new operator ∇•∇ = ∇2 which is referred to as the Laplacian. In fact, when we say ‘apply ∇ twice’, we are actually doing a dot product. Indeed, ∇ returns a vector, and so we are going to multiply this vector once again with a vector using the dot product rule: a•b = ∑aibi (so we multiply the individual vector components and then add them). In the case of our functions u and Ψ, we get:
∇•(∇u) =∇•∇u = (∇•∇)u = ∇2 u =∂2u/∂x2
∇•(∇Ψ) = ∇2 Ψ = ∂2Ψ/∂x2 + ∂2Ψ/∂y2 + ∂2Ψ/∂z2
Now, you may wonder what it means to take the derivative (or partial derivative) of a complex-valued function (which is what we are doing in the case of Ψ) but don’t worry about that: a complex-valued function of one or more real variables, such as our Ψ(x, t), can be decomposed as Ψ(x, t) =ΨRe(x, t) + iΨIm(x, t), with ΨRe and ΨRe two real-valued functions representing the real and imaginary part of Ψ(x, t) respectively. In addition, the rules for integrating complex-valued functions are, to a large extent, the same as for real-valued functions. For example, if z is a complex number, then dez/dz = ez and, hence, using this and other very straightforward rules, we can indeed find the partial derivatives of a function such as Ψ(r, t) = Aei(p·r – Et)ħ with respect to all the (real-valued) variables in the argument.
The electromagnetic wave equation
OK. That’s enough math now. We are ready now to look at – and to understand – a real wave equation – I mean one that actually represents something in physics. Let’s take Maxwell’s equations as a start. To make it easy – and also to ensure that you have easy access to the full derivation – we’ll take the so-called Heaviside form of these equations:
This Heaviside form assumes a charge-free vacuum space, so there are no external forces acting upon our electromagnetic wave. There are also no other complications such as electric currents. Also, the c2 (i.e. the square of the speed of light) is written here c2 = 1/με, with μ and ε the so-called permeability (μ) and permittivity (ε) respectively (c0, μ0 and ε0 are the values in a vacuum space: indeed, light travels slower elsewhere (e.g. in glass) – if at all).
Now, these four equations can be replaced by just two, and it’s these two equations that are referred to as the electromagnetic wave equation(s):
The derivation is not difficult. In fact, it’s much easier than the derivation for the Schrödinger equation which I will present in a moment. But, even if it is very short, I will just refer to Wikipedia in case you would be interested in the details (see the article on the electromagnetic wave equation). The point here is just to illustrate what is being done with these wave equations and why – not so much how. Indeed, you may wonder what we have gained with this ‘reduction’.
The answer to this very legitimate question is easy: the two equations above are second-order partial differential equations which are relatively easy to solve. In other words, we can find a general solution, i.e. a set or family of functions that satisfy the equation and, hence, can represent the wave itself. Why a set of functions? If it’s a specific wave, then there should only be one wave function, right? Right. But to narrow our general solution down to a specific solution, we will need extra information, which are referred to as initial conditions, boundary conditions or, in general, constraints. [And if these constraints are not sufficiently specific, then we may still end up with a whole bunch of possibilities, even if they narrowed down the choice.]
Let’s give an example by re-writing the above wave equation and using our function u(x, t) or, to simplify the analysis, u(x, t) – so we’re looking at a plane wave traveling in one dimension only:
There are many functional forms for u that satisfy this equation. One of them is the following:
This resembles the one I introduced when presenting the de Broglie equations, except that – this time around – we are talking a real electromagnetic wave, not some probability amplitude. Another difference is that we allow a composite wave with two components: one traveling in the positive x-direction, and one traveling in the negative x-direction. Now, if you read the post in which I introduced the de Broglie wave, you will remember that these Aei(kx–ωt) or Be–i(kx+ωt) waves give strange probabilities. However, because we are not looking at some probability amplitude here – so it’s not a de Broglie wave but a real wave (so we use complex number notation only because it’s convenient but, in practice, we’re only considering the real part), this functional form is quite OK.
That being said, the following functional form, representing a wave packet (aka a wave train) is also a solution (or a set of solutions better):
Huh? Well… Yes. If you really can’t follow here, I can only refer you to my post on Fourier analysis and Fourier transforms: I cannot reproduce that one here because that would make this post totally unreadable. We have a wave packet here, and so that’s the sum of an infinite number of component waves that interfere constructively in the region of the envelope (so that’s the location of the packet) and destructively outside. The integral is just the continuum limit of a summation of n such waves. So this integral will yield a function u with x and t as independent variables… If we know A(k) that is. Now that’s the beauty of these Fourier integrals (because that’s what this integral is).
Indeed, in my post on Fourier transforms I also explained how these amplitudes A(k) in the equation above can be expressed as a function of u(x, t) through the inverse Fourier transform. In fact, I actually presented the Fourier transform pair Ψ(x) and Φ(p) in that post, but the logic is same – except that we’re inserting the time variable t once again (but with its value fixed at t=0):
OK, you’ll say, but where is all of this going? Be patient. We’re almost done. Let’s now introduce a specific initial condition. Let’s assume that we have the following functional form for u at time t = 0:
You’ll wonder where this comes from. Well… I don’t know. It’s just an example from Wikipedia. It’s random but it fits the bill: it’s a localized wave (so that’s a a wave packet) because of the very particular form of the phase (θ = –x2+ ik0x). The point to note is that we can calculate A(k) when inserting this initial condition in the equation above, and then – finally, you’ll say – we also get a specific solution for our u(x, t) function by inserting the value for A(k) in our general solution. In short, we get:
As mentioned above, we are actually only interested in the real part of this equation (so that’s the e with the exponent factor (note there is no i in it, so it’s just some real number) multiplied with the cosine term).
However, the example above shows how easy it is to extend the analysis to a complex-valued wave function, i.e. a wave function describing a probability amplitude. We will actually do that now for Schrödinger’s equation. [Note that the example comes from Wikipedia’s article on wave packets, and so there is a nice animation which shows how this wave packet (be it the real or imaginary part of it) travels through space. Do watch it!]
Let me just write it down:
That’s it. This is the Schrödinger equation – in a somewhat simplified form but it’s OK.
[…] You’ll find that equation above either very simple or, else, very difficult depending on whether or not you understood most or nothing at all of what I wrote above it. If you understood something, then it should be fairly simple, because it hardly differs from the other wave equation.
Indeed, we have that imaginary unit (i) in front of the left term, but then you should not panic over that: when everything is said and done, we are working here with the derivative (or partial derivative) of a complex-valued function, and so it should not surprise us that we have an i here and there. It’s nothing special. In fact, we had them in the equation above too, but they just weren’t explicit. The second difference with the electromagnetic wave equation is that we have a first-order derivative of time only (in the electromagnetic wave equation we had ∂2u/∂t2, so that’s a second-order derivative). Finally, we have a -1/2 factor in front of the right-hand term, instead of c2. OK, so what? It’s a different thing – but that should not surprise us: when everything is said and done, it is a different wave equation because it describes something else (not an electromagnetic wave but a quantum-mechanical system).
To understand why it’s different, I’d need to give you the equivalent of Maxwell’s set of equations for quantum mechanics, and then show how this wave equation is derived from them. I could do that. The derivation is somewhat lengthier than for our electromagnetic wave equation but not all that much. The problem is that it involves some new concepts which we haven’t introduced as yet – mainly some new operators. But then we have introduced a lot of new operators already (such as the gradient and the curl and the divergence) so you might be ready for this. Well… Maybe. The treatment is a bit lengthy, and so I’d rather do in a separate post. Why? […] OK. Let me say a few things about it then. Here we go:
- These new operators involve matrix algebra. Fine, you’ll say. Let’s get on with it. Well… It’s matrix algebra with matrices with complex elements, so if we write a n×m matrix A as A = (aiaj), then the elements aiaj (i = 1, 2,… n and j = 1, 2,… m) will be complex numbers.
- That allows us to define Hermitian matrices: a Hermitian matrix is a square matrix A which is the same as the complex conjugate of its transpose.
- We can use such matrices as operators indeed: transformations acting on a column vector X to produce another column vector AX.
- Now, you’ll remember – from your course on matrix algebra with real (as opposed to complex) matrices, I hope – that we have this very particular matrix equation AX = λX which has non-trivial solutions (i.e. solutions X ≠ 0) if and only if the determinant of A-λI is equal to zero. This condition (det(A-λI) = 0) is referred to as the characteristic equation.
- This characteristic equation is a polynomial of degree n in λ and its roots are called eigenvalues or characteristic values of the matrix A. The non-trivial solutions X ≠ 0 corresponding to each eigenvalue are called eigenvectors or characteristic vectors.
Now – just in case you’re still with me – it’s quite simple: in quantum mechanics, we have the so-called Hamiltonian operator. The Hamiltonian in classical mechanics represents the total energy of the system: H = T + V (total energy H = kinetic energy T + potential energy V). Here we have got something similar but different. 🙂 The Hamiltonian operator is written as H-hat, i.e. an H with an accent circonflexe (as they say in French). Now, we need to let this Hamiltonian operator act on the wave function Ψ and if the result is proportional to the same wave function Ψ, then Ψ is a so-called stationary state, and the proportionality constant will be equal to the energy E of the state Ψ. These stationary states correspond to standing waves, or ‘orbitals’, such as in atomic orbitals or molecular orbitals. So we have:
I am sure you are no longer there but, in fact, that’s it. We’re done with the derivation. The equation above is the so-called time-independent Schrödinger equation. It’s called like that not because the wave function is time-independent (it is), but because the Hamiltonian operator is time-independent: that obviously makes sense because stationary states are associated with specific energy levels indeed. However, if we do allow the energy level to vary in time (which we should do – if only because of the uncertainty principle: there is no such thing as a fixed energy level in quantum mechanics), then we cannot use some constant for E, but we need a so-called energy operator. Fortunately, this energy operator has a remarkably simple functional form:
Now if we plug that in the equation above, we get our time-dependent Schrödinger equation:
OK. You probably did not understand one iota of this but, even then, you will object that this does not resemble the equation I wrote at the very beginning: i(∂u/∂t) = (-1/2)∇2u.
You’re right, but we only need one more step for that. If we leave out potential energy (so we assume a particle moving in free space), then the Hamiltonian can be written as:
You’ll ask me how this is done but I will be short on that: the relationship between energy and momentum is being used here (and so that’s where the 2m factor in the denominator comes from). However, I won’t say more about it because this post would become way too lengthy if I would include each and every derivation and, remember, I just want to get to the result because the derivations here are not the point: I want you to understand the functional form of the wave equation only. So, using the above identity and, OK, let’s be somewhat more complete and include potential energy once again, we can write the time-dependent wave equation as:
Now, how is the equation above related to i(∂u/∂t) = (-1/2)∇2u? It’s a very simplified version of it: potential energy is, once again, assumed to be not relevant (so we’re talking a free particle again, with no external forces acting on it) but the real simplification is that we give m and ħ the value 1, so m = ħ = 1. Why?
Well… My initial idea was to do something similar as I did above and, hence, actually use a specific example with an actual functional form, just like we did for that the real-valued u(x, t) function. However, when I look at how long this post has become already, I realize I should not do that. In fact, I would just copy an example from somewhere else – probably Wikipedia once again, if only because their examples are usually nicely illustrated with graphs (and often animated graphs). So let me just refer you here to the other example given in the Wikipedia article on wave packets: that example uses that simplified i(∂u/∂t) = (-1/2)∇2u equation indeed. It actually uses the same initial condition:
However, because the wave equation is different, the wave packet behaves differently. It’s a so-called dispersive wave packet: it delocalizes. Its width increases over time and so, after a while, it just vanishes because it diffuses all over space. So there’s a solution to the wave equation, given this initial condition, but it’s just not stable – as a description of some particle that is (from a mathematical point of view – or even a physical point of view – there is no issue).
In any case, this probably all sounds like Chinese – or Greek if you understand Chinese :-). I actually haven’t worked with these Hermitian operators yet, and so it’s pretty shaky territory for me myself. However, I felt like I had picked up enough math and physics on this long and winding Road to Reality (I don’t think I am even halfway) to give it a try. I hope I succeeded in passing the message, which I’ll summarize as follows:
- Schrödinger’s equation is just like any other differential equation used in physics, in the sense that it represents a system subject to constraints, such as the relationship between energy and momentum.
- It will have many general solutions. In other words, the wave function – which describes a probability amplitude as a function in space and time – will have many general solutions, and a specific solution will depend on the initial conditions.
- The solution(s) can represent stationary states, but not necessary so: a wave (or a wave packet) can be non-dispersive or dispersive. However, when we plug the wave function into the wave equation, it will satisfy that equation.
That’s neither spectacular nor difficult, is it? But, perhaps, it helps you to ‘understand’ wave equations, including the Schrödinger equation. But what is understanding? Dirac once famously said: “I consider that I understand an equation when I can predict the properties of its solutions, without actually solving it.”
Hmm… I am not quite there yet, but I am sure some more practice with it will help. 🙂
Post scriptum: On Maxwell’s equations
First, we should say something more about these two other operators which I introduced above: the divergence and the curl. First on the divergence.
The divergence of a field vector E (or B) at some point r represents the so-called flux of E, i.e. the ‘flow’ of E per unit volume. So flux and divergence both deal with the ‘flow’ of electric field lines away from (positive) charges. [The ‘away from’ is from positive charges indeed – as per the convention: Maxwell himself used the term ‘convergence’ to describe flow towards negative charges, but so his ‘convention’ did not survive. Too bad, because I think convergence would be much easier to remember.]
So if we write that ∇•E = ρ/ε0, then it means that we have some constant flux of E because of some (fixed) distribution of charges.
Now, we already mentioned that equation (2) in Maxwell’s set meant that there is no such thing as a ‘magnetic’ charge: indeed, ∇•B = 0 means there is no magnetic flux. But, of course, magnetic fields do exist, don’t they? They do. A current in a wire, for example, i.e. a bunch of steadily moving electric charges, will induce a magnetic field according to Ampère’s law, which is part of equation (4) in Maxwell’s set: c2∇×B = j/ε0, with j representing the current density and ε0 the electric constant.
Now, at this point, we have this curl: ∇×B. Just like divergence (or convergence as Maxwell called it – but then with the sign reversed), curl also means something in physics: it’s the amount of ‘rotation’, or ‘circulation’ as Feynman calls it, around some loop.
So, to summarize the above, we have (1) flux (divergence) and (2) circulation (curl) and, of course, the two must be related. And, while we do not have any magnetic charges and, hence, no flux for B, the current in that wire will cause some circulation of B, and so we do have a magnetic field. However, that magnetic field will be static, i.e. it will not change. Hence, the time derivative ∂B/∂t will be zero and, hence, from equation (2) we get that ∇×E = 0, so our electric field will be static too. The time derivative ∂E/∂t which appears in equation (4) also disappears and we just have c2∇×B = j/ε0. This situation – of a constant magnetic and electric field – is described as electrostatics and magnetostatics respectively. It implies a neat separation of the four equations, and it makes magnetism and electricity appear as distinct phenomena. Indeed, as long as charges and currents are static, we have:
[I] Electrostatics: (1) ∇•E = ρ/ε0 and (2) ∇×E = 0
[II] Magnetostatics: (3) c2∇×B = j/ε0 and (4) ∇•B = 0
The first two equations describe a vector field with zero curl and a given divergence (i.e. the electric field) while the third and fourth equations second describe a seemingly separate vector field with a given curl but zero divergence. Now, I am not writing this post scriptum to reproduce Feynman’s Lectures on Electromagnetism, and so I won’t say much more about this. I just want to note two points:
1. The first point to note is that factor c2 in the c2∇×B = j/ε0 equation. That’s something which you don’t have in the ∇•E = ρ/ε0 equation. Of course, you’ll say: So what? Well… It’s weird. And if you bring it to the other side of the equation, it becomes clear that you need an awful lot of current for a tiny little bit of magnetic circulation (because you’re dividing by c2 , so that’s a factor 9 with 16 zeroes after it (9×1016): an awfully big number in other words). Truth be said, it reveals something very deep. Hmm? Take a wild guess. […] Relativity perhaps? Well… Yes!
It’s obvious that we buried v somewhere in this equation, the velocity of the moving charges. But then it’s part of j of course: the rate at which charge flows through a unit area per second. But – Hey! – velocity as compared to what? What’s the frame of reference? The frame of reference is us obviously or – somewhat less subjective – the stationary charges determining the electric field according to equation (1) in the set above: ∇•E = ρ/ε0. But so here we can ask the same question: stationary in what reference frame? As compared to the moving charges? Hmm… But so how does it work with relativity? I won’t copy Feynman’s 13th Lecture here, but so, in that lecture, he analyzes what happens to the electric and magnetic force when we look at the scene from another coordinate system – let’s say one that moves parallel to the wire at the same speed as the moving electrons, so – because of our new reference frame – the ‘moving electrons’ now appear to have no speed at all but, of course, our stationary charges will now seem to move.
What Feynman finds – and his calculations are very easy and straightforward – is that, while we will obviously insert different input values into Maxwell’s set of equations and, hence, get different values for the E and B fields, the actual physical effect – i.e. the final Lorentz force on a (charged) particle – will be the same. To be very specific, in a coordinate system at rest with respect to the wire (so we see charges move in the wire), we find a ‘magnetic’ force indeed, but in a coordinate system moving at the same speed of those charges, we will find an ‘electric’ force only. And from yet another reference frame, we will find a mixture of E and B fields. However, the physical result is the same: there is only one combined force in the end – the Lorentz force F = q(E + v×B) – and it’s always the same, regardless of the reference frame (inertial or moving at whatever speed – relativistic (i.e. close to c) or not).
In other words, Maxwell’s description of electromagnetism is invariant or, to say exactly the same in yet other words, electricity and magnetism taken together are consistent with relativity: they are part of one physical phenomenon: the electromagnetic interaction between (charged) particles. So electric and magnetic fields appear in different ‘mixtures’ if we change our frame of reference, and so that’s why magnetism is often described as a ‘relativistic’ effect – although that’s not very accurate. However, it does explain that c2 factor in the equation for the curl of B. [How exactly? Well… If you’re smart enough to ask that kind of question, you will be smart enough to find the derivation on the Web. :-)]
Note: Don’t think we’re talking astronomical speeds here when comparing the two reference frames. It would also work for astronomical speeds but, in this case, we are talking the speed of the electrons moving through a wire. Now, the so-called drift velocity of electrons – which is the one we have to use here – in a copper wire of radius 1 mm carrying a steady current of 3 Amps is only about 1 m per hour! So the relativistic effect is tiny – but still measurable !
2. The second thing I want to note is that Maxwell’s set of equations with non-zero time derivatives for E and B clearly show that it’s changing electric and magnetic fields that sort of create each other, and it’s this that’s behind electromagnetic waves moving through space without losing energy. They just travel on and on. The math behind this is beautiful (and the animations in the related Wikipedia articles are equally beautiful – and probably easier to understand than the equations), but that’s stuff for another post. As the electric field changes, it induces a magnetic field, which then induces a new electric field, etc., allowing the wave to propagate itself through space. I should also note here that the energy is in the field and so, when electromagnetic waves, such as light, or radiowaves, travel through space, they carry their energy with them.
Let me be fully complete here, and note that there’s energy in electrostatic fields as well, and the formula for it is remarkably beautiful. The total (electrostatic) energy U in an electrostatic field generated by charges located within some finite distance is equal to:
This equation introduces the electrostatic potential. This is a scalar field Φ from which we can derive the electric field vector just by applying the gradient operator. In fact, all curl-free fields (such as the electric field in this case) can be written as the gradient of some scalar field. That’s a universal truth. See how beautiful math is? 🙂
Or the end of theoretical physics?
In my previous post, I mentioned the Goliath of science and engineering: the Large Hadron Collider (LHC), built by the European Organization for Nuclear Research (CERN) under the Franco-Swiss border near Geneva. I actually started uploading some pictures, but then I realized I should write a separate post about it. So here we go.
The first image (see below) shows the LHC tunnel, while the other shows (a part of) one of the two large general-purpose particle detectors that are part of this Large Hadron Collider. A detector is the thing that’s used to look at those collisions. This is actually the smallest of the two general-purpose detectors: it’s the so-called CMS detector (the other one is the ATLAS detector), and it’s ‘only’ 21.6 meter long and 15 meter in diameter – and it weighs about 12,500 tons. But so it did detect a Higgs particle – just like the ATLAS detector. [That’s actually not 100% sure but it was sure enough for the Nobel Prize committee – so I guess that should be good enough for us common mortals :-)]
The picture above shows one of these collisions in the CMS detector. It’s not the one with the trace of the Higgs particle though. In fact, I have not found any image that actually shows the Higgs particle: the closest thing to such image are some impressionistic images on the ATLAS site. See http://atlas.ch/news/2013/higgs-into-fermions.html
In case you wonder what’s being scattered here… Well… All kinds of things – but so the original collision is usually between protons (so these are hydrogen ions: H+ nuclei), although the LHC can produce other nucleon beams as well (collectively referred to as hadrons). These protons have energy levels of 4 TeV (tera-electronVolt: 1 TeV = 1000 GeV = 1 trillion eV = 1×1012 eV).
Now, let’s think about scale once again. Remember (from that same previous post) that we calculated a wavelength of 0.33 nanometer (1 nm = 1×10–9 m, so that’s a billionth of a meter) for an electron. Well, this LHC is actually exploring the sub-femtometer (fm) frontier. One femtometer (fm) is 1×10–15 m so that’s another million times smaller. Yes: so we are talking a millionth of a billionth of a meter. The size of a proton is an estimated 1.7 femtometer indeed and, as you surely know, a proton is a point-like thing occupying a very tiny space, so it’s not like an electron ‘cloud’ swirling around: it’s much smaller. In fact, quarks – three of them make up a proton (or a neutron) – are usually thought of as being just a little bit less than half that size – so that’s about 0.7 fm.
It may also help you to use the value I mentioned for high-energy electrons when I was discussing the LEP (the Large Electron-Positron Collider, which preceded the LHC) – so that was 104.5 GeV – and calculate the associated de Broglie wavelength using E = hf and λ = v/f. The velocity v is close to c and, hence, if we plug everything in, we get a value close to 1.2×10–15 m indeed, so that’s the femtometer scale indeed. [If you don’t want to calculate anything, then just note we’re going from eV to giga-eV energy levels here, and so our wavelength decreases accordingly: one billion times smaller. Also remember (from the previous posts) that we calculated a wavelength of 0.33×10–6 m and an associated energy level of 70 eV for a slow-moving electron – i.e. one going at 2200 km per second ‘only’, i.e. less than 1% of the speed of light.] Also note that, at these energy levels, it doesn’t matter whether or not we include the rest mass of the electron: 0.511 MeV is nothing as compared to the GeV realm. In short, we are talking very very tiny stuff here.
But so that’s the LEP scale. I wrote that the LHC is probing things at the sub-femtometer scale. So how much sub-something is that? Well… Quite a lot: the LHC is looking at stuff at a scale that’s more than a thousand times smaller. Indeed, if collision experiments in the giga-electronvolt (GeV) energy range correspond to probing stuff at the femtometer scale, then tera-electronvolt (TeV) energy levels correspond to probing stuff that’s, once again, another thousand times smaller, so we’re looking at distances of less than a thousandth of a millionth of a billionth of a meter. Now, you can try to ‘imagine’ that, but you can’t really.
So what do we actually ‘see’ then? Well… Nothing much one could say: all we can ‘see’ are traces of point-like ‘things’ being scattered, which then disintegrate or just vanish from the scene – as shown in the image above. In fact, as mentioned above, we do not even have such clear-cut ‘trace’ of a Higgs particle: we’ve got a ‘kinda signal’ only. So that’s it? Yes. But then these images are beautiful, aren’t they? If only to remind ourselves that particle physics is about more than just a bunch of formulas. It’s about… Well… The essence of reality: its intrinsic nature so to say. So… Well…
Let me be skeptical. So we know all of that now, don’t we? The so-called Standard Model has been confirmed by experiment. We now know how Nature works, don’t we? We observe light (or, to be precise, radiation: most notably that cosmic background radiation that reaches us from everywhere) that originated nearly 14 billion years ago (to be precise: 380,000 years after the Big Bang – but what’s 380,000 years on this scale?) and so we can ‘see’ things that are 14 billion light-years away. In fact, things that were 14 billion light-years away: indeed, because of the expansion of the universe, they are further away now and so that’s why the so-called observable universe is actually larger. So we can ‘see’ everything we need to ‘see’ at the cosmic distance scale and now we can also ‘see’ all of the particles that make up matter, i.e. quarks and electrons mainly (we also have some other so-called leptons, like neutrinos and muons), and also all of the particles that make up anti-matter of course (i.e. antiquarks, positrons etcetera). As importantly – or even more – we can also ‘see’ all of the ‘particles’ carrying the forces governing the interactions between the ‘matter particles’ – which are collectively referred to as fermions, as opposed to the ‘force carrying’ particles, which are collectively referred to as bosons (see my previous post on Bose and Fermi). Let me quickly list them – just to make sure we’re on the same page:
- Photons for the electromagnetic force.
- Gluons for the so-called strong force, which explains why positively charged protons ‘stick’ together in nuclei – in spite of their electric charge, which should push them away from each other. [You might think it’s the neutrons that ‘glue’ them together but so, no, it’s the gluons.]
- W+, W–, and Z bosons for the so-called ‘weak’ interactions (aka as Fermi’s interaction), which explain how one type of quark can change into another, thereby explaining phenomena such as beta decay. [For example, carbon-14 will – through beta decay – spontaneously decay into nitrogen-14. Indeed, carbon-12 is the stable isotope, while carbon-14 has a life-time of 5,730 ± 40 years ‘only’ 🙂 and, hence, measuring how much carbon-14 is left in some organic substance allows us to date it (that’s what (radio)carbon-dating is about). As for the name, a beta particle can refer to an electron or a positron, so we can have β– decay (e.g. the above-mentioned carbon-14 decay) as well as β+ decay (e.g. magnesium-23 into sodium-23). There’s also alpha and gamma decay but that involves different things. In any case… Let me end this digression within the digression.]
- Finally, the existence of the Higgs particle – and, hence, of the associated Higgs field – has been predicted since 1964 already, but so it was only experimentally confirmed (i.e. we saw it, in the LHC) last year, so Peter Higgs – and a few others of course – got their well-deserved Nobel prize only 50 years later. The Higgs field gives fermions, and also the W+, W–, and Z bosons, mass (but not photons and gluons, and so that’s why the weak force has such short range – as compared to the electromagnetic and strong forces).
So there we are. We know it all. Sort of. Of course, there are many questions left – so it is said. For example, the Higgs particle does actually not explain the gravitational force, so it’s not the (theoretical) graviton, and so we do not have a quantum field theory for the gravitational force. [Just Google it and you’ll see why: there’s theoretical as well as practical (experimental) reasons for that.] Secondly, while we do have a quantum field theory for all of the forces (or ‘interactions’ as physicists prefer to call them), there are a lot of constants in them (much more than just that Planck constant I introduced in my posts!) that seem to be ‘unrelated and arbitrary.’ I am obviously just quoting Wikipedia here – but it’s true.
Just look at it: three ‘generations’ of matter with various strange properties, four force fields (and some ‘gauge theory’ to provide some uniformity), bosons that have mass (the W+, W–, and Z bosons, and then the Higgs particle itself) but then photons and gluons don’t… It just doesn’t look good, and then Feynman himself wrote, just a few years before his death (QED, 1985, p. 128), that the math behind calculating some of these constants (the coupling constant j for instance, or the rest mass n of an electron), which he actually invented (it makes use of a mathematical approximation method called perturbation theory) and for which he got a Nobel Prize, is a “dippy process” and that “having to resort to such hocus-pocus has prevented us from proving that the theory of quantum electrodynamics is mathematically self-consistent“. He adds: “It’s surprising that the theory still hasn’t been proved self-consistent one way or the other by now; I suspect that renormalization [“the shell game that we play to find n and j” as he calls it] is not mathematically legitimate.” And so he writes this about quantum electrodynamics, not about “the rest of physics” (and so that’s quantum chromodynamics (QCD) – the theory of the strong interactions – and quantum flavordynamics (QFD) – the theory of weak interactions) which, he adds, “has not been checked anywhere near as well as electrodynamics.”
Waw ! That’s a pretty damning statement, isn’t it? In short, all of the celebrations around the experimental confirmation of the Higgs particle cannot hide the fact that it all looks a bit messy. There are other questions as well – most of which I don’t understand so I won’t mention them. To make a long story short, physicists and mathematicians alike seem to think there must be some ‘more fundamental’ theory behind. But – Hey! – you can’t have it all, can you? And, of course, all these theoretical physicists and mathematicians out there do need to justify their academic budget, don’t they? And so all that talk about a Grand Unification Theory (GUT) is probably just what is it: talk. Isn’t it? Maybe.
The key question is probably easy to formulate: what’s beyond this scale of a thousandth of a proton diameter (0.001×10–15 m) – a thousandth of a millionth of a billionth of a meter that is. Well… Let’s first note that this so-called ‘beyond’ is a ‘universe’ which mankind (or let’s just say ‘we’) will never see. Never ever. Why? Because there is no way to go substantially beyond the 4 TeV energy levels that were reached last year – at great cost – in the world’s largest particle collider (the LHC). Indeed, the LHC is widely regarded not only as “the most complex and ambitious scientiﬁc project ever accomplished by humanity” (I am quoting a CERN scientist here) but – with a cost of more than 7.5 billion Euro – also as one of the most expensive ones. Indeed, taking into account inflation and all that, it was like the Manhattan project indeed (although scientists loathe that comparison). So we should not have any illusions: there will be no new super-duper LHC any time soon, and surely not during our lifetime: the current LHC is the super-duper thing!
Indeed, when I write ‘substantially‘ above, I really mean substantially. Just to put things in perspective: the LHC is currently being upgraded to produce 7 TeV beams (it was shut down for this upgrade, and it should come back on stream in 2015). That sounds like an awful lot (from 4 to 7 is +75%), and it is: it amounts to packing the kinetic energy of seven flying mosquitos (instead of four previously :-)) into each and every particle that makes up the beam. But that’s not substantial, in the sense that it is very much below the so-called GUT energy scale, which is the energy level above which, it is believed (by all those GUT theorists at least), the electromagnetic force, the weak force and the strong force will all be part and parcel of one and the same unified force. Don’t ask me why (I’ll know when I finished reading Penrose, I hope) but that’s what it is (if I should believe what I am reading currently that is). In any case, the thing to remember is that the GUT energy levels are in the 1016 GeV range, so that’s – sorry for all these numbers – a trillion TeV. That amounts to pumping more than 160,000 Joule in each of those tiny point-like particles that make up our beam. So… No. Don’t even try to dream about it. It won’t happen. That’s science fiction – with the emphasis on fiction. [Also don’t dream about a trillion flying mosquitos packed into one proton-sized super-mosquito either. :-)]
Well… I don’t know. Physicists refer to the zone beyond the above-mentioned scale (so things smaller than 0.001×10–15 m) as the Great Desert. That’s a very appropriate name I think – for more than one reason. And so it’s this ‘desert’ that Roger Penrose is actually trying to explore in his ‘Road to Reality’. As for me, well… I must admit I have great trouble following Penrose on this road. I’ve actually started to doubt that Penrose’s Road leads to Reality. Maybe it takes us away from it. Huh? Well… I mean… Perhaps the road just stops at that 0.001×10–15 m frontier?
In fact, that’s a view which one of the early physicists specialized in high-energy physics, Raoul Gatto, referred to as the zeroth scenario. I am actually not quoting Gatto here, but another theoretical physicist: Gerard ‘t Hooft, another Nobel prize winner (you may know him better because he’s a rather fervent Mars One supporter, but so here I am referring to his popular 1996 book In Search of the Ultimate Building Blocks). In any case, Gatto, and most other physicists, including ‘T Hooft (despite the fact ‘T Hooft got his Nobel prize for his contribution to gauge theory – which, together with Feynman’s application of perturbation theory to QED, is actually the backbone of the Standard Model) firmly reject this zeroth scenario. ‘T Hooft himself thinks superstring theory (i.e. supersymmetric string theory – which has now been folded into M-theory or – back to the original term – just string theory – the terminology is quite confusing) holds the key to exploring this desert.
But who knows? In fact, we can’t – because of the above-mentioned practical problem of experimental confirmation. So I am likely to stay on this side of the frontier for quite a while – if only because there’s still so much to see here and, of course, also because I am just at the beginning of this road. 🙂 And then I also realize I’ll need to understand gauge theory and all that to continue on this road – which is likely to take me another six months or so (if not more) and then, only then, I might try to look at those little strings, even if we’ll never see them because… Well… Their theoretical diameter is the so-called Planck length. So what? Well… That’s equal to 1.6×10−35 m. So what? Well… Nothing. It’s just that 1.6×10−35 m is 1/10 000 000 000 000 000 of that sub-femtometer scale. I don’t even want to write this in trillionths of trillionths of trillionths etcetera because I feel that’s just not making any sense. And perhaps it doesn’t. One thing is for sure: that ‘desert’ that GUT theorists want us to cross is not just ‘Great’: it’s ENORMOUS!
Richard Feynman – another Nobel Prize scientist whom I obviously respect a lot – surely thought trying to cross a desert like that amounts to certain death. Indeed, he’s supposed to have said the following about string theorists, about a year or two before he died (way too young): “I don’t like that they’re not calculating anything. I don’t like that they don’t check their ideas. I don’t like that for anything that disagrees with an experiment, they cook up an explanation–a fix-up to say, “Well, it might be true.” For example, the theory requires ten dimensions. Well, maybe there’s a way of wrapping up six of the dimensions. Yes, that’s all possible mathematically, but why not seven? When they write their equation, the equation should decide how many of these things get wrapped up, not the desire to agree with experiment. In other words, there’s no reason whatsoever in superstring theory that it isn’t eight out of the ten dimensions that get wrapped up and that the result is only two dimensions, which would be completely in disagreement with experience. So the fact that it might disagree with experience is very tenuous, it doesn’t produce anything; it has to be excused most of the time. It doesn’t look right.”
Hmm… Feynman and ‘T Hooft… Two giants in science. Two Nobel Prize winners – and for stuff that truly revolutionized physics. The amazing thing is that those two giants – who are clearly at loggerheads on this one – actually worked closely together on a number of other topics – most notably on the so-called Feynman-‘T Hooft gauge, which – as far as I understand – is the one that is most widely used in quantum field calculations. But I’ll leave it at that here – and I’ll just make a mental note of the terminology here. The Great Desert… Probably an appropriate term. ‘T Hooft says that most physicists think that desert is full of tiny flowers. I am not so sure – but then I am not half as smart as ‘T Hooft. Much less actually. So I’ll just see where the road I am currently following leads me. With Feynman’s warning in mind, I should probably expect the road condition to deteriorate quickly.
Post scriptum: You will not be surprised to hear that there’s a word for 1×10–18 m: it’s called an attometer (with two t’s, and abbreviated as am). And beyond that we have zeptometer (1 zm = 1×10–21 m) and yoctometer (1 ym = 1×10–23 m). In fact, these measures actually represent something: 20 yoctometer is the estimated radius of a 1 MeV neutrino – or, to be precise, its the radius of the cross section, which is “the effective area that governs the probability of some scattering or absorption event.” But so then there are no words anymore. The next measure is the Planck length: 1.62 × 10−35 m – but so that’s a trillion (1012) times smaller than a yoctometer. Unimaginable, isn’t it? Literally.
Note: A 1 MeV neutrino? Well… Yes. The estimated rest mass of an (electron) neutrino is tiny: at least 50,000 times smaller than the mass of the electron and, therefore, neutrinos are often assumed to be massless, for all practical purposes that is. However, just like the massless photon, they can carry high energy. High-energy gamma ray photons, for example, are also associated with MeV energy levels. Neutrinos are one of the many particles produced in high-energy particle collisions in particle accelerators, but they are present everywhere: they’re produced by stars (which, as you know, are nuclear fusion reactors). In fact, most neutrinos passing through Earth are produced by our Sun. The largest neutrino detector on Earth is called IceCube. It sits on the South Pole – or under it, as it’s suspended under the Antarctic ice, and it regularly captures high-energy neutrinos in the range of 1 to 10 TeV. Last year (in November 2013), it captured two with energy levels around 1000 TeV – so that’s the peta-electronvolt level (1 PeV = 1×1015 eV). If you think that’s amazing, it is. But also remember that 1 eV is 1.6×10−19 Joule, so it’s ‘only’ a ten-thousandth of a Joule. In other words, you would need at least ten thousand of them to briefly light up an LED. The PeV pair was dubbed Bert and Ernie and the illustration below (from IceCube’s website) conveys how the detectors sort of lit up when they passed. It was obviously a pretty clear ‘signal’ – but so the illustration also makes it clear that we don’t really ‘see’ at such small scale: we just know ‘something’ happened.
After all those boring pieces on math, it is about time I got back to physics. Indeed, what’s all that stuff on differential equations and complex numbers good for? This blog was supposed to be a journey into physics, wasn’t it? Yes. But wave functions – functions describing physical waves (in classical mechanics) or probability amplitudes (in quantum mechanics) – are the solution to some differential equation, and they will usually involve complex-number notation. However, I agree we have had enough of that now. Let’s see how it works. By the way, the title of this post – An Easy Piece – is an obvious reference to (some of) Feynman’s 1965 Lectures on Physics, some of which were re-packaged in 1994 (six years after his death that is) in ‘Six Easy Pieces’ indeed – but, IMHO, it makes more sense to read all of them as part of the whole series.
Let’s first look at one of the most used mathematical shapes: the sinusoidal wave. The illustration below shows the basic concepts: we have a wave here – some kind of cyclic thing – with a wavelength λ, an amplitude (or height) of (maximum) A0, and a so-called phase shift equal to φ. The Wikipedia definition of a wave is the following: “a wave is a disturbance or oscillation that travels through space and matter, accompanied by a transfer of energy.” Indeed, a wave transports energy as it travels (oh – I forgot to mention the speed or velocity of a wave (v) as an important characteristic of a wave), and the energy it carries is directly proportional to the square of the amplitude of the wave: E ∝ A2 (this is true not only for waves like water waves, but also for electromagnetic waves, like light).
Let’s now look at how these variables get into the argument – literally: into the argument of the wave function. Let’s start with that phase shift. The phase shift is usually defined referring to some other wave or reference point (in this case the origin of the x and y axis). Indeed, the amplitude – or ‘height’ if you want (think of a water wave, or the strength of the electric field) – of the wave above depends on (1) the time t (not shown above) and (2) the location (x), but so we will need to have this phase shift φ in the argument of the wave function because at x = 0 we do not have a zero height for the wave. So, as we can see, we can shift the x-axis left or right with this φ. OK. That’s simple enough. Let’s look at the other independent variables now: time and position.
The height (or amplitude) of the wave will obviously vary both in time as well as in space. On this graph, we fixed time (t = 0) – and so it does not appear as a variable on the graph – and show how the amplitude y = A varies in space (i.e. along the x-axis). We could also have looked at one location only (x = 0 or x1 or whatever other location) and shown how the amplitude varies over time at that location only. The graph would be very similar, except that we would have a ‘time distance’ between two crests (or between two troughs or between any other two points separated by a full cycle of the wave) instead of the wavelength λ (i.e. a distance in space). This ‘time distance’ is the time needed to complete one cycle and is referred to as the period of the wave (usually denoted by the symbol T or T0 – in line with the notation for the maximum amplitude A0). In other words, we will also see time (t) as well as location (x) in the argument of this cosine or sine wave function. By the way, it is worth noting that it does not matter if we use a sine or cosine function because we can go from one to the other using the basic trigonometric identities cos θ = sin(π/2 – θ) and sin θ = cos(π/2 – θ). So all waves of the shape above are referred to as sinusoidal waves even if, in most cases, the convention is to actually use the cosine function to represent them.
So we will have x, t and φ in the argument of the wave function. Hence, we can write A = A(x, t, φ) = cos(x + t + φ) and there we are, right? Well… No. We’re adding very different units here: time is measured in seconds, distance in meter, and the phase shift is measured in radians (i.e. the unit of choice for angles). So we can’t just add them up. The argument of a trigonometric function (like this cosine function) is an angle and, hence, we need to get everything in radians – because that’s the unit we use to measure angles. So how do we do that? Let’s do it step by step.
First, it is worth noting that waves are usually caused by something. For example, electromagnetic waves are caused by an oscillating point charge somewhere, and radiate out from there. Physical waves – like water waves, or an oscillating string – usually also have some origin. In fact, we can look at a wave as a way of transmitting energy originating elsewhere. In the case at hand here – i.e. the nice regular sinusoidal wave illustrated above – it is obvious that the amplitude at some time t = t1 at some point x = x1 will be the same as the amplitude of that wave at point x = 0 some time ago. How much time ago? Well… The time (t0 ) that was needed for that wave to travel from point x = 0 to point x = x1 is easy to calculate: indeed, if the wave originated at t = 0 and x = 0, then x1 (i.e. the distance traveled by the wave) will be equal to its velocity (v) multiplied by t1, so we have x1= v.t1 (note that we assume the wave velocity is constant – which is a very reasonable assumption). In other words, inserting x1and t1 in the argument of our cosine function should yield the same value as inserting zero for x and t. Distance and time can be substituted so to say, and that’s we will have something like x – vt or vt – x in the argument in that cosine function: we measure both time and distance in units of distance so to say. [Note that x – vt and –(x-vt) = vt – x are equivalent because cos θ = cos (-θ)]
Does this sound fishy? It shouldn’t. Think about it. In the (electric) field equation for electromagnetic radiation (that’s one of the examples of a wave which I mentioned above), you’ll find the so-called retarded acceleration a(t – x/c) in the argument: that’s the acceleration (a)of the charge causing the electric field at point x to change not at time t but at time t – x/c. So that’s the retarded acceleration indeed: x/c is the time it took for the wave to travel from its origin (the oscillating point charge) to x and so we subtract that from t. [When talking electromagnetic radiation (e.g. light), the wave velocity v is obviously equal to c, i.e. the speed of light, or of electromagnetic radiation in general.] Of course, you will now object that t – x/c is not the same as vt – x, and you are right: we need time units in the argument of that acceleration function, not distance. We can get to distance units if we would multiply the time with the wave velocity v but that’s complicated business because the velocity of that moving point charge is not a constant.
[…] I am not sure if I made myself clear here. If not, so be it. The thing to remember is that we need an input expressed in radians for our cosine function, not time, nor distance. Indeed, the argument in a sine or cosine function is an angle, not some distance. We will call that angle the phase of the wave, and it is usually denoted by the symbol θ – which we also used above. But so far we have been talking about amplitude as a function of distance, and we expressed time in distance units too – by multiplying it with v. How can we go from some distance to some angle? It is simple: we’ll multiply x – vt with 2π/λ.
Huh? Yes. Think about it. The wavelength will be expressed in units of distance – typically 1 m in the SI International System of Units but it could also be angstrom (10–10 m = 0.1 nm) or nano-meter (10–9 m = 10 Å). A wavelength of two meter (2 m) means that the wave only completes half a cycle per meter of travel. So we need to translate that into radians, which – once again – is the measure used to… well… measure angles, or the phase of the wave as we call it here. So what’s the ‘unit’ here? Well… Remember that we can add or subtract 2π (and any multiple of 2π, i.e. ± 2nπ with n = ±1, ±2, ±3,…) to the argument of all trigonometric functions and we’ll get the same value as for the original argument. In other words, a cycle characterized by a wavelength λ corresponds to the angle θ going around the origin and describing one full circle, i.e. 2π radians. Hence, it is easy: we can go from distance to radians by multiplying our ‘distance argument’ x – vt with 2π/λ. If you’re not convinced, just work it out for the example I gave: if the wavelength is 2 m, then 2π/λ equals 2π/2 = π. So traveling 6 meters along the wave – i.e. we’re letting x go from 0 to 6 m while fixing our time variable – corresponds to our phase θ going from 0 to 6π: both the ‘distance argument’ as well as the change in phase cover three cycles (three times two meter for the distance, and three times 2π for the change in phase) and so we’re fine. [Another way to think about it is to remember that the circumference of the unit circle is also equal to 2π (2π·r = 2π·1 in this case), so the ratio of 2π to λ measures how many times the circumference contains the wavelength.]
In short, if we put time and distance in the (2π/λ)(x-vt) formula, we’ll get everything in radians and that’s what we need for the argument for our cosine function. So our sinusoidal wave above can be represented by the following cosine function:
A = A(x, t) = A0cos[(2π/λ)(x-vt)]
We could also write A = A0cosθ with θ = (2π/λ)(x-vt). […] Both representations look rather ugly, don’t they? They do. And it’s not only ugly: it’s not the standard representation of a sinusoidal wave either. In order to make it look ‘nice’, we have to introduce some more concepts here, notably the angular frequency and the wave number. So let’s do that.
The angular frequency is just like the… well… the frequency you’re used to, i.e. the ‘non-angular’ frequency f, as measured in cycles per second (i.e. in Hertz). However, instead of measuring change in cycles per second, the angular frequency (usually denoted by the symbol ω) will measure the rate of change of the phase with time, so we can write or define ω as ω = ∂θ/∂t. In this case, we can easily see that ω = –2πv/λ. [Note that we’ll take the absolute value of that derivative because we want to work with positive numbers for such properties of functions.] Does that look complicated? In doubt, just remember that ω is measured in radians per second and then you can probably better imagine what it is really. Another way to understand ω somewhat better is to remember that the product of ω and the period T is equal to 2π, so that’s a full cycle. Indeed, the time needed to complete one cycle multiplied with the phase change per second (i.e. per unit time) is equivalent to going round the full circle: 2π = ω.T. Because f = 1/T, we can also relate ω to f and write ω = 2π.f = 2π/T.
Likewise, we can measure the rate of change of the phase with distance, and that gives us the wave number k = ∂θ/∂x, which is like the spatial frequency of the wave. So it is just like the wavelength but then measured in radians per unit distance. From the function above, it is easy to see that k = 2π/λ. The interpretation of this equality is similar to the ω.T = 2π equality. Indeed, we have a similar equation for k: 2π = k.λ, so the wavelength (λ) is for k what the period (T) is for ω. If you’re still uncomfortable with it, just play a bit with some numerical examples and you’ll be fine.
To make a long story short, this, then, allows us to re-write the sinusoidal wave equation above in its final form (and let me include the phase shift φ again in order to be as complete as possible at this stage):
A(x, t) = A0cos(kx – ωt + φ)
You will agree that this looks much ‘nicer’ – and also more in line with what you’ll find in textbooks or on Wikipedia. 🙂 I should note, however, that we’re not adding any new parameters here. The wave number k and the angular frequency ω are not independent: this is still the same wave (A = A0cos[(2π/λ)(x-vt)]), and so we are not introducing anything more than the frequency and – equally important – the speed with which the wave travels, which is usually referred to as the phase velocity. In fact, it is quite obvious from the ω.T = 2π and the k = 2π/λ identities that kλ = ω.T and, hence, taking into account that λ is obviously equal to λ = v.T (the wavelength is – by definition – the distance traveled by the wave in one period), we find that the phase (or wave) velocity v is equal to the ratio of ω and k, so we have that v = ω/k. So x, t, ω and k could be re-scaled or so but their ratio cannot change: the velocity of the wave is what it is. In short, I am introducing two new concepts and symbols (ω and k) but there are no new degrees of freedom in the system so to speak.
[At this point, I should probably say something about the difference between the phase velocity and the so-called group velocity of a wave. Let me do that in as brief a way as I can manage. Most real-life waves travel as a wave packet, aka a wave train. So that’s like a burst, or an “envelope” (I am shamelessly quoting Wikipedia here…), of “localized wave action that travels as a unit.” Such wave packet has no single wave number or wavelength: it actually consists of a (large) set of waves with phases and amplitudes such that they interfere constructively only over a small region of space, and destructively elsewhere. The famous Fourier analysis (or infamous if you have problems understanding what it is really) decomposes this wave train in simpler pieces. While these ‘simpler’ pieces – which, together, add up to form the wave train – are all ‘nice’ sinusoidal waves (that’s why I call them ‘simple’), the wave packet as such is not. In any case (I can’t be too long on this), the speed with which this wave train itself is traveling through space is referred to as the group velocity. The phase velocity and the group velocity are usually very different: for example, a wave packet may be traveling forward (i.e. its group velocity is positive) but the phase velocity may be negative, i.e. traveling backward. However, I will stop here and refer to the Wikipedia article on group and phase velocity: it has wonderful illustrations which are much and much better than anything I could write here. Just one last point that I’ll use later: regardless of the shape of the wave (sinusoidal, sawtooth or whatever), we have a very obvious relationship relating wavelength and frequency to the (phase) velocity: v = λ.f, or f = v/λ. For example, the frequency of a wave traveling 3 meter per second and wavelength of 1 meter will obviously have a frequency of three cycles per second (i.e. 3 Hz). Let’s go back to the main story line now.]
With the rather lengthy ‘introduction’ to waves above, we are now ready for the thing I really wanted to present here. I will go much faster now that we have covered the basics. Let’s go.
From my previous posts on complex numbers (or from what you know on complex numbers already), you will understand that working with cosine functions is much easier when writing them as the real part of a complex number A0eiθ = A0ei(kx – ωt + φ). Indeed, A0eiθ = A0(cosθ + isinθ) and so the cosine function above is nothing else but the real part of the complex number A0eiθ. Working with complex numbers makes adding waves and calculating interference effects and whatever we want to do with these wave functions much easier: we just replace the cosine functions by complex numbers in all of the formulae, solve them (algebra with complex numbers is very straightforward), and then we look at the real part of the solution to see what is happening really. We don’t care about the imaginary part, because that has no relationship to the actual physical quantities – for physical and electromagnetic waves that is, or for any other problem in classical wave mechanics. Done. So, in classical mechanics, the use of complex numbers is just a mathematical tool.
Now, that is not the case for the wave functions in quantum mechanics: the imaginary part of a wave equation – yes, let me write one down here – such as Ψ = Ψ(x, t) = (1/x)ei(kx – ωt) is very much part and parcel of the so-called probability amplitude that describes the state of the system here. In fact, this Ψ function is an example taken from one of Feynman’s first Lectures on Quantum Mechanics (i.e. Volume III of his Lectures) and, in this case, Ψ(x, t) = (1/x)ei(kx – ωt) represents the probability amplitude of a tiny particle (e.g. an electron) moving freely through space – i.e. without any external forces acting upon it – to go from 0 to x and actually be at point x at time t. [Note how it varies inversely with the distance because of the 1/x factor, so that makes sense.] In fact, when I started writing this post, my objective was to present this example – because it illustrates the concept of the wave function in quantum mechanics in a fairly easy and relatively understandable way. So let’s have a go at it.
First, it is necessary to understand the difference between probabilities and probability amplitudes. We all know what a probability is: it is a real number between o and 1 expressing the chance of something happening. It is usually denoted by the symbol P. An example is the probability that monochromatic light (i.e. one or more photons with the same frequency) is reflected from a sheet of glass. [To be precise, this probability is anything between 0 and 16% (i.e. P = 0 to 0.16). In fact, this example comes from another fine publication of Richard Feynman – QED (1985) – in which he explains how we can calculate the exact probability, which depends on the thickness of the sheet.]
A probability amplitude is something different. A probability amplitude is a complex number (3 + 2i, or 2.6ei1.34, for example) and – unlike its equivalent in classical mechanics – both the real and imaginary part matter. That being said, probabilities and probability amplitudes are obviously related: to be precise, one calculates the probability of an event actually happening by taking the square of the modulus (or the absolute value) of the probability amplitude associated with that event. Huh? Yes. Just let it sink in. So, if we denote the probably amplitude by Φ, then we have the following relationship:
P = probability
Φ = probability amplitude
In addition, where we would add and multiply probabilities in the classical world (for example, to calculate the probability of an event which can happen in two different ways – alternative 1 and alternative 2 let’s say – we would just add the individual probabilities to arrive at the probably of the event happening in one or the other way, so P = P1+ P2), in the quantum-mechanical world we should add and multiply probability amplitudes, and then take the square of the modulus of that combined amplitude to calculate the combined probability. So, formally, the probability of a particle to reach a given state by two possible routes (route 1 or route 2 let’s say) is to be calculated as follows:
Φ = Φ1+ Φ2
and P =|Φ|2 =|Φ1+ Φ2|2
Also, when we have only one route, but that one route consists of two successive stages (for example: to go from A to C, the particle would have first have to go from A to B, and then from B to C, with different probabilities of stage AB and stage BC actually happening), we will not multiply the probabilities (as we would do in the classical world) but the probability amplitudes. So we have:
Φ = ΦAB ΦBC
and P =|Φ|2 =|ΦAB ΦBC|2
In short, it’s the probability amplitudes (and, as mentioned, these are complex numbers, not real numbers) that are to be added and multiplied etcetera and, hence, the probability amplitudes act as the equivalent, so to say, in quantum mechanics, of the conventional probabilities in classical mechanics. The difference is not subtle. Not at all. I won’t dwell too much on this. Just re-read any account of the double-slit experiment with electrons which you may have read and you’ll remember how fundamental this is. [By the way, I was surprised to learn that the double-slit experiment with electrons has apparently only been done in 2012 in exactly the way as Feynman described it. So when Feynman described it in his 1965 Lectures, it was still very much a ‘thought experiment’ only – even a 1961 experiment (not mentioned by Feynman) had clearly established the reality of electron interference.]
OK. Let’s move on. So we have this complex wave function in quantum mechanics and, as Feynman writes, “It is not like a real wave in space; one cannot picture any kind of reality to this wave as one does for a sound wave.” That being said, one can, however, get pretty close to ‘imagining’ what it actually is IMHO. Let’s go by the example which Feynman gives himself – on the very same page where he writes the above actually. The amplitude for a free particle (i.e. with no forces acting on it) with momentum p = mv to go from location r1 to location r2 is equal to
Φ12 = (1/r12)eip.r12/ħ with r12 = r2 – r1
I agree this looks somewhat ugly again, but so what does it say? First, be aware of the difference between bold and normal type: I am writing p and v in bold type above because they are vectors: they have a magnitude (which I will denote by p and v respectively) as well as a direction in space. Likewise, r12 is a vector going from r1 to r2 (and r1 and r2 themselves are space vectors themselves obviously) and so r12 (non-bold) is the magnitude of that vector. Keeping that in mind, we know that the dot product p.r12 is equal to the product of the magnitudes of those vectors multiplied by cosα, with α the angle between those two vectors. Hence, p.r12 .= p.r12.cosα. Now, if p and r12 have the same direction, the angle α will be zero and so cosα will be equal to one and so we just have p.r12 = p.r12 or, if we’re considering a particle going from 0 to some position x, p.r12 = p.r12 = px.
Now we also have Planck’s constant there, in its reduced form ħ = h/2π. As you can imagine, this 2π has something to do with the fact that we need radians in the argument. It’s the same as what we did with x in the argument of that cosine function above: if we have to express stuff in radians, then we have to absorb a factor of 2π in that constant. However, here I need to make an additional digression. Planck’s constant is obviously not just any constant: it is the so-called quantum of action. Indeed, it appears in what may well the most fundamental relations in physics.
The first of these fundamental relations is the so-called Planck relation: E = hf. The Planck relation expresses the wave-particle duality of light (or electromagnetic waves in general): light comes in discrete quanta of energy (photons), and the energy of these ‘wave particles’ is directly proportional to the frequency of the wave, and the factor of proportionality is Planck’s constant.
The second fundamental relation, or relations – in plural – I should say, are the de Broglie relations. Indeed, Louis-Victor-Pierre-Raymond, 7th duc de Broglie, turned the above on its head: if the fundamental nature of light is (also) particle-like, then the fundamental nature of particles must (also) be wave-like. So he boldly associated a frequency f and a wavelength λ with all particles, such as electrons for example – but larger-scale objects, such as billiard balls, or planets, also have a de Broglie wavelength and frequency! The de Broglie relation determining the de Broglie frequency is – quite simply – the re-arranged Planck relation: f = E/h. So this relation relates the de Broglie frequency with energy. However, in the above wave function, we’ve got momentum, not energy. Well… Energy and momentum are obviously related, and so we have a second de Broglie relation relating momentum with wavelength: λ = h/p.
We’re almost there: just hang in there. 🙂 When we presented the sinusoidal wave equation, we introduced the angular frequency (ω) and the wave number (k), instead of working with f and λ. That’s because we want an argument expressed in radians. Here it’s the same. The two de Broglie equations have a equivalent using angular frequency and wave number: ω = E/ħ and k = p/ħ. So we’ll just use the second one (i.e. the relation with the momentum in it) to associate a wave number with the particle (k = p/ħ).
Phew! So, finally, we get that formula which we introduced a while ago already: Ψ(x) = (1/x)eikx, or, including time as a variable as well (we made abstraction of time so far):
Ψ(x, t) = (1/x)ei(kx – ωt)
The formula above obviously makes sense. For example, the 1/x factor makes the probability amplitude decrease as we get farther away from where the particle started: in fact, this 1/x or 1/r variation is what we see with electromagnetic waves as well: the amplitude of the electric field vector E varies as 1/r and, because we’re talking some real wave here and, hence, its energy is proportional to the square of the field, the energy that the source can deliver varies inversely as the square of the distance. [Another way of saying the same is that the energy we can take out of a wave within a given conical angle is the same, no matter how far away we are: the energy flux is never lost – it just spreads over a greater and greater effective area. But let’s go back to the main story.]
We’ve got the math – I hope. But what does this equation mean really? What’s that de Broglie wavelength or frequency in reality? What wave are we talking about? Well… What’s reality? As mentioned above, the famous de Broglie relations associate a wavelength λ and a frequency f to a particle with momentum p and energy E, but it’s important to mention that the associated de Broglie wave function yields probability amplitudes. So it is, indeed, not a ‘real wave in space’ as Feynman would put it. It is a quantum-mechanical wave equation.
Huh? […] It’s obviously about time I add some illustrations here, and so that’s what I’ll do. Look at the two cases below. The case on top is pretty close to the situation I described above: it’s a de Broglie wave – so that’s a complex wave – traveling through space (in one dimension only here). The real part of the complex amplitude is in blue, and the green is the imaginary part. So the probability of finding that particle at some position x is the modulus squared of this complex amplitude. Now, this particular wave function ignores the 1/x variation and, hence, the squared modulus of Aei(kx – ωt) is equal to a constant. To be precise, it’s equal to A2 (check it: the squared modulus of a complex number z equals the product of z and its complex conjugate, and so we get A2 as a result indeed). So what does this mean? It means that the probability of finding that particle (an electron, for example) is the same at all points! In other words, we don’t know where it is! In the illustration below (top part), that’s shown as the (yellow) color opacity: the probability is spread out, just like the wave itself, so there is no definite position of the particle indeed.
[Note that the formula in the illustration above (which I took from Wikipedia once again) uses p instead of k as the factor in front of x. While it does not make a big difference from a mathematical point of view (ħ is just a factor of proportionality: k = p/ħ), it does make a big difference from a conceptual point of view and, hence, I am puzzled as to why the author of this article did this. Also, there is some variation in the opacity of the yellow (i.e. the color of our tennis (or ping pong) ball representing our ‘wavicle’) which shouldn’t be there because the probability associated with this particular wave function is a constant indeed: so there is no variation in the probability (when squaring the absolute value of a complex number, the phase factor does not come into play). Also note that, because all probabilities have to add up to 100% (or to 1), a wave function like this is quite problematic. However, don’t worry about it just now: just try to go with the flow.]
By now, I must assume you shook your head in disbelief a couple of time already. Surely, this particle (let’s stick to the example of an electron) must be somewhere, yes? Of course.
The problem is that we gave an exact value to its momentum and its energy and, as a result, through the de Broglie relations, we also associated an exact frequency and wavelength to the de Broglie wave associated with this electron. Hence, Heisenberg’s Uncertainty Principle comes into play: if we have exact knowledge on momentum, then we cannot know anything about its location, and so that’s why we get this wave function covering the whole space, instead of just some region only. Sort of. Here we are, of course, talking about that deep mystery about which I cannot say much – if only because so many eminent physicists have already exhausted the topic. I’ll just state Feynman once more: “Things on a very small scale behave like nothing that you have any direct experience with. […] It is very difficult to get used to, and it appears peculiar and mysterious to everyone – both to the novice and to the experienced scientist. Even the experts do not understand it the way they would like to, and it is perfectly reasonable that they should not because all of direct, human experience and of human intuition applies to large objects. We know how large objects will act, but things on a small scale just do not act that way. So we have to learn about them in a sort of abstract or imaginative fashion and not by connection with our direct experience.” And, after describing the double-slit experiment, he highlights the key conclusion: “In quantum mechanics, it is impossible to predict exactly what will happen. We can only predict the odds [i.e. probabilities]. Physics has given up on the problem of trying to predict exactly what will happen. Yes! Physics has given up. We do not know how to predict what will happen in a given circumstance. It is impossible: the only thing that can be predicted is the probability of different events. It must be recognized that this is a retrenchment in our ideal of understanding nature. It may be a backward step, but no one has seen a way to avoid it.”
[…] That’s enough on this I guess, but let me – as a way to conclude this little digression – just quickly state the Uncertainty Principle in a more or less accurate version here, rather than all of the ‘descriptions’ which you may have seen of it: the Uncertainty Principle refers to any of a variety of mathematical inequalities asserting a fundamental limit (fundamental means it’s got nothing to do with observer or measurement effects, or with the limitations of our experimental technologies) to the precision with which certain pairs of physical properties of a particle (these pairs are known as complementary variables) such as, for example, position (x) and momentum (p), can be known simultaneously. More in particular, for position and momentum, we have that σxσp ≥ ħ/2 (and, in this formulation, σ is, obviously the standard symbol for the standard deviation of our point estimate for x and p respectively).
OK. Back to the illustration above. A particle that is to be found in some specific region – rather than just ‘somewhere’ in space – will have a probability amplitude resembling the wave equation in the bottom half: it’s a wave train, or a wave packet, and we can decompose it, using the Fourier analysis, in a number of sinusoidal waves, but so we do not have a unique wavelength for the wave train as a whole, and that means – as per the de Broglie equations – that there’s some uncertainty about its momentum (or its energy).
I will let this sink in for now. In my next post, I will write some more about these wave equations. They are usually a solution to some differential equation – and that’s where my next post will connect with my previous ones (on differential equations). Just to say goodbye – as for now that is – I will just copy another beautiful illustration from Wikipedia. See below: it represents the (likely) space in which a single electron on the 5d atomic orbital of a hydrogen atom would be found. The solid body shows the places where the electron’s probability density (so that’s the squared modulus of the probability amplitude) is above a certain value – so it’s basically the area where the likelihood of finding the electron is higher than elsewhere. The hue on the colored surface shows the complex phase of the wave function.
It is a wonderful image, isn’t it? At the very least, it increased my understanding of the mystery surround quantum mechanics somewhat. I hope it helps you too. 🙂
Post scriptum 1: On the need to normalize a wave function
In this post, I wrote something about the need for probabilities to add up to 1. In mathematical terms, this condition will resemble something like
In this integral, we’ve got – once again – the squared modulus of the wave function, and so that’s the probability of find the particle somewhere. The integral just states that all of the probabilities added all over space (Rn) should add up to some finite number (a2). Hey! But that’s not equal to 1 you’ll say. Well… That’s a minor problem only: we can create a normalized wave function ψ out of ψ0 by simply dividing ψ by a so we have ψ = ψ0/a, and then all is ‘normal’ indeed. 🙂
Post scriptum 2: On using colors to represent complex numbers
When inserting that beautiful 3D graph of that 5d atomic orbital (again acknowledging its source: Wikipedia), I wrote that “the hue on the colored surface shows the complex phase of the wave function.” Because this kind of visual representation of complex numbers will pop up in other posts as well (and you’ve surely encountered it a couple of times already), it’s probably useful to be explicit on what it represents exactly. Well… I’ll just copy the Wikipedia explanation, which is clear enough: “Given a complex number z = reiθ, the phase (also known as argument) θ can be represented by a hue, and the modulus r =|z| is represented by either intensity or variations in intensity. The arrangement of hues is arbitrary, but often it follows the color wheel. Sometimes the phase is represented by a specific gradient rather than hue.” So here you go…
Post scriptum 3: On the de Broglie relations
The de Broglie relations are a wonderful pair. They’re obviously equivalent: energy and momentum are related, and wavelength and frequency are obviously related too through the general formula relating frequency, wavelength and wave velocity: fλ = v (the product of the frequency and the wavelength must yield the wave velocity indeed). However, when it comes to the relation between energy and momentum, there is a little catch. What kind of energy are we talking about? We were describing a free particle (e.g. an electron) traveling through space, but with no (other) charges acting on it – in other words: no potential acting upon it), and so we might be tempted to conclude that we’re talking about the kinetic energy (K.E.) here. So, at relatively low speeds (v), we could be tempted to use the equations p = mv and K.E. = p2/2m = mv2/2 (the one electron in a hydrogen atom travels at less than 1% of the speed of light, and so that’s a non-relativistic speed indeed) and try to go from one equation to the other with these simple formulas. Well… Let’s try it.
f = E/h according to de Broglie and, hence, substituting E with p2/2m and f with v/λ, we get v/λ = m2v2/2mh. Some simplification and re-arrangement should then yield the second de Broglie relation: λ = 2h/mv = 2h/p. So there we are. Well… No. The second de Broglie relation is just λ = h/p: there is no factor 2 in it. So what’s wrong? The problem is the energy equation: de Broglie does not use the K.E. formula. [By the way, you should note that the K.E. = mv2/2 equation is only an approximation for low speeds – low compared to c that is.] He takes Einstein’s famous E = mc2 equation (which I am tempted to explain now but I won’t) and just substitutes c, the speed of light, with v, the velocity of the slow-moving particle. This is a very fine but also very deep point which, frankly, I do not yet fully understand. Indeed, Einstein’s E = mc2 is obviously something much ‘deeper’ than the formula for kinetic energy. The latter has to do with forces acting on masses and, hence, obeys Newton’s laws – so it’s rather familiar stuff. As for Einstein’s formula, well… That’s a result from relativity theory and, as such, something that is much more difficult to explain. While the difference between the two energy formulas is just a factor of 1/2 (which is usually not a big problem when you’re just fiddling with formulas like this), it makes a big conceptual difference.
Hmm… Perhaps we should do some examples. So these de Broglie equations associate a wave with frequency f and wavelength λ with particles with energy E, momentum p and mass m traveling through space with velocity v: E = hf and p = h/λ. [And, if we would want to use some sine or cosine function as an example of such wave function – which is likely – then we need an argument expressed in radians rather than in units of time or distance. In other words, we will need to convert frequency and wavelength to angular frequency and wave number respectively by using the 2π = ωT = ω/f and 2π = kλ relations, with the wavelength (λ), the period (T) and the velocity (v) of the wave being related through the simple equations f = 1/T and λ = vT. So then we can write the de Broglie relations as: E = ħω and p = ħk, with ħ = h/2π.]
In these equations, the Planck constant (be it h or ħ) appears as a simple factor of proportionality (we will worry about what h actually is in physics in later posts) – but a very tiny one: approximately 6.626×10–34 J·s (Joule is the standard SI unit to measure energy, or work: 1 J = 1 kg·m2/s2), or 4.136×10–15 eV·s when using a more appropriate (i.e. larger) measure of energy for atomic physics: still, 10–15 is only 0.000 000 000 000 001. So how does it work? First note, once again, that we are supposed to use the equivalent for slow-moving particles of Einstein’s famous E = mc2 equation as a measure of the energy of a particle: E = mv2. We know velocity adds mass to a particle – with mass being a measure for inertia. In fact, the mass of so-called massless particles, like photons, is nothing but their energy (divided by c2). In other words, they do not have a rest mass, but they do have a relativistic mass m = E/c2, with E = hf (and with f the frequency of the light wave here). Particles, such as electrons, or protons, do have a rest mass, but then they don’t travel at the speed of light. So how does that work out in that E = mv2 formula which – let me emphasize this point once again – is not the standard formula (for kinetic energy) that we’re used to (i.e. E = mv2/2)? Let’s do the exercise.
For photons, we can re-write E = hf as E = hc/λ. The numerator hc in this expression is 4.136×10–15 eV·s (i.e. the value of the Planck constant h expressed in eV·s) multiplied with 2.998×108 m/s (i.e. the speed of light c) so that’s (more or less) hc ≈ 1.24×10–6 eV·m. For visible light, the denominator will range from 0.38 to 0.75 micrometer (1 μm = 10–6 m), i.e. 380 to 750 nanometer (1 nm = 10–6 m), and, hence, the energy of the photon will be in the range of 3.263 eV to 1.653 eV. So that’s only a few electronvolt (an electronvolt (eV) is, by definition, the amount of energy gained (or lost) by a single electron as it moves across an electric potential difference of one volt). So that’s 2.6 to 5.2 Joule (1 eV = 1.6×10–19 Joule) and, hence, the equivalent relativistic mass of these photons is E/c2 or 2.9 to 5.8×10–34 kg. That’s tiny – but not insignificant. Indeed, let’s look at an electron now.
The rest mass of an electron is about 9.1×10−31 kg (so that’s a scale factor of a thousand as compared to the values we found for the relativistic mass of photons). Also, in a hydrogen atom, it is expected to speed around the nucleus with a velocity of about 2.2×106 m/s. That’s less than 1% of the speed of light but still quite fast obviously: at this speed (2,200 km per second), it could travel around the earth in less than 20 seconds (a photon does better: it travels not less than 7.5 times around the earth in one second). In any case, the electron’s energy – according to the formula to be used as input for calculating the de Broglie frequency – is 9.1×10−31 kg multiplied with the square of 2.2×106 m/s, and so that’s about 44×10–19 Joule or about 70 eV (1 eV = 1.6×10–19 Joule). So that’s – roughly – 35 times more than the energy associated with a photon.
The frequency we should associate with 70 eV can be calculated from E = hv/λ (we should, once again, use v instead of c), but we can also simplify and calculate directly from the mass: λ = hv/E = hv/mv2 = h/mv (however, make sure you express h in J·s in this case): we get a value for λ equal to 0.33 nanometer, so that’s more than one thousand times shorter than the above-mentioned wavelengths for visible light. So, once again, we have a scale factor of about a thousand here. That’s reasonable, no? [There is a similar scale factor when moving to the next level: the mass of protons and neutrons is about 2000 times the mass of an electron.] Indeed, note that we would get a value of 0.510 MeV if we would apply the E = mc2, equation to the above-mentioned (rest) mass of the electron (in kg): MeV stands for mega-electronvolt, so 0.510 MeV is 510,000 eV. So that’s a few hundred thousand times the energy of a photon and, hence, it is obvious that we are not using the energy equivalent of an electron’s rest mass when using de Broglie’s equations. No. It’s just that simple but rather mysterious E = mv2 formula. So it’s not mc2 nor mv2/2 (kinetic energy). Food for thought, isn’t it? Let’s look at the formulas once again.
They can easily be linked: we can re-write the frequency formula as λ = hv/E = hv/mv2 = h/mv and then, using the general definition of momentum (p = mv), we get the second de Broglie equation: p = h/λ. In fact, de Broglie‘s rather particular definition of the energy of a particle (E = mv2) makes v a simple factor of proportionality between the energy and the momentum of a particle: v = E/p or E = pv. [We can also get this result in another way: we have h = E/f = pλ and, hence, E/p = fλ = v.]
Again, this is serious food for thought: I have not seen any ‘easy’ explanation of this relation so far. To appreciate its peculiarity, just compare it to the usual relations relating energy and momentum: E =p2/2m or, in its relativistic form, p2c2 = E2 – m02c4 . So these two equations are both not to be used when going from one de Broglie relation to another. [Of course, it works for massless photons: using the relativistic form, we get p2c2 = E2 – 0 or E = pc, and the de Broglie relation becomes the Planck relation: E = hf (with f the frequency of the photon, i.e. the light beam it is part of). We also have p = h/λ = hf/c, and, hence, the E/p = c comes naturally. But that’s not the case for (slower-moving) particles with some rest mass: why should we use mv2 as a energy measure for them, rather than the kinetic energy formula?
But let’s just accept this weirdness and move on. After all, perhaps there is some mistake here and so, perhaps, we should just accept that factor 2 and replace λ = h/p by λ = 2h/p. Why not? 🙂 In any case, both the λ = h/mv and λ = 2h/p = 2h/mv expressions give the impression that both the mass of a particle as well as its velocity are on a par so to say when it comes to determining the numerical value of the de Broglie wavelength: if we double the speed, or the mass, the wavelength gets shortened by half. So, one would think that larger masses can only be associated with extremely short de Broglie wavelengths if they move at a fairly considerable speed. But that’s where the extremely small value of h changes the arithmetic we would expect to see. Indeed, things work different at the quantum scale, and it’s the tiny value of h that is at the core of this. Indeed, it’s often referred to as the ‘smallest constant’ in physics, and so here’s the place where we should probably say a bit more about what h really stands for.
Planck’s constant h describes the tiny discrete packets in which Nature packs energy: one cannot find any smaller ‘boxes’. As such, it’s referred to as the ‘quantum of action’. But, surely, you’ll immediately say that it’s cousin, ħ = h/2π, is actually smaller. Well… Yes. You’re actually right: ħ = h/2π is actually smaller. It’s the so-called quantum of angular momentum, also (and probably better) known as spin. Angular momentum is a measure of… Well… Let’s call it the ‘amount of rotation’ an object has, taking into account its mass, shape and speed. Just like p, it’s a vector. To be precise, it’s the product of a body’s so-called rotational inertia (so that’s similar to the mass m in p = mv) and its rotational velocity (so that’s like v, but it’s ‘angular’ velocity), so we can write L = Iω but we’ll not go in any more detail here. The point to note is that angular momentum, or spin as it’s known in quantum mechanics, also comes in discrete packets, and these packets are multiples of ħ. [OK. I am simplifying here but the idea or principle that I am explaining here is entirely correct.]
But let’s get back to the de Broglie wavelength now. As mentioned above, one would think that larger masses can only be associated with extremely short de Broglie wavelengths if they move at a fairly considerable speed. Well… It turns out that the extremely small value of h upsets our everyday arithmetic. Indeed, because of the extremely small value of h as compared to the objects we are used to ( in one grain of salt alone, we will find about 1.2×1018 atoms – just write a 1 with 18 zeroes behind and you’ll appreciate this immense numbers somewhat more), it turns out that speed does not matter all that much – at least not in the range we are used to. For example, the de Broglie wavelength associated with a baseball weighing 145 grams and traveling at 90 mph (i.e. approximately 40 m/s) would be 1.1×10–34 m. That’s immeasurably small indeed – literally immeasurably small: not only technically but also theoretically because, at this scale (i.e. the so-called Planck scale), the concepts of size and distance break down as a result of the Uncertainty Principle. But, surely, you’ll think we can improve on this if we’d just be looking at a baseball traveling much slower. Well… It does not much get better for a baseball traveling at a snail’s pace – let’s say 1 cm per hour, i.e. 2.7×10–6 m/s. Indeed, we get a wavelength of 17×10–28 m, which is still nowhere near the nanometer range we found for electrons. Just to give an idea: the resolving power of the best electron microscope is about 50 picometer (1 pm = ×10–12 m) and so that’s the size of a small atom (the size of an atom ranges between 30 and 300 pm). In short, for all practical purposes, the de Broglie wavelength of the objects we are used to does not matter – and then I mean it does not matter at all. And so that’s why quantum-mechanical phenomena are only relevant at the atomic scale.