[Preliminary note (added on 13 June 2019): When re-reading what I wrote below, I realize I would fundamentally re-write certain sections. I think I have found a comprehensive realist interpretation of quantum mechanics and, hence, I’d recommend you check my recent papers. The writings below are probably just good to illustrate how I got there. Lettura felice!]
In the movie about Stephen Hawking’s life, The Theory of Everything, there is talk about a single unifying equation that would explain everything in the universe. I must assume the real Stephen Hawking is familiar with Feynman’s unworldliness equation: U = 0, which – as Feynman convincingly demonstrates – effectively integrates all known equations in physics. It’s one of Feynman’s many jokes, of course, but an exceptionally clever one, as the argument convincingly shows there’s no such thing as one equation that explains all. Well… What Feynman proves is that one can actually ‘hide‘ all the equations you want in a single equation, but it’s just a trick. As Feynman puts it: “When you unwrap the whole thing, you get back where you were before.” Having said that, some equations in physics are obviously more fundamental than others. Besides Maxwell’s equations – on which I shall come back in a moment – these are the equations one should probably think of as being more fundamental than others:
- Einstein’s mass-energy equivalence: m = E/c2
- The wavefunction: ψ = ψ(x, t) = e−i(ωt − kx) = ei(kx − ωt)
- The two de Broglie relations that determine the spatial and temporal frequency of the wavefunction:
ω = E/ħ and k = p/ħ
And, of course, Schrödinger’s equation:
The wavefunction, and Schrödinger’s equation, look the most formidable, of course. However, the mathematical symbols describe rather simple things. First look at the wavefunction: ψ = ei(kx − ωt). That’s Euler’s formula:
eiθ = cosθ + isinθ = cos(kx − ωt) + isin(kx − ωt)
So it’s just a two-dimensional equivalent of writing y = cos(kx – ωt) or, more generally, y = F(x−vt).
Huh? Yes. The y = F(x−vt) function describes any wave that’s traveling in the positive x-direction. In case you forgot, let me remind you of how we’d find the same value for F by simultaneously (a) adding the distance Δx = vΔt to, and (b) subtracting the time vΔt from, the argument. We wrote:
F[x+Δx−v(t+Δt)] = F(x+vΔt–ct−vΔt) = F(x–ct)
What we do here is converting something that’s expressed in time units into something that’s expressed in distance units, and vice versa, as both dimensions are related through the wave velocity: v = x/t = Δx/Δt. You’ll say: how do you get from F(x−vt) to cos(kx – ωt)? That’s simple: the argument of a sinusoidal function is expressed in radians, rather than in meter or in seconds. So, using the v = λf formula (the velocity of a regular wave is its frequency times its wavelength), we’d write y = cos(2π(x–ct)/λ). We divide x–ct by the wavelength λ here, so the (x–ct)/λ factor is something that’s measured not in meter but in wavelengths, and then we multiply the thing with 2π because we know that one cycle (i.e. one wavelength) corresponds to 2π radians—and we need an argument expressed in radians in our sinusoidal function, because it’s an angle! It has to be. Now, k = 2π/λ and ω = 2πf, and so we substitute and we get what we wanted to get: cos(2π(x–vt)/λ) = cos[(2π/λ)x–(2πv/λ)t) = cos[kx–(2πf)t) = cos(kx – ωt). I managed to explain this to my 16-year old son, Vincent, so you should have no difficulty in understanding this.
So ψ = ei(kx − ωt) is the same old wavefunction, really. The only thing is that this ei(kx − ωt) function gives you two functions for the price of one. 🙂 To be precise, a sine and a cosine function are one and the same function, really, but with a phase difference of 90 degrees, so that’s π/2 radians. That’s illustrated below: cosθ = sin(θ+π/2).
So all you need to understand is this:
eiθ = cosθ + isinθ = sin(θ+π/2) + isinθ
What does this say, really? This formula tells us that our ψ(x, t) = e−i(ωt − kx) wavefunction is two-dimensional indeed. Now, the term ‘dimension’ is one of those ubiquitous terms in physics and in mathematics that means different things in different contexts, so I should be more precise here but… Well… I can’t. The wavefunction consists of two parts. One part is referred to as ‘real’ (I am talking about the cosine here) while the other (the sine) is referred to as ‘imaginary’. However, both are equally real, of course, because… Well… Both are equally essential: the one bit comes with the other, and vice versa.
Now, we could keep track of the real and imaginary part of our wavefunction in a different way. Instead of writing cosθ + isinθ, we could write our wavefunction as a vector function:
ψ(x, t) = [cos(kx−ωt), sin(kx−ωt)] = [cos(kx−ωt+π/2), sin(kx−ωt)]
Why don’t we do that? We could, but it turns out the imaginary unit i is something more than just a delimiter: it’s not just some symbol separating values in some sequence. We use it in calculations. For example, we can move it to the other side in Schrödinger’s equation by multiplying both sides of the equation with its multiplicative inverse 1/i = i−1 = −i. In fact, if we also move ħ, then we get an expression which looks somewhat simpler:
So why is it that physicists never write it that way? I am not sure. It’s probably because of that expression using the Hamiltonian operator (H), which is used in the equation that’s engraved on Schrödinger’s bust, and on his grave:
The simpler expression, with i and ħ on the right-hand side of the equation, is interesting though, because the expression makes us think of the geometry involved. Huh? Yes. You should think of two things here:
- Multiplying some vector quantity with the imaginary unit (i) amounts to a (counter-clockwise) rotation by 90 degrees.
- When you see a Laplacian in physics – as we do in Schrödinger’s equation (∇2 = ∇·∇ = ∂2/∂x2 + ∂2/∂y2 + ∂2/∂z2) − it will usually represent the flux density of the (gradient) flow of a function.
Huh? What? Well.. Yes. Let me first explain the rotation, as that’s the easiest thing to explain. The following illustration should do the trick:
[By the way, I did some posts on complex numbers myself, but here’s a link to an author who does a really good job at explaining them in a very intuitive way. Have fun with it!] As for the geometrical interpretation of the Laplacian in Schrödinger’s equation, I should refer you to my posts on vector analysis, but let’s review the basics here, using the example of heat diffusion in some material body, which was our first example of a vector equation:
h = –κ∇T
We encountered the same equation elsewhere. For example, if the electric potential is Φ, then we can immediately calculate the electric field using the following formula:
E = –∇Φ
It’s the same thing—mathematically speaking, that is. Now we need a little bit of math to understand these equations in the way we want to understand them, and that’s in some kind of ‘physical‘ way (as opposed to just the math side of it). So… Well… Let me spell it out:
- The heat flow is represented by the h vector, and its magnitude is the thermal energy that passes, per unit time and per unit area, through an infinitesimally small isothermal surface, so we write: h = |h| = ΔJ/ΔA.
- The direction of the heat flow is opposite to the direction of the gradient vector ∇T, so we’ve got a minus sign in our equation. That’s simple enough to understand: heat flows from higher to lower temperature, i.e. ‘downhill’, so to speak. So that’s the minus sign. Note that the gradient operator (∇) acts on a scalar-valued function (T), aka a scalar field, but yields a vector: ∇T = (∂T/∂x, ∂T/∂y, ∂T/∂z). [Always remember: boldface symbols represent vectors, while symbols in regular font (e.g. T) are scalars or scalar functions: T = T(x, t) = T(x, y, z, t).]
- The magnitude of h is proportional to the magnitude of the gradient ∇T, with the constant of proportionality equal to κ (kappa), which is called the thermal conductivity: |h| = h = κ|∇T|.
- The gradient of T, i.e. ∇T = (∂T/∂x, ∂T/∂y, ∂T/∂z), measures the rate of change of T, but in a particular direction. In fact, the the rate of change of T in any direction will be the component of the ∇T vector in that direction. Now, the magnitude of a vector component is always smaller than the magnitude of the vector itself, except if it’s the component in the same direction as the vector, in which case the component is the vector. [If you have difficulty understanding this, read what I write once again, but very slowly and attentively.] Therefore, the direction of ∇T is the direction in which the (instantaneous) rate of change of T is largest. In Feynman’s words: “The gradient of T has the direction of the steepest uphill slope in T.”
But so we were talking the Laplacian, which is equal to ∇2T = ∇·∇T = equation. The little dot in the middle here matters—and very much so! We’re multiplying the vector differential operator ∇ (i.e. the gradient) with itself here, so we write:
∇2 =∇·∇ = (∂/∂x, ∂/∂y , ∂/∂z)·(∂/∂x, ∂/∂y , ∂/∂z) = ∂2/∂x2 + ∂2/∂y2 + ∂2/∂z2
Note that the Laplacian operator (∇2) would, once again, act on a scalar-valued function here (T), aka a scalar field, but that it yields another scalar-valued function, unlike the gradient operator ∇ = (∂/∂x, ∂/∂y, ∂/∂z). Now, the ∇· operator is referred to as the divergence. We wrote:
∇·h = div h = the divergence of h
Now we can really start talking about what I wrote above: if you see a Laplacian in physics – as we do in Schrödinger’s equation − it will usually represent the flux density of the (gradient) flow of a function. That’s because the physical significance of the divergence is related to the so-called flux of a vector field: it measures the magnitude of a field’s source or sink at a given point. Continuing our example with temperature, consider, for example, air as it is heated or cooled. The relevant vector field will now be the velocity of the moving air at a point. If air is heated in a particular region, it will expand in all directions such that the velocity field points outward from that region. Therefore the divergence of the velocity field in that region would have a positive value, as the region is a source. If the air cools and contracts, the divergence has a negative value, as the region is a sink. Of course, in a block of metal, there’s nothing that moves, except for the heat flow itself, as illustrated below.
The less intuitive but more accurate definition of the divergence is the following: the divergence represents the volume density of the outward flux of a vector field from an infinitesimal volume around a given point. Now, heat is conserved, and this conservation law is expressed as follows:
I am tempted to show you how we get this result, but I don’t want to make this post too long, so I’ll just refer you to Feynman on this. Just note that q is the heat per unit volume (as opposed to Q, which is the total heat). The point is that we can now combine the equation above with our h = –κ∇T expression so as to get the following:
The last step in this rather long development is the following: it is rather logical to assume that the rate of change of the heat per unit volume (i.e. q) will be proportional to the rate of change of the temperature (T), so we can introduce another proportionality constant cv, which, in physics, is referred to as the specific heat (per unit volume) of the material (aka the (volume) heat capacity of the substance), so we write: dq/dt = cv·(dT/dt). Combining all gives us the diffusion equation we wanted to find:
This equation, which – as we demonstrated – models the flow of something (it modeled the flow of heat, or thermal energy, in our example) is very similar to Schrödinger’s equation:
Both equations model a flow, and involve a flux density of something. Let’s do a dimensional analysis here. On the left-hand side, we have a quantity (∂ψ/∂ψ) that’s expressed per second, while on the right-hand side, we have ∇2ψ = ∇·∇ψ, which is expressed per square meter, so it is a flux density alright! Now, in our heat diffusion equation, the dimensions came out alright because of the dimension of the diffusion constant, i.e. D = κ/cv. The specific heat (cv) is expressed in J/(m3·K), while the thermal conductivity, κ, is expressed in W/(m·K) = J/(m·s·K). Dividing the latter by the former gives us a diffusion constant with dimension m2/s, so it works out: we get something expressed per second on both sides of the equation.
The diffusion constant in Schrödinger’s equation is iħ/2m, whose dimension is [J·s]/[J/(m2/s2] = m2/s, so that works out too! We get something expressed per second on both sides of our equation… Except… Well… We conveniently forgot about i! Is that part of the ‘dimension’ of the right-hand side?
I’d say: it is and it isn’t. Remember that the wavefunction (and, hence, Schrödinger’s equation) gives us two functions for the price of one: we’ve got a real part, and we’ve an imaginary part. Now, multiplying both by i amounts to a rotation by 90 degrees, so what’s happening here is that the real part of ∇2ψ becomes the imaginary part of dψ/dt, and the imaginary part of ∇2ψ becomes the real part of dψ/dt. Well… And there’s also this scaling factor ħ/2m, of course.
Huh? Yes. Think of it. If we write dψ/dt as dψ/dt = a + ib, and ∇2ψ as ∇2ψ = c + id, then Schrödinger’s equation tells us that a + ib = (iħ/2m)·(c + id) = (ħ/2m)·(–d + i·c). Hence, a = –(ħ/2m)·d and b = (ħ/2m)·c, or:
- Re(dψ/dt) = −(ħ/2m)·Im(∇2ψ)
- Im(dψ/dt) = (ħ/2m)·Re(∇2ψ)
This is very deep. First the minus sign. Does that reflect the clock- or counter-clockwise direction of rotation of the θ argument in the wavefunction matter? No. Time goes in one direction only and, in quantum math, that’s reflected in the argument going clockwise with time. If we’d replace i by j = −i, we’d have a complex-number system that works exactly the same (j2 = (−i)2 = i2 = −1). We would just write dψ/dt as dψ/dt = a + jb, and ∇2ψ as ∇2ψ = c + jd, but so we’d still find that a + jb = (jħ/2m)·(c + jd) = (ħ/2m)·(–d + j·c), so we’d write the same: Re(dψ/dt) = −(ħ/2m)·Im(∇2ψ) and Im(dψ/dt) = (ħ/2m)·Re(∇2ψ).
What these bizarre equations above actually mean is not immediately clear. We had an equation that established a relation between (1) a quantity changing in time (i.e. a quantity expressed per second) and (2) its flux density in space (i.e. a quantity expressed per square meter). To be precise, on the left-hand side of Schrödinger’s equation we had the time rate of change of an amplitude, while on the right-hand side we have its flux density. […] Well… We should be precise here: that Laplacian gives us the flux density of the gradient flow of ψ. But… Well… What?
I think of two things right now. First, the famous words that Hermann Minkowski used as introduction to his famous talk at the 80th Assembly of German Natural Scientists and Physicians on September 21, 1908:
“The views of space and time which I wish to lay before you have sprung from the soil of experimental physics, and therein lies their strength. They are radical. Henceforth, space by itself, and time by itself, are doomed to fade away into mere shadows, and only a kind of union of the two will preserve an independent reality.”
As you know, Minkowski used these words as an introduction to his theory on four-dimensional space-time, which was nothing but a mathematical formalization and generalization of Einstein’s special relativity theory of relativity. So these words have nothing to do with quantum physics, although Planck had already introduced the fundamental quantum-mechanical postulate (quantization of energy) a decade before. Nevertheless, the Re(dψ/dt) = −(ħ/2m)·Im(∇2ψ) and Im(dψ/dt) = (ħ/2m)·Re(∇2ψ) equations do lead us to thinking that the time rate of change of the wavefunction and the flux density of its gradient flow are both the manifestation of a deeper, underlying, reality. In the case of the heat diffusion equation, that underlying reality is the flow of (heat) energy. Now, as our wavefunction describes some particle moving through spacetime, and remembering Einstein’s mass-energy equivalence relation, it’s obvious we should be able to find a similar interpretation for Schrödinger’s equation.
That brings me to the second point, which I mentioned in other posts too: there is that striking mathematical similarity between (a) the real and imaginary part of the traveling amplitude, and (b) the E and B vectors that characterize the traveling electromagnetic field. Look at the two illustration below.
- The first one visualizes the real and imaginary part of the ψ(x) = eikx function (so that’s the ψ(x, t) = e−i(ωt − kx) at t = 0, or at some other fixed point in time). So we’re missing the time dimension here.
- The second illustration shows how the electric field vector E varies in space as well as in time, but it does not show the magnetic field vector B.
It’s easy to imagine how an animated version of the ψ(x, t) = e−i(ωt − kx) wavefunction would look like: exactly the same as that animation which shows E! Indeed, note that – how convenient ! – E actually rotates clockwise, just like the argument (θ) of our wavefunction! Hence, it is tempting to think we could de-construct E, i.e. we could think of E as consisting of a ‘real’ and ‘imaginary’ part as well! Now that is an interesting idea, isn’t it? 🙂
But so we don’t have B. How would B look like? The animation above shows the electric field vector (E) for a so-called circularly polarized electromagnetic wave. You know the magnetic field vector B is always orthogonal to E, as shown below, with a magnitude that’s equal |B| = B = |E|/c = E/c. The animation below shows how a linearly polarized electromagnetic wave (i.e. a plane wave, really) is supposed to travel through space: the red oscillation is the electric field vector, and the blue oscillation is the magnetic field vector (never mind the 1/c factor here).
The mathematical description of how an electromagnetic wave travels through spacetime is given by Maxwell’s Laws:
I highlighted the linkages between E and B, and I talked about how that works in detail before, so I won’t repeat myself here too much. The point is: Maxwell’s Laws are also structured around time derivatives (∂E/∂t and ∂B/∂t) and flow densities—even if we’re talking circulation densities here, because the left-hand side of the equations with the time derivatives involve the curl operator (∇×), which is a vector operator that describes the (infinitesimal) rotation of the E and B vector fields respectively. In fact, making abstraction of the electric current (j) and charge densities (i.e. the electric field caused by static charges, which is captured by the first equation), as well as choosing our time and distance units such that c = 1, we get:
- ∂B/∂t = ∇×E
- ∂E/∂t = ∇×B
That’s structurally very similar to what we wrote above:
- Re(dψ/dt) = −(ħ/2m)·Im(∇2ψ)
- Im(dψ/dt) = (ħ/2m)·Re(∇2ψ)
The only difference between the two mathematical descriptions is that phase shift: Euler’s formula incorporates a phase shift—remember: sinθ = cos(θ − π/2)—and so you don’t have that with the E and B vectors. But then that’s why bosons and fermions are different, of course! This brings me to the final thought I want to make here. You’ll remember we found an equation that was very similar to the dq/dt = −∇·h equation when we were talking about the energy of fields and the Poynting vector:
Indeed, the similarity triggers the following reflection: if there’s a ‘Poynting vector’ for heat flow (h), and for the energy of fields (S), then there must be some kind of ‘Poynting vector’ for amplitudes too! I don’t know which one, but it must exist! It’s going to be some complex vector, no doubt! But it should be out there. 🙂
Post scriptum 1: In my post on gauges, in which I introduced much more advanced stuff than what you need here (like the concept of the vector potential, usually denoted by A), we derived two equations to describe the traveling electromagnetic field. They were the following:
To see these equations are really structurally similar to Schrödinger’s equation, it suffices to just write out the second one:
It’s again the same thing: a time derivative on one side, and a Laplacian on the other. [Just equate ρ with zero, because we’re interested in the propagation mechanism of the wave only.] As mentioned, the two equations are so similar that we might want to think of how to ‘de-construct’ E in a ‘real’ and an ‘imaginary’ part. But so we’ve got two equations above—one involving the electric potential (ϕ) and one involving the vector potential A—and only one wave equation (i.e. Schrödinger’s equation), so how does that work. Well… As I showed above, Schrödinger’s equation is actually two equations as well: one for the real and one for the imaginary part of the wavefunction. So, yes, “same-same but different”, as they say in Asia. 🙂
Not convinced? Well… There was a quote of Feynman which I never quite understood—till now, that is: “The vector potential (together with the scalar potential that goes with it) appears to give the most direct description of the physics. This becomes more and more apparent the more deeply we go into the quantum theory. In the general theory of quantum electrodynamics, one takes the vector and scalar potentials as the fundamental quantities in a set of equations that replace the Maxwell equations.”
I think I finally start to understand what he’s saying here, although I still have a lot to learn. For starters, the E and B vectors have a physical significance that is totally lacking – at our (limited) level of understanding of quantum physics here, for those real and imaginary parts of the wavefunction. Indeed, we’d associate forces with E and B. More specifically, the force on a moving charge would be equal to:
F = qE + qv×B
I guess we’ll learn all about it once we start studying quantum electrodynamics, or quantum field theory. In the meanwhile, it’s nice we can already see what it’s going to look like. As Feynman puts it: “The same equations have the same solutions.” And he’s explicit about what he means with that: “The equations for many different physical situations have exactly the same appearance. Of course, the symbols may be different—one letter is substituted for another—but the mathematical form of the equations is the same. This means that having studied one subject, we immediately have a great deal of direct and precise knowledge about the solutions of the equations of another.” Well… That says it all, doesn’t it? 🙂
Post scriptum 1: In my post on the de Broglie relations, I mentioned this funny E = m·v2 formula that comes out of them. Just multiply them:
- f·λ = (E/h)·(h/p) = E/p
- v = f·λ ⇒ f·λ = v = E/p ⇔ E = v·p = v·(m·v)
⇒ E = m·v2
That’s weird, isn’t it? E = m·v2? It reminds one of the kinetic energy formula of course—K.E. = m·v2/2—but… Well… That factor 1/2 should not be there. Now I won’t repeat all of my reflections on that here (just check that post of mine if you’d would to freewheel a bit). I just want to show you that we do not find that weird formula when substituting ψ for ψ = ei(kx − ωt) in Schrödinger’s equation, i.e. in:
To keep things simple, we’ll just assume one-dimensional space, so ∇2ψ = ∂2ψ/∂x2. The time derivative on the left-hand side is ∂ψ/∂t = −iω·ei(kx − ωt). The second-order derivative on the right-hand side is ∂2ψ/∂x2 = (ik)·(ik)·ei(kx − ωt) = −k2·ei(kx − ωt) . The ei(kx − ωt) factor on both sides cancels out and, hence, equating both sides gives us the following condition:
−iω = −(iħ/2m)·k2 ⇔ ω = (ħ/2m)·k2
Substituting ω = E/ħ and k = p/ħ yields:
E/ħ = (ħ/2m)·p2/ħ2 = m2·v2/(2m·ħ) = m·v2/(2ħ) ⇔ E = m·v2/2
So now it’s OK!? The only logical conclusion is that we must be doing something wrong when multiplying the two de Broglie equations. To be precise: our v = f·λ equation is probably wrong. Why? Not sure. It’s just something one shouldn’t apply to our complex-valued wavefunction. The ‘correct’ velocity formula for the complex-valued wavefunction should have that 1/2 factor, so we’d write 2·f·λ = v to make things come out alright. But where would this formula come from? The period of cosθ + isinθ is the period of the sine and cosine function: cos(θ+2π) + isin(θ+2π) = cosθ + isinθ, so T = 2π and f = 1/T = 1/2π do not change.
But so that’s a mathematical point of view. From a physical point of view, it’s clear we got two oscillations for the price of one: one ‘real’ and one ‘imaginary’—but both are equally essential and, hence, equally ‘real’. You may think the answer must lie in the distinction between the group and the phase velocity when we’re combining waves. Indeed, the group velocity of a sum of waves is equal to vg = dω/dk. In this case, we have:
vg = d[E/ħ]/d[p/ħ] = dE/dp
We can now use the kinetic energy formula to write E as E = m·v2/2 = p·v/2. Now, v and p are related through m (p = m·v, so v = p/m). So we should write this as E = m·v2/2 = p2/(2m). Substituting E and p = m·v in the equation above then gives us the following:
dω/dk = d[p2/(2m)]/dp = 2p/(2m) = vg = v
However, for the phase velocity, we can just use the vp = ω/k formula, which gives us that 1/2 factor:
vp = ω/k = (E/ħ)/(p/ħ) = E/p = (m·v2/2)/(m·v) = v/2
Riddle solved! 🙂