The ammonia maser: transitions in a time-dependent field

Pre-script (dated 26 June 2020): I have come to the conclusion one does not need all this hocus-pocus to explain masers or lasers (and two-state systems in general): classical physics will do. So no use to read this. Read my papers instead. 🙂

Original post:

Feynman’s analysis of a maser – microwave amplification, by stimulated emission of radiation – combines an awful lot of stuff. Resonance, electromagnetic field theory, and quantum mechanics: it’s all there! Therefore, it’s complicated and, hence, actually very tempting to just skip it when going through his third volume of Lectures. But let’s not do that. What I want to do in this post is not repeat his analysis, but reflect on it and, perhaps, offer some guidance as to how to interpret some of the math.

The model: a two-state system

The model is a two-state system, which Feynman illustrates as follows:

Don’t shy away now. It’s not so difficult. Try to understand. The nitrogen atom (N) in the ammonia molecule (NH₃) can tunnel through the plane of the three hydrogen (H) atoms, so it can be ‘up’ or ‘down’. This ‘up’ or ‘down’ state has nothing to do with the classical or quantum-mechanical notion of spin, which is related to the magnetic moment. Nothing, i.e. nada, niente, rien, nichts! Indeed, it’s much simpler than that. 🙂 The nitrogen atom could be either beneath or, else, above the plane of the hydrogens, as shown above, with ‘beneath’ and ‘above’ being defined in regard to the molecule’s direction of rotation around its axis of symmetry. That’s all. That’s why we prefer simple numbers to denote those two states, instead of the confusing ‘up’ or ‘down’, or ‘↑’ or ‘↓’ symbols. We’ll just call the two states state ‘1’ and state ‘2’ respectively.

Having said that (i.e. having said that you shouldn’t think of spin, which is related to the angular momentum of some (net) electric charge), the NH₃molecule does have some electric dipole moment, which is denoted by μ in the illustration and which, depending on the state of the molecule (i.e. the nitrogen atom being above or beneath the plane of the hydrogens), changes the total energy of the molecule by an amount that is equal to +με or −με, with ε some external electric field, as illustrated by the ε arrow on the left-hand side of the diagram. [You may think of that arrow as an electric field vector.] This electric field may vary in time and/or in space, but we’ll not worry about that now. In fact, we should first analyze what happens in the absence of an external field, which is what we’ll do now.

The NH₃molecule will spontaneously transition from an ‘up’ to a ‘down’ state, or from ‘1’ to ‘2’—and vice versa, of course! This spontaneous transition is also modeled as an uncertainty in its energy. Indeed, we say that, even in the absence of an external electric field, there will be two energy levels, rather than one only: E₀+ A and E₀− A.

We wrote the amplitude to find the molecule in either one of these two states as:

C₁(t) = 〈 1 | ψ 〉 = (1/2)·e^{−(i/ħ)·(E₀− A)·t}+ (1/2)·e^{−(i/ħ)·(E₀+ A)·t}= e^{−(i/ħ)·E₀·t}·cos[(A/ħ)·t]
C₂(t) =〈 2 | ψ 〉 = (1/2)·e^{−(i/ħ)·(E₀− A)·t}– (1/2)·e^{−(i/ħ)·(E₀+ A)·t}= i·e^{−(i/ħ)·E₀·t}·sin[(A/ħ)·t]

[Remember: the sum of complex conjugates, i.e e^iθ+ e^−iθreduces to 2·cosθ, while e^iθ− e^−iθreduces to 2·i·sinθ.]

That gave us the following probabilities:

P₁= |C₁|² = cos²[(A/ħ)·t]
P₂= |C₂|²= sin²[(A/ħ)·t]

[Remember: the absolute square of i is |i|²= +√1²= +1, so the i in the C₂(t) formula disappears.]

The graph below shows how these probabilities evolve over time. Note that, because of the square, the period of cos²[(A/ħ)·t] and sin²[(A/ħ)·t] is equal to π, instead of the usual 2π.

The interpretation of this is easy enough: if our molecule can be in two states only, and it starts off in one, then the probability that it will remain in that state will gradually decline, while the probability that it flips into the other state will gradually increase. As Feynman puts it: the first state ‘dumps’ probability into the second state as time goes by, and vice versa, so the probability sloshes back and forth between the two states.

The graph above measures time in units of ħ/A but, frankly, the ‘natural’ unit of time would usually be the period, which you can easily calculate as (A/ħ)·T = π ⇔ T = π·ħ/A. In any case, you can go from one unit to another by dividing or multiplying by π. Of course, the period is the reciprocal of the frequency and so we can calculate the molecular transition frequency f₀as f₀ = A/[π·ħ] = 2A/h. [Remember: h = 2π·ħ, so A/[π·ħ] = 2A/h].

Of course, by now we’re used to using angular frequencies, and so we’d rather write: ω₀= 2π·f₀= f₀= 2π·A/[π·ħ] = 2A/ħ. And because it’s always good to have some idea of the actual numbers – as we’re supposed to model something real, after all – I’ll give them to you straight away. The separation between the two energy levels E₀+ A and E₀− A has been measured as being equal to 2A = hf₀ ≈ 10⁻⁴eV, more or less. 🙂 That’s tiny. To avoid having to convert this to joule, i.e. the SI unit for energy, we can calculate the corresponding frequency using h expressed in eV·s, rather than in J·s. We get: f₀ = 2A/h = (1×10⁻⁴eV)/(4×10⁻¹⁵eV·s) = 25 GHz. Now, we’ve rounded the numbers here: the exact frequency is 23.79 GHz, which corresponds to microwave radiation with a wavelength of λ = c/f₀ = 1.26 cm.

How does one measure that? It’s simple: ammonia absorbs light of this frequency. The frequency is also referred to as a resonance frequency, as light of this frequency, i.e. microwave radiation, will also induce transitions from one state to another. In fact, that’s what the stimulated emission of radiation principle is all about. But we’re getting ahead of ourselves here. It’s time to look at what happens if we do apply some external electric field, which is what we’ll do now.

Polarization and induced transitions

As mentioned above, an electric field will change the total energy of the molecule by an amount that is equal to +με or −με. Of course, the plus or the minus in front of με depends both on the direction of the electric field ε, as well as on the direction of μ. However, it’s not like our molecule might be in four possible states. No. We assume the direction of the field is given, and then we have two states only, with the following energy levels:

Don’t rack your brain over how you get that square root thing. You get it when applying the general solution of a pair of Hamiltonian equations to this particular case. For full details on how to get this general solution, I’ll refer you to Feynman. Of course, we’re talking base states here, which do not always have a physical meaning. However, in this case, they do: a jet of ammonia gas will split in an inhomogeneous electric field, and it will split according to these two states, just like a beam of particles with different spin in a Stern-Gerlach apparatus. A Stern-Gerlach apparatus splits particle beams because of an inhomogeneous magnetic field, however. So here we’re talking an electric field.

It’s important to note that the field should not be homogeneous, for the very same reason as to why the magnetic field in the Stern-Gerlach apparatus should not be homogeneous: it’s because the force on the molecules will be proportional to the derivative of the energy. So if the energy doesn’t vary—so if there is no strong field gradient—then there will be no force. [If you want to get more detail, check the section on the Stern-Gerlach apparatus in my post on spin and angular momentum.] To be precise, if με is much smaller than A, then one can use the following approximation for the square root in the expressions above:

The energy expressions then reduce to:

And then we can calculate the force on the molecules as:

The bottom line is that our ammonia jet will split into two separate beams: all molecules in state I will be deflected toward the region of lower ε², and all molecules in state II will be deflected toward the region of larger ε². [We talk about ε² rather than ε because of the ∇ε²gradient in that force formula. However, you could, of course, simplify and write ∇ε²as ∇ε²= 2ε∇ε.] So, to make a long story short, we should now understand the left-hand side of the schematic maser diagram below. It’s easy to understand that the ammonia molecules that go into the maser cavity are polarized.

To understand the maser, we need to understand how the maser cavity works. It’s a so-called resonant cavity, and we’ve got an electric field in it as well. The field direction happens to be south as we’re looking at it right now, but in an actual maser we’ll have an electric field that varies sinusoidally. Hence, while the direction of the field is always perpendicular to the direction of motion of our ammonia molecules, it switches from south to north and vice versa all of the time. We write ε as:

ε = 2ε₀cos(ω·t) = ε₀(e^i·ω·t+ e^−i·ω·t)

Now, you’ve guessed it, of course. If we ensure that ω = ω₀= 2A/ħ, then we’ve got a maser. In fact, the result is a similar graph:

Let’s first explain this graph. We’ve got two probabilities here:

P_I= cos²[(με₀/ħ)·t]
P_II= sin²[(με₀/ħ)·t]

So that’s just like the P₁= cos²[(A/ħ)·t] and P₂= sin²[(A/ħ)·t] probabilities we found for spontaneous transitions. In fact, the formulas for the related amplitudes are also similar to those for C₁(t) and C₂(t):

C_I(t) = 〈 I | ψ 〉 = e^{−(i/ħ)·E_I·t}·cos[(με₀/ħ)·t], which is equal to:

C_I(t) = e^{−(i/ħ)·(E₀+A)·t}·cos[(με₀/ħ)·t] = e^{−(i/ħ)·(E₀+A)·t}·(1/2)·[e^{i·(με₀/ħ)·t}+ e^{−i·(με₀/ħ)·t}] = (1/2)·e^{−(i/ħ)·(E₀+A−με₀)·t}+ (1/2)·e^{−(i/ħ)·(E₀+A+με₀)·t}

C_II(t) = 〈 II | ψ 〉 = i·e^{−(i/ħ)·E_II·t}·sin[(με₀/ħ)·t], which is equal to:

C_II(t) = e^{−(i/ħ)·(E₀−A)·t}·i·sin[(με₀/ħ)·t] = e^{−(i/ħ)·(E₀−A)·t}·(1/2)·[e^{i·(με₀/ħ)·t}− e^{−i·(με₀/ħ)·t}] = (1/2)·e^{−(i/ħ)·(E₀−A−με₀)·t}– (1/2)·e^{−(i/ħ)·(E₀−A+με₀)·t}

But so here we are talking induced transitions. As you can see, the frequency and, hence, the period, depend on the strength, or magnitude, of the electric field, i.e. the ε₀constant in the ε = 2ε₀cos(ω·t) expression. The natural unit for measuring time would be the period once again, which we can easily calculate as (με₀/ħ)·T = π ⇔ T = π·ħ/με₀. However, Feynman adds an 1/2 factor so as to ensure it’s the time that corresponds to the time a molecule needs to go through the cavity. Well… That’s what he says, at least. I’ll show he’s actually wrong, but the idea is OK.

First have a look at the diagram of our maser once again. You can see that all molecules come in in state I, but are supposed to leave in state II. Now, Feynman says that’s because the cavity is just long enough so as to more or less ensure that all ammonia molecules switch from state I to state II. Hmm… Let’s have a close look at that. What the functions and the graph are telling us is that, at the point t = 1 (with t being measured in those π·ħ/2με₀ units), the probability of being in state I has all been ‘dumped’ into the probability of being in state II!

So… Well… Our molecules had better be in that state then! 🙂 Of course, the idea is that, as they transition from state I to state II, they lose energy. To be precise, according to our expressions for E_Iand E_IIabove, the difference between the energy levels that are associated with these two states is equal to 2A + μ²ε₀²/A.

Now, a resonant cavity is a cavity designed to keep electromagnetic waves like the oscillating field that we’re talking about here going with minimal energy loss. Indeed, a microwave cavity – which is what we’re having here – is similar to a resonant circuit, except that it’s much better than any equivalent electric circuit you’d try to build, using inductors and capacitors. ‘Much better’ means it hardly needs energy to keep it going. We express that using the so-called Q-factor (believe it or not: the ‘Q’ stands for quality). The Q factor of a resonant cavity is of the order of 10⁶, as compared to 10² for electric circuits that are designed for the same frequencies. But let’s not get into the technicalities here. Let me quote Feynman as he summarizes the operation of the maser:

“The molecule enters the cavity, [and then] the cavity field—oscillating at exactly the right frequency—induces transitions from the upper to the lower state, and the energy released is fed into the oscillating field. In an operating maser the molecules deliver enough energy to maintain the cavity oscillations—not only providing enough power to make up for the cavity losses but even providing small amounts of excess power that can be drawn from the cavity. Thus, the molecular energy is converted into the energy of an external electromagnetic field.”

As Feynman notes, it is not so simple to explain how exactly the energy of the molecules is being fed into the oscillations of the cavity: it would require to also deal with the quantum mechanics of the field in the cavity, in addition to the quantum mechanics of our molecule. So we won’t get into that nitty-gritty—not here at least. So… Well… That’s it, really.

Of course, you’ll wonder about the orders of magnitude, or minitude, involved. And… Well… That’s where this analysis is somewhat tricky. Let me first say something more about those resonant cavities because, while that’s quite straightforward, you may wonder if they could actually build something like that in the 1950s. 🙂 The condition is that the cavity length must be an integer multiple of the half-wavelength at resonance. We’ve talked about this before. [See, for example, my post on wave modes. More formally, the condition for resonance in a resonator is that the round trip distance, 2·d, is equal to an integral number of the wavelength λ, so we write: 2·d = N·λ, with N = 1, 2, 3, etc. Then, if the velocity of our wave is equal to c, then the resonant frequencies f will be equal to f = (N·c)/(2·d).

Does that makes sense? Of course. We’re talking the speed of light, but we’re also talking microwaves. To be specific, we’re talking a frequency of 23.79 GHz and, more importantly, a wavelength that’s equal to λ = c/f₀ = 1.26 cm, so for the first normal mode (N = 1), we get 2·d = λ ⇔ d = λ/2 = 63 mm. In short, we’re surely not talking nanotechnology here! In other words, the technological difficulties involved in building the apparatus were not insurmountable. 🙂

But what about the time that’s needed to travel through it? What about that length? Now, that depends on the με₀quantity if we are to believe Feynman here. Now, we actually don’t need to know the actual values for μ or ε₀: we said that the value of the με₀product is (much) smaller than the value of A. Indeed, the fields that are used in those masers aren’t all that strong, and the electric dipole moment μ is pretty tiny. So let’s say με₀= A/2, which is the upper limit for our approximation of that square root above, so 2με₀= A = 0.5×10⁻⁴eV. [The approximation for that square root expression is only used when y ≤ x/2.]

Let’s now think about the time. It was measured in units equal to T = π·ħ/2με₀. So our T here is not the T we defined above, which was the period. Here it’s the period divided by two. First the dimensions: ħ is expressed in eV·s, and με₀ is an energy, so we can express it in eV too: 1 eV ≈ 1.6×10⁻¹⁹ J, i.e. 160 zeptojoules. 🙂 π is just a real number, so our T = π·ħ/2με₀gives us seconds alright. So we get:

T ≈ (3.14×6.6×10⁻¹⁶eV·s)/(0.5×10⁻⁴eV) ≈ 40×10⁻¹²seconds

[…] Hmm… That doesn’t look good. Even when traveling at the speed of light – which our ammonia molecule surely doesn’t do! – it would only travel over a distance equal to (3×10⁸m/s)·(20×10⁻¹²s) = 60×10⁻⁴m = 0.6 cm = 6 mm. The speed of our ammonia molecule is likely to be only a fraction of the speed of light, so we’d have an extremely short cavity then. The time mentioned is also not in line with what Feynman mentions about the ammonia molecule being in the cavity for a ‘reasonable length of time, say for one millisecond.‘ One millisecond is also more in line with the actual dimensions of the cavity which, as you can see from the historical illustration below, is quite long indeed.

So what’s going on here? Feynman’s statement that T is “the time that it takes the molecule to go through the cavity” cannot be right. Let’s do some good thinking here. For example, let’s calculate the time that’s needed for a spontaneous state transition and compare with the time we calculated above. From the graph and the formulas above, we know we can calculate that from the (A/ħ)·T = π/2 equation. [Note the added 1/2 factor, because we’re not going through a full probability cycle: we’re going through a half-cycle only.] So that’s equivalent to T = (π·ħ)/(2A). We get:

T ≈ (3.14×6.6×10⁻¹⁶eV·s)/(1×10⁻⁴eV) ≈ 20×10⁻¹²seconds

The T = π·ħ/2με₀ and T = (π·ħ)/(2A) expression make it obvious that the expected, average, or mean time for a spontaneous versus an induced transition depends on A and με respectively. Let’s be systematic now, so we’ll distinguish T_induced = (π·ħ)/(2με₀) from T_spontaneous = (π·ħ)/(2A) respectively. Taking the ratio, we find:

T_induced/T_spontaneous = [(π·ħ)/(2με₀)]/[(π·ħ)/(2A)] = A/με₀

However, we know the A/με₀ratio is greater than one, so T_induced/T_spontaneous is greater than one, which, in turn, means that the presence of our electric field – which, let me remind you, dances to the beat of the resonant frequency – causes a slower transition than we would have had if the oscillating electric field were not present. We may write the equation above as:

T_induced= [A/με₀]·T_spontaneous = [A/με₀]·(π·ħ)/(2A) = h/(4με₀)

However, that doesn’t tell us anything new. It just says that the transition period (T) is inversely proportional to the strength of the field (as measured by ε₀). So a weak field will make for a longer transition period (T), with T → ∞ as ε₀ → 0. So it all makes sense, but what do we do with this?

The T_induced/T_spontaneous = [με₀/A]⁻¹ is the most telling. It says that the T_induced/T_spontaneous is inversely proportional to the με₀/A ratio. For example, if the energy με₀ is only one fifth of the energy A, then the time for the induced transition will be five times that of a spontaneous transition. To get something like a millisecond, however, we’d need the με₀/A ratio to go down to like a billionth or something, which doesn’t make sense.

So what’s the explanation? Is Feynman hiding something from us? He’s obviously aware of these periods because, when discussing the so-called three-state maser, he notes that “The | I 〉 state has a long lifetime, so its population can be increased.” But… Well… That’s just not relevant here. He just made a mistake: the length of the maser has nothing to do with it. The thing is: once the molecule transitions from state I to state II, then that’s basically the end of the story as far as the maser operation is concerned. By transitioning, it dumps that energy 2A + μ²ε₀²/A into the electric field, and that’s it. That’s energy that came from outside, because the ammonia molecules were selected so as to ensure they were in state I. So all the transitions afterwards don’t really matter: the ammonia molecules involved will absorb energy as they transition, and then give it back as they transition again, and so on and so on. But that’s no extra energy, i.e. no new or outside energy: it’s just energy going back and forth from the field to the molecules and vice versa.

So, in a way, those P_I and P_IIcurves become irrelevant. Think of it: the energy that’s related to A and με₀is defined with respect to a certain orientation of the molecule as well as with respect to the direction of the electric field before it enters the apparatus, and the induced transition is to happen when the electric field inside of the cavity points south, as shown in the diagram. But then the transition happens, and that’s the end of the story, really. Our molecule is then in state II, and will oscillate between state II and I, and back again, and so on and so on, but it doesn’t mean anything anymore, as these flip-flops do not add any net energy to the system as a whole.

So that’s the crux of the matter, really. Mind you: the energy coming out of the first masers was of the order of one microwatt, i.e. 10⁻⁶ joule per second. Not a lot, but it’s something, and so you need to explain it from an ‘energy conservation’ perspective: it’s energy that came in with the molecules as they entered the cavity. So… Well… That’s it.

The obvious question, of course, is: why do we actually need the oscillating field in the cavity? If all molecules come in in the ‘upper’ state, they’ll all dump their energy anyway. Why do we need the field? Well… First, you should note that the whole idea is that our maser keeps going because it uses the energy that the molecules are dumping into its field. The more important thing, however, is that we actually do need the field to induce the transition. That’s obvious from the math. Look at the probability functions once again:

P_I= cos²[(με₀/ħ)·t]
P_II= sin²[(με₀/ħ)·t]

If there would be no electric field, i.e. if ε₀= 0, then P_I= 1 and P_II= 0. So, our ammonia molecules enter in state I and, more importantly, stay in state I forever, so there’s no chance whatsoever to transition to state II. Also note what I wrote above: T_induced= h/(4με₀), and, therefore, we find that T → ∞ as ε₀ → 0.

So… Well… That’s it. I know this is not the ‘standard textbook’ explanation of the maser—it surely isn’t Feynman’s! But… Well… Please do let me know what you think about it. What I write above, indicates the analysis is much more complicated than standard textbooks would want it to be.

There’s one more point related to masers that I need to elaborate on, and that’s its use as an ‘atomic’ clock. So let me quickly do that now.

The use of a maser as an ‘atomic’ clock

In light of the amazing numbers involved – we talked GHz frequencies, and cycles expressed in picoseconds – we may wonder how it’s possible to ‘tune’ the frequency of the field to the ‘natural’ molecular transition frequency. It will be no surprise to hear that it’s actually not straightforward. It’s got to be right: if the frequency of the field, which we’ll denote by ω, is somewhat ‘off’ – significantly different from the molecular transition frequency ω₀– then the chance of transitioning from state I to state II shrinks significantly, and actually becomes zero for all practical purposes. That basically means that, if the frequency isn’t right, then the presence of the oscillating field doesn’t matter. In fact, the fact that the frequency has got to be right – with tolerances that, as we will see in a moment, are expressed in billionths – is why a maser can be used as an atomic clock.

The graph below illustrates the principle. If ω = ω₀, then the probability that a transition from state I to II will happen is one, so P_I→II(ω)/P_I→II(ω₀) = 1. If it’s slightly off, though, then the ratio decreases quickly, which means that the P_I→IIprobability goes rapidly down to zero. [There’s secondary and tertiary ‘bumps’ because of interference of amplitudes, but they’re insignificant.] As evidenced from the graph, the cut-off point is ω − ω₀= 2π/T, which we can re-write as 2π·f − 2π·f₀= 2π/T, which is equivalent to writing: (f − f₀)/f₀ = 1/(f₀T). Now, we know that f₀= 23.79 GHz, but what’s T in this expression? Well… This time around it actually is the time that our ammonia molecules spend in the resonant cavity, from going in to going out, which Feynman says is of the order of a millisecond—so that’s much more reasonable that those 40 picoseconds we calculated. So 1/(f₀T) = 1/[23.79×10⁹·1×⁻³] ≈ 0.042×10⁻⁶ = 42×10⁻⁹ , i.e. 42 billionths indeed, which Feynman rounds to “five parts in 10⁸“, i.e. five parts in a hundred million.

In short, the frequency must be ‘just right’, so as to get a significant transition probability and, therefore, get some net energy out of our maser, which, of course, will come out of our cavity as microwave radiation of the same frequency. Now that’s how one the first ‘atomic’ clock was built: the maser was the equivalent of a resonant circuit, and one could keep it going with little energy, because it’s so good as a resonant circuit. However, in order to get some net energy out of the system, in the form of microwave radiation of, yes, the ammonia frequency, the applied frequency had to be exactly right. To be precise, the applied frequency ω has to match the ω₀ frequency, i.e. the molecular resonance frequency, with a precision expressed in billionths. As mentioned above, the power output is very limited, but it’s real: it comes out through the ‘output waveguide’ in the illustration above or, as the Encyclopædia Brittanica puts it: “Output is obtained by allowing some radiation to escape through a small hole in the resonator.” 🙂

In any case, a maser is not build to produce huge amounts of power. On the contrary, the state selector obviously consumes more power than comes out of the cavity, obviously, so it’s not some generator. Its main use nowadays is as a clock indeed, and so it’s that simple really: if there’s no output, then the ‘clock’ doesn’t work.

It’s an interesting topic, but you can read more about it yourself. I’ll just mention that, while the ammonia maser was effectively used as a timekeeping device, the next-generation of atomic clocks was based on the hydrogen maser, which was introduced in 1960. The principle is the same. Let me quote the Encyclopædia Brittanica on it: “Its output is a radio wave, hose frequency of 1,420,405,751.786 hertz (cycles per second) is reproducible with an accuracy of one part in 30 × 10¹². A clock controlled by such a maser would not get out of step more than one second in 100,000 years.”

So… Well… Not bad. 🙂 Of course, one needs another clock to check if one’s clock is still accurate, and so that’s what’s done internationally: national standards agencies in various countries maintain a network of atomic clocks which are intercompared and kept synchronized. So these clocks define a continuous and stable time scale, collectively, which is referred to as the International Atomic Time (TAI, from the French Temps Atomique International).

Well… That’s it for today. I hope you enjoyed it.

Post scriptum:

When I say the ammonia molecule just dumps that energy 2A + μ²ε₀²/A into the electric field, and that’s “the end of the story”, then I am simplifying, of course. The ammonia molecule still has two energy levels, separated by an energy difference of 2A and, obviously, it keeps its electric dipole moment and so that continues to play as we’ve got an electric field in the cavity. In fact, the ammonia molecule has a high polarizability coefficient, which means it’s highly sensitive to the electric field inside of the cavity. So, yes, the molecules will continue ‘dancing’ to be the beat of the field indeed, and absorbing and releasing energy, in accordance with that 2A and με₀ factor, and so the probability curves do remain relevant—of course! However, we talked net energy going into the field, and so that’s where the ‘end of story’ story comes in. I hope I managed to make that clear.

In fact, there’s lots of other complications as well, and Feynman mentions them briefly in his account of things. But let’s keep things simple here. 🙂 Also, if you’d want to know how we get that P_I→II(ω)/P_I→II(ω₀), check it out in Feynman. However, I have to warn you: the math involved is not easy. Not at all, really. The set of differential equations that’s involved is complicated, and it takes a while to understand why Feynman uses the trial functions he uses. So the solution that comes out, i.e. those simple P_I= cos²[(με₀/ħ)·t] and P_II= sin²[(με₀/ħ)·t] functions, makes sense—but, if you check it out, you’ll see the whole mathematical argument is rather complicated. That’s just how it is, I am afraid. 🙂

Quantum math revisited

It’s probably good to review the concepts we’ve learned so far. Let’s start with the foundation of all of our math, i.e. the concept of the state, or the state vector. [The difference between the two concepts is subtle but real. I’ll come back to it.]

State vectors and base states

We used Dirac’s bra-ket notation to denote a state vector, in general, as | ψ 〉. The obvious question is: what is this thing? We called it a vector because we use it like a vector: we multiply it with some number, and then add it to some other vector. So that’s just what you did in high school, when you learned about real vector spaces. In this regard, it is good to remind you of the definition of a vector space. To put it simply, it is is a collection of objects called vectors, which may be added together, and multiplied by numbers. So we have two things here: the ‘objects’, and the ‘numbers’. That’s why we’d say that we have some vector space over a field of numbers. [The term ‘field’ just refers to an algebraic structure, so we can add and multiply and what have you.] Of course, what it means to ‘add’ two ‘objects’, and what it means to ‘multiply’ an object with a number, depends on the type of objects and, unsurprisingly, the type of numbers.

Huh? The type of number?! A number is a number, no?

No, hombre, no! We’ve got natural numbers, rational numbers, real numbers, complex numbers—and you’ve probably heard of quaternions too – and, hence, ‘multiplying’ a ‘number’ with ‘something else’ can mean very different things. At the same time, the general idea is the general idea, so that’s the same, indeed. 🙂 When using real numbers and the kind of vectors you are used to (i.e. Euclidean vectors), then the multiplication amounts to a re-scaling of the vector, and so that’s why a real number is often referred to as a scalar. At the same time, anything that can be used to multiply a vector is often referred to as a scalar in math so… Well… Terminology is often quite confusing. In fact, I’ll give you some more examples of confusing terminology in a moment. But let’s first look at our ‘objects’ here, i.e. our ‘vectors’.

I did a post on Euclidean and non-Euclidean vector spaces two years ago, when I started this blog, but state vectors are obviously very different ‘objects’. They don’t resemble the vectors we’re used to. We’re used to so-called polar vectors, aka as real vectors, like the position vector (x or r), or the momentum vector (p = m·v), or the electric field vector (E). We are also familiar with the so-called pseudo-vectors, aka as axial vectors, like angular momentum (L = r×p), or the magnetic dipole moment. [Unlike what you might think, not all vector cross products yield a pseudo-vector. For example, the cross-product of a polar and an axial vector yields a polar vector.] But here we are talking some very different ‘object’. In math, we say that state vectors are elements in a Hilbert space. So a Hilbert space is a vector space but… Well… With special vectors. 🙂

The key to understanding why we’d refer to states as state vectors is the fact that, just like Euclidean vectors, we can uniquely specify any element in a Hilbert space with respect to a set of base states. So it’s really like using Cartesian coordinates in a two- or three-dimensional Euclidean space. The analogy is complete because, even in the absence of a geometrical interpretation, we’ll require those base states to be orthonormal. Let me be explicit on that by reminding you of your high-school classes on vector analysis: you’d choose a set of orthonormal base vectors e₁, e₂, and e₃, and you’d write any vector A as:

A = (A_x, A_y, A_z) = A_x·e₁ + A_y·e₂+ A_z·e₃ with e_i·e_j = 1 if i = j, and e_i·e_j = 0 if i ≠ j

The e_i·e_j = 1 if i = j and e_i·e_j = 0 if i ≠ j condition expresses the orthonormality condition: the base vectors need to be orthogonal unit vectors. We wrote it as e_i·e_j = δ_ijusing the Kronecker delta (δ_ij= 1 if i = j and 0 if i ≠ j). Now, base states in quantum mechanics do not necessarily have a geometrical interpretation. Indeed, although one often can actually associate them with some position or direction in space, the condition of orthonormality applies in the mathematical sense of the word only. Denoting the base states by i = 1, 2,… – or by Roman numerals, like I and II – so as to distinguish them from the Greek ψ or φ symbols we use to denote state vectors in general, we write the orthonormality condition as follows:

〈 i | j 〉 = δ_ij, with δ_ij= δ_jiis equal to 1 if i = j, and zero if i ≠ j

Now, you may grumble and say: that 〈 i | j 〉 bra-ket does not resemble the e_i·e_j product. Well… It does and it doesn’t. I’ll show why in a moment. First note how we uniquely specify state vectors in general in terms of a set of base states. For example, if we have two possible base states only, we’ll write:

| φ 〉 = | 1 〉 C₁ + | 2 〉 C₂

Or, if we chose some other set of base states | I 〉 and | II 〉, we’ll write:

| φ 〉 = | I 〉 C_I + | II 〉 C_II

You should note that the | 1 〉 C₁ term in the | φ 〉 = | 1 〉 C₁ + | 2 〉 C₂ sum is really like the A_x·e₁ product in the A = A_x·e₁ + A_y·e₂+ A_z·e₃ expression. In fact, you may actually write it as C₁·| 1 〉, or just reverse the order and write C₁| 1 〉. However, that’s not common practice and so I won’t do that, except occasionally. So you should look at | 1 〉 C₁ as a product indeed: it’s the product of a base state and a complex number, so it’s really like m·v, or whatever other product of some scalar and some vector, except that we’ve got a complex scalar here. […] Yes, I know the term ‘complex scalar’ doesn’t make sense, but I hope you know what I mean. 🙂

More generally, we write:

Writing our state vector | ψ 〉, | φ 〉 or | χ 〉 like this also defines these coefficients or coordinates C_i. Unlike our state vectors, or our base states, C_iis an actual number. It has to be, of course: it’s the complex number that makes sense of the whole expression. To be precise, C_iis an amplitude, or a wavefunction, i.e. a function depending on both space and time. In our previous posts, we limited the analysis to amplitudes varying in time only, and we’ll continue to do so for a while. However, at some point, you’ll get the full picture.

Now, what about the supposed similarity between the 〈 i | j〉 bra-ket and the e_i·e_j product? Let me invoke what Feynman, tongue-in-cheek as usual, refers to as the Great Law of Quantum Mechanics:

You get this by taking | ψ 〉 out of the | ψ 〉 = ∑| i 〉〈 i | ψ 〉 expression. And, no, don’t say: what nonsense! Because… Well… Dirac’s notation really is that simple and powerful! You just have to read it from right to left. There’s an order to the symbols, unlike what you’re used to in math, because you’re used to operations that are commutative. But I need to move on. The upshot is that we can specify our base states in terms of the base states too. For example, if we have only two base states, let’s say I and II, then we can write:

| I 〉 = ∑| i 〉〈 i | 1 〉 = 1·| I 〉 + 0·| II 〉 and | II 〉 = ∑| i 〉〈 i | II 〉 = 0·| 1 〉 + 0·| II 〉

We can write this using a matrix notation:

Now that is silly, you’ll say. What’s the use of this? It doesn’t tell us anything new, and it also does not show us why we should think of the 〈 i | j 〉 bra-ket and the e_i·e_j product as being similar! Well… Yes and no. Let me show you something else. Let’s assume we’ve got some state χ and φ, which we specify in terms of our chosen set of base states as | χ 〉 = ∑ | i 〉 D_i and | φ 〉 = ∑ | i 〉 C_i respectively. Now, from our post on quantum math, you’ll remember that 〈 χ | i 〉 and 〈 i | χ 〉 are each other’s complex conjugates, so we know that 〈 χ | i 〉 = 〈 i | χ 〉* = D_i*. So if we have all C_i = 〈 i | φ 〉 and all D_i = 〈 i | χ 〉, i.e. the ‘components’ of both states in terms of our base states, then we can calculate 〈 χ | φ 〉 – i.e. the amplitude to go from some state φ to some state χ as:

〈 χ | φ 〉 = ∑〈 χ | i 〉〈 i | φ 〉 =∑ D_i*C_i= ∑ D_i*〈 i | φ 〉

We can now scrap | φ 〉 in this expression – yes, it’s the power of Dirac’s notation once more! – so we get:

Now, we can re-write this using a matrix notation:

[I assumed that we have three base states now, so as to make the example somewhat less obvious. Please note we can never leave one of the base states out when specifying a state vector, so it’s not like the previous example was not complete. I’ll switch from two-state to three-state systems and back again all the time, so as to show the analysis is pretty general. To visualize things, think of the ammonia molecule as an example of a two-state system versus the spin of a proton or an electron as a three-state system, respectively. OK. Let’s get back to the lesson.]

You’ll say: so what? Well… Look at this:

I just combined the notations for 〈 I | and | III 〉. Can you now see the similarity between the the 〈 i | j〉 bra-ket and the e_i·e_j product? It really is the same: you just need to respect the subtleties in regard to writing the 〈 i | and | j 〉 vector, or the e_iand e_j vectors, as a row vector or a column vector respectively.

It doesn’t stop here, of course. When learning about vectors in high school, we also learned that we could go from one set of base vectors to another by a transformation, such as, for example, a rotation, or a translation. We showed how a rotation worked in one of our post on two-state systems, where we wrote:

So we’ve got that transformation matrix, which, of course, isn’t random. To be precise, we got the matrix equation above (note that we’re back to two states only, so as to simplify) because we defined the C_Iand C_IIcoefficients in the | φ 〉 = | I 〉 C_I + | II 〉 C_II = | 1 〉 C₁ + | 2 〉 C₂ expression as follows:

C_I= 〈 I | ψ 〉 = (1/√2)·(C₁− C₂)
C_II= 〈 II | ψ 〉 = (1/√2)·(C₁+ C₂)

The (1/√2) factor is there because of the normalization condition, and the two-by-two matrix equals the transformation matrix for a rotation of a state filtering apparatus about the y-axis, over an angle equal to (minus) 90 degrees, which we wrote as:

I promised I’d say something more about confusing terminology so let me do that here. We call a set of base states a ‘representation‘, and writing a state vector in terms of a set of base states is often referred to as a ‘projection‘ of that state into the base set. Again, we can see it’s sort of a mathematical projection, rather than a geometrical one. But it makes sense. In any case, that’s enough on state vectors and base states.

Let me wrap it up by inserting one more matrix equation, which you should be able to reconstruct yourself:

The only thing we’re doing here is to substitute 〈 χ | and | φ 〉 for ∑ D_j*〈 j | and ∑ | i 〉 C_i respectively. All the rest follows. Finally, I promised I’d tell you the difference between a state and a state vector. It’s subtle and, in practice, the two concepts refer to the same. However, we write a state as a state, like ψ or, if it’s a base state, like I, or ‘up’, or whatever. When we say a state vector, then we think of a set of numbers. It may be a row vector, like the 〈 χ | row vector with the D_i* coefficients, or a column vector, like the | φ 〉 column vector with the D_i* coefficients. But so if we say vector, then we think of a one-dimensional array of numbers, while the state itself is… Well… The state. So that’s some reality in physics. So you might define the state vector as the set of numbers that describes the state. While the difference is subtle, it’s important. It’s also important to note that the 〈 χ | and | χ 〉 state vectors are different too. The former appears as the final state in an amplitude, while the latter describes the starting condition. The former is referred to as a bra in the 〈 χ | φ 〉 bra-ket, while the latter is a ket in the 〈 φ | χ 〉 = 〈 χ | φ 〉* amplitude. 〈 χ | is a row vector equal to ∑ D_i*〈 i |, while | χ 〉 = ∑ D_i| χ 〉. So it’s quite different. More in general, we’d define bras and kets as row and column vectors respectively, so we write:

That makes it clear that a bra next to a ket is to be understood as a matrix multiplication. From what I wrote, it is also obvious that the conjugate transpose (which is also known as the Hermitian conjugate) of a bra is the corresponding ket and vice versa, so we write:

Let me formally define the conjugate or Hermitian transpose here: the conjugate transpose of an m-by-n matrix $A$ $with complex elements is the$ n-by-m matrix A† obtained from $A$ by taking the transpose (so we write the rows as columns and vice versa) and then taking the complex conjugate of each element (i.e. we switch the sign of the imaginary part of the complex number). A† is read as ‘A dagger’, but mathematicians will usually denote it by $A*$ . In fact, there are a lot of equivalent notations, as we can write:

OK. That’s it on this.

One more thing, perhaps. We’ll often have states, or base states, that make sense, in a physical sense, that is. But it’s not always the case: we’ll sometimes use base states that may not represent some situation we’re likely to encounter, but that make sense mathematically. We gave the example of the ‘mathematical’ | I 〉 and | II 〉 base states, versus the ‘physical’ | 1 〉 and | 2 〉 base states, in our post on the ammonia molecule, so I won’t say more about this here. Do keep it in mind though. Sometimes it may feel like nothing makes sense, physically, but it usually does mathematically and, therefore, all usually comes out alright in the end. 🙂 To be precise, what we did there, was to choose base states with a unambiguous, i.e. a definite, energy level. That made our calculations much easier, and the end result was the same, indeed!

So… Well… I’ll let this sink in, and move on to the next topic.

The Hamiltonian operator

In my post on the post on the Hamiltonian, I explained that those C_i and D_i coefficients are usually a function of time, and how they can be determined. To be precise, they’re determined by a set of differential equations (i.e. equations involving a function and the derivative of that function) which we wrote as:

If we have two base states only, then this set of equations can be written as:

Two equations and two functions – C₁= C₁(t) and C₂= C₂(t) – so we should be able to solve this thing, right? Well… No. We don’t know those H_ijcoefficients. As I explained in that post, they also evolve in time, so we should write them as H_ij(t) instead of H_ijtout court, and so it messes the whole thing up. We have two equations and six functions really. Of course, there’s always a way out, but I won’t dwell on that here—not now at least. What I want to do here is look at the Hamiltonian as an operator.

We introduced operators – but not very rigorously – when explaining the Hamiltonian. We did so by ‘expanding’ our 〈 χ | φ 〉 amplitude as follows. We’d say the amplitude to find a ‘thing’ – like a particle, for example, or some system, of particles or other things – in some state χ at the time t = t₂, when it was in some state φ state at the time t = t₁ was equal to:

Now, a formula like this only makes sense because we’re ‘abstracting away’ from the base states, which we need to describe any state. Hence, to actually describe what’s going on, we have to choose some representation and expand this expression as follows:

That looks pretty monstrous, so we should write it all out. Using the matrix notation I introduced above, we can do that – let’s take a practical example with three base states once again – as follows:

Now, this still looks pretty monstrous, but just think of it. We’re just applying that ‘Great Law of Quantum Physics’ here, i.e. | = ∑ | i 〉〈 i | over all base states i. To be precise, we apply it to an 〈 χ | A | φ 〉 expression, and we do so twice, so we get:

Nothing more, nothing less. 🙂 Now, the idea of an operator is the result of being creative: we just drop the 〈 χ | state from the expression above to write:

Yes. I know. That’s a lot to swallow, but you’ll see it makes sense because of the Great Law of Quantum Mechanics:

Just think about it and continue reading when you’re ready. 🙂 The upshot is: we now think of the particle entering some ‘apparatus’ A in the state ϕ and coming out of A in some state ψ or, looking at A as an operator, we can generalize this. As Feynman puts it:

“The symbol A is neither an amplitude, nor a vector; it is a new kind of thing called an operator. It is something which “operates on” a state to produce a new state.”

Back to our Hamiltonian. Let’s go through the same process of ‘abstraction’. Let’s first re-write that ‘Hamiltonian equation’ as follows:

The H_ij(t) are amplitudes indeed, and we can represent them in a 〈 i | H_ij(t) | j 〉 matrix indeed! Now let’s take the first step in our ‘abstraction process’: let’s scrap the 〈 i | bit. We get:

We can, of course, also abstract away from the | j 〉 bit, so we get:

Look at this! The right-hand side of this expression is exactly the same as that A | χ 〉 format we presented when introducing the concept of an operator. [In fact, when I say you should ‘abstract away’ from the | j 〉 bit, then you should think of the ‘Great Law’ and that matrix notation above.] So H is an operator and, therefore, it’s something which operates on a state to produce a new state.

OK. Clear enough. But what’s that ‘state’ on the left-hand side? I’ll just paraphrase Feynman here, who says we should think of it as follows: “The time derivative of the state vector |ψ〉 times iℏ is equal to what you get by (1) operating with the Hamiltonian operator H on each base state, (2) multiplying by the amplitude that ψ is in the state j (i.e. 〈j|ψ〉), and (1) summing over all j.” Alternatively, you can also say: “The time derivative, times iħ, of a state |ψ〉 is equal to what you get if you operate on it with the Hamiltonian.” Of course, that’s true for any state, so we can ‘abstract away’ the |ψ〉 bit too and, putting a little hat (^) over the operator to remind ourselves that it’s an operator (rather than just any matrix), we get the Hamiltonian operator equation:

Now, that’s all nice and great, but the key question, of course, is: what can you do with this? Well… It turns out his Hamiltonian operators is useful to calculate lots of stuff. In the first place, of course, it’s a useful operator in the context of those differential equations describing the dynamics of a quantum-mechanical system. When everything is said and done, those equations are the equivalent, in quantum physics, of the law of motion in classical physics. [And I am not joking here.]

In addition, the Hamiltonian operator also has other uses. The one I should really mention here is that you can calculate the average or expected value (EV[X]) of the energy of a state ψ (i.e. any state, really) by first operating on | ψ 〉 with the Hamiltonian, and then multiplying 〈 ψ | with the result. That sounds a bit complicated, but you’ll understand it when seeing the mathematical expression, which we can write as:

The formula is pretty straightforward. [If you don’t think so, then just write it all out using the matrix notation.] But you may wonder how it works exactly… Well… Sorry. I don’t want to copy all of Feynman here, so I’ll refer you to him on this. In fact, the proof of this formula is actually very straightforward, and so you should be able to get through it with the math you got here. You may even understand Feynman’s illustration of it for the ‘special case’ when base states are, indeed, those mathematically convenient base states with definite energy levels.

Have fun with it! 🙂

Post scriptum on Hilbert spaces:

As mentioned above, our state vectors are actually functions. To be specific, they are wavefunctions, i.e. periodic functions, evolving in space and time, so we usually write them as ψ = ψ(x, t). Our ‘Hilbert space’, i.e. our collection of state vectors, is, therefore, often referred to as a function space. So it’s a set of functions. At the same time, it is a vector space too, because we have those addition and multiplication operations, so our function space has the algebraic structure of a vector space. As you can imagine, there are some mathematical conditions for a space or a set of objects to ‘qualify’ as a Hilbert space, and the epithet itself comes with a lot of interesting properties. One of them is completeness, which is a property that allows us to jot down those differential equations that describe the dynamics of a quantum-mechanical system. However, as you can find whatever you’d need or want to know about those mathematical properties on the Web, I won’t get into it. The important thing here is to understand the concept of a Hilbert space intuitively. I hope this post has helped you in that regard, at least. 🙂

Re-visiting uncertainty…

I re-visited the Uncertainty Principle a couple of times already, but here I really want to get at the bottom of the thing? What’s uncertain? The energy? The time? The wavefunction itself? These questions are not easily answered, and I need to warn you: you won’t get too much wiser when you’re finished reading this. I just felt like freewheeling a bit. [Note that the first part of this post repeats what you’ll find on the Occam page, or my post on Occam’s Razor. But these post do not analyze uncertainty, which is what I will be trying to do here.]

Let’s first think about the wavefunction itself. It’s tempting to think it actually is the particle, somehow. But it isn’t. So what is it then? Well… Nobody knows. In my previous post, I said I like to think it travels with the particle, but then doesn’t make much sense either. It’s like a fundamental property of the particle. Like the color of an apple. But where is that color? In the apple, in the light it reflects, in the retina of our eye, or is it in our brain? If you know a thing or two about how perception actually works, you’ll tend to agree the quality of color is not in the apple. When everything is said and done, the wavefunction is a mental construct: when learning physics, we start to think of a particle as a wavefunction, but they are two separate things: the particle is reality, the wavefunction is imaginary.

But that’s not what I want to talk about here. It’s about that uncertainty. Where is the uncertainty? You’ll say: you just said it was in our brain. No. I didn’t say that. It’s not that simple. Let’s look at the basic assumptions of quantum physics:

Quantum physics assumes there’s always some randomness in Nature and, hence, we can measure probabilities only. We’ve got randomness in classical mechanics too, but this is different. This is an assumption about how Nature works: we don’t really know what’s happening. We don’t know the internal wheels and gears, so to speak, or the ‘hidden variables’, as one interpretation of quantum mechanics would say. In fact, the most commonly accepted interpretation of quantum mechanics says there are no ‘hidden variables’.
However, as Shakespeare has one of his characters say: there is a method in the madness, and the pioneers– I mean Werner Heisenberg, Louis de Broglie, Niels Bohr, Paul Dirac, etcetera – discovered that method: all probabilities can be found by taking the square of the absolute value of a complex-valued wavefunction (often denoted by Ψ), whose argument, or phase (θ), is given by the de Broglie relations ω = E/ħ and k = p/ħ. The generic functional form of that wavefunction is:

Ψ = Ψ(x, t) = a·e^−iθ= a·e^{−i(ω·t − k ∙x)} = a·e^{−i·[(E/ħ)·t − (p/ħ)∙x]}

That should be obvious by now, as I’ve written more than a dozens of posts on this. 🙂 I still have trouble interpreting this, however—and I am not ashamed, because the Great Ones I just mentioned have trouble with that too. It’s not that complex exponential. That e^−iφ is a very simple periodic function, consisting of two sine waves rather than just one, as illustrated below. [It’s a sine and a cosine, but they’re the same function: there’s just a phase difference of 90 degrees.]

No. To understand the wavefunction, we need to understand those de Broglie relations, ω = E/ħ and k = p/ħ, and then, as mentioned, we need to understand the Uncertainty Principle. We need to understand where it comes from. Let’s try to go as far as we can by making a few remarks:

Adding or subtracting two terms in math, (E/ħ)·t − (p/ħ)∙x, implies the two terms should have the same dimension: we can only add apples to apples, and oranges to oranges. We shouldn’t mix them. Now, the (E/ħ)·t and (p/ħ)·x terms are actually dimensionless: they are pure numbers. So that’s even better. Just check it: energy is expressed in newton·meter (energy, or work, is force over distance, remember?) or electronvolts (1 eV = 1.6×10⁻¹⁹J = 1.6×10⁻¹⁹N·m); Planck’s constant, as the quantum of action, is expressed in J·s or eV·s; and the unit of (linear) momentum is 1 N·s = 1 kg·m/s = 1 N·s. E/ħ gives a number expressed per second, and p/ħ a number expressed per meter. Therefore, multiplying E/ħ and p/ħ by t and x respectively gives us a dimensionless number indeed.
It’s also an invariant number, which means we’ll always get the same value for it, regardless of our frame of reference. As mentioned above, that’s because the four-vector product p_μx_μ= E·t − p∙x is invariant: it doesn’t change when analyzing a phenomenon in one reference frame (e.g. our inertial reference frame) or another (i.e. in a moving frame).
Now, Planck’s quantum of action h, or ħ – h and ħ only differ in their dimension: h is measured in cycles per second, while ħ is measured in radians per second: both assume we can at least measure one cycle – is the quantum of energy really. Indeed, if “energy is the currency of the Universe”, and it’s real and/or virtual photons who are exchanging it, then it’s good to know the currency unit is h, i.e. the energy that’s associated with one cycle of a photon. [In case you want to see the logic of this, see my post on the physical constants c, h and α.]
It’s not only time and space that are related, as evidenced by the fact that t − x itself is an invariant four-vector, E and p are related too, of course! They are related through the classical velocity of the particle that we’re looking at: E/p = c²/v and, therefore, we can write: E·β = p·c, with β = v/c, i.e. the relative velocity of our particle, as measured as a ratio of the speed of light. Now, I should add that the t − x four-vector is invariant only if we measure time and space in equivalent units. Otherwise, we have to write c·t − x. If we do that, so our unit of distance becomes c meter, rather than one meter, or our unit of time becomes the time that is needed for light to travel one meter, then c = 1, and the E·β = p·c becomes E·β = p, which we also write as β = p/E: the ratio of the energy and the momentum of our particle is its (relative) velocity.

Combining all of the above, we may want to assume that we are measuring energy and momentum in terms of the Planck constant, i.e. the ‘natural’ unit for both. In addition, we may also want to assume that we’re measuring time and distance in equivalent units. Then the equation for the phase of our wavefunctions reduces to:

θ = (ω·t − k ∙x) = E·t − p·x

Now, θ is the argument of a wavefunction, and we can always re-scale such argument by multiplying or dividing it by some constant. It’s just like writing the argument of a wavefunction as v·t–x or (v·t–x)/v = t –x/v with v the velocity of the waveform that we happen to be looking at. [In case you have trouble following this argument, please check the post I did for my kids on waves and wavefunctions.] Now, the energy conservation principle tells us the energy of a free particle won’t change. [Just to remind you, a ‘free particle’ means it’s in a ‘field-free’ space, so our particle is in a region of uniform potential.] So we can, in this case, treat E as a constant, and divide E·t − p·x by E, so we get a re-scaled phase for our wavefunction, which I’ll write as:

φ = (E·t − p·x)/E = t − (p/E)·x = t − β·x

Alternatively, we could also look at p as some constant, as there is no variation in potential energy that will cause a change in momentum, and the related kinetic energy. We’d then divide by p and we’d get (E·t − p·x)/p = (E/p)·t − x) = t/β − x, which amounts to the same, as we can always re-scale by multiplying it with β, which would again yield the same t − β·x argument.

The point is, if we measure energy and momentum in terms of the Planck unit (I mean: in terms of the Planck constant, i.e. the quantum of energy), and if we measure time and distance in ‘natural’ units too, i.e. we take the speed of light to be unity, then our Platonic wavefunction becomes as simple as:

Φ(φ) = a·e^−iφ= a·e^{−i(t − β·x)}

This is a wonderful formula, but let me first answer your most likely question: why would we use a relative velocity?Well… Just think of it: when everything is said and done, the whole theory of relativity and, hence, the whole of physics, is based on one fundamental and experimentally verified fact: the speed of light is absolute. In whatever reference frame, we will always measure it as 299,792,458 m/s. That’s obvious, you’ll say, but it’s actually the weirdest thing ever if you start thinking about it, and it explains why those Lorentz transformations look so damn complicated. In any case, this fact legitimately establishes c as some kind of absolute measure against which all speeds can be measured. Therefore, it is only natural indeed to express a velocity as some number between 0 and 1. Now that amounts to expressing it as the β = v/c ratio.

Let’s now go back to that Φ(φ) = a·e^−iφ= a·e^{−i(t − β·x)}wavefunction. Its temporal frequency ω is equal to one, and its spatial frequency k is equal to β = v/c. It couldn’t be simpler but, of course, we’ve got this remarkably simple result because we re-scaled the argument of our wavefunction using the energy and momentum itself as the scale factor. So, yes, we can re-write the wavefunction of our particle in a particular elegant and simple form using the only information that we have when looking at quantum-mechanical stuff: energy and momentum, because that’s what everything reduces to at that level.

So… Well… We’ve pretty much explained what quantum physics is all about here. You just need to get used to that complex exponential: e^−iφ = cos(−φ) + i·sin(−φ) = cos(φ) −i·sin(φ). It would have been nice if Nature would have given us a simple sine or cosine function. [Remember the sine and cosine function are actually the same, except for a phase difference of 90 degrees: sin(φ) = cos(π/2−φ) = cos(φ+π/2). So we can go always from one to the other by shifting the origin of our axis.] But… Well… As we’ve shown so many times already, a real-valued wavefunction doesn’t explain the interference we observe, be it interference of electrons or whatever other particles or, for that matter, the interference of electromagnetic waves itself, which, as you know, we also need to look at as a stream of photons , i.e. light quanta, rather than as some kind of infinitely flexible aether that’s undulating, like water or air.

However, the analysis above does not include uncertainty. That’s as fundamental to quantum physics as de Broglie‘s equations, so let’s think about that now.

Introducing uncertainty

Our information on the energy and the momentum of our particle will be incomplete: we’ll write E = E₀± σ_E, and p = p₀± σ_p. Huh? No ΔE or ΔE? Well… It’s the same, really, but I am a bit tired of using the Δ symbol, so I am using the σ symbol here, which denotes a standard deviation of some density function. It underlines the probabilistic, or statistical, nature of our approach.

The simplest model is that of a two-state system, because it involves two energy levels only: E = E₀± A, with A some constant. Large or small, it doesn’t matter. All is relative anyway. 🙂 We explained the basics of the two-state system using the example of an ammonia molecule, i.e. an NH₃molecule, so it consists on one nitrogen and three hydrogen atoms. We had two base states in this system: ‘up’ or ‘down’, which we denoted as base state | 1 〉 and base state | 2 〉 respectively. This ‘up’ and ‘down’ had nothing to do with the classical or quantum-mechanical notion of spin, which is related to the magnetic moment. No. It’s much simpler than that: the nitrogen atom could be either beneath or, else, above the plane of the hydrogens, as shown below, with ‘beneath’ and ‘above’ being defined in regard to the molecule’s direction of rotation around its axis of symmetry.

In any case, for the details, I’ll refer you to the post(s) on it. Here I just want to mention the result. We wrote the amplitude to find the molecule in either one of these two states as:

C₁= 〈 1 | ψ 〉 = (1/2)·e^{−(i/ħ)·(E₀− A)·t}+ (1/2)·e^{−(i/ħ)·(E₀+ A)·t}
C₂= 〈 2 | ψ 〉 = (1/2)·e^{−(i/ħ)·(E₀− A)·t}– (1/2)·e^{−(i/ħ)·(E₀+ A)·t}

That gave us the following probabilities:

If our molecule can be in two states only, and it starts off in one, then the probability that it will remain in that state will gradually decline, while the probability that it flips into the other state will gradually increase.

Now, the point you should note is that we get these time-dependent probabilities only because we’re introducing two different energy levels: E₀+ A and E₀− A. [Note they separated by an amount equal to 2·A, as I’ll use that information later.] If we’d have one energy level only – which amounts to saying that we know it, and that it’s something definite – then we’d just have one wavefunction, which we’d write as:

a·e^−iθ= a·e⁻⁽^{i/ħ)·(E₀·t − p·x)}= a·e⁻⁽ⁱ^/ħ)·(E^₀·t)·e⁽^{i/ħ)·(p·x)}

Note that we can always split our wavefunction in a ‘time’ and a ‘space’ part, which is quite convenient. In fact, because our ammonia molecule stays where it is, it has no momentum: p = 0. Therefore, its wavefunction reduces to:

a·e^−iθ= a·e⁻⁽^{i/ħ)·(E₀·t)}

As simple as it can be. 🙂 The point is that a wavefunction like this, i.e. a wavefunction that’s defined by a definite energy, will always yield a constant and equal probability, both in time as well as in space. That’s just the math of it: |a·e^−iθ|²= a². Always! If you want to know why, you should think of Euler’s formula and Pythagoras’ Theorem: cos²θ +sin²θ = 1. Always! 🙂

That constant probability is annoying, because our nitrogen atom never ‘flips’, and we know it actually does, thereby overcoming a energy barrier: it’s a phenomenon that’s referred to as ‘tunneling’, and it’s real! The probabilities in that graph above are real! Also, if our wavefunction would represent some moving particle, it would imply that the probability to find it somewhere in space is the same all over space, which implies our particle is everywhere and nowhere at the same time, really.

So, in quantum physics, this problem is solved by introducing uncertainty. Introducing some uncertainty about the energy, or about the momentum, is mathematically equivalent to saying that we’re actually looking at a composite wave, i.e. the sum of a finite or potentially infinite set of component waves. So we have the same ω = E/ħ and k = p/ħ relations, but we apply them to n energy levels, or to some continuous range of energy levels ΔE. It amounts to saying that our wave function doesn’t have a specific frequency: it now has n frequencies, or a range of frequencies Δω = ΔE/ħ. In our two-state system, n = 2, obviously! So we’ve two energy levels only and so our composite wave consists of two component waves only.

We know what that does: it ensures our wavefunction is being ‘contained’ in some ‘envelope’. It becomes a wavetrain, or a kind of beat note, as illustrated below:

[The animation comes from Wikipedia, and shows the difference between the group and phase velocity: the green dot shows the group velocity, while the red dot travels at the phase velocity.]

So… OK. That should be clear enough. Let’s now apply these thoughts to our ‘reduced’ wavefunction

Φ(φ) = a·e^−iφ= a·e^{−i(t − β·x)}

Thinking about uncertainty

Frankly, I tried to fool you above. If the functional form of the wavefunction is a·e⁻⁽^{i/ħ)·(E·t − p·x)}, then we can measure E and p in whatever unit we want, including h or ħ, but we cannot re-scale the argument of the function, i.e. the phase θ, without changing the functional form itself. I explained that in that post for my kids on wavefunctions:, in which I explained we may represent the same electromagnetic wave by two different functional forms:

F(ct−x) = G(t−x/c)

So F and G represent the same wave, but they are different wavefunctions. In this regard, you should note that the argument of F is expressed in distance units, as we multiply t with the speed of light (so it’s like our time unit is 299,792,458 m now), while the argument of G is expressed in time units, as we divide x by the distance traveled in one second). But F and G are different functional forms. Just do an example and take a simple sine function: you’ll agree that sin(θ) ≠ sin(θ/c) for all values of θ, except 0. Re-scaling changes the frequency, or the wavelength, and it does so quite drastically in this case. 🙂 Likewise, you can see that a·e^−i(φ/E)= [a·e^−iφ]^1/E, so that’s a very different function. In short, we were a bit too adventurous above. Now, while we can drop the 1/ħ in the a·e⁻⁽^{i/ħ)·(E·t − p·x)}function when measuring energy and momentum in units that are numerically equal to ħ, we’ll just revert to our original wavefunction for the time being, which equals

Ψ(θ) = a·e^−iθ= a·e⁻^{i·[(E/ħ)·t − (p/ħ)·x]}

Let’s now introduce uncertainty once again. The simplest situation is that we have two closely spaced energy levels. In theory, the difference between the two can be as small as ħ, so we’d write: E = E₀± ħ/2. [Remember what I said about the ± A: it means the difference is 2A.] However, we can generalize this and write: E = E₀± n·ħ/2, with n = 1, 2, 3,… This does not imply any greater uncertainty – we still have two states only – but just a larger difference between the two energy levels.

Let’s also simplify by looking at the ‘time part’ of our equation only, i.e. a·e⁻^i·(E/ħ)·t. It doesn’t mean we don’t care about the ‘space part’: it just means that we’re only looking at how our function varies in time and so we just ‘fix’ or ‘freeze’ x. Now, the uncertainty is in the energy really but, from a mathematical point of view, we’ve got an uncertainty in the argument of our wavefunction, really. This uncertainty in the argument is, obviously, equal to:

(E/ħ)·t = [(E₀± n·ħ/2)/ħ]·t = (E₀/ħ ± n/2)·t = (E₀/ħ)·t ± (n/2)·t

So we can write:

a·e⁻^i·(E/ħ)·t = a·e⁻^{i·[(E₀/ħ)·t ± (1/2)·t]} = a·e⁻^{i·[(E₀/ħ)·t]}·e^{i·[±(n/2)·t]}

This is valid for any value of t. What the expression says is that, from a mathematical point of view, introducing uncertainty about the energy is equivalent to introducing uncertainty about the wavefunction itself. It may be equal to a·e⁻^{i·[(E₀/ħ)·t]}·e^i·(n/2)·t, but it may also be equal to a·e⁻^{i·[(E₀/ħ)·t]}·e^{−i·(n/2)·t}. The phases of the e^−i·t/2 and e^i·t/2factors are separated by a distance equal to t.

So… Well…

[…]

Hmm… I am stuck. How is this going to lead me to the ΔE·Δt = ħ/2 principle? To anyone out there: can you help? 🙂

[…]

The thing is: you won’t get the Uncertainty Principle by staring at that formula above. It’s a bit more complicated. The idea is that we have some distribution of the observables, like energy and momentum, and that implies some distribution of the associated frequencies, i.e. ω for E, and k for p. The Wikipedia article on the Uncertainty Principle gives you a formal derivation of the Uncertainty Principle, using the so-called Kennard formulation of it. You can have a look, but it involves a lot of formalism—which is what I wanted to avoid here!

I hope you get the idea though. It’s like statistics. First, we assume we know the population, and then we describe that population using all kinds of summary statistics. But then we reverse the situation: we don’t know the population but we do have sample information, which we also describe using all kinds of summary statistics. Then, based on what we find for the sample, we calculate the estimated statistics for the population itself, like the mean value and the standard deviation, to name the most important ones. So it’s a bit the same here, except that, in quantum mechanics, there may not be any real value underneath: the mean and the standard deviation represent something fuzzy, rather than something precise.

Hmm… I’ll leave you with these thoughts. We’ll develop them further as we will be digging into all much deeper over the coming weeks. 🙂

Post scriptum: I know you expect something more from me, so… Well… Think about the following. If we have some uncertainty about the energy E, we’ll have some uncertainty about the momentum p according to that β = p/E. [By the way, please think about this relationship: it says, all other things being equal (such as the inertia, i.e. the mass, of our particle), that more energy will all go into more momentum. More specifically, note that ∂p/∂p = β according to this equation. In fact, if we include the mass of our particle, i.e. its inertia, as potential energy, then we might say that (1−β)·E is the potential energy of our particle, as opposed to its kinetic energy.] So let’s try to think about that.

Let’s denote the uncertainty about the energy as ΔE. As should be obvious from the discussion above, it can be anything: it can mean two separate energy levels E = E₀± A, or a potentially infinite set of values. However, even if the set is infinite, we know the various energy levels need to be separated by ħ, at least. So if the set is infinite, it’s going to be a countable infinite set, like the set of natural numbers, or the set of integers. But let’s stick to our example of two values E = E₀± A only, with A = ħ so E + ΔE = E₀± ħ and, therefore, ΔE = ± ħ. That implies Δp = Δ(β·E) = β·ΔE = ± β·ħ.

Hmm… This is a bit fishy, isn’t it? We said we’d measure the momentum in units of ħ, but so here we say the uncertainty in the momentum can actually be a fraction of ħ. […] Well… Yes. Now, the momentum is the product of the mass, as measured by the inertia of our particle to accelerations or decelerations, and its velocity. If we assume the inertia of our particle, or its mass, to be constant – so we say it’s a property of the object that is not subject to uncertainty, which, I admit, is a rather dicey assumption (if all other measurable properties of the particle are subject to uncertainty, then why not its mass?) – then we can also write: Δp = Δ(m·v) = Δ(m·β) = m·Δβ. [Note that we’re not only assuming that the mass is not subject to uncertainty, but also that the velocity is non-relativistic. If not, we couldn’t treat the particle’s mass as a constant.] But let’s be specific here: what we’re saying is that, if ΔE = ± ħ, then Δv = Δβ will be equal to Δβ = Δp/m = ± (β/m)·ħ. The point to note is that we’re no longer sure about the velocity of our particle. Its (relative) velocity is now:

β ± Δβ = β ± (β/m)·ħ

But, because velocity is the ratio of distance over time, this introduces an uncertainty about time and distance. Indeed, if its velocity is β ± (β/m)·ħ, then, over some time T, it will travel some distance X = [β ± (β/m)·ħ]·T. Likewise, it we have some distance X, then our particle will need a time equal to T = X/[β ± (β/m)·ħ].

You’ll wonder what I am trying to say because… Well… If we’d just measure X and T precisely, then all the uncertainty is gone and we know if the energy is E₀+ ħ or E₀− ħ. Well… Yes and no. The uncertainty is fundamental – at least that’s what’s quantum physicists believe – so our uncertainty about the time and the distance we’re measuring is equally fundamental: we can have either of the two values X = [β ± (β/m)·ħ] T = X/[β ± (β/m)·ħ], whenever or wherever we measure. So we have a ΔX and ΔT that are equal to ± [(β/m)·ħ]·T and X/[± (β/m)·ħ] respectively. We can relate this to ΔE and Δp:

ΔX = (1/m)·T·Δp
ΔT = X/[(β/m)·ΔE]

You’ll grumble: this still doesn’t give us the Uncertainty Principle in its canonical form. Not at all, really. I know… I need to do some more thinking here. But I feel I am getting somewhere. 🙂 Let me know if you see where, and if you think you can get any further. 🙂

The thing is: you’ll have to read a bit more about Fourier transforms and why and how variables like time and energy, or position and momentum, are so-called conjugate variables. As you can see, energy and time, and position and momentum, are obviously linked through the E·t and p·x products in the E₀·t − p·x sum. That says a lot, and it helps us to understand, in a more intuitive way, why the ΔE·Δt and Δp·Δx products should obey the relation they are obeying, i.e. the Uncertainty Principle, which we write as ΔE·Δt ≥ ħ/2 and Δp·Δx ≥ ħ/2. But so proving involves more than just staring at that Ψ(θ) = a·e^−iθ= a·e⁻^{i·[(E/ħ)·t − (p/ħ)·x]}relation.

Having said, it helps to think about how that E·t − p·x sum works. For example, think about two particles, a and b, with different velocity and mass, but with the same momentum, so p_a= p_b ⇔ m_a·v_a= m_a·v_a⇔ m_a/v_b= m_b/v_a. The spatial frequency of the wavefunction would be the same for both but the temporal frequency would be different, because their energy incorporates the rest mass and, hence, because m_a≠ m_b, we also know that E_a≠ E_b. So… It all works out but, yes, I admit it’s all very strange, and it takes a long time and a lot of reflection to advance our understanding.

Working with base states and Hamiltonians

I wrote a pretty abstract post on working with amplitudes, followed by more of the same, and then illustrated how it worked with a practical example (the ammonia molecule as a two-state system). Now it’s time for even more advanced stuff. Here we’ll show how to switch to another set of base states, and what it implies in terms of the Hamiltonian matrix and all of those equations, like those differential equations and – of course – the wavefunctions (or amplitudes) themselves. In short, don’t try to read this if you haven’t done your homework. 🙂

Let me continue the practical example, i.e. the example of the NH₃ molecule, as shown below. We abstracted away from all of its motion, except for its angular momentum – or its spin, you’d like to say, but that’s rather confusing, because we shouldn’t be using that term for the classical situation we’re presenting here – around its axis of symmetry. That angular momentum doesn’t change from state | 1 〉 to state | 2 〉. What’s happening here is that we allow the nitrogen atom to flip through the other side, so it tunnels through the plane of the hydrogen atoms, thereby going through an energy barrier.

It’s important to note that we do not specify what that energy barrier consists of. In fact, the illustration above may be misleading, because it presents all sorts of things we don’t need right now, like the electric dipole moment, or the center of mass of the molecule, which actually doesn’t change, unlike what’s suggested above. We just put them there to remind you that (a) quantum physics is based on physics – so there’s lots of stuff involved – and (b) because we’ll need that electric dipole moment later. But, as we’re introducing it, note that we’re using the μ symbol for it, which is usually reserved for the magnetic dipole moment, which is what you’d usually associate when thinking about the angular momentum or the spin, both in classical as well as in quantum mechanics. So the direction of rotation of our molecule, as indicated by the arrow around the axis at the bottom, and the μ in the illustration itself, have nothing to do with each other. So now you know. Also, as we’re talking symbols, you should note the use of ε to represent an electric field. We’d usually write the electric dipole moment and the electric field vector as p and E respectively, but so we use that now for linear momentum and energy, and so we borrowed them from our study of magnets. 🙂

The point to note is that, when we’re talking about the ‘up’ or ‘down’ state of our ammonia molecule, you shouldn’t think of it as ‘spin up’ or ‘spin down’. It’s not like that: it’s just the nitrogen atom being beneath or above the plane of the hydrogen atoms, and we define beneath or above assuming the direction of spin actually stays the same!

OK. That should be clear enough. In quantum mechanics, the situation is analyzed by associating two energy levels with the ammonia molecule, E₀+ A and E₀− A, so they are separated by an amount equal to 2A. This pair of energy levels has been confirmed experimentally: they are separated by an energy amount equal to 1×10⁻⁴ eV, so that’s less than a ten-thousandth of the energy of a photon in the visible-light spectrum. Therefore, a molecule that has a transition will emit a photon in the microwave range. The principle of a maser is based on exciting the the NH₃ molecules, and then induce transitions. One can do that by applying an external electric field. The mechanism works pretty much like what we described when discussing the tunneling phenomenon: an external force field will change the energy factor in the wavefunction, by adding potential energy (let’s say an amount equal to U) to the total energy, which usually consists of the internal (E_int) and kinetic (p²/(2m) = m·v) energy only. So now we write a·e^{−i[(E_int+ m·v + U)·t − p∙x]/ħ} instead of a·e^{−i[(E_int+ m·v)·t − p∙x]/ħ}.

Of course, a·e^{−i·(E·t − p∙x)/ħ} is an idealized wavefunction only, or a Platonic wavefunction – as I jokingly referred to it in my previous post. A real wavefunction has to deal with these uncertainties: we don’t know E and p. At best, we have a discrete set of possible values, like E₀+ A and E₀− A in this case. But it might as well be some range, which we denote as ΔE and Δp, and then we need to make some assumption in regard to the probability density function that we’re going to associate with it. But I am getting ahead of myself here. Back to NH₃, i.e. our simple two-state system. Let’s first do some mathematical gymnastics.

Choosing another representation

We have two base states in this system: ‘up’ or ‘down’, which we denoted as base state | 1 〉 and base state | 2 〉 respectively. You’ll also remember we wrote the amplitude to find the molecule in either one of these two states as:

C₁= 〈 1 | ψ 〉 = (1/2)·e^{−(i/ħ)·(E₀− A)·t}+ (1/2)·e^{−(i/ħ)·(E₀+ A)·t}
C₂= 〈 2 | ψ 〉 = (1/2)·e^{−(i/ħ)·(E₀− A)·t}– (1/2)·e^{−(i/ħ)·(E₀+ A)·t}

That gave us the following probabilities:

Now, you may think there is only one possible set of base states here, as it’s not like measuring spin along this or that direction. These two base states are much simpler: it’s a matter of the nitrogen being beneath or above the plane of the hydrogens, and we’re only interested in the angular momentum of the molecule around its axis of symmetry to help us define what ‘up’ and what’s ‘down’. That’s all. However, from a quantum math point of view, we can actually choose some other ‘representation’. Now, these base state vectors | i 〉 are a bit tough to understand, so let’s, in our first go at it, use those coefficients C_i, which are ‘proper’ amplitudes. We’ll define two new coefficients, C_I and C_II, which – you’ve guess it – we’ll associate with an alternative set of base states | I 〉 and | II 〉. We’ll define them as follows:

C_I= 〈 I | ψ 〉 = (1/√2)·(C₁− C₂)
C_II= 〈 II | ψ 〉 = (1/√2)·(C₁+ C₂)

[The (1/√2) factor is there because of the normalization condition, obviously. We could take it out and then do the whole analysis to plug it in later, as Feynman does, but I prefer to do it this way, as it reminds us that our wavefunctions are to be related to probabilities at some point in time. :-)]

Now, you can easily check that, when substituting our C₁and C₂for those wavefunctions above, we get:

C_I= 〈 I | ψ 〉 = (1/√2)·e^{−(i/ħ)·(E₀+ A)·t}
C_II= 〈 I | ψ 〉 = (1/√2)·e^{−(i/ħ)·(E₀− A)·t}

Note that the way plus and minus signs switch here makes things not so easy to remember, but that’s how it is. 🙂 So we’ve got our stationary state solutions here, that are associated with probabilities that do not vary in time. [In case you wonder: that’s the definition of a ‘stationary state’: we’ve got something with a definite energy and, therefore, the probability that’s associated with it is some constant.] Of course, now you’ll cry wolf and say: these wavefunctions don’t actually mean anything, do they? They don’t describe how ammonia actually behaves, do they? Well… Yes and no. The base states I and II actually do allow us to describe whatever we need to describe. To be precise, describing the state φ in terms of the base states | 1 〉 and | 2 〉, i.e. writing | φ 〉 as:

| φ 〉 = | 1 〉 C₁ + | 2 〉 C₂,

is mathematically equivalent to writing:

| φ 〉 = | I 〉 C_I + | II 〉 C_II.

We can easily show that, even if it requires some gymnastics indeed—but then you should look at it as just another exercise in quantum math and so, yes, please do go through the logic. First note that the C_I= 〈 I | ψ 〉 = (1/√2)·(C₁− C₂) and C_II= 〈 II | ψ 〉 = (1/√2)·(C₁+ C₂) expressions are equivalent to:

〈 I | ψ 〉 = (1/√2)·[〈 1 | ψ 〉 − 〈 2 | ψ 〉] and 〈 II | ψ 〉 = (1/√2)·[〈 1 | ψ 〉 + 〈 2 | ψ 〉]

Now, using our quantum math rules, we can abstract the | ψ 〉 away, and so we get:

〈 I | = (1/√2)·[〈 1 | − 〈 2 |] and 〈 II | = (1/√2)·[〈 1 | + 〈 2 |]

We could also have applied the complex conjugate rule to the expression for 〈 I | ψ 〉 above (the complex conjugate of a sum (or a product) is the sum (or the product) of the complex conjugates), and then abstract 〈 ψ | away, so as to write:

| I 〉 = (1/√2)·[| 1 〉 − | 2 〉] and | II 〉 = (1/√2)·[| 1 〉 + | 2 〉]

OK. So what? We’ve only shown our new base states can be written as similar combinations as those C_Iand C_IIcoefficients. What proves they are base states? Well… The first rule of quantum math actually defines them as states i respecting the following condition:

〈 i | j〉 = 〈 j | i〉 = δ_ij, with δ_ij= δ_jiis equal to 1 if i = j, and zero if i ≠ j

We can prove that as follows. First, use the | I 〉 = (1/√2)·[| 1 〉 − | 2 〉] and | II 〉 = (1/√2)·[| 1 〉 + | 2 〉] result above to check the following:

〈 I | I 〉 = (1/√2)·[〈 I | 1 〉 − 〈 I | 2 〉]
〈 II | II 〉 = (1/√2)·[〈 II | 1 〉 + 〈 II | 2 〉]
〈 II | I 〉 = (1/√2)·[〈 II | 1 〉 − 〈 II | 2 〉]
〈 I | II 〉 = (1/√2)·[〈 I | 1 〉 + 〈 I | 2 〉]

Now we need to find those 〈 I | i 〉 and 〈 II | i 〉 amplitudes. To do that, we can use that 〈 I | ψ 〉 = (1/√2)·[〈 1 | ψ 〉 − 〈 2 | ψ 〉] and 〈 II | ψ 〉 = (1/√2)·[〈 1 | ψ 〉 + 〈 2 | ψ 〉] equation and substitute:

〈 I | 1 〉 = (1/√2)·[〈 1 | 1 〉 − 〈 2 | 1 〉] = (1/√2)
〈 I | 2 〉 = (1/√2)·[〈 1 | 2 〉 − 〈 2 | 2 〉] = −(1/√2)
〈 II | 1 〉 = (1/√2)·[〈 1 | 1 〉 + 〈 2 | 1 〉] = (1/√2)
〈 II | 2 〉 = (1/√2)·[〈 1 | 2 〉 + 〈 2 | 2 〉] = (1/√2)

So we get:

〈 I | I 〉 = (1/√2)·[〈 I | 1 〉 − 〈 I | 2 〉] = (1/√2)·[(1/√2) + (1/√2)] = (2/(√2·√2) = 1
〈 II | II 〉 = (1/√2)·[〈 II | 1 〉 + 〈 II | 2 〉] = (1/√2)·[(1/√2) + (1/√2)] = 1
〈 II | I 〉 = (1/√2)·[〈 II | 1 〉 − 〈 II | 2 〉] = (1/√2)·[(1/√2) − (1/√2)] = 0
〈 I | II 〉 = (1/√2)·[〈 I | 1 〉 + 〈 I | 2 〉] = (1/√2)·[(1/√2) − (1/√2)] = 0

So… Well.. Yes. That’s equivalent to:

〈 I | I 〉 = 〈 II | II 〉 = 1 and 〈 I | II 〉 = 〈 II | I 〉 = 0

Therefore, we can confidently say that our | I 〉 = (1/√2)·[| 1 〉 − | 2 〉] and | II 〉 = (1/√2)·[| 1 〉 + | 2 〉] state vectors are, effectively, base vectors in their own right. Now, we’re going to have to grow very fond of matrices, so let me write our ‘definition’ of the new base vectors as a matrix formula:

You’ve seen this before. The two-by-two matrix is the transformation matrix for a rotation of state filtering apparatus about the y-axis, over an angle equal to (minus) 90 degrees, when only two states are involved:

You’ll wonder why we should go through all that trouble. Part of it, of course, is to just learn these tricks. The other reason, however, is that it does simplify calculations. Here I need to remind you of the Hamiltonian matrix and the set of differential equations that comes with it. For a system with two base states, we’d have the following set of equations:

Now, adding and subtracting those two equations, and then differentiating the expressions you get (with respect to t), should give you the following two equations:

So what about it? Well… If we transform to the new set of base states, and use the C_Iand C_IIcoefficients instead of those C₁and C₂coefficients, then it turns out that our set of differential equations simplifies, because – as you can see – two out of the four Hamiltonian coefficients are zero, so we can write:

Now you might think that’s not worth the trouble but, of course, now you know how it goes, and so next time it will be easier. 🙂

On a more serious note, I hope you can appreciate the fact that with more states than just two, it will become important to diagonalize the Hamiltonian matrix so as simplify the problem of solving the related set of differential equations. Once we’ve got the solutions, we can always go back to calculate the wavefunctions we want, i.e. the C₁and C₂functions that we happen to like more in this particular case. Just to remind you of how this works, remember that we can describe any state φ both in terms of the base states | 1 〉 and | 2 〉 as well as in terms of the base states | I 〉 and | II 〉, so we can either write:

| φ 〉 = | 1 〉 C₁ + | 2 〉 C₂ or, alternatively, | φ 〉 = | I 〉 C_I + | II 〉 C_II.

Now, if we choose, or define, C_Iand C_IIthe way we do – so that’s as C_I= (1/√2)·(C₁− C₂) and C_II= (1/√2)·(C₁+ C₂) respectively – then the Hamiltonian matrices that come with them are the following ones:

To understand those matrices, let me remind you here of that equation for the Hamiltonian coefficients in those matrices:

U_ij(t + Δt, t) = δ_ij + K_ij(t)·Δt = δ_ij − (i/ħ)·H_ij(t)·Δt

In my humble opinion, this makes the difference clear. The | I 〉 and | II 〉 base states are clearly separated, mathematically, as much as the | 1 〉 and | 2 〉 base states were separated conceptually. There is no amplitude to go from state I to state II, but then both states are a mix of state 1 and 2, so the physical reality they’re describing is exactly the same: we’re just pushing the temporal variation of the probabilities involved from the coefficients we’re using in our differential equations to the base states we use to define those coefficients – or vice versa.

Huh? Yes… I know it’s all quite deep, and I haven’t quite come to terms with it myself, so that’s why I’ll let you think about it. 🙂 To help you think this through, think about this: the C₁ and C₂wavefunctions made sense but, at the same time, they were not very ‘physical’ (read: classical), because they incorporated uncertainty—as they mix two different energy levels. However, the associated base states – which I’ll call ‘up’ and ‘down’ here – made perfect sense, in a classical ‘physical’ sense, that is (my English seems to be getting poorer and poorer—sorry for that!). Indeed, in classical physics, the nitrogen atom is either here or there, right? Not somewhere in-between. 🙂 Now, the C_I and C_IIwavefunctions make sense in the classical sense because they are stationary and, hence, they’re associated with a very definite energy level. In fact, as definite, or as classical, as when we say: the nitrogen atom is either here or there. Not somewhere in-between. But they don’t make sense in some other way: we know that the nitrogen atom will, sooner or later, effectively tunnel through. So they do not describe anything real. So how do we capture reality now? Our C_I and C_IIwavefunctions don’t do that explicitly, but implicitly, as the base states now incorporate all of the uncertainty. Indeed, the C_I and C_IIwavefunctions are described in terms of the base states I and II, which themselves are a mixture of our ‘classical’ up or down states. So, yes, we are kicking the ball around here, from a math point of view. Does that make sense? If not, sorry. I can’t do much more. You’ll just have to think through this yourself. 🙂

Let me just add one little note, totally unrelated to what I just wrote, to conclude this little excursion. I must assume that, in regard of diagonalization, you’ve heard about eigenvalues and eigenvectors. In fact, I must assume you heard about this when you learned about matrices in high school. So… Well… In case you wonder, that’s where we need this stuff. 🙂

OK. On to the next !

The general solution for a two-state system

Now, you’ll wonder why, after all of the talk about the need to simplify the Hamiltonian, I will now present a general solution for any two-state system, i.e. any pair of Hamiltonian equations for two-state systems. However, you’ll soon appreciate why, and you’ll also connect the dots with what I wrote above.

Let me first give you the general solution. In fact, I’ll copy it from Feynman (just click on it to enlarge it, or read it in Feynman’s Lecture on it yourself):

The problem is, of course, how do we interpret that solution? Let me make it big:

This says that the general solution to any two-state system amounts to calculating two separate energy levels using the Hamiltonian coefficients as they are being used in those equations above. So there is an ‘upper’ energy level, which is denoted as E_I, and a ‘lower’ energy level, which is denoted as E_II.

What? So it doesn’t say anything about the Hamiltonian coefficients themselves? No. It doesn’t. What did you expect? Those coefficients define the system as such. So the solution is as general as the ‘two-state system’ we wanted to solve: conceptually, it’s characterized by two different energy levels, but that’s about all we can say about it.

[…] Well… No. The solutions above are specific functional forms and, to find them, we had to make certain assumptions and impose certain conditions so as to ensure there’s any non-zero solution at all! In fact, that’s all the fine print above, so I won’t dwell on that—and you had better stop complaining! 🙂 Having said that, the solutions above are very general indeed, and so now it’s up to us to look at specific two-state systems, like our ammonia molecule, and make educated guesses so as to come up with plausible values or functional forms for those Hamiltonian coefficients. That’s what we did when we equated H₁₁ and H₂₂ with some average energy E₀, and H₁₂ and H₁₂ with some energy A. [Minus A, in fact—but we might have chosen some positive value +A. Same solution. In fact, I wonder why Feynman didn’t go for the +A value. It doesn’t matter, really, because we’re talking energy differences, but… Well… Any case… That’s how it is. I guess he just wanted to avoid having to switch the indices 1 and 2, and the coefficients a and b and what have you. But it’s the same. Honestly. :-)]

So… Well… We could do the same here and analyze the solutions we’ve found in our previous posts but… Well… I don’t think that’s very interesting. In addition, I’ll make some references to that in my next post anyway, where we’re going to be analyzing the ammonia molecule in terms of it I and II states, so as to prepare a full-blown analysis of how a maser works.

Just to wet your appetite, let me tell you that the mysterious I and II states do have a wonderfully practical physical interpretation as well. Just scroll back it all the way up, and look at the opposite electric dipole moment that’s associated with state 1 and 2. Now, the two pictures have the angular momentum in the same direction, but we might expect that, when looking at a beam of random NH₃ molecules – think of gas being let out of a little jet 🙂 – the angular momentum will be distributed randomly. So… Well… The thing is: the molecules in state I, or in state II, will all have their electric dipole moment lined up in the very same physical direction. So, in that sense, they’re really ‘up’ or ‘down’, and we’ll be able to separate them in an inhomogeneous electric field, just like we were able to separate ‘up’ or ‘down’ electrons, protons or whatever spin-1/2 particles in an inhomogeneous magnetic field.

But so that’s for the next post. I just wanted to tell you that our | I 〉 and | II 〉 base states do make sense. They’re more than just ‘mathematical’ states. They make sense as soon as we’re moving away from an analysis in terms of one NH₃ molecule only because… Well… Are you surprised, really? You shouldn’t be. 🙂 Let’s go for it straight away.

The ammonia molecule in an electric field

Our educating guess of the Hamiltonian matrix for the ammonia molecule was the following:

Hamiltion before

This guess was ‘educated’ because we knew what we wanted to get out of it, and that’s those time-dependent probabilities to be in state 1 or state 2:

Now, we also know that state 1 and 2 are associated with opposite electric dipole moments, as illustrated below.

Hence, it’s only natural, when applying an external electric field ε to a whole bunch of ammonia molecules –think of some beam – that our ‘educated’ guess would change to:

Why the minus sign for με in the H₂₂term? You can answer that question yourself: the associated energy is μ·ε = μ·ε·cosθ, and θ is ±π here, as we’re talking opposite directions. So… There we are. 🙂 The consequences show when using those values in the general solution for our system of differential equations. Indeed, the

equations become:

The graph of this looks as follows:

The upshot is: we can separate the the NH₃ molecules in a inhomogeneous electric field based on their state, and then I mean state I or II, not state 1 or 2. How? Let me copy Feynman on that: it’s like a Stern-Gerlach apparatus, really. 🙂

So that’s it. We get the following:

That will feed into the maser, which looks as follows:

But… Well… Analyzing how a maser works involves another realm of physics: cavities and resonances. I don’t want to get into that here. I only wanted to show you why and how different representations of the same thing are useful, and how it translates into a different Hamiltonian matrix. I think I’ve done that, and so let’s call it a night. 🙂 I hope you enjoyed this one. If not… Well… I did. 🙂

Occam’s Razor

The analysis of a two-state system (i.e. the rather famous example of an ammonia molecule ‘flipping’ its spin direction from ‘up’ to ‘down’, or vice versa) in my previous post is a good opportunity to think about Occam’s Razor once more. What are we doing? What does the math tell us?

In the example we chose, we didn’t need to worry about space. It was all about time: an evolving state over time. We also knew the answers we wanted to get: if there is some probability for the system to ‘flip’ from one state to another, we know it will, at some point in time. We also want probabilities to add up to one, so we knew the graph below had to be the result we would find: if our molecule can be in two states only, and it starts of in one, then the probability that it will remain in that state will gradually decline, while the probability that it flips into the other state will gradually increase, which is what is depicted below.

However, the graph above is only a Platonic idea: we don’t bother to actually verify what state the molecule is in. If we did, we’d have to ‘re-set’ our t = 0 point, and start all over again. The wavefunction would collapse, as they say, because we’ve made a measurement. However, having said that, yes, in the physicist’s Platonic world of ideas, the probability functions above make perfect sense. They are beautiful. You should note, for example, that P₁ (i.e. the probability to be in state 1) and P₂ (i.e. the probability to be in state 2) add up to 1 all of the time, so we don’t need to integrate over a cycle or something: so it’s all perfect!

These probability functions are based on ideas that are even more Platonic: interfering amplitudes. Let me explain.

Quantum physics is based on the idea that these probabilities are determined by some wavefunction, a complex-valued amplitude that varies in time and space. It’s a two-dimensional thing, and then it’s not. It’s two-dimensional because it combines a sine and cosine, i.e. a real and an imaginary part, but the argument of the sine and the cosine is the same, and the sine and cosine are the same function, except for a phase shift equal to π. We write:

a·e^−iθ= a·cos(θ) – a·sin(−θ) = a·cosθ – a·sinθ

The minus sign is there because it turns out that Nature measures angles, i.e. our phase, clockwise, rather than counterclockwise, so that’s not as per our mathematical convention. But that’s a minor detail, really. [It should give you some food for thought, though.] For the rest, the related graph is as simple as the formula:

Now, the phase of this wavefunction is written as θ = (ω·t − k ∙x). Hence, ω determines how this wavefunction varies in time, and the wavevector k tells us how this wave varies in space. The young Frenchman Comte Louis de Broglie noted the mathematical similarity between the ω·t − k ∙x expression and Einstein’s four-vector product p_μx_μ= E·t − p∙x, which remains invariant under a Lorentz transformation. He also understood that the Planck-Einstein relation E = ħ·ω actually defines the energy unit and, therefore, that any frequency, any oscillation really, in space or in time, is to be expressed in terms of ħ.

[To be precise, the fundamental quantum of energy is h = ħ·2π, because that’s the energy of one cycle. To illustrate the point, think of the Planck-Einstein relation. It gives us the energy of a photon with frequency f: E_γ = h·f. If we re-write this equation as E_γ/f = h, and we do a dimensional analysis, we get: h = E_γ/f ⇔ 6.626×10⁻³⁴ joule·second = [x joule]/[f cycles per second] ⇔ h = 6.626×10⁻³⁴ joule per cycle. It’s only because we are expressing ω and k as angular frequencies (i.e. in radians per second or per meter, rather than in cycles per second or per meter) that we have to think of ħ = h/2π rather than h.]

Louis de Broglie connected the dots between some other equations too. He was fully familiar with the equations determining the phase and group velocity of composite waves, or a wavetrain that actually might represent a wavicle traveling through spacetime. In short, he boldly equated ω with ω = E/ħ and k with k = p/ħ, and all came out alright. It made perfect sense!

I’ve written enough about this. What I want to write about here is how this also makes for the situation on hand: a simple two-state system that depends on time only. So its phase is θ = ω·t = E₀/ħ. What’s E₀? It is the total energy of the system, including the equivalent energy of the particle’s rest mass and any potential energy that may be there because of the presence of one or the other force field. What about kinetic energy? Well… We said it: in this case, there is no translational or linear momentum, so p = 0. So our Platonic wavefunction reduces to:

a·e^−iθ= ae⁻⁽^{i/ħ)·(E₀·t)}

Great! […] But… Well… No! The problem with this wavefunction is that it yields a constant probability. To be precise, when we take the absolute square of this wavefunction – which is what we do when calculating a probability from a wavefunction − we get P = a², always. The ‘normalization’ condition (so that’s the condition that probabilities have to add up to one) implies that P₁ = P₂ = a² = 1/2. Makes sense, you’ll say, but the problem is that this doesn’t reflect reality: these probabilities do not evolve over time and, hence, our ammonia molecule never ‘flips’ its spin direction from ‘up’ to ‘down’, or vice versa. In short, our wavefunction does not explain reality.

The problem is not unlike the problem we’d had with a similar function relating the momentum and the position of a particle. You’ll remember it: we wrote it as a·e^−iθ= ae⁽^{i/ħ)·(p·x)}. [Note that we can write a·e^−iθ= a·e⁻⁽^{i/ħ)·(E₀·t − p·x)}= a·e⁻⁽ⁱ^/ħ)·(E^₀·t)·e⁽^{i/ħ)·(p·x)}, so we can always split our wavefunction in a ‘time’ and a ‘space’ part.] But then we found that this wavefunction also yielded a constant and equal probability all over space, which implies our particle is everywhere (and, therefore, nowhere, really).

In quantum physics, this problem is solved by introducing uncertainty. Introducing some uncertainty about the energy, or about the momentum, is mathematically equivalent to saying that we’re actually looking at a composite wave, i.e. the sum of a finite or infinite set of component waves. So we have the same ω = E/ħ and k = p/ħ relations, but we apply them to n energy levels, or to some continuous range of energy levels ΔE. It amounts to saying that our wave function doesn’t have a specific frequency: it now has n frequencies, or a range of frequencies Δω = ΔE/ħ.

We know what that does: it ensures our wavefunction is being ‘contained’ in some ‘envelope’. It becomes a wavetrain, or a kind of beat note, as illustrated below:

[The animation also shows the difference between the group and phase velocity: the green dot shows the group velocity, while the red dot travels at the phase velocity.]

This begs the following question: what’s the uncertainty really? Is it an uncertainty in the energy, or is it an uncertainty in the wavefunction? I mean: we have a function relating the energy to a frequency. Introducing some uncertainty about the energy is mathematically equivalent to introducing uncertainty about the frequency. Of course, the answer is: the uncertainty is in both, so it’s in the frequency and in the energy and both are related through the wavefunction. So… Well… Yes. In some way, we’re chasing our own tail. 🙂

However, the trick does the job, and perfectly so. Let me summarize what we did in the previous post: we had the ammonia molecule, i.e. an NH₃ molecule, with the nitrogen ‘flipping’ across the hydrogens from time to time, as illustrated below:

This ‘flip’ requires energy, which is why we associate two energy levels with the molecule, rather than just one. We wrote these two energy levels as E₀+ A and E₀− A. That assumption solved all of our problems. [Note that we don’t specify what the energy barrier really consists of: moving the center of mass obviously requires some energy, but it is likely that a ‘flip’ also involves overcoming some electrostatic forces, as shown by the reversal of the electric dipole moment in the illustration above.] To be specific, it gave us the following wavefunctions for the amplitude to be in the ‘up’ or ‘1’ state versus the ‘down’ or ‘2’ state respectivelly:

C₁= (1/2)·e^{−(i/ħ)·(E₀− A)·t}+ (1/2)·e^{−(i/ħ)·(E₀+ A)·t}
C₂= (1/2)·e^{−(i/ħ)·(E₀− A)·t}– (1/2)·e^{−(i/ħ)·(E₀+ A)·t}

Both are composite waves. To be precise, they are the sum of two component waves with a temporal frequency equal to ω₁= (E₀− A)/ħ and ω₁= (E₀+ A)/ħ respectively. [As for the minus sign in front of the second term in the wave equation for C₂, −1 = e^±iπ, so + (1/2)·e^{−(i/ħ)·(E₀+ A)·t}and – (1/2)·e^{−(i/ħ)·(E₀+ A)·t} are the same wavefunction: they only differ because their relative phase is shifted by ±π.] So the so-called base states of the molecule themselves are associated with two different energy levels: it’s not like one state has more energy than the other.

You’ll say: so what?

Well… Nothing. That’s it really. That’s all I wanted to say here. The absolute square of those two wavefunctions gives us those time-dependent probabilities above, i.e. the graph we started this post with. So… Well… Done!

You’ll say: where’s the ‘envelope’? Oh! Yes! Let me tell you. The C₁(t) and C₂(t) equations can be re-written as:

Now, remembering our rules for adding and subtracting complex conjugates (e^iθ + e^–iθ = 2cosθ and e^iθ − e^–iθ = 2sinθ), we can re-write this as:

So there we are! We’ve got wave equations whose temporal variation is basically defined by E₀but, on top of that, we have an envelope here: the cos(A·t/ħ) and sin(A·t/ħ) factor respectively. So their magnitude is no longer time-independent: both the phase as well as the amplitude now vary with time. The associated probabilities are the ones we plotted:

|C₁(t)|²= cos²[(A/ħ)·t], and
|C₂(t)|²= sin²[(A/ħ)·t].

So, to summarize it all once more, allowing the nitrogen atom to push its way through the three hydrogens, so as to flip to the other side, thereby breaking the energy barrier, is equivalent to associating two energy levels to the ammonia molecule as a whole, thereby introducing some uncertainty, or indefiniteness as to its energy, and that, in turn, gives us the amplitudes and probabilities that we’ve just calculated. [And you may want to note here that the probabilities “sloshing back and forth”, or “dumping into each other” – as Feynman puts it – is the result of the varying magnitudes of our amplitudes, so that’s the ‘envelope’ effect. It’s only because the magnitudes vary in time that their absolute square, i.e. the associated probability, varies too.

So… Well… That’s it. I think this and all of the previous posts served as a nice introduction to quantum physics. More in particular, I hope this post made you appreciate the mathematical framework is not as horrendous as it often seems to be.

When thinking about it, it’s actually all quite straightforward, and it surely respects Occam’s principle of parsimony in philosophical and scientific thought, also know as Occam’s Razor: “When trying to explain something, it is vain to do with more what can be done with less.” So the math we need is the math we need, really: nothing more, nothing less. As I’ve said a couple of times already, Occam would have loved the math behind QM: the physics call for the math, and the math becomes the physics.

That’s what makes it beautiful. 🙂

Post scriptum:

One might think that the addition of a term in the argument in itself would lead to a beat note and, hence, a varying probability but, no! We may look at e^{−(i/ħ)·(E₀+ A)·t}as a product of two amplitudes:

e^{−(i/ħ)·(E₀+ A)·t}= e^{−(i/ħ)·E₀·t}·e^{−(i/ħ)·A·t}

But, when writing this all out, one just gets a cos(α·t+β·t)–sin(α·t+β·t), whose absolute square |cos(α·t+β·t)–sin(α·t+β·t)|²= 1. However, writing e^{−(i/ħ)·(E₀+ A)·t}as a product of two amplitudes in itself is interesting. We multiply amplitudes when an event consists of two sub-events. For example, the amplitude for some particle to go from s to x via some point a is written as:

〈 x | s 〉_{via a} = 〈 x | a 〉〈 a | s 〉

Having said that, the graph of the product is uninteresting: the real and imaginary part of the wavefunction are a simple sine and cosine function, and their absolute square is constant, as shown below.

Adding two waves with very different frequencies – A is a fraction of E₀– gives a much more interesting pattern, like the one below, which shows an e^−iαt+e^−iβt= cos(αt)−i·sin(αt)+cos(βt)−i·sin(βt) = cos(αt)+cos(βt)−i·[sin(αt)+sin(βt)] pattern for α = 1 and β = 0.1.

That doesn’t look a beat note, does it? The graphs below, which use 0.5 and 0.01 for β respectively, are not typical beat notes either.

We get our typical ‘beat note’ only when we’re looking at a wave traveling in space, so then we involve the space variable x again, and the relations that come with in, i.e. a phase velocity v_p= ω/k = (E/ħ)/(p/ħ) = E/p = c²/v (read: all component waves travel at the same speed), and a group velocity v_g= dω/dk = v (read: the composite wave or wavetrain travels at the classical speed of our particle, so it travels with the particle, so to speak). That’s what’s I’ve shown numerous times already, but I’ll insert one more animation here, just to make sure you see what we’re talking about. [Credit for the animation goes to another site, one on acoustics, actually!]

So what’s left? Nothing much. The only thing you may want to do is to continue thinking about that wavefunction. It’s tempting to think it actually is the particle, somehow. But it isn’t. So what is it then? Well… Nobody knows, really, but I like to think it does travel with the particle. So it’s like a fundamental property of the particle. We need it every time when we try to measure something: its position, its momentum, its spin (i.e. angular momentum) or, in the example of our ammonia molecule, its orientation in space. So the funny thing is that, in quantum mechanics,

We can measure probabilities only, so there’s always some randomness. That’s how Nature works: we don’t really know what’s happening. We don’t know the internal wheels and gears, so to speak, or the ‘hidden variables’, as one interpretation of quantum mechanics would say. In fact, the most commonly accepted interpretation of quantum mechanics says there are no ‘hidden variables’.
But then, as Polonius famously put, there is a method in this madness, and the pioneers – I mean Werner Heisenberg, Louis de Broglie, Niels Bohr, Paul Dirac, etcetera – discovered. All probabilities can be found by taking the square of the absolute value of a complex-valued wavefunction (often denoted by Ψ), whose argument, or phase (θ), is given by the de Broglie relations ω = E/ħ and k = p/ħ:

θ = (ω·t − k ∙x) = (E/ħ)·t − (p/ħ)·x

That should be obvious by now, as I’ve written dozens of posts on this by now. 🙂 I still have trouble interpreting this, however—and I am not ashamed, because the Great Ones I just mentioned have trouble with that too. But let’s try to go as far as we can by making a few remarks:

Adding two terms in math implies the two terms should have the same dimension: we can only add apples to apples, and oranges to oranges. We shouldn’t mix them. Now, the (E/ħ)·t and (p/ħ)·x terms are actually dimensionless: they are pure numbers. So that’s even better. Just check it: energy is expressed in newton·meter (force over distance, remember?) or electronvolts (1 eV = 1.6×10⁻¹⁹J = 1.6×10⁻¹⁹N·m); Planck’s constant, as the quantum of action, is expressed in J·s or eV·s; and the unit of (linear) momentum is 1 N·s = 1 kg·m/s = 1 N·s. E/ħ gives a number expressed per second, and p/ħ a number expressed per meter. Therefore, multiplying it by t and x respectively gives us a dimensionless number indeed.
It’s also an invariant number, which means we’ll always get the same value for it. As mentioned above, that’s because the four-vector product p_μx_μ= E·t − p∙x is invariant: it doesn’t change when analyzing a phenomenon in one reference frame (e.g. our inertial reference frame) or another (i.e. in a moving frame).
Now, Planck’s quantum of action h or ħ (they only differ in their dimension: h is measured in cycles per second and ħ is measured in radians per second) is the quantum of energy really. Indeed, if “energy is the currency of the Universe”, and it’s real and/or virtual photons who are exchanging it, then it’s good to know the currency unit is h, i.e. the energy that’s associated with one cycle of a photon.
It’s not only time and space that are related, as evidenced by the fact that t − x itself is an invariant four-vector, E and p are related too, of course! They are related through the classical velocity of the particle that we’re looking at: E/p = c²/v and, therefore, we can write: E·β = p·c, with β = v/c, i.e. the relative velocity of our particle, as measured as a ratio of the speed of light. Now, I should add that the t − x four-vector is invariant only if we measure time and space in equivalent units. Otherwise, we have to write c·t − x. If we do that, so our unit of distance becomes c meter, rather than one meter, or our unit of time becomes the time that is needed for light to travel one meter, then c = 1, and the E·β = p·c becomes E·β = p, which we also write as β = p/E: the ratio of the energy and the momentum of our particle is its (relative) velocity.

θ = (ω·t − k ∙x) = E·t − p·x

Now, θ is the argument of a wavefunction, and we can always re-scale such argument by multiplying or dividing it by some constant. It’s just like writing the argument of a wavefunction as v·t–x or (v·t–x)/v = t –x/v with v the velocity of the waveform that we happen to be looking at. [In case you have trouble following this argument, please check the post I did for my kids on waves and wavefunctions.] Now, the energy conservation principle tells us the energy of a free particle won’t change. [Just to remind you, a ‘free particle’ means it is present in a ‘field-free’ space, so our particle is in a region of uniform potential.] You see what I am going to do now: we can, in this case, treat E as a constant, and divide E·t − p·x by E, so we get a re-scaled phase for our wavefunction, which I’ll write as:

φ = (E·t − p·x)/E = t − (p/E)·x = t − β·x

Now that’s the argument of a wavefunction with the argument expressed in distance units. Alternatively, we could also look at p as some constant, as there is no variation in potential energy that will cause a change in momentum, i.e. in kinetic energy. We’d then divide by p and we’d get (E·t − p·x)/p = (E/p)·t − x) = t/β − x, which amounts to the same, as we can always re-scale by multiplying it with β, which would then yield the same t − β·x argument.

Φ(φ) = a·e^−iφ= a·e^{−i(t − β·x)}

Of course, the analysis above does not include uncertainty. Our information on the energy and the momentum of our particle will be incomplete: we’ll write E = E₀± σ_E, and p = p₀± σ_p. [I am a bit tired of using the Δ symbol, so I am using the σ symbol here, which denotes a standard deviation of some density function. It underlines the probabilistic, or statistical, nature of our approach.] But, including that, we’ve pretty much explained what quantum physics is about here.

You just need to get used to that complex exponential: e^−iφ = cos(−φ) + i·sin(−φ) = cos(φ) − i·sin(φ). Of course, it would have been nice if Nature would have given us a simple sine or cosine function. [Remember the sine and cosine function are actually the same, except for a phase difference of 90 degrees: sin(φ) = cos(π/2−φ) = cos(φ+π/2). So we can go always from one to the other by shifting the origin of our axis.] But… Well… As we’ve shown so many times already, a real-valued wavefunction doesn’t explain the interference we observe, be it interference of electrons or whatever other particles or, for that matter, the interference of electromagnetic waves itself, which, as you know, we also need to look at as a stream of photons , i.e. light quanta, rather than as some kind of infinitely flexible aether that’s undulating, like water or air.

So… Well… Just accept that e^−iφ is a very simple periodic function, consisting of two sine waves rather than just one, as illustrated below.

And then you need to think of stuff like this (the animation is taken from Wikipedia), but then with a projection of the sine of those phasors too. It’s all great fun, so I’ll let you play with it now. 🙂

The Hamiltonian for a two-state system: the ammonia example

Ammonia, i.e. NH₃, is a colorless gas with a strong smell. Its serves as a precursor in the production of fertilizer, but we also know it as a cleaning product, ammonium hydroxide, which is NH₃ dissolved in water. It has a lot of other uses too. For example, its use in this post, is to illustrate a two-state system. 🙂 We’ll apply everything we learned in our previous posts and, as I mentioned when finishing the last of those rather mathematical pieces, I think the example really feels like a reward after all of the tough work on all of those abstract concepts – like that Hamiltonian matrix indeed – so I hope you enjoy it. So… Here we go!

The geometry of the NH₃ molecule can be described by thinking of it as a trigonal pyramid, with the nitrogen atom (N) at its apex, and the three hydrogen atoms (H) at the base, as illustrated below. [Feynman’s illustration is slightly misleading, though, because it may give the impression that the hydrogen atoms are bonded together somehow. That’s not the case: the hydrogen atoms share their electron with the nitrogen, thereby completing the outer shell of both atoms. This is referred to as a covalent bond. You may want to look it up, but it is of no particular relevance to what follows here.]

Here, we will only worry about the spin of the molecule about its axis of symmetry, as shown above, which is either in one direction or in the other, obviously. So we’ll discuss the molecule as a two-state system. So we don’t care about its translational (i.e. linear) momentum, its internal vibrations, or whatever else that might be going on. It is one of those situations illustrating that the spin vector, i.e. the vector representing angular momentum, is an axial vector: the first state, which is denoted by | 1 〉 is not the mirror image of state | 2 〉. In fact, there is a more sophisticated version of the illustration above, which usefully reminds us of the physics involved.

It should be noted, however, that we don’t need to specify what the energy barrier really consists of: moving the center of mass obviously requires some energy, but it is likely that a ‘flip’ also involves overcoming some electrostatic forces, as shown by the reversal of the electric dipole moment in the illustration above. In fact, the illustration may confuse you, because we’re usually thinking about some net electric charge that’s spinning, and so the angular momentum results in a magnetic dipole moment, that’s either ‘up’ or ‘down’, and it’s usually also denoted by the very same μ symbol that’s used below. As I explained in my post on angular momentum and the magnetic moment, it’s related to the angular momentum J through the so-called g-number. In the illustration above, however, the μ symbol is used to denote an electric dipole moment, so that’s different. Don’t rack your brain over it: just accept there’s an energy barrier, and it requires energy to get through it. Don’t worry about its details!

Indeed, in quantum mechanics, we abstract away from such nitty-gritty, and so we just say that we have base states | i 〉 here, with i equal to 1 or 2. One or the other. Now, in our post on quantum math, we introduced what Feynman only half-jokingly refers to as the Great Law of Quantum Physics: | = ∑ | i 〉〈 i | over all base states i. It basically means that we should always describe our initial and end states in terms of base states. Applying that principle to the state of our ammonia molecule, which we’ll denote by | ψ 〉, we can write:

You may – in fact, you should – mechanically apply that | = ∑ | i 〉〈 i | substitution to | ψ 〉 to get what you get here, but you should also think about what you’re writing. It’s not an easy thing to interpret, but it may help you to think of the similarity of the formula above with the description of a vector in terms of its base vectors, which we write as A = A_x·e₁+ A_y·e₂ + A_z·e₃. Just substitute the A_icoefficients for C_i and the e_ibase vectors for the | i 〉 base states, and you may understand this formula somewhat better. It also explains why the | ψ 〉 state is often referred to as the | ψ 〉 state vector: unlike our A = ∑ A_i·e_isum of base vectors, our | 1 〉 C₁ + | 2 〉 C₂sum does not have any geometrical interpretation but… Well… Not all ‘vectors’ in math have a geometric interpretation, and so this is a case in point.

It may also help you to think of the time-dependency. Indeed, this formula makes a lot more sense when realizing that the state of our ammonia molecule, and those coefficients C_i, depend on time, so we write: ψ = ψ(t) and C_i= C_i(t). Hence, if we would know, for sure, that our molecule is always in state | 1 〉, then C₁ = 1 and C₂ = 0, and we’d write: | ψ 〉 = | 1 〉 = | 1 〉 1 + | 2 〉 0. [I am always tempted to insert a little dot (·), and change the order of the factors, so as to show we’re talking some kind of product indeed – so I am tempted to write | ψ 〉 = C₁·| 1 〉 C₁ + C₂·| 2 〉 C₂, but I note that’s not done conventionally, so I won’t do it either.]

Why this time dependency? It’s because we’ll allow for the possibility of the nitrogen to push its way through the pyramid – through the three hydrogens, really – and flip to the other side. It’s unlikely, because it requires a lot of energy to get half-way through (we’ve got what we referred to as an energy barrier here), but it may happen and, as we’ll see shortly, it results in us having to think of the the ammonia molecule as having two separate energy levels, rather than just one. We’ll denote those energy levels as E₀ ± A. However, I am getting ahead of myself here, so let me get back to the main story.

To fully understand the story, you should really read my previous post on the Hamiltonian, which explains how those C_i coefficients, as a function of time, can be determined. They’re determined by a set of differential equations (i.e. equations involving a function and the derivative of that function) which we wrote as:

If we have two base states only – which is the case here – then this set of equations is:

Two equations and two functions – C₁= C₁(t) and C₂= C₂(t) – so we should be able to solve this thing, right? Well… No. We don’t know those H_ijcoefficients. As I explained in my previous post, they also evolve in time, so we should write them as H_ij(t) instead of H_ijtout court, and so it messes the whole thing up. We have two equations and six functions really. There is no way we can solve this! So how do we get out of this mess?

Well… By trial and error, I guess. 🙂 Let us just assume the molecule would behave nicely—which we know it doesn’t, but so let’s push the ‘classical’ analysis as far as we can, so we might get some clues as to how to solve this problem. In fact, our analysis isn’t ‘classical’ at all, because we’re still talking amplitudes here! However, you’ll agree the ‘simple’ solution would be that our ammonia molecule doesn’t ‘tunnel’. It just stays in the same spin direction forever. Then H₁₂and H₂₁must be zero (think of the U₁₂(t + Δt, t) and U₂₁(t + Δt, t) functions) and H₁₁and H₂₂are equal to… Well… I’d love to say they’re equal to 1 but… Well… You should go through my previous posts: these Hamiltonian coefficients are related to probabilities but… Well… Same-same but different, as they say in Asia. 🙂 They’re amplitudes, which are things you use to calculate probabilities. But calculating probabilities involve normalization and other stuff, like allowing for interference of amplitudes, and so… Well… To make a long story short, if our ammonia molecule would stay in the same spin direction forever, then H₁₁and H₂₂are not one but some constant. In any case, the point is that they would not change in time (so H₁₁(t) = H₁₁ and H₂₂(t ) = H₂₂), and, therefore, our two equations would reduce to:

So the coefficients are now proper coefficients, in the sense that they’ve got some definite value, and so we have two equations and two functions only now, and so we can solve this. Indeed, remembering all of the stuff we wrote on the magic of exponential functions (more in particular, remembering that d[e^x]/dx), we can understand the proposed solution:

As Feynman notes: “These are just the amplitudes for stationary states with the energies E₁= H₁₁ and E₂= H₂₂.” Now let’s think about that. Indeed, I find the term ‘stationary’ state quite confusing, as it’s ill-defined. In this context, it basically means that we have a wavefunction that is determined by (i) a definite (i.e. unambiguous, or precise) energy level and (ii) that there is no spatial variation. Let me refer you to my post on the basics of quantum math here. We often use a sort of ‘Platonic’ example of the wavefunction indeed:

a·e^−i·θ= e^{−i·(ω·t − k ∙x)} = a·e^{−(i/ħ)·(E·t − p∙x)}

So that’s a wavefunction assuming the particle we’re looking at has some well-defined energy E and some equally well-defined momentum p. Now, that’s kind of ‘Platonic’ indeed, because it’s more like an idea, rather than something real. Indeed, a wavefunction like that means that the particle is everywhere and nowhere, really—because its wavefunction is spread out all of over space. Of course, we may think of the ‘space’ as some kind of confined space, like a box, and then we can think of this particle as being ‘somewhere’ in that box, and then we look at the temporal variation of this function only – which is what we’re doing now: we don’t consider the space variable x at all. So then the equation reduces to a·e^{–(i/ħ)·(E·t)}, and so… Well… Yes. We do find that our Hamiltonian coefficient H_iiis like the energy of the | i 〉 state of our NH₃ molecule, so we write: H₁₁= E₁, and H₂₂= E₂, and the ‘wavefunctions’ of our C₁and C₂coefficients can be written as:

C₁= a·e^{−(i/ħ)·(H₁₁·t)}= a·e^{−(i/ħ)·(E₁·t)}, with H₁₁= E₁, and
C₂= a·e^{−(i/ħ)·(H₂₂·t)}= a·e^{−(i/ħ)·(E₂·t)}, with H₂₂= E₂.

But can we interpret C₁and C₂as proper amplitudes? They are just coefficients in these equations, aren’t they? Well… Yes and no. From what we wrote in previous posts, you should remember that these C_icoefficients are equal to 〈 i | ψ 〉, so they are the amplitude to find our ammonia molecule in one state or the other.

Back to Feynman now. He adds, logically but brilliantly:

“We note, however, that for the ammonia molecule the two states |1〉 and |2〉 have a definite symmetry. If nature is at all reasonable, the matrix elements H₁₁ and H₂₂ must be equal. We’ll call them both E₀, because they correspond to the energy the states would have if H₁₁ and H₂₂ were zero.”

So our C₁and C₂amplitudes then reduce to:

C₁= 〈 1 | ψ 〉 = a·e^{−(i/ħ)·(E₀·t)}
C₂=〈 2 | ψ 〉 = a·e^{−(i/ħ)·(E₀·t)}

We can now take the absolute square of both to find the probability for the molecule to be in state 1 or in state 2:

|〈 1 | ψ 〉|²= |a·e^{−(i/ħ)·(E₀·t)}|²= a²
|〈 2 | ψ 〉|²= |a·e^{−(i/ħ)·(E₀·t)}|²= a²

Now, the probabilities have to add up to 1, so a²+ a²= 1 and, therefore, the probability to be in either in state 1 or state 2 is 0.5, which is what we’d expect.

Note: At this point, it is probably good to get back to our | ψ 〉 = | 1 〉 C₁ + | 2 〉 C₂equation, so as to try to understand what it really says. Substituting the a·e^{−(i/ħ)·(E₀·t)} expression for C₁ and C₂yields:

| ψ 〉 = | 1 〉 a·e^{−(i/ħ)·(E₀·t)} + | 2 〉 a·e^{−(i/ħ)·(E₀·t)}= [| 1 〉 + | 2 〉] a·e^{−(i/ħ)·(E₀·t)}

Now, what is this saying, really? In our previous post, we explained this is an ‘open’ equation, so it actually doesn’t mean all that much: we need to ‘close’ or ‘complete’ it by adding a ‘bra’, i.e. a state like 〈 χ |, so we get a 〈 χ | ψ〉 type of amplitude that we can actually do something with. Now, in this case, our final 〈 χ | state is either 〈 1 | or 〈 2 |, so we write:

〈 1 | ψ 〉 = [〈 1 | 1 〉 + 〈 1 | 2 〉]·a·e^{−(i/ħ)·(E₀·t)}= [1 + 0]·a·e^{−(i/ħ)·(E₀·t)}· = a·e^{−(i/ħ)·(E₀·t)}
〈 2 | ψ 〉 = [〈 2 | 1 〉 + 〈 2 | 2 〉]·a·e^{−(i/ħ)·(E₀·t)}= [0 + 1]·a·e^{−(i/ħ)·(E₀·t)}· = a·e^{−(i/ħ)·(E₀·t)}

Note that I finally added the multiplication dot (·) because we’re talking proper amplitudes now and, therefore, we’ve got a proper product too: we multiply one complex number with another. We can now take the absolute square of both to find the probability for the molecule to be in state 1 or in state 2:

|〈 1 | ψ 〉|²= |a·e^{−(i/ħ)·(E₀·t)}|²= a²
|〈 2 | ψ 〉|²= |a·e^{−(i/ħ)·(E₀·t)}|²= a²

Unsurprisingly, we find the same thing: these probabilities have to add up to 1, so a²+ a²= 1 and, therefore, the probability to be in state 1 or state 2 is 0.5. So the notation and the logic behind makes perfect sense. But let me get back to the lesson now.

The point is: the true meaning of a ‘stationary’ state here, is that we have non-fluctuating probabilities. So they are and remain equal to some constant, i.e. 1/2 in this case. This implies that the state of the molecule does not change: there is no way to go from state 1 to state 2 and vice versa. Indeed, if we know the molecule is in state 1, it will stay in that state. [Think about what normalization of probabilities means when we’re looking at one state only.]

You should note that these non-varying probabilities are related to the fact that the amplitudes have a non-varying magnitude. The phase of these amplitudes varies in time, of course, but their magnitude is and remains a, always. The amplitude is not being ‘enveloped’ by another curve, so to speak.

OK. That should be clear enough. Sorry I spent so much time on this, but this stuff on ‘stationary’ states comes back again and again and so I just wanted to clear that up as much as I can. Let’s get back to the story.

So we know that, what we’re describing above, is not what ammonia does really. As Feynman puts it: “The equations [i.e. the C₁and C₂equations above] don’t tell us what what ammonia really does. It turns out that it is possible for the nitrogen to push its way through the three hydrogens and flip to the other side. It is quite difficult; to get half-way through requires a lot of energy. How can it get through if it hasn’t got enough energy? There is some amplitude that it will penetrate the energy barrier. It is possible in quantum mechanics to sneak quickly across a region which is illegal energetically. There is, therefore, some [small] amplitude that a molecule which starts in |1〉 will get to the state |2〉. The coefficients H₁₂ and H₂₁ are not really zero.”

He adds: “Again, by symmetry, they should both be the same—at least in magnitude. In fact, we already know that, in general, H_ijmust be equal to the complex conjugate of H_ji.”

His next step, then, is to interpreted as either a stroke of genius or, else, as unexplained. 🙂 He invokes the symmetry of the situation to boldly state that H₁₂is some real negative number, which he denotes as −A, which – because it’s a real number (so the imaginary part is zero) – must be equal to its complex conjugate H₂₁. So then Feynman does this fantastic jump in logic. First, he keeps using the E₀ value for H₁₁ and H₂₂, motivating that as follows: “If nature is at all reasonable, the matrix elements H₁₁ and H₂₂ must be equal, and we’ll call them both E₀, because they correspond to the energy the states would have if H₁₁ and H₂₂ were zero.” Second, he uses that minus A value for H₁₂and H₂₁. In short, the two equations and six functions are now reduced to:

Solving these equations is rather boring. Feynman does it as follows:

solution

Now, what does these equations actually mean? It depends on those a and b coefficients. Looking at the solutions, the most obvious question to ask is: what if a or b are zero? If b is zero, then the second terms in both equations is zero, and so C₁ and C₂ are exactly the same: two amplitudes with the same temporal frequency ω = (E₀− A)/ħ. If a is zero, then C₁ and C₂ are the same too, but with opposite sign: two amplitudes with the same temporal frequency ω = (E₀+ A)/ħ. Squaring them – in both cases (i.e. for a = 0 or b = 0) – yields, once again, an equal and constant probability for the spin of the ammonia molecule to in the ‘up’ or ‘down’ or ‘down’. To be precise, we We can now take the absolute square of both to find the probability for the molecule to be in state 1 or in state 2:

For b = 0: |〈 1 | ψ 〉|²= |(a/2)·e^{−(i/ħ)·(E₀− A)·t}|²= a²/4 = |〈 2 | ψ 〉|²
For a = 0: |〈 1 | ψ 〉|²=|(b/2)·e^{−(i/ħ)·(E₀+ A)·t}|²= b²/4 = |〈 2 | ψ 〉|²(the minus sign in front of b/2 is squared away)

So we get two stationary states now. Why two instead of one? Well… You need to use your imagination a bit here. They actually reflect each other: they’re the same as the one stationary state we found when assuming our nitrogen atom could not ‘flip’ from one position to the other. It’s just that the introduction of that possibility now results in a sort of ‘doublet’ of energy levels. But so we shouldn’t waste our time on this, as we want to analyze the general case, for which the probabilities to be in state 1 or state 2 do vary in time. So that’s when a and b are non-zero.

To analyze it all, we may want to start with equating t to zero. We then get:

This leads us to conclude that a = b = 1, so our equations for C₁(t) and C₂(t) can now be written as:

Remembering our rules for adding and subtracting complex conjugates (e^iθ + e^–iθ = 2cosθ and e^iθ − e^–iθ = 2sinθ), we can re-write this as:

Now these amplitudes are much more interesting. Their temporal variation is defined by E₀but, on top of that, we have an envelope here: the cos(A·t/ħ) and sin(A·t/ħ) factor respectively. So their magnitude is no longer time-independent: both the phase as well as the amplitude now vary with time. What’s going on here becomes quite obvious when calculating and plotting the associated probabilities, which are

|C₁(t)|²= cos²(A·t/ħ), and
|C₂(t)|²= sin²(A·t/ħ)

respectively (note that the absolute square of i is equal to 1, not −1). The graph of these functions is depicted below.

As Feynman puts it: “The probability sloshes back and forth.” Indeed, the way to think about this is that, if our ammonia molecule is in state 1, then it will not stay in that state. In fact, one can be sure the nitrogen atom is going to flip at some point in time, with the probabilities being defined by that fluctuating probability density function above. Indeed, as time goes by, the probability to be in state 2 increases, until it will effectively be in state 2. And then the cycle reverses.

Our | ψ 〉 = | 1 〉 C₁ + | 2 〉 C₂equation is a lot more interesting now, as we do have a proper mix of pure states now: we never really know in what state our molecule will be, as we have these ‘oscillating’ probabilities now, which we should interpret carefully.

The point to note is that the a = 0 and b = 0 solutions came with precise temporal frequencies: (E₀− A)/ħ and (E₀+ A)/ħ respectively, which correspond to two separate energy levels: E₀− A and E₀+ A respectively, with |A| = H₁₂= H₂₁. So everything is related to everything once again: allowing the nitrogen atom to push its way through the three hydrogens, so as to flip to the other side, thereby breaking the energy barrier, is equivalent to associating two energy levels to the ammonia molecule as a whole, thereby introducing some uncertainty, or indefiniteness as to its energy, and that, in turn, gives us the amplitudes and probabilities that we’ve just calculated.

Note that the probabilities “sloshing back and forth”, or “dumping into each other” – as Feynman puts it – is the result of the varying magnitudes of our amplitudes, going up and down and, therefore, their absolute square varies too.

So… Well… That’s it as an introduction to a two-state system. There’s more to come. Ammonia is used in the ammonia maser. Now that is something that’s interesting to analyze—both from a classical as well as from a quantum-mechanical perspective. Feynman devotes a full chapter to it, so I’d say… Well… Have a look. 🙂

Post scriptum: I must assume this analysis of the NH₃ molecule, with the nitrogen ‘flipping’ across the hydrogens, triggers a lot of questions, so let me try to answer some. Let me first insert the illustration once more, so you don’t have to scroll up:

The first thing that you should note is that the ‘flip’ involves a change in the center of mass position. So that requires energy, which is why we associate two different energy levels with the molecule: E₀+ A and E₀− A. However, as mentioned above, we don’t care about the nitty-gritty here: the energy barrier is likely to combine a number of factors, including electrostatic forces, as evidenced by the flip in the electric dipole moment, which is what the μ symbol here represents! Just note that the two energy levels are separated by an amount that’s equal to 2·A, rather than A and that, once again, it becomes obvious now why Feynman would prefer the Hamiltonian to be called the ‘energy matrix’, as its coefficients do represent specific energy levels, or differences between them! Now, that assumption yielded the following wavefunctions for C₁= 〈 1 | ψ 〉 and C₁= 〈 2 | ψ 〉:

C₁= 〈 1 | ψ 〉 = (1/2)·e^{−(i/ħ)·(E₀− A)·t}+ (1/2)·e^{−(i/ħ)·(E₀+ A)·t}
C₂= 〈 2 | ψ 〉 = (1/2)·e^{−(i/ħ)·(E₀− A)·t}– (1/2)·e^{−(i/ħ)·(E₀+ A)·t}

Now, writing things this way, rather than in terms of probabilities, makes it clear that the two base states of the molecule themselves are associated with two different energy levels, so it is not like one state has more energy than the other. It’s just that the possibility of going from one state to the other requires an uncertainty about the energy, which is reflected by the energy doublet E₀± A in the wavefunction of the base states. Now, if the wavefunction of the base states incorporates that energy doublet, then it is obvious that the state of the ammonia molecule, at any point in time, will also incorporate that energy doublet.

This triggers the following remark: what’s the uncertainty really? Is it an uncertainty in the energy, or is it an uncertainty in the wavefunction? I mean: we have a function relating the energy to a frequency. Introducing some uncertainty about the energy is mathematically equivalent to introducing uncertainty about the frequency. Think of it: two energy levels implies two frequencies, and vice versa. More in general, introducing n energy levels, or some continuous range of energy levels ΔE, amounts to saying that our wave function doesn’t have a specific frequency: it now has n frequencies, or a range of frequencies Δω = ΔE/ħ. Of course, the answer is: the uncertainty is in both, so it’s in the frequency and in the energy and both are related through the wavefunction. So… In a way, we’re chasing our own tail.

Having said that, the energy may be uncertain, but it is real. It’s there, as evidenced by the fact that the ammonia molecule behaves like an atomic oscillator: we can excite it in exactly the same way as we can excite an electron inside an atom, i.e. by shining light on it. The only difference is the photon energies: to cause a transition in an atom, we use photons in the optical or ultraviolet range, and they give us the same radiation back. To cause a transition in an ammonia molecule, we only need photons with energies in the microwave range. Here, I should quickly remind you of the frequencies and energies involved. visible light is radiation in the 400–800 terahertz range and, using the E = h·f equation, we can calculate the associated energies of a photon as 1.6 to 3.2 eV. Microwave radiation – as produced in your microwave oven – is typically in the range of 1 to 2.5 gigahertz, and the associated photon energy is 4 to 10 millionths of an eV. Having illustrated the difference in terms of the energies involved, I should add that masers and lasers are based on the same physical principle: LASER and MASER stand for Light/Micro-wave Amplification by Stimulated Emission of Radiation, respectively.

So… How shall I phrase this? There’s uncertainty, but the way we are modeling that uncertainty matters. So yes, the uncertainty in the frequency of our wavefunction and the uncertainty in the energy are mathematically equivalent, but the wavefunction has a meaning that goes much beyond that. [You may want to reflect on that yourself.]

Finally, another question you may have is why would Feynman take minus A (i.e. −A) for H₁₂ and H₂₁. Frankly, my first thought on this was that it should have something to do with the original equation for these Hamiltonian coefficients, which also has a minus sign: U_ij(t + Δt, t) = δ_ij + K_ij(t)·Δt = δ_ij − (i/ħ)·H_ij(t)·Δt. For i ≠ j, this reduces to:

U_ij(t + Δt, t) = + K_ij(t)·Δt = − (i/ħ)·H_ij(t)·Δt

However, the answer is: it really doesn’t matter. One could write: H₁₂ and H₂₁ = +A, and we’d find the same equations. We’d just switch the indices 1 and 2, and the coefficients a and b. But we get the same solutions. You can figure that out yourself. Have fun with it !

Oh ! And please do let me know if some of the stuff above would trigger other questions. I am not sure if I’ll be able to answer them, but I’ll surely try, and good question always help to ensure we sort of ‘get’ this stuff in a more intuitive way. Indeed, when everything is said and done, the goal of this blog is not simply re-produce stuff, but to truly ‘get’ it, as good as we can. 🙂

Quantum math: the Hamiltonian

Pre-script (dated 26 June 2020): I have come to the conclusion one does not need all this hocus-pocus to explain quantum-mechanical systems: classical physics will do. So no use to read this. Read my papers instead. 🙂

Original post:

After all of the ‘rules’ and ‘laws’ we’ve introduced in our previous post, you might think we’re done but, of course, we aren’t. Things change. As Feynman puts it: “One convenient, delightful ‘apparatus’ to consider is merely a wait of a few minutes; During the delay, various things could be going on—external forces applied or other shenanigans—so that something is happening. At the end of the delay, the amplitude to find the thing in some state χ is no longer exactly the same as it would have been without the delay.”

In short, the picture we presented in the previous posts was a static one. Time was frozen. In reality, time passes, and so we now need to look at how amplitudes change over time. That’s where the Hamiltonian kicks in. So let’s have a look at that now.

[If you happen to understand the Hamiltonian already, you may want to have a look at how we apply it to a real situation: we’ll explain the basics involving state transitions of the ammonia molecule, which are a prerequisite to understanding how a maser works, which is not unlike a laser. But that’s for later. First we need to get the basics.]

Using Dirac’s bra-ket notation, which we introduced in the previous posts, we can write the amplitude to find a ‘thing’ – i.e. a particle, for example, or some system, of particles or other things – in some state χ at the time t = t₂, when it was in some state φ state at the time t = t₁ as follows:

Don’t be scared of this thing. If you’re unfamiliar with the notation, just check out my previous posts: we’re just replacing A by U, and the only thing that we’ve modified is that the amplitudes to go from φ to χ now depend on t₁ and t₂. Of course, we’ll describe all states in terms of base states, so we have to choose some representation and expand this expression, so we write:

I’ve explained the point a couple of time already, but let me note it once more: in quantum physics, we always measure some (vector) quantity – like angular momentum, or spin – in some direction, let’s say the z-direction, or the x-direction, or whatever direction really. Now we can do that in classical mechanics too, of course, and then we find the component of that vector quantity (vector quantities are defined by their magnitude and, importantly, their direction). However, in classical mechanics, we know the components in the x-, y- and z-direction will unambiguously determine that vector quantity. In quantum physics, it doesn’t work that way. The magnitude is never all in one direction only, so we can always some of it in some other direction. (see my post on transformations, or on quantum math in general). So there is an ambiguity in quantum physics has no parallel in classical mechanics. So the concept of a component of a vector needs to be carefully interpreted. There’s nothing definite there, like in classical mechanics: all we have is amplitudes, and all we can do is calculate probabilities, i.e. expected values based on those amplitudes.

In any case, I can’t keep repeating this, so let me move on. In regard to that 〈 χ | U | φ 〉 expression, I should, perhaps, add a few remarks. First, why U instead of A? The answer: no special reason, but it’s true that the use of U reminds us of energy, like potential energy, for example. We might as well have used W. The point is: energy and momentum do appear in the argument of our wavefunctions, and so we might as well remind ourselves of that by choosing symbols like W or U here. Second, we may, of course, want to choose our time scale such that t₁ = 0. However, it’s fine to develop the more general case. Third, it’s probably good to remind ourselves we can think of matrices to model it all. More in particular, if we have three base states, say ‘plus‘, ‘zero, or ‘minus‘, and denoting 〈 i | φ 〉 and 〈 i | χ 〉 as C_i and D_i respectively (so 〈 χ | i 〉 = 〈 i | χ 〉* = D_i*), then we can re-write the expanded expression above as:

Fourth, you may have heard of the S-matrix, which is also known as the scattering matrix—which explains the S in front but it’s actually a more general thing. Feynman defines the S-matrix as the U(t₁, t₂) matrix for t₁→ −∞ and t₂→ +∞, so as some kind of limiting case of U. That’s true in the sense that the S-matrix is used to relate initial and final states, indeed. However, the relation between the S-matrix and the so-called evolution operators U is slightly more complex than he wants us to believe. I can’t say too much about this now, so I’ll just refer you to the Wikipedia article on that, as I have to move on.

The key to the analysis is to break things up once more. More in particular, one should appreciate that we could look at three successive points in time, t₁, t₂, t₃, and write U(t₁, t₃) as:

U(t₃, t₁) = U(t₃, t₂)·U(t₂, t₁)

It’s just like adding another apparatus in series, so it’s just like what did in our previous post, when we wrote:

So we just put a | bar between B and A and wrote it all out. That | bar is really like a factor 1 in multiplication but – let me caution you – you really need to watch the order of the various factors in your product, and read symbols in the right order, which is often from right to left, like in Hebrew or Arab, rather than from left to right. In that regard, you should note that we wrote U(t₃, t₁) rather than U(t₁, t₃): you need to keep your wits about you here! So as to make sure we can all appreciate that point, let me show you what that U(t₃, t₁) = U(t₃, t₂)·U(t₂, t₁) actually says by spelling it out if we have two base states only (like ‘up‘ or ‘down‘, which I’ll note as ‘+’ and ‘−’ again) :

So now you appreciate why we try to simplify our notation as much as we can! But let me get back to the lesson. To explain the Hamiltonian, which we need to describe how states change over time, Feynman embarks on a rather spectacular differential analysis. Now, we’ve done such exercises before, so don’t be too afraid. He substitutes t₁ for t tout court, and t₂for t + Δt, with Δt the infinitesimal you know from Δy = (dy/dx)·Δx, with the derivative dy/dx being defined as the Δy/Δx ratio for Δx → 0. So we write U(t₂, t₁) = U(t + Δt, t). Now, we also explained the idea of an operator in our previous post. It came up when we’re being creative, and so we dropped the 〈 χ | state from the 〈 χ | A | φ〉 expression and just wrote:

If you ‘get’ that, you’ll also understand what I am writing now:

This is quite abstract, however. It is an ‘open’ equation, really: one needs to ‘complete’ it with a ‘bra’, i.e. a state like 〈 χ |, so as to give a 〈 χ | ψ〉 = 〈 χ | A | φ〉 type of amplitude that actually means something. What we’re saying is that our operator (or our ‘apparatus’ if it helps you to think that way) does not mean all that much as long as we don’t measure what comes out, so we have to choose some set of base states, i.e. a representation, which allows us to describe the final state, which we write as 〈 χ |. In fact, what we’re interested in is the following amplitudes:

So now we’re in business, really. 🙂 If we can find those amplitudes, for each of our base states i, we know what’s going on. Of course, we’ll want to express our ψ(t) state in terms of our base states too, so the expression we should be thinking of is:

Phew! That looks rather unwieldy, doesn’t it? You’re right. It does. So let’s simplify. We can do the following substitutions:

〈 i | ψ(t + Δt)〉 = C_i(t + Δt) or, more generally, 〈 j | ψ(t)〉 = C_j(t)
〈 i | U(t₂, t₁) | j〉 = U_ij(t₂, t₁) or, more specifically, 〈 i | U(t + Δt, t) | j〉 = U_ij(t + Δt, t)

As Feynman notes, that’s how the dynamics of quantum mechanics really look like. But, of course, we do need something in terms of derivatives rather than in terms of differentials. That’s where the Δy = (dy/dx)·Δx equation comes in. The analysis looks kinda dicey because it’s like doing some kind of first-order linear approximation of things – rather than an exact kinda thing – but that’s how it is. Let me remind you of the following formula: if we write our function y as y = f(x), and we’re evaluating the function near some point a, then our Δy = (dy/dx)·Δx equation can be used to write:

y = f(x) ≈ f(a) + f'(a)·(x − a) = f(a) + (dy/dx)·Δx

To remind yourself of how this works, you can complete the drawing below with the actual y = f(x) as opposed to the f(a) + Δy approximation, remembering that the (dy/dx) derivative gives you the slope of the tangent to the curve, but it’s all kids’ stuff really and so we shouldn’t waste too much spacetime on this. 🙂

The point is: our U_ij(t + Δt, t) is a function too, not only of time, but also of i and j. It’s just a rather special function, because we know that, for Δt → 0, U_ijwill be equal to 1 if i = j (in plain language: if Δt → 0 goes to zero, nothing happens and we’re just in state i), and equal to 0 if i = j. That’s just as per the definition of our base states. Indeed, remember the first ‘rule’ of quantum math:

〈 i | j〉 = 〈 j | i〉 = δ_ij, with δ_ij= δ_jiis equal to 1 if i = j, and zero if i ≠ j

So we can write our f(x) ≈ f(a) + (dy/dx)·Δx expression for U_ijas:

So K_ij is also some kind of derivative and the Kronecker delta, i.e. δ_ij, serves as the reference point around which we’re evaluating U_ij. However, that’s about as far as the comparison goes. We need to remind ourselves that we’re talking complex-valued amplitudes here. In that regard, it’s probably also good to remind ourselves once more that we need to watch the order of stuff: U_ij = 〈 i | U | j〉, so that’s the amplitude to go from base state j to base state i, rather than the other way around. Of course, we have the 〈 χ | φ 〉 = 〈 φ | χ 〉* rule, but we still need to see how that plays out with an expression like 〈 i | U(t + Δt, t) | j〉. So, in short, we should be careful here!

Having said that, we can actually play a bit with that expression, and so that’s what we’re going to do now. The first thing we’ll do is to write K_ij as a function of time indeed:

K_ij = K_ij(t)

So we don’t have that Δt in the argument. It’s just like dy/dx = f'(x): a derivative is a derivative—a function which we derive from some other function. However, we’ll do something weird now: just like any function, we can multiply or divide it by some constant, so we can write something like G(x) = c·F(x), which is equivalent to saying that F(x) = G(x)/c. I know that sound silly but it is how is, and we can also do it with complex-valued functions: we can define some other function by multiplying or dividing by some complex-valued constant, like a + b·i, or ξ or whatever other constant. Just note we’re no longer talking the base state i but the imaginary unit i. So it’s all done so as to confuse you even more. 🙂

So let’s take −i/ħ as our constant and re-write our K_ij(t) function as −i/ħ times some other function, which we’ll denote by H_ij(t), so K_ij(t) = –(i/ħ)·H_ij(t). You guess it, of course: H_ij(t) is the infamous Hamiltonian, and it’s written the way it’s written both for historical as well as for practical reasons, which you’ll soon discover. Of course, we’re talking one coefficient only and we’ll have nine if we have three base states i and j, or four if we have only two. So we’ve got a n-by-n matrix once more. As for its name… Well… As Feynman notes: “How Hamilton, who worked in the 1830s, got his name on a quantum mechanical matrix is a tale of history. It would be much better called the energy matrix, for reasons that will become apparent as we work with it.”

OK. So we’ll just have to acknowledge that and move on. Our U_ij(t + Δt, t) = δ_ij + K_ij(t)·Δt expression becomes:

U_ij(t + Δt, t) = δ_ij –(i/ħ)·H_ij(t)·Δt

[Isn’t it great you actually start to understand those Chinese-looking formulas? :-)] We’re not there yet, however. In fact, we’ve still got quite a bit of ground to cover. We now need to take that other monster:

So let’s substitute now, so we get:

We can get this in the form we want to get – so that’s the form you’ll find in textbooks 🙂 – by noting that the ∑δ_ij·C_j(t) sum, taking over all j is, quite simply, equal to C_i(t). [Think about the indexes here: we’re looking at some i, and so it’s only the j that’s taking on whatever value it can possibly have.] So we can move that to the other side, which gives us C_i(t + Δt) – C_i(t). We can then divide both sides of our expression by Δt, which gives us an expression like [f(x + Δx) – f(x)]/Δx = Δy//Δx, which is actually the definition of the derivative for Δx going to zero. Now, that allows us to re-write the whole thing in terms of a proper derivative, rather than having to work with this rather unwieldy differential stuff. So, if we substitute [C_i(t + Δt) – C_i(t)]/Δx for d[C_i(t)]/dt, and then also move –(i/ħ) to the left-hand side, remembering that 1/i = –i (and, hence, [–(i/ħ)]⁻¹= i/ħ), we get the formula in the shape we wanted it in:

Done ! Of course, this is a set of differential equations and… Well… Yes. Yet another set of differential equations. 🙂 It seems like we can’t solve anything without involving differential equations in physics, isn’t it? But… Well… I guess that’s the way it is. So, before we turn to some example, let’s note a few things.

First, we know that a particle, or a system, must be in some state at any point of time. That’s equivalent to stating that the sum of the probabilities |C_i(t)|²= |〈 i | ψ(t)〉|²is some constant. In fact, we’d like to say it’s equal to one, but then we haven’t normalized anything here. You can fiddle with the formulas but it’s probably easier to just acknowledge that, if we’d measure anything – think of the angular momentum along the z-direction, or some other direction, if you’d want an example – then we’ll find it’s either ‘up’ or ‘down’ for a spin-1/2 particle, or ‘plus’, ‘zero’, or ‘minus’ for a spin-1 particle.

Now, we know that the complex conjugate of a sum is equal to the sum of the complex conjugates: [∑ z_i]* = ∑ z_i*, and that the complex conjugate of a product is the product of the complex conjugates, so we have [∑ z_iz_j]* = ∑ z_i*z_j*. Now, some fiddling with the formulas above should allow you to prove that H_ij= H_ij*, and the associated matrix is usually referred to as the Hermitian or conjugate transpose. If if the original Hamiltonian matrix is denoted as H, then its conjugate transpose will be denoted by H*, H^† or even H^H(so the H in the superscript stands for Hermitian, instead of Hamiltonean). So… Yes. There’s competing notations around. 🙂

The simplest situation, of course, is when the Hamiltonian do not depend on time. In that case, we’re back in the static case, and all H_ijcoefficients are just constants. For a system with two base states, we’d have the following set of equations:

This set of two equations can be easily solved by remembering the solution for one equation only. Indeed, if we assume there’s only base state – which is like saying: the particle is at rest somewhere (yes: it’s that stupid!) – our set of equations reduces to only one:

This is a differential equation which is easily solved to give:

[As for being ‘easily solved’, just remember the exponential function is its own derivative and, therefore, d[a·e^{–(i/ħ)H_ijt}]/dt = a·d[e^{–(i/ħ)H_ijt}]/dt = –a·(i/ħ)·H_ij·e^{–(i/ħ)H_ijt}, which gives you the differential equation, so… Well… That’s the solution.]

This should, of course, remind you of the equation that inspired Louis de Broglie to write down his now famous matter-wave equation (see my post on the basics of quantum math):

a·e^−i·θ= e^{−i·(ω·t − k ∙x)} = a·e^{−(i/ħ)·(E·t − p∙x)}

Indeed, if we look at the temporal variation of this function only – so we don’t consider the space variable x – then this equation reduces to a·e^{–(i/ħ)·(E·t)}, and so find that our Hamiltonian coefficient H₁₁is equal to the energy of our particle, so we write: H₁₁= E, which, of course, explains why Feynman thinks the Hamiltonian matrix should be referred to as the energy matrix. As he puts it: “The Hamiltonian is the generalization of the energy for more complex situations.”

Now, I’ll conclude this post by giving you the answer to Feynman’s remark on why the Irish 19th century mathematician William Rowan Hamilton should be associated with the Hamiltonian. The truth is: the term ‘Hamiltonian matrix’ may also refer to a more general notion. Let me copy Wikipedia here: “In mathematics, a Hamiltonian matrix is a $2 n$ -by- $2 n$ matrix $A$ such that $JA$ is symmetric, where $J$ is the skew-symmetric matrix

J= \begin{bmatrix} 0 & I_n \\ -I_n & 0 \\ \end{bmatrix}

and $I n$ is the $n$ -by- $n$ identity matrix. In other words, $A$ is Hamiltonian if and only if $(JA) T = JA$ where $() T$ denotes the transpose. So… That’s the answer. 🙂 And there’s another reason too: Hamilton invented the quaternions and… Well… I’ll leave it to you to check out what these have got to do with quantum physics. 🙂

[…] Oh ! And what about the maser example? Well… I am a bit tired now, so I’ll just refer you to Feynman’s exposé on it. It’s not that difficult if you understood all of the above. In fact, it’s actually quite straightforward, and so I really recommend you work your way through the example, as it will give you a much better ‘feel’ for the quantum-mechanical framework we’ve developed so far. In fact, walking through the whole thing is like a kind of ‘reward’ for having worked so hard on the more abstract stuff in this and my previous posts. So… Yes. Just go for it! 🙂 [And, just in case you don’t want to go for it, I did write a little introduction to in the following post. :-)]

Quantum math: states as vectors, and apparatuses as operators

Pre-script (dated 26 June 2020): Our ideas have evolved into a full-blown realistic (or classical) interpretation of all things quantum-mechanical. In addition, I note the dark force has amused himself by removing some material. So no use to read this. Read my recent papers instead. 🙂

Original post:

I actually wanted to write about the Hamiltonian matrix. However, I realize that, before I can serve the plat de résistance, we need to review or introduce some more concepts and ideas. It all revolves around the same theme: working with states is like working with vectors, but so you need to know how exactly. Let’s go for it. 🙂

In my previous posts, I repeatedly said that a set of base states is like a coordinate system. A coordinate system allows us to describe (i.e. uniquely identify) vectors in an n-dimensional space: we associate a vector with a set of real numbers, like x, y and z, for example. Likewise, we can describe any state in terms of a set of complex numbers – amplitudes, really – once we’ve chosen a set of base states. We referred to this set of base states as a ‘representation’. For example, if our set of base states is +S, 0S and −S, then any state φ can be defined by the amplitudes C₊ = 〈 +S | φ 〉, C₀ = 〈 0S | φ 〉, and C₋ = 〈 −S | φ 〉.

We have to choose some representation (but we are free to choose which one) because, as I demonstrated when doing a practical example (see my description of muon decay in my post on how to work with amplitudes), we’ll usually want to calculate something like the amplitude to go from one state to another – which we denoted as 〈 χ | φ 〉 – and we’ll do that by breaking it up. To be precise, we’ll write that amplitude 〈 χ | φ 〉 – i.e. the amplitude to go from state φ to state χ (you have to read this thing from right to left, like Hebrew or Arab) – as the following sum:

So that’s a sum over a complete set of base states (that’s why I write all i under the summation symbol ∑). We discussed this rule in our presentation of the ‘Laws’ of quantum math.

Now we can play with this. As χ can be defined in terms of the chosen set of base states too, it’s handy to know that 〈 χ | i 〉 and 〈 i | χ 〉 are each other’s complex conjugates – we write this as: 〈 χ | i 〉 = 〈 i | χ 〉* – so if we have one, we have the other (we can also write: 〈 i | χ 〉* = 〈 χ | i 〉). In other words, if we have all C_i = 〈 i | φ 〉 and all D_i = 〈 i | χ 〉, i.e. the ‘components’ of both states in terms of our base states, then we can calculate 〈 χ | φ 〉 as:

〈 χ | φ 〉 = ∑ D_i*C_i = ∑〈 χ | i 〉〈 i | φ 〉,

provided we make sure we do the summation over a complete set of base states. For example, if we’re looking at the angular momentum of a spin-1/2 particle, like an electron or a proton, then we’ll have two base states, +ħ/2 and +ħ/2, so then we’ll have only two terms in our sum, but the spin number (j) of a cobalt nucleus is 7/2, so if we’d be looking at the angular momentum of a cobalt nucleus, we’ll have eight (2·j + 1) base states and, hence, eight terms when doing the sum. So it’s very much like working with vectors, indeed, and that’s why states are often referred to as state vectors. So now you know that term too. 🙂

However, the similarities run even deeper, and we’ll explore all of them in this post. You may or may not remember that your math teacher actually also defined ordinary vectors in three-dimensional space in terms of base vectors e_i, defined as: e₁= [1, 0, 0], e₂= [0, 1, 0] and e₂= [0, 0, 1]. You may also remember that the units along the x, y and z-axis didn’t have to be the same – we could, for example, measure in cm along the x-axis, but in inches along the z-axis, even if that’s not very convenient to calculate stuff – but that it was very important to ensure that the base vectors were a set of orthogonal vectors. In any case, we’d chose our set of orthogonal base vectors and write all of our vectors as:

A = A_x·e₁ + A_y·e₂+ A_z·e₃

That’s simple enough. In fact, one might say that the equation above actually defines coordinates. However, there’s another way of defining them. We can write A_x, A_y, and A_z as vector dot products, aka scalar vector products (as opposed to cross products, or vector products tout court). Check it:

A_x= A·e₁, A_y= A·e₂, and A_z= A·e₃.

This actually allows us to re-write the vector dot product A·B in a way you’ve probably haven’t seen before. Indeed, you’d usually calculate A·B as |A|∙|B|·cosθ = A∙B·cosθ (A and B is the magnitude of the vectors A and B respectively) or, quite simply, as A_xB_x+ A_yB_y+ A_zB_z. However, using the dot products above, we can now also write it as:

We deliberately wrote B·A instead of A∙B because, while the mathematical similarity with the

〈 χ | φ 〉 = ∑〈 χ | i 〉〈 i | φ 〉

equation is obvious, B·A = A·B but 〈 χ | φ 〉 ≠ 〈 φ | χ 〉. Indeed, 〈 χ | φ 〉 and 〈 φ | χ 〉 are complex conjugates – so 〈 χ | φ 〉 = 〈 φ | χ 〉* – but they’re not equal. So we’ll have to watch the order when working with those amplitudes. That’s because we’re working with complex numbers instead of real numbers. Indeed, it’s only because the A·B dot product involves real numbers, whose complex conjugate is the same, that we have that commutativity in the real vector space. Apart from that – so apart from having to carefully check the order of our products – the correspondence is complete.

Let me mention another similarity here. As mentioned above, our base vectors e_i had to be orthogonal. We can write this condition as:

e_i·e_j = δ_ij, with δ_ij= 0 if i ≠ j, and 1 if i = j.

Now, our first quantum-mechanical rule says the same:

〈 i | j 〉 = δ_ij, with δ_ij= 0 if i ≠ j, and 1 if i = j.

So our set of base states also has to be ‘orthogonal’, which is the term you’ll find in physics textbooks, although – as evidenced from our discussion on the base states for measuring angular momentum – one should not try to give any geometrical interpretation here: +ħ/2 and +ħ/2 (so that’s spin ‘up’ and ‘down’ respectively) are not ‘orthogonal’ in any geometric sense, indeed. It’s just that pure states, i.e. base states, are separate, which we write as: 〈 ‘up’ | ‘down’ 〉 = 〈 ‘down’ | ‘up’ 〉 = 0 and 〈 ‘up’ | ‘up’ 〉 = 〈 ‘down’ | ‘down’ 〉 = 1. It just means they are just different base states, and so it’s one or the other. For our +S, 0S and −S example, we’d have nine such amplitudes, and we can organize them in a little matrix:

In fact, just like we defined the base vectors e_i as e₁= [1, 0, 0], e₂= [0, 1, 0] and e₂= [0, 0, 1] respectively, we may say that the matrix above, which states exactly the same as the 〈 i | j 〉 = δ_ij rule, can serve as a definition of what base states actually are. [Having said that, it’s obvious we like to believe that base states are more than just mathematical constructs: we’re talking reality here. The angular momentum as measured in the x-, y- or z-direction, or in whatever direction, is more than just a number.]

OK. You get this. In fact, you’re probably getting impatient because this is too simple for you. So let’s take another step. We showed that the 〈 χ | φ 〉 = ∑〈 χ | i 〉〈 i | χ 〉 and B·A = ∑(B·e_i)(e_i·A) are structurally equivalent – from a mathematical point of view, that is – but B and A are separate vectors, while 〈 χ | φ 〉 is just a complex number. Right?

Well… No. We can actually analyze the bra and the ket in the 〈 χ | φ 〉 bra-ket as separate pieces too. Moreover, we’ll show they are actually state vectors too, even if the bra, i.e. 〈 χ |, and the ket, i.e. | φ 〉, are ‘unfinished pieces’, so to speak. Let’s be bold. Let’s just cut the 〈 χ | φ 〉 = ∑〈 χ | i 〉〈 i | χ 〉 by writing:

Huh?

Yes. That’s the power of Dirac’s bra-ket notation: we can just drop symbols left or right. It’s quite incredible. But, of course, the question is: so what does this actually mean? Well… Don’t rack your brain. I’ll tell you. We define | φ 〉 as a state vector because we define | i 〉 as a (base) state vector. Look at it this way: we wrote the 〈 +S | φ 〉, 〈 0S | φ 〉 and 〈 −S | φ 〉 amplitudes as C₊, C₀, C₋, respectively, so we can write the equation above as:

So we’ve got a sum of products here, and it’s just like A = A_x·e₁+ A_y·e₂ + A_z·e₃. Just substitute the A_icoefficients for C_i and the e_ibase vectors for the | i 〉 base states. We get:

| φ 〉 = |+S〉 C₊ + |0S〉 C₀+ |+S〉 C₋

Of course, you’ll wonder what those terms mean: what does it mean to ‘multiply’ C₊ (remember: C₊ is some complex number) by |+S〉? Be patient. Just wait. You’ll understand when we do some examples, so when you start working with this stuff. You’ll see it all makes sense—later. 🙂

Of course, we’ll have a similar equation for | χ 〉, and so if we write 〈 χ | i 〉 as D_i, then we can write | χ 〉 = ∑ | i 〉〈 χ | i 〉 as | χ 〉 = ∑ | i 〉 D_i.

So what? Again: be patient. We know that 〈 χ | i 〉 = 〈 i | χ 〉*, so our second equation above becomes:

You’ll have two questions now. The first is the same as the one above: what does it mean to ‘multiply’, let’s say, D₀* (i.e. the complex conjugate of D₀, so if D₀= a + ib, then D₀* = a − ib) with 〈0S|? The answer is the same: be patient. 🙂 Your second question is: why do I use another symbol for the index here? Why j instead of i? Well… We’ll have to re-combine stuff, so it’s better to keep things separate by using another symbol for the same index. 🙂

In fact, let’s re-combine stuff right now, in exactly the same way as we took it apart: we just write the two things right next to each other. We get the following:

What? Is that it? So we went through all of this hocus-pocus just to find the same equation as we started out with?

Yes. I had to take you through this so you get used to juggling all those symbols, because that’s what we’ll do in the next post. Just think about it and give yourself some time. I know you’ve probably never ever handled such exercise in symbols before – I haven’t, for sure! – but it all makes sense: we cut and paste. It’s all great! 🙂 [Oh… In case you wonder about the transition from the sum involving i and j to the sum involving i only, think about the Kronecker expression: 〈 j | i 〉 = δ_ij, with δ_ij= 0 if i ≠ j, and 1 if i = j, so most of the terms are zero.]

To summarize the whole discussion, note that the expression above is completely analogous with the B·A = B_xA_x+ B_yA_y+ B_zA_zformula. The only difference is that we’re talking complex numbers here, so we need to watch out. We have to watch the order of stuff, and we can’t use the D_inumbers themselves: we have to use their complex conjugates D_i*. But, for the rest, we’re all set! 🙂 If we’ve got a set of base states, then we can define any state in terms of a set of ‘coordinates’ or ‘coefficients’ – i.e. the C_i or D_i numbers for the φ or χ example above – and we can then calculate the amplitude to go from one state to another as:

In case you’d get confused, just take the original equation:

The two equations are fully equivalent.

[…]

So we just went through all of the shit above so as to show that structural similarity with vector spaces?

Yes. It’s important. You just need to remember that we may have two, three, four, five,… or even an infinite number of base states depending on the situation we’re looking at, and what we’re trying to measure. I am sorry I had to take you through all of this. However, there’s more to come, and so you need this baggage. We’ll take the next step now, and that is to introduce the concept of an operator.

Look at the middle term in that expression above—let me copy it:

We’ve got three terms in that double sum (a double sum is a sum involving two indices, which is what we have here: i and j). When we have two indices like that, one thinks of matrices. That’s easy to do here, because we represented that 〈 i | j 〉 = δ_ij equation as a matrix too! To be precise, we presented it as the identity matrix, and a simple substitution allows us to re-write our equation above as:

I must assume you’re shaking your head in disbelief now: we’ve expanded a simple amplitude into a product of three matrices now. Couldn’t we just stick to that sum, i.e that vector dot product ∑ D_i*C_i? What’s next? Well… I am afraid there’s a lot more to come. For starters, we’ll take that idea of ‘putting something in the middle’ to the next level by going back to our Stern-Gerlach filters and whatever other apparatus we can think of. Let’s assume that, instead of some filter S or T, we’ve got something more complex now, which we’ll denote by A. [Don’t confuse it with our vectors: we’re talking an apparatus now, so you should imagine some beam of particles, polarized or not, entering it, going through, and coming out.]

We’ll stick to the symbols we used already, and so we’ll just assume a particle enters into the apparatus in some state φ, and that it comes out in some state χ. Continuing the example of spin-one particles, and assuming our beam has not been filtered – so, using lingo, we’d say it’s unpolarized – we’d say there’s a probability of 1/3 for being either in the ‘plus’, ‘zero’, or ‘minus’ state with respect to whatever representation we’d happen to be working with, and the related amplitudes would be 1/√3. In other words, we’d say that φ is defined by C₊ = 〈 +S | φ 〉, C₀ = 〈 0S | φ 〉, and C₋ = 〈 −S | φ 〉, with C₊ = C₀ = C₋= 1/√3. In fact, using that | φ 〉 = |+S〉 C₊ + |0S〉 C₀+ |+S〉 C₋expression we invented above, we’d write: | φ 〉 = (1/√3)|+S〉 + (1/√3)|0S〉 C₀+ (1/√3)|+S〉 C₋ or, using ‘matrices’—just a row and a column, really:

However, you don’t need to worry about that now. The new big thing is the following expression:

〈 χ | A | φ〉

It looks simple enough: φ to A to χ. Right? Well… Yes and no. The question is: what do you do with this? How would we take its complex conjugate, for example? And if we know how to do that, would it be equal to 〈 φ | A | χ〉?

You guessed it: we’ll have to take it apart, but how? We’ll do this using another fantastic abstraction. Remember how we took Dirac’s 〈 χ | φ 〉 bra-ket apart by writing | φ 〉 = ∑ | i 〉〈 i | φ 〉? We just dropped the 〈 χ left and right in our 〈 χ | φ 〉 = ∑〈 χ | i 〉〈 i | φ 〉 expression. We can go one step further now, and drop the φ 〉 left and right in our | φ 〉 = ∑ | i 〉〈 i | φ 〉 expression. We get the following wonderful thing:

| = ∑ | i 〉〈 i | over all base states i

With characteristic humor, Feynman calls this ‘The Great Law of Quantum Mechanics’ and, frankly, there’s actually more than one grain of truth in this. 🙂

Now, if we apply this ‘Great Law’ to our 〈 χ | A | φ〉 expression – we should apply it twice, actually – we get:

As Feynman points out, it’s easy to add another apparatus in series. We just write:

Just put a | bar between B and A and apply the same trick. The | bar is really like a factor 1 in multiplication. However, that’s all great fun but it doesn’t solve our problem. Our ‘Great Law’ allows us to sort of ‘resolve’ our apparatus A in terms of base states, as we now have 〈 i | A | j 〉 in the middle, rather than 〈 χ | A | φ〉 but, again, how do we work with that?

Well… The answer will surprise you. Rather than trying to break this thing up, we’ll say that the apparatus A is actually being described, or defined, by the nine 〈 i | A | j 〉 amplitudes. [There are nine for this example, but four only for the example involving spin-1/2 particles, of course.] We’ll call those amplitudes, quite simply, the matrix of amplitudes, and we’ll often denote it by A_ij.

Now, I wanted to talk about operators here. The idea of an operator comes up when we’re creative again, and when we drop the 〈 χ | state from the 〈 χ | A | φ〉 expression. We write:

So now we think of the particle entering the ‘apparatus’ A in the state ϕ and coming out of A in some state ψ (‘psi’). We can generalize this and think of it as an ‘operator’, which Feynman intuitively defines as follows:

The symbol A is neither an amplitude, nor a vector; it is a new kind of thing called an operator. It is something which “operates on” a state to produce a new state.”

But… Wait a minute! | ψ 〉 is not the same as 〈 χ |. Why can we do that substitution? We can only do it because any state ψ and χ are related through that other ‘Law’ of quantum math:

Combining the two shows our ‘definition’ of an operator is OK. We should just note that it’s an ‘open’ equation until it is completed with a ‘bra’, i.e. a state like 〈 χ |, so as to give the 〈 χ | ψ〉 = 〈 χ | A | φ〉 type of amplitude that actually means something. In practical terms, that means our operator or our apparatus doesn’t mean much as long as we don’t measure what comes out, so then we choose some set of base states, i.e. a representation, which allows us to describe the final state, i.e. 〈 χ |.

[…]

Well… Folks, that’s it. I know this was mighty abstract, but the next posts should bring things back to earth again. I realize it’s only by working examples and doing exercises that one can get some kind of ‘feel’ for this kind of stuff, so that’s what we’ll have to go through now. 🙂

Quantum math: transformations

Original post:

We’ve come a very long way. Now we’re ready for the Big Stuff. We’ll look at the rules for transforming amplitudes from one ‘base’ to ‘another’. [In quantum mechanics, however, we’ll talk about a ‘representation’, rather than a ‘base’, as we’ll reserve the latter term for a ‘base’ state.] In addition, we’ll look at how physicists model how amplitudes evolve over time using the so-called Hamiltonian matrix. So let’s go for it.

Transformations: how should we think about them?

In my previous post, I presented the following hypothetical set-up: we have an S-filter and a T-filter in series, but the T-filter at the angle α with respect to the first. In case you forgot: these ‘filters’ are modified Stern-Gerlach apparatuses, designed to split a particle beam according to the angular momentum in the direction of the gradient of the magnetic field, in which we may place masks to filter out one or more states.

The idea is illustrated in the hypothetical example below. The unpolarized beam goes through S, but we have masks blocking all particles with zero or negative spin in the z-direction, i.e. with respect to S. Hence, all particles entering the T-filter are in the +S state. Now, we assume the set-up of the T-filter is such that it filters out all particles with positive or negative spin. Hence, only particles with zero spin go through. So we’ve got something like this:

However, we need to be careful as what we are saying here. The T-apparatus is tilted, so the gradient of the magnetic field is different. To be precise, it’s got the same tilt as the T-filter itself (α). Hence, it will be filtering out all particles with positive or negative spin with respect to T. So, unlike what you might think at first, some fraction of the particles in the +S state will get through the T-filter, and come out in the 0T state. In fact, we know how many, because we have formulas for situations like this. To be precise, in this case, we should apply the following formula:

〈 0T | +S 〉 = −(1/√2)·sinα

This is a real-valued amplitude. As usual, we get the following probability by taking the absolute square, so P = |−(1/√2)·sinα|²= (1/2)·sin²α, which gives us the following graph of P:

The probability varies between 0 (for α = 0 or π) and 1/2 = 0.5 (for α = π/2 or 3π/2). Now, this graph may or may not make sense to you, so you should think about it. You’ll admit it makes sense to find P = 0 for α = 0, but what about the non-zero values?

Think about what this would mean in classical terms: we’ve got a beam of particles whose angular momentum is ‘up’ in the z-direction. To be precise, this means that J_z = +ħ. [Angular momentum and the quantum of action have the same dimension: the joule·second.] So that’s the maximum value out of the three permitted values, which are +ħ, 0 and –ħ. Note that the particles here must be bosons. So you may think we’re talking photons, in practice but… Well… No. As I’ll explain in a later post, the photon is a spin-one particle but it’s quite particular, because it has no ‘zero spin’-state. Don’t worry about it here – but it’s really quite remarkable. So, instead of thinking of a photon, you should think of some composite matter particle obeying Bose-Einstein statistics. These are not so rare as you may think: all matter-particles that contain an even number of fermions – like elementary particles – have integer spin – but… Well… Their spin number is usually zero – not one. So… Well… Feynman’s particle here is somewhat theoretical – but it doesn’t matter. Let’s move on. 🙂

Let’s look at another transformation formula. More in particular, let’s look at the formula we (should) get for 〈 0T | −S 〉 as a function of α. So we change the set-up of the S-filter to ensure all particles entering T have negative spin. The formula is:

〈 0T | −S 〉 = +(1/√2)·sinα

That gives the same probabilities: |+(1/√2)·sinα|²= (1/2)·sin²α. Adding |〈 0T | +S 〉|² and |〈 0T | −S 〉|²gives us a total probability equal to sin²α, which is equal to 1 if α = π/2 or 3π/2. We may be tempted to interpret this as follows: if a particle is in the +S or −S state before entering the T-apparatus, and the T-apparatus is tilted at an angle α = π/2 or 3π/2 with respect to the S-apparatus, then this particle will come out of the T-apparatus in the 0T-state. No ambiguity here: P = 1.

Is this strange? Well… Let’s think about what it means to tilt the T-apparatus. You’ll have to admit that, if the apparatus is tilted at the angle π/2 or 3π/2, it’s going to measure the angular momentum in the x-direction. [The y-direction is the common axis of both apparatuses here.] So… Well… It’s pretty plausible, isn’t it? If all of the angular momentum is in the positive or negative z-direction, then it’s not going to have any angular momentum in the x-direction, right? And not having any angular momentum in the x-direction effectively corresponds to being in the 0T-state, right?

Oh ! Is it that easy?

Well… No! Not at all! The reasoning above shows how easy it is to be led astray. We forgot to normalize. Remember, if we integrate the probability density function over its domain, i.e. α ∈ [0, 2π], then we have to get one, as all probabilities have to add up to one. The definite integral of (1/2)·sin²α over [0, 2π] is equal to π/2 (the definite integral of the sine or cosine squared over a full cycle is equal to π), so we need to multiply this function by 2/π to get the actual probability density function, i.e. (1/π)·sin²α. It’s got the same shape, obviously, but it gives us maximum probabilities equal to 1/π ≈ 0.32 for α = π/2 or 3π/2, instead of 1/2 = 0.5.

Likewise, the sin²α function we got when adding |〈 0T | +S 〉|² and |〈 0T | −S 〉|²should also be normalized. One really needs to keep one’s wits about oneself here. What we’re saying here is that we have a particle that is either in the +S or the −S state, so let’s say that the chance is 50/50 to be in either of the two states. We then have these probabilities |〈 0T | +S 〉|² and |〈 0T | −S 〉|², which we calculated as (1/π)·sin²α. So the total combined probability is equal to 0.5·(1/π)·sin²α + 0.5·(1/π)·sin²α = (1/π)·sin²α. So we’re now weighing the two (1/π)·sin²α functions – and it doesn’t matter if the weights are 50/50 or 75/25 or whatever, as long as the two weights add up to one. The bottom line is: we get the same (1/π)·sin²α function for P, and the same maximum probability 1/π ≈ 0.32 for α = π/2 or 3π/2.

So we don’t get unity: P ≠ 1 for α = π/2 or 3π/2. Why not? Think about it. The classical analysis made sense, didn’t it? If the angular momentum is all in the z-direction (or in one of the two z-directions, I should say), then we cannot have any of it in the x-direction, can it? Well… The surprising answer is: yes, we can. The remarkable thing is that, in quantum physics, we actually never have all of the angular momentum in one direction. As I explained in my post on spin and angular momentum, the classical concepts of angular momentum, and the related magnetic moment, have their limits in quantum mechanics. In quantum physics, we find that the magnitude of a vector quantity, like angular momentum, or the related magnetic moment, is generally not equal to the maximum value of the component of that quantity in any direction. The general rule is that the maximum value of any component of J in whatever direction – i.e. +ħ in the example we’re discussing here – is smaller than the magnitude of J – which I calculated in the mentioned post as |J| = J = +√2·ħ ≈ 1.414·ħ, so that’s almost 1.5 times ħ! So it’s quite a bit smaller! The upshot is that we cannot associate any precise and unambiguous direction with quantities like the angular momentum J or the magnetic moment μ. So the answer is: the angular momentum can never be all in the z-direction, so we can always have some of it in the x-direction, and so that explains the amplitudes and probabilities we’re having here.

Huh?

Yep. I know. We never seem to get out of this ‘weirdness’, but then that’s how quantum physics is like. Feynman warned us upfront:

“Because atomic behavior is so unlike ordinary experience, it is very difficult to get used to, and it appears peculiar and mysterious to everyone—both to the novice and to the experienced physicist. Even the experts do not understand it the way they would like to, and it is perfectly reasonable that they should not, because all of direct, human experience and of human intuition applies to large objects. We know how large objects will act, but things on a small scale just do not act that way. So we have to learn about them in a sort of abstract or imaginative fashion and not by connection with our direct experience.”

As I see it, quantum physics is about explaining all sorts of weird stuff, like electron interference and tunneling and what have you, so it shouldn’t surprise us that the framework is as weird as the stuff it’s trying to explain. 🙂 So… Well… All we can do is to try to go along with it, isn’t it? And so that’s what we’ll do here. 🙂

Transformations: the formulas

We need to distinguish various cases here. The first case is the case explained above: the T-apparatus shares the same y-axis – along which the particles move – but it’s tilted. To be precise, we should say that it’s rotated about the common y-axis by the angle α. That implies we can relate the x’, y’, z’ coordinate system of T to the x, y, z coordinate system of S through the following equations: z′ = z·cosα + x·sinα, x′ = x·cosα − z·sinα, and y′ = y. Then the transformation amplitudes are:

We used the formula for 〈 0T | +S 〉 and 〈 0T | −S 〉 above, and you can play with the formulas above by imagining the related set-up of the S and T filters, such as the one below:

If you do your homework (just check what formula and what set-up this corresponds to), you should find the following graph for the amplitude and the probability as a function of α: the graph is zero for α = π, but is non-zero everywhere else. As with the other example, you should think about this. It makes sense—sort of, that is. 🙂

OK. Next case. Now we’re going to rotate the T-apparatus around the z-axis by some angle β. To illustrate what we’re doing here, we need to take a ‘top view’ of our apparatus, as shown below, which shows a rotation over 90°. More in general, for any angle β, the coordinate transformation is given by z′ = z, x′ = x·cosβ + y·sinβ, y′ = y·cosβ − x·sinβ. [So it’s quite similar to case 1: we’re only rotating the thing in a different plane.]

The transformation amplitudes are now given by:

As you can see, we get complex-valued transformation amplitudes, unlike our first case, which yielded real-valued transformation amplitudes. That’s just the way it is. Nobody says transformation amplitudes have to be real-valued. On the contrary, one would expect them to be complex numbers. 🙂 Having said that, the combined set of transformation formulas is, obviously, rather remarkable. The amplitude to go from the +S state to, say, the 0T state is zero. Also, when our particle has zero spin when coming out of S, it will always have zero spin when and if it goes through T. In fact, the absolute value of those e^±iβ functions is also equal to one, so they are also associated with probabilities that are equal to one: |e^±iβ|² = 1² = 1. So… Well… Those formulas are simple and weird at the same time, aren’t they? They sure give us plenty of stuff to think about, I’d say.

So what’s next? Well… Not all that much. We’re sort of done, really. Indeed, it’s just a property of space that we can get any rotation of T by combining the two rotations above. As I only want to introduce the basic concepts here, I’ll refer you to Feynman for the details of how exactly that’s being done. [He illustrates it for spin-1/2 particles in particular.] I’ll just wrap up here by generalizing our results from base states to any state.

Transformations: generalization

We mentioned a couple of times already that the base states are like a particular coordinate system: we will usually describe a state in terms of base states indeed. More in particular, choosing S as our representation, we’ll say:

The state φ is defined by the three numbers:

C₊ = 〈 +S | φ 〉,

C₀ = 〈 0S | φ 〉,

C₋ = 〈 −S | φ 〉.

Now, the very same state can, of course, also be described in the ‘T system’, so then our numbers – i.e. the ‘components’ of φ – would be equal to:

C’₊ = 〈 +T | φ 〉, C’₀ = 〈 0T | φ 〉, and C’₋ = 〈 −T | φ 〉.

So how can we go from the unprimed ‘coordinates’ to the primed ones? The trick is to use the second of the three quantum math ‘Laws’ which I introduced in my previous post:

Capture

Just replace χ in [II] by +T, 0T and/or –T. More in general, if we denote +T, 0T or –T by jT, we can re-write this ‘Law’ as:

So the 〈 jT | iS 〉 amplitudes are those nine transformation amplitudes. Now, we can represent those nine amplitudes in a nice three-by-three matrix and, yes, we’ll call that matrix the transformation matrix. So now you know what that is.

To conclude, I should note that it’s only because we’re talking spin-one particles here that we have three base states here and, hence, three ‘components’, which we denoted by C₊, C₋ and C₀, which transform the way they do when going from one representation to another, and so that is very much like what vectors do when we move to a different coordinate system, which is why spin-one particles are often referred to as ‘vector particles‘. [I am just mentioning this in case you’d come across the term and wonder why they’re being called that way. Now you know.] In fact, if we have three base states, in respect to whatever representation, and we define some state φ in terms of them, then we can always re-define that state in terms of the following ‘special’ set of components:

The set is ‘special’ because one can show (you can do that yourself that by using those transformation laws) that these components transform exactly the way as x, y, z transform to x′, y′, z′. But so I’ll leave at this.

[…]

Oh… What about the Hamiltonian? Well… I’ll save that for my next posts, as my posts have become longer and longer, and so it’s probably a good idea to separate them out. 🙂

Post scriptum: transformations for spin-1/2 particles

You should actually really check out that chapter of Feynman. The transformation matrices for spin-1/2 particles look different because… Well… Because there’s only two base states for spin-1/2 particles. It’s a pretty technical chapter, but then spin-1/2 particles are the ones that make up the world. 🙂

Quantum math: the rules – all of them! :-)

Original post:

In my previous post, I made no compromise, and used all of the rules one needs to calculate quantum-mechanical stuff:

However, I didn’t explain them. These rules look simple enough, but let’s analyze them now. They’re simple and not at the same time, indeed.

[I] The first equation uses the Kronecker delta, which sounds fancy but it’s just a simple shorthand: δ_ij= δ_jiis equal to 1 if i = j, and zero if i ≠ j, with i and j representing base states. Equation (I) basically says that base states are all different. For example, the angular momentum in the x-direction of a spin-1/2 particle – think of an electron or a proton – is either +ħ/2 or −ħ/2, not something in-between, or some mixture. So 〈 +x | +x 〉 = 〈 −x | −x 〉 = 1 and 〈 +x | −x 〉 = 〈 −x | +x 〉 = 0.

We’re talking base states here, of course. Base states are like a coordinate system: we settle on an x-, y- and z-axis, and a unit, and any point is defined in terms of an x-, y– and z-number. It’s the same here, except we’re talking ‘points’ in four-dimensional spacetime. To be precise, we’re talking constructs evolving in spacetime. To be even more precise, we’re talking amplitudes with a temporal as well as a spatial frequency, which we’ll often represent as:

a·e^−i·θ= e^{−i·(ω·t − k ∙x)} = a·e^{−(i/ħ)·(E·t − p∙x)}

The coefficient in front (a) is just a normalization constant, ensuring all probabilities add up to one. It may not be a constant, actually: perhaps it just ensure our amplitude stays within some kind of envelope, as illustrated below.

As for the ω = E/ħ and k = p/ħ identities, these are the de Broglie equations for a matter-wave, which the young Comte jotted down as part of his 1924 PhD thesis. He was inspired by the fact that the E·t − p∙x factor is an invariant four-vector product (E·t − p∙x = p_μx_μ) in relativity theory, and noted the striking similarity with the argument of any wave function in space and time (ω·t − k ∙x) and, hence, couldn’t resist equating both. Louis de Broglie was inspired, of course, by the solution to the blackbody radiation problem, which Max Planck and Einstein had convincingly solved by accepting that the ω = E/ħ equation holds for photons. As he wrote it:

“When I conceived the first basic ideas of wave mechanics in 1923–24, I was guided by the aim to perform a real physical synthesis, valid for all particles, of the coexistence of the wave and of the corpuscular aspects that Einstein had introduced for photons in his theory of light quanta in 1905.” (Louis de Broglie, quoted in Wikipedia)

Looking back, you’d of course want the phase of a wavefunction to be some invariant quantity, and the examples we gave our previous post illustrate how one would expect energy and momentum to impact its temporal and spatial frequency. But I am digressing. Let’s look at the second equation. However, before we move on, note that minus sign in the exponent of our wavefunction: a·e^−i·θ. The phase turns counter-clockwise. That’s just the way it is. I’ll come back to this.

[II] The φ and χ symbols do not necessarily represent base states. In fact, Feynman illustrates this law using a variety of examples including both polarized as well as unpolarized beams, or ‘filtered’ as well as ‘unfiltered’ states, as he calls it in the context of the Stern-Gerlach apparatuses he uses to explain what’s going on. Let me summarize his argument here.

I discussed the Stern-Gerlach experiment in my post on spin and angular momentum, but the Wikipedia article on it is very good too. The principle is illustrated below: a inhomogeneous magnetic field – note the direction of the gradient ∇B = (∂B/∂x, ∂B/∂y, ∂B/∂z) – will split a beam of spin-one particles into three beams. [Matter-particles with spin one are rather rare (Lithium-6 is an example), but three states (rather than two only, as we’d have when analyzing spin-1/2 particles, such as electrons or protons) allow for more play in the analysis. 🙂 In any case, the analysis is easily generalized.]

The splitting of the beam is based, of course, on the quantized angular momentum in the z-direction (i.e. the direction of the gradient): its value is either ħ, 0, or −ħ. We’ll denote these base states as +, 0 or −, and we should note they are defined in regard to an apparatus with a specific orientation. If we call this apparatus S, then we can denote these base states as +S, 0S and −S respectively.

The interesting thing in Feynman’s analysis is the imagined modified Stern-Gerlach apparatus, which – I am using Feynman‘s words here 🙂 – “puts Humpty Dumpty back together.” It looks a bit monstruous, but it’s easy enough to understand. Quoting Feynman once more: “It consists of a sequence of three high-gradient magnets. The first one (on the left) is just the usual Stern-Gerlach magnet and splits the incoming beam of spin-one particles into three separate beams. The second magnet has the same cross section as the first, but is twice as long and the polarity of its magnetic field is opposite the field in magnet $1$ . The second magnet pushes in the opposite direction on the atomic magnets and bends their paths back toward the axis, as shown in the trajectories drawn in the lower part of the figure. The third magnet is just like the first, and brings the three beams back together again, so that leaves the exit hole along the axis.”

Now, we can use this apparatus as a filter by inserting blocking masks, as illustrated below.

But let’s get back to the lesson. What about the second ‘Law’ of quantum math? Well… You need to be able to imagine all kinds of situations now. The rather simple set-up below is one of them: we’ve got two of these apparatuses in series now, S and T, with T tilted at the angle α with respect to the first.

I know: you’re getting impatient. What about it? Well… We’re finally ready now. Let’s suppose we’ve got three apparatuses in series, with the first and the last one having the very same orientation, and the one in the middle being tilted. We’ll denote them by S, T and S’ respectively. We’ll also use masks: we’ll block the 0 and − state in the S-filter, like in that illustration above. In addition, we’ll block the + and − state in the T apparatus and, finally, the 0 and − state in the S’ apparatus. Now try to imagine what happens: how many particles will get through?

[…]

Just try to think about it. Make some drawing or something. Please!

[…]

OK… The answer is shown below. Despite the filtering in S, the +S particles that come out do have an amplitude to go through the 0T-filter, and so the number of atoms that come out will be some fraction (α) of the number of atoms (N) that came out of the +S-filter. Likewise, some other fraction (β) will make it through the +S’-filter, so we end up with βαN particles.

Now, I am sure that, if you’d tried to guess the answer yourself, you’d have said zero rather than βαN but, thinking about it, it makes sense: it’s not because we’ve got some angular momentum in one direction that we have none in the other. When everything is said and done, we’re talking components of the total angular momentum here, don’t we? Well… Yes and no. Let’s remove the masks from T. What do we get?

[…]

Come on: what’s your guess? N?

[…] You’re right. It’s N. Perfect. It’s what’s shown below.

Now, that should boost your confidence. Let’s try the next scenario. We block the 0 and − state in the S-filter once again, and the + and − state in the T apparatus, so the first two apparatuses are the same as in our first example. But let’s change the S’ apparatus: let’s close the + and − state there now. Now try to imagine what happens: how many particles will get through?

[…]

Come on! You think it’s a trap, isn’t it? It’s not. It’s perfectly similar: we’ve got some other fraction here, which we’ll write as γαN, as shown below.

Next scenario: S has the 0 and − gate closed once more, and T is fully open, so it has no masks. But, this time, we set S’ so it filters the 0-state with respect to it. What do we get? Come on! Think! Please!

[…]

The answer is zero, as shown below.

Does that make sense to you? Yes? Great! Because many think it’s weird: they think the T apparatus must ‘re-orient’ the angular momentum of the particles. It doesn’t: if the filter is wide open, then “no information is lost”, as Feynman puts it. Still… Have a look at it. It looks like we’re opening ‘more channels’ in the last example: the S and S’ filter are the same, indeed, and T is fully open, while it selected for 0-state particles before. But no particles come through now, while with the 0-channel, we had γαN.

Hmm… It actually is kinda weird, won’t you agree? Sorry I had to talk about this, but it will make you appreciate that second ‘Law’ now: we can always insert a ‘wide-open’ filter and, hence, split the beams into a complete set of base states − with respect to the filter, that is − and bring them back together provided our filter does not produce any unequal disturbances on the three beams. In short, the passage through the wide-open filter should not result in a change of the amplitudes. Again, as Feynman puts it: the wide-open filter should really put Humpty-Dumpty back together again. If it does, we can effectively apply our ‘Law’:

For an example, I’ll refer you to my previous post. This brings me to the third and final ‘Law’.

[III] The amplitude to go from state φ to state χ is the complex conjugate of the amplitude to to go from state χ to state φ:

〈 χ | φ 〉 = 〈 φ | χ 〉*

This is probably the weirdest ‘Law’ of all, even if I should say, straight from the start, we can actually derive it from the second ‘Law’, and the fact that all probabilities have to add up to one. Indeed, a probability is the absolute square of an amplitude and, as we know, the absolute square of a complex number is also equal to the product of itself and its complex conjugate:

|z|²= |z|·|z| = z·z*

[You should go through the trouble of reviewing the difference between the square and the absolute square of a complex number. Just write z as a + ib and calculate (a + ib)²= a² + 2abi + b², as opposed to |z|²= a² + b². Also check what it means when writing z as r·e^iθ= r·(cosθ + i·sinθ).]

Let’s applying the probability rule to a two-filter set-up, i.e. the situation with the S and the tilted T filter which we described above, and let’s assume we’ve got a pure beam of +S particles entering the wide-open T filter, so our particles can come out in either of the three base states with respect to T. We can then write:

〈 +T | +S 〉²+ 〈 0T | +S 〉²+ 〈 −T | +S 〉²= 1

⇔ 〈 +T | +S 〉〈 +T | +S 〉* + 〈 0T | +S 〉〈 0T | +S 〉* + 〈 −T | +S 〉〈 −T | +S 〉* = 1

Of course, we’ve got two other such equations if we start with a 0S or a −S state. Now, we take the 〈 χ | φ 〉 = ∑ 〈 χ | i 〉〈 i | φ 〉 ‘Law’, and substitute χ and φ for +S, and all i states for the base states with regard to T. We get:

〈 +S | +S 〉 = 1 = 〈 +S | +T 〉〈 +T | +S 〉 + 〈 +S | 0T 〉〈 0T | +S 〉 + 〈 +S | –T 〉〈 −T | +S 〉

These equations are consistent only if:

〈 +S | +T 〉 = 〈 +T | +S 〉*,

〈 +S | 0T 〉 = 〈 0T | +S 〉*,

〈 +S | −T 〉 = 〈 −T | +S 〉*,

which is what we wanted to prove. One can then generalize to any state φ and χ. However, proving the result is one thing. Understanding it is something else. One can write down a number of strange consequences, which all point to Feynman‘s rather enigmatic comment on this ‘Law’: “If this Law were not true, probability would not be ‘conserved’, and particles would get ‘lost’.” So what does that mean? Well… You may want to think about the following, perhaps. It’s obvious that we can write:

|〈 φ | χ 〉|²= 〈 φ | χ 〉〈 φ | χ 〉* = 〈 χ | φ 〉*〈 χ | φ 〉 = |〈 χ | φ 〉|²

This says that the probability to go from the φ-state to the χ-state is the same as the probability to go from the χ-state to the φ-state.

Now, when we’re talking base states, that’s rather obvious, because the probabilities involved are either 0 or 1. However, if we substitute for +S and −T, or some more complicated states, then it’s a different thing. My guts instinct tells me this third ‘Law’ – which, as mentioned, can be derived from the other ‘Laws’ – reflects the principle of reversibility in spacetime, which you may also interpret as a causality principle, in the sense that, in theory at least (i.e. not thinking about entropy and/or statistical mechanics), we can reverse what’s happening: we can go back in spacetime.

In this regard, we should also remember that the complex conjugate of a complex number in polar form, i.e. a complex number written as r·e^iθ, is equal to r·e^−iθ, so the argument in the exponent gets a minus sign. Think about what this means for our a·e^−i·θ= e^{−i·(ω·t − k ∙x)} = a·e^{−(i/ħ)·(E·t − p∙x)}function. Taking the complex conjugate of this function amounts to reversing the direction of t and x which, once again, evokes that idea of going back in spacetime.

I feel there’s some more fundamental principle here at work, on which I’ll try to reflect a bit more. Perhaps we can also do something with that relationship between the multiplicative inverse of a complex number and its complex conjugate, i.e. z⁻¹= z*/|z|². I’ll check it out. As for now, however, I’ll leave you to do that, and please let me know if you’ve got any inspirational ideas on this. 🙂

So… Well… Goodbye as for now. I’ll probably talk about the Hamiltonian in my next post. I think we really did a good job in laying the groundwork for the really hardcore stuff, so let’s go for that now. 🙂

Post Scriptum: On the Uncertainty Principle and other rules

After writing all of the above, I realized I should add some remarks to make this post somewhat more readable. First thing: not all of the rules are there—obviously! Most notably, I didn’t say anything about the rules for adding or multiplying amplitudes, but that’s because I wrote extensively about that already, and so I assume you’re familiar with that. [If not, see my page on the essentials.]

Second, I didn’t talk about the Uncertainty Principle. That’s because I didn’t have to. In fact, we don’t need it here. In general, all popular accounts of quantum mechanics have an excessive focus on the position and momentum of a particle, while the approach in this and my previous post is quite different. Of course, it’s Feynman’s approach to QM really. Not ‘mine’. 🙂 All of the examples and all of the theory he presents in his introductory chapters in the Third Volume of Lectures, i.e. the volume on QM, are related to things like:

What is the amplitude for a particle to go from spin state +S to spin state −T?
What is the amplitude for a particle to be scattered, by a crystal, or from some collision with another particle, in the θ direction?
What is the amplitude for two identical particles to be scattered in the same direction?
What is the amplitude for an atom to absorb or emit a photon? [See, for example, Feynman’s approach to the blackbody radiation problem.]
What is the amplitude to go from one place to another?

In short, you read Feynman, and it’s only at the very end of his exposé, that he starts talking about the things popular books start with, such as the amplitude of a particle to be at point (x, t) in spacetime, or the Schrödinger equation, which describes the orbital of an electron in an atom. That’s where the Uncertainty Principle comes in and, hence, one can really avoid it for quite a while. In fact, one should avoid it for quite a while, because it’s now become clear to me that simply presenting the Uncertainty Principle doesn’t help all that much to truly understand quantum mechanics.

Truly understanding quantum mechanics involves understanding all of these weird rules above. To some extent, that involves dissociating the idea of the wavefunction with our conventional ideas of time and position. From the questions above, it should be obvious that ‘the’ wavefunction does actually not exist: we’ve got a wavefunction for anything we can and possibly want to measure. That brings us to the question of the base states: what are they?

Feynman addresses this question in a rather verbose section of his Lectures titled: What are the base states of the world? I won’t copy it here, but I strongly recommend you have a look at it. 🙂

I’ll end here with a final equation that we’ll need frequently: the amplitude for a particle to go from one place (r₁) to another (r₂). It’s referred to as a propagator function, for obvious reasons—one of them being that physicists like fancy terminology!—and it looks like this:

The shape of the e^{(i/ħ)·(p∙r₁₂)}function is now familiar to you. Note the r₁₂in the argument, i.e. the vector pointing from r₁ to r₂. The p∙r₁₂ dot product equals |p|∙|r₁₂|·cosθ = p∙r₁₂·cosθ, with θ the angle between p and r₁₂. If the angle is the same, then cosθ is equal to 1. If the angle is π/2, then it’s 0, and the function reduces to 1/r₁₂. So the angle θ, through the cosθ factor, sort of scales the spatial frequency. Let me try to give you some idea of how this looks like by assuming the angle between p and r₁₂ is the same, so we’re looking at the space in the direction of the momentum only and |p|∙|r₁₂|·cosθ = p∙r₁₂. Now, we can look at the p/ħ factor as a scaling factor, and measure the distance x in units defined by that scale, so we write: x = p∙r₁₂/ħ. The function then reduces to (ħ/p)·e^i∙x/x = (ħ/p)·cos(x)/x + i·(ħ/p)·sin(x)/x, and we just need to square this to get the probability. All of the graphs are drawn hereunder: I’ll let you analyze them. [Note that the graphs do not include the ħ/p factor, which you may look at as yet another scaling factor.] You’ll see – I hope! – that it all makes perfect sense: the probability quickly drops off with distance, both in the positive as well as in the negative x-direction, while it’s going to infinity when very near. [Note that the absolute square, using cos(x)/x and sin(x)/xyields the same graph as squaring 1/x—obviously!]

Working with amplitudes

Note (22 May 2020): I am in the process of reviewing many of my posts—including the one below—as a result of my realist interpretation of quantum mechanics, which is based on the ring current model of an electron. It is a delicate exercise because I would like to keep track of the old posts. Perhaps I should copy all of them to some new site. Any case, the point is: I would significantly rephrase some of the below because I now believe spin is very real in quantum mechanics as well.

Original post:

Don’t worry: I am not going to introduce the Hamiltonian matrix—not yet, that is. But this post is going a step further than my previous ones, in the sense that it will be more abstract. At the same time, I do want to stick to real physical examples so as to illustrate what we’re doing when working with those amplitudes. The example that I am going to use involves spin. So let’s talk about that first.

Spin, angular momentum and the magnetic moment

You know spin: it allows experienced pool players to do the most amazing tricks with billiard balls, making a joke of what a so-called elastic collision is actually supposed to look like. So it should not come as a surprise that spin complicates the analysis in quantum mechanics too. We dedicated several posts to that (see, for example, my post on spin and angular momentum in quantum physics) and I won’t repeat these here. Let me just repeat the basics:

1. Classical and quantum-mechanical spin do share similarities: the basic idea driving the quantum-mechanical spin model is that of a electric charge – positive or negative – spinning about its own axis (this is often referred to as intrinsic spin) as well as having some orbital motion (presumably around some other charge, like an electron in orbit with a nucleus at the center). This intrinsic spin, and the orbital motion, give our charge some angular momentum (J) and, because it’s an electric charge in motion, there is a magnetic moment (μ). To put things simply: the classical and quantum-mechanical view of things converge in their analysis of atoms or elementary particles as tiny little magnets. Hence, when placed in an external magnetic field, there is some interaction – a force – and their potential and/or kinetic energy changes. The whole system, in fact, acquires extra energy when placed in an external magnetic field.

Note: The formula for that magnetic energy is quite straightforward, both in classical as well as in quantum physics, so I’ll quickly jot it down: U = −μ•B = −|μ|·|B|·cosθ = −μ·B·cosθ. So it’s just the scalar product of the magnetic moment and the magnetic field vector, with a minus sign in front so as to get the direction right. [θ is the angle between the μ and B vectors and determines whether U as a whole is positive or negative.

2. The classical and quantum-mechanical view also diverge, however. They diverge, first, because of the quantum nature of spin in quantum mechanics. Indeed, while the angular momentum can take on any value in classical mechanics, that’s not the case in quantum mechanics: in whatever direction we measure, we get a discrete set of values only. For example, the angular momentum of a proton or an electron is either −ħ/2 or +ħ/2, in whatever direction we measure it. Therefore, they are referred to as spin-1/2 particles. All elementary fermions, i.e. the particles that constitute matter (as opposed to force-carrying particles, like photons), have spin 1/2.

Note: Spin-1/2 particles include, remarkably enough, neutrons too, which has the same kind of magnetic moment that a rotating negative charge would have. The neutron, in other words, is not exactly ‘neutral’ in the magnetic sense. One can explain this by noting that a neutron is not ‘elementary’, really: it consists of three quarks, just like a proton, and, therefore, it may help you to imagine that the electric charges inside are, somehow, distributed unevenly—although physicists hate such simplifications. I am noting this because the famous Stern-Gerlach experiment, which established the quantum nature of particle spin, used silver atoms, rather than protons or electrons. More in general, we’ll tend to forget about the electric charge of the particles we’re describing, assuming, most of the time, or tacitly, that they’re neutral—which helps us to sort of forget about classical theory when doing quantum-mechanical calculations!

3. The quantum nature of spin is related to another crucial difference between the classical and quantum-mechanical view of the angular momentum and the magnetic moment of a particle. Classically, the angular momentum and the magnetic moment can have any direction.

Note: I should probably briefly remind you that J is a so-called axial vector, i.e. a vector product (as opposed to a scalar product) of the radius vector r and the (linear) momentum vector p = m·v, with v the velocity vector, which points in the direction of motion. So we write: J = r×p = r×m·v = |r|·|p|·sinθ·n. The n vector is the unit vector perpendicular to the plane containing r and p (and, hence, v, of course) given by the right-hand rule. I am saying this to remind you that the direction of the magnetic moment and the direction of motion are not the same: the simple illustration below may help to see what I am talking about.]

Back to quantum mechanics: the image above doesn’t work in quantum mechanics. We do not have an unambiguous direction of the angular momentum and, hence, of the magnetic moment. That’s where all of the weirdness of the quantum-mechanical concept of spin comes out, really. I’ll talk about that when discussing Feynman’s ‘filters’ – which I’ll do in a moment – but here I just want to remind you of the mathematical argument that I presented in the above-mentioned post. Just like in classical mechanics, we’ll have a maximum (and, hence, also a minimum) value for J, like +ħ, 0 and +ħ for a Lithium-6 nucleus. [I am just giving this rather special example of a spin-1 article so you’re reminded we can have particles with an integer spin number too!] So, when we measure its angular momentum in any direction really, it will take on one of these three values: +ħ, 0 or +ħ. So it’s either/or—nothing in-between. Now that leads to a funny mathematical situation: one would usually equate the maximum value of a quantity like this to the magnitude of the vector, which is equal to the (positive) square root of J² = J•J = J_x² + J_y² + J_z², with J_x, J_y and J_z the components of J in the x-, y- and z-direction respectively. But we don’t have continuity in quantum mechanics, and so the concept of a component of a vector needs to be carefully interpreted. There’s nothing definite there, like in classical mechanics: all we have is amplitudes, and all we can do is calculate probabilities, or expected values based on those amplitudes.

Huh? Yes. In fact, the concept of the magnitude of a vector itself becomes rather fuzzy: all we can do really is calculate its expected value. Think of it: in the classical world, we have a J² = J•J product that’s independent of the direction of J. For example, if J is all in the x-direction, then J_yand J_zwill be zero, and J² = J_x². If it’s all in the y-direction, then J_xand J_zwill be zero and all of the magnitude of J will be in the y-direction only, so we write: J² = J_y². Likewise, if J does not have any z-component, then our J•J product will only include the x- and y-components: J•J = J_x² + J_y². You get the idea: the J² = J•J product is independent of the direction of J exactly because, in classical mechanics, J actually has a precise and unambiguous magnitude and direction and, therefore, actually has a precise and unambiguous component in each direction. So we’d measure J_x, J_y, and J_zand, regardless of the actual direction of J, we’d find its magnitude |J| = J = +√J² = +(J_x² + J_y² + J_z²)^1/2.

In quantum mechanics, we just don’t have quantities like that. We say that J_x,J_yand J_zhave an amplitude to take on a value that’s equal to +ħ, 0 or +ħ (or whatever other value is allowed by the spin number of the system). Now that we’re talking spin numbers, please note that this characteristic number is usually denoted by j, which is a bit confusing, but so be it. So j can be 0, 1/2, 1, 3/2, etcetera, and the number of ‘permitted values’ is 2j + 1 values, with each value being separated by an amount equal to ħ. So we have 1, 2, 3, 4, 5 etcetera possible values for J_x,J_yand J_zrespectively. But let me get back to the lesson. We just can’t do the same thing in quantum mechanics. For starters, we can’t measure J_x, J_y, and J_zsimultaneously: our Stern-Gerlach apparatus has a certain orientation and, hence, measures one component of J only. So what can we do?

Frankly, we can only do some math here. The wave-mechanical approach does allow to think of the expected value of J² = J•J = J_x² + J_y² + J_z² value, so we write:

E[J²] = E[J•J] = E[J_x² + J_y² + J_z²] = ?

[Feynman’s use of the 〈 and 〉 brackets to denote an expected value is hugely confusing, because these brackets are also used to denote an amplitude. So I’d rather use the more commonly used E[X] notation.] Now, it is a rather remarkable property, but the expected value of the sum of two or more random variables is equal to the sum of the expected values of the variables, even if those variables may not be independent. So we can confidently use the linearity property of the expected value operator and write:

E[J_x²+ J_y² + J_z²] = E[J_x²] + E[J_x²] + E[J_x²]

Now we need something else. It’s also just part of the quantum-mechanical approach to things and so you’ll just have to accept it. It sounds rather obvious but it’s actually quite deep: if we measure the x-, y- or z-component of the angular momentum of a random particle, then each of the possible values is equally likely to occur. So that means, in our case, that the +ħ, 0 or +ħ values are equally likely, so their likelihood is one into three, i.e. 1/3. Again, that sounds obvious but it’s not. Indeed, please note, once again, that we can’t measure J_x, J_y, and J_zsimultaneously, so the ‘or’ in x-, y- or z-component is an exclusive ‘or’. Of course, I must add this equipartition of likelihoods is valid only because we do not have a preferred direction for J: the particles in our beam have random ‘orientations’. Let me give you the lingo for this: we’re looking at an unpolarized beam. You’ll say: so what? Well… Again, think about what we’re doing here: we may of may not assume that the J_x, J_y, and J_zvariables are related. In fact, in classical mechanics, they surely are: they’re determined by the magnitude and direction of J. Hence, they are not random at all ! But let me continue, so you see what comes out.

Because the +ħ, 0 and +ħ values are equally, we can write: E[J_x²] = ħ²/3 + 0/3 + (−ħ)²/3 = [ħ² + 0 + (−ħ)²]/3 = 2ħ²/3. In case you wonder, that’s just the definition of the expected value operator: E[X] = p₁x₁+ p₂x₂+ … = ∑p_ix_i, with p_i the likelihood of the possible value x_i. So we take a weighted average with the respective probabilities as the weights. However, in this case, with an unpolarized beam, the weighted average becomes a simple average.

Now, E[J_y²] and E[J_z²] are – rather unsurprisingly – also equal to 2ħ²/3, so we find that E[J²] = E[J_x²] + E[J_x²] + E[J_x²] = 3·(2ħ²/3) = 2ħ²and, therefore, we’d say that the magnitude of the angular momentum is equal to |J| = J = +√2·ħ ≈ 1.414·ħ. Now that value is not equal to the maximum value of our x-, y-, z-component of J, or the component of J in whatever direction we’d want to measure it. That maximum value is ħ, without the √2 factor, so that’s some 40% less than the magnitude we’ve just calculated!

Now, you’ve probably fallen asleep by now but, what this actually says, is that the angular momentum, in quantum mechanics, is never completely in any direction. We can state this in another way: it implies that, in quantum mechanics, there’s no such thing really as a ‘definite’ direction of the angular momentum.

[…]

OK. Enough on this. Let’s move on to a more ‘real’ example. Before I continue though, let me generalize the results above:

[I] A particle, or a system, will have a characteristic spin number: j. That number is always an integer or a half-integer, and it determines a discrete set of possible values for the component of the angular momentum J in any direction.

[II] The number of values is equal to 2j + 1, and these values are separated by ħ, which is why they are usually measured in units of ħ, i.e. Planck’s reduced constant: ħ ≈ 1×10⁻³⁴J·s, so that’s tiny but real. 🙂 [It’s always good to remind oneself that we’re actually trying to describe reality.] For example, the permitted values for a spin-3/2 particle are +3ħ/2, +ħ/2, −ħ/2 and −3ħ/2 or, measured in units of ħ, +3/2, +1/2, −1/2 and −3/2. When discussing spin-1/2 particles, we’ll often refer to the two possible states as the ‘up’ and the ‘down’ state respectively. For example, we may write the amplitude for an electron or a proton to have a angular momentum in the x-direction equal to +ħ/2 or −ħ/2 as 〈+x〉 and 〈−x〉 respectively. [Don’t worry too much about it right now: you’ll get used to the notation quickly.]

[III] The classical concepts of angular momentum, and the related magnetic moment, have their limits in quantum mechanics. The magnitude of a vector quantity like angular momentum is generally not equal to the maximum value of the component of that quantity in any direction. The general rule is:

J²= j·(j+1)ħ² > j²·ħ²

So the maximum value of any component of J in whatever direction (i.e. j·ħ) is smaller than the magnitude of J (i.e. √[ j·(j+1)]·ħ). This implies we cannot associate any precise and unambiguous direction with quantities like the angular momentum J or the magnetic moment μ. As Feynman puts it:

“That the energy of an atom [or a particle] in a magnetic field can have only certain discrete energies is really not more surprising than the fact that atoms in general have only certain discrete energy levels—something we mentioned often in Volume I. Why should the same thing not hold for atoms in a magnetic field? It does. But it is the attempt to correlate this with the idea of an oriented magnetic moment that brings out some of the strange implications of quantum mechanics.”

**A real example: the disintegration of a muon in a magnetic field**

I talked about muon integration before, when writing a much more philosophical piece on symmetries in Nature and time reversal in particular. I used the illustration below. We’ve got an incoming muon that’s being brought to rest in a block of material, and then, as muons do, it disintegrates, emitting an electron and two neutrinos. As you can see, the decay direction is (mostly) in the direction of the axial vector that’s associated with the spin direction, i.e. the direction of the grey dashed line. However, there’s some angular distribution of the decay direction, as illustrated by the blue arrows, that are supposed to visualize the decay products, i.e. the electron and the neutrinos.

This disintegration process is very interesting from a more philosophical side. The axial vector isn’t ‘real’: it’s a mathematical concept—a pseudovector. A pseudo- or axial vector is the product of two so-called true vectors, aka as polar vectors. Just look back at what I wrote about the angular momentum: the J in the J = r×p = r×m·v formula is such vector, and its direction depends on the spin direction, which is clockwise or counter-clockwise, depending from what side you’re looking at it. Having said that, who’s to judge if the product of two ‘true’ vectors is any less ‘true’ than the vectors themselves? 🙂

The point is: the disintegration process does not respect what is referred to as P-symmetry. That’s because our mathematical conventions (like all of these right-hand rules that we’ve introduced) are unambiguous, and they tell us that the pseudovector in the mirror image of what’s going on, has the opposite direction. It has to, as per our definition of a vector product. Hence, our fictitious muon in the mirror should send its decay products in the opposite direction too! So… Well… The mirror image of our muon decay process is actually something that’s not going to happen: it’s physically impossible. So we’ve got a process in Nature here that doesn’t respect ‘mirror’ symmetry. Physicists prefer to call it ‘P-symmetry’, for parity symmetry, because it involves a flip of sign of all space coordinates, so there’s a parity inversion indeed. So there’s processes in Nature that don’t respect it but, while that’s all very interesting, it’s not what I want to write about. [Just check that post of mine if you’d want to read more.] Let me, therefore, use another illustration—one that’s more to the point in terms of what we do want to talk about here:

So we’ve got the same muon here – well… A different one, of course! 🙂 – entering that block (A) and coming to a grinding halt somewhere in the middle, and then it disintegrates in a few micro-seconds, which is an eternity at the atomic or sub-atomic scale. It disintegrates into an electron and two neutrinos, as mentioned above, with some spread in the decay direction. [In case you wonder where we can find muons… Well… I’ll let you look it up yourself.] So we have:

Now it turns out that the presence of a magnetic field (represented by the B arrows in the illustration above) can drastically change the angular distribution of decay directions. That shouldn’t surprise us, of course, but how does it work, exactly? Well… To simplify the analysis, we’ve got a polarized beam here: the spin direction of all muons before they enter the block and/or the magnetic field, i.e. at time t = 0, is in the +x-direction. So we filtered them just, before they entered the block. [I will come back to this ‘filtering’ process.] Now, if the muon’s spin would stay that way, then the decay products – and the electron in particular – would just go straight, because all of the angular momentum is in that direction. However, we’re in the quantum-mechanical world here, and so things don’t stay the same. In fact, as we explained, there’s no such things as a definite angular momentum: there’s just an amplitude to be in the +x state, and that amplitude changes in time and in space.

How exactly? Well… We don’t know, but we can apply some clever tricks here. The first thing to note is that our magnetic field will add to the energy of our muon. So, as I explained in my previous post, the magnetic field adds to the E in the exponent of our complex-valued wavefunction a·e^{−(i/ħ)(E·t − p∙x)}. In our example, we’ve got a magnetic field in the z-direction only, so that U = −μ•B reduces to U = −μ_z·B, and we can re-write our wavefunction as:

a·e^{−(i/ħ)[(E+U)·t − p∙x]} = a·e^{−(i/ħ)(E·t − p∙x)}·e^{(i/ħ)(μ_z·B·t)}

Of course, the magnetic field only acts from t = 0 to when the muon disintegrates, which we’ll denote by the point t = τ. So what we get is that the probability amplitude of a particle that’s been in a uniform magnetic field changes by a factor e⁽ⁱ^/ħ)(μ^_z^·B·τ). Note that it’s a factor indeed: we use it to multiply. You should also note that this is a complex exponential, so it’s a periodic function, with its real and imaginary part oscillating between zero and one. Finally, we know that μ_z can take on only certain values: for a spin-1/2 particle, they are plus or minus some number, which we’ll simply denote as μ, so that’s without the subscript, so our factor becomes:

e^{(i/ħ)(±μ·B·t)}

[The plus or minus sign needs to be explained here, so let’s do that quickly: we have two possible states for a spin-1/2 particle, one ‘up’, and the other ‘down’. But then we also know that the phase of our complex-valued wave function turns clockwise, which is why we have a minus sign in the exponent of our e^−iθexpression. In short, for the ‘up’ state, we should take the positive value, i.e. +μ, but the minus sign in the exponent of our e^−iθfunction makes it negative again, so our factor is e^{−(i/ħ)(μ·B·t)}for the ‘up’ state, and e^{+(i/ħ)(μ·B·t)}for the ‘down’ state.]

OK. We get that, but that doesn’t get us anywhere—yet. We need another trick first. One of the most fundamental rules in quantum-mechanics is that we can always calculate the amplitude to go from one state, say φ (read: ‘phi’), to another, say χ (read: ‘khi’), if we have a complete set of so-called base states, which we’ll denote by the index i or j (which you shouldn’t confuse with the imaginary unit, of course), using the following formula:

〈 χ | φ 〉 = ∑ 〈 χ | i 〉〈 i | φ 〉

I know this is a lot to swallow, so let me start with the notation. You should read 〈 χ | φ 〉 from right to left: it’s the amplitude to go from state φ to state χ. This notation is referred to as the bra-ket notation, or the Dirac notation. [Dirac notation sounds more scientific, doesn’t it?] The right part, i.e. | φ 〉, is the bra, and the left part, i.e. 〈 χ | is the ket. In our example, we wonder what the amplitude is for our muon staying in the +x state. Because that amplitude is time-dependent, we can write it as A₊(τ) = 〈 +x at time t = τ | +x at time t = 0 〉 = 〈 +x at t = τ | +x at t = 0 〉or, using a very misleading shorthand, 〈 +x | +x 〉. [The shorthand is misleading because the +x in the ket obviously means something else than the +x in the bra.]

But let’s apply the rule. We’ve got two states with respect to each coordinate axis only here. For example, in respect to the z-axis, the spin values are +z and −z respectively. [As mentioned above, we actually mean that the angular momentum in this direction is either +ħ/2 or −ħ/2, aka as ‘up’ or ‘down’ respectively, but then quantum theorists seem to like all kinds of symbols better, so we’ll use the +z and −z notations for these two base states here. So now we can use our rule and write:

A₊(t) = 〈 +x | +x 〉 = 〈 +x | +z 〉〈 +z | +x 〉 + 〈 +x | −z 〉〈 −z | +x 〉

You’ll say this doesn’t help us any further, but it does, because there is another set of rules, which are referred to as transformation rules, which gives us those 〈 +z | +x 〉 and 〈 −z | +x 〉 amplitudes. They’re real numbers, and it’s the same number for both amplitudes.

〈 +z | +x 〉 = 〈 −z | +x 〉 = 1/√2

This shouldn’t surprise you too much: the square root disappears when squaring, so we get two equal probabilities – 1/2, to be precise – that add up to one which – you guess it – they have to add up to because of the normalization rule: the sum of all probabilities has to add up to one, always. [I can feel your impatience, but just hang in here for a while, as I guide you through what is likely to be your very first quantum-mechanical calculation.] Now, the 〈 +z | +x 〉 = 〈 −z | +x 〉 = 1/√2 amplitudes are the amplitudes at time t = 0, so let’s be somewhat less sloppy with our notation and write 〈 +z | +x 〉 as C₊(0) and 〈 −z | +x 〉 as C₋(0), so we write:

〈 +z | +x 〉 = C₊(0) = 1/√2

〈 −z | +x 〉 = C₋(0) = 1/√2

Now we know what happens with those amplitudes over time: that e^{(i/ħ)(±μ·B·t)} factor kicks in, and so we have:

C₊(t) = C₊(0)·e^{−(i/ħ)(μ·B·t)} = e^{−(i/ħ)(μ·B·t)}/√2

C₋(t) = C₋(0)·e^{+(i/ħ)(μ·B·t)} = e^{+(i/ħ)(μ·B·t)}/√2

As for the plus and minus signs, see my remark on the tricky ± business in regard to μ. To make a long story somewhat shorter :-), our expression for A₊(t) = 〈 +x at t | +x 〉 now becomes:

A₊(t) = 〈 +x | +z 〉·C₊(t) + 〈 +x | −z 〉·C₋(t)

Now, you wouldn’t be too surprised if I’d just tell you that the 〈 +x | +z 〉 and 〈 +x | −z 〉 amplitudes are also real-valued and equal to 1/√2, but you can actually use yet another rule we’ll generalize shortly: the amplitude to go from state φ to state χ is the complex conjugate of the amplitude to to go from state χ to state φ, so we write 〈 χ | φ 〉 = 〈 φ | χ 〉*, and therefore:

〈 +x | +z 〉 = 〈 +z | +x 〉* = (1/√2)* = (1/√2)

〈 +x | −z 〉 = 〈 −z | +x 〉* = (1/√2)* = (1/√2)

So our expression for A₊(t) = 〈 +x at t | +x 〉 now becomes:

A₊(t) = e^{−(i/ħ)(μ·B·t)}/2 + e^{(i/ħ)(μ·B·t)}/2

That’s the sum of a complex-valued function and its complex conjugate, and we’ve shown more than once (see my page on the essentials, for example) that such sum reduces to the sum of the real parts of the complex exponentials. [You should not expect any explanation of Euler’s e^iθ = cosθ + i·sinθ rule at this level of understanding.] In short, we get the following grand result:

The big question, of course: what does this actually mean? 🙂 Well… Just square this thing and you get the probabilities shown below. [Note that the period of a squared cosine function is π, instead of 2π, which you can easily verify using an online graphing tool.]

Because you’re tired of this business, you probably don’t realize what we’ve just done. It’s spectacular and mundane at the same time. Let me quote Feynman to summarize the results:

“We find that the chance of catching the decay electron in the electron counter varies periodically with the length of time the muon has been sitting in the magnetic field. The frequency depends on the magnetic moment $μ$ . The magnetic moment of the muon has, in fact, been measured in just this way.”

As far as I am concerned, the key result is that we’ve learned how to work with those mysterious amplitudes, and the wavefunction, in a practical way, thereby using all of the theoretical rules of the quantum-mechanical approach to real-life physical situations. I think that’s a great leap forward, and we’ll re-visit those rules in a more theoretical and philosophical démarche in the next post. As for the example itself, Feynman takes it much further, but I’ll just copy the Grand Master here:

Huh? Well… I am afraid I have to leave it at this, as I discussed the precession of ‘atomic’ magnets elsewhere (see my post on precession and diamagnetism), which gives you the same formula: ω_p= μ·B/J (just substitute J for ±ħ/2). However, the derivation above approaches it from an entirely different angle, which is interesting. Of course, all fits. 🙂 However, I’ll let you do your own homework now. I hope to see you tomorrow for the mentioned theoretical discussion. Have a nice evening, or weekend – or whatever ! 🙂

Quantum mechanics and forces: the classical limit

Original post:

This post continues along the same line as the two previous ones: the objective is to get a better ‘feel’ for those weird amplitudes by showing how they are being affected by a potential − such as an electric potential, or a gravitational field, or a magnetic field − and how they relate to a classical analysis of a situation. It’s what Feynman tries to do too. The best example of such ‘familiarization exercises’, by far, is the comparison between a classical analysis of a force field, and the quantum-mechanical one, so let’s quickly go through that.

The classical situation is depicted below. We’ve got a particle moving in the x-direction and entering a region where the potential − again, think of an electrostatic field, for example, as it probably inspires the V (for voltage, I assume) – varies in the y-direction. As you know, the potential (energy) is measured by doing work against the force, and the work done by the force is the W = ∫F•ds integral along the path. Conversely, one can differentiate both sides of that W = ∫F•ds equation to get the force, so we can write: F = dW/ds. That explains the F = −∂V/∂y expression in the illustration below. Again, think of a positively charged particle that’s being attracted from a high or positive voltage (high V) to a lower or negative voltage.

As a result of the force, the particle is being accelerated transversely, i.e. in the y-direction, and so it adds some momentum in the y-direction (p_y) to the initial momentum p, which is all in the x-direction (p = p_x). Of course, the force only acts as the particle is passing through the area, so that’s during a time interval that’s equal to w/v, with w being the ‘width’ of the area. Now, the vertical momentum builds from 0 to p_y, and we can calculate p_y using Newton’s Law: F = m·a = m·(dv/dt) = d(m·v)/dt = dp/dt. It goes as follows:

We can then use a simple trigonometric formula to find the angle of deflection δθ:

Done! Now we need to show we get the same result with our quantum-mechanical approach or, at the very least, show how our quantum-mechanical approach actually works.

Of course, the assumption is that everything is on a very large scale as compared with the wavelength of our probability amplitudes. We then know that the presence of a potential V will just add to the energy E that we are to use in our wavefunction. Hence, in any small region, the amplitude will vary as:

a·e^{−(i/ħ)[(E_int+ p²/(2m) + V)·t − p∙x]}

Now, in the previous post, we explained that the energy conservation principle does not have any impact on the temporal variation of the wavefunction. Hence, E_int+ p²/(2m) + V effectively remains constant, and the effect of a changing V is on p, and works out through the −p∙x term in the exponent of our complex-valued wave function. To be specific, in a region where V is larger, p will be smaller and, therefore, we’ll have a larger wavelength λ = h/p. [Sorry if you can’t follow here: please do check out what I wrote in my two previous posts. It’s not that difficult.]

The illustration below shows how it works. The wavelength is the distance between successive wave nodes, which you should think of as surfaces where the phase of the amplitude is zero, or as wavefronts. So we’ve got a change in the angle of the wave nodes as well. How does that come about? Look at the two paths a and b: there’s a difference in potential between them, which we denote by ΔV, and which can be approximated as ΔV = (∂V/∂y)·D, with D the separation between the two paths.

Now, if E_int+ p²/(2m) + V is a constant, then ΔV must be equal to −Δ[p²/(2m)]. Now what can we do with that? We’re talking a differential really, so we should apply the Δf(x) = [df(x)/dx]·Δx formula, which gives us:

Δ[p²/(2m)] = (p/m)·Δp = −ΔV ⇔ Δp = −(m/p)·ΔV

[Note that the d[f(x)] = [df(x)/dx]·dx formula is an interesting one: you’ll surely have come across something like Δ[1/λ], and wondered what you could do with that. Now you know: Δ[1/λ] = Δλ/λ².]

Now, we’re interested in Δx, so how can we get some equation for that? Here we need to use the wavenumber k, and the derivation is not so easy—unfortunately! The (p/m)·Δp = −ΔV equation tells us that the wave number will be different along the two paths, and we can write the difference as Δk = Δ(p/ħ) = Δp/ħ. Now, we also now that the wavenumber is expressed in radians per unit distance. In addition, we also know that the wavenumber is the derivative of the phase with respect to distance, so we can approximately calculate the amount by which the phase on path b is ‘ahead’ of the phase on path a as the wave leaves the area, as:

Δ(phase) = Δk·w

Now, Δk = Δp/ħ, so Δ(phase) = Δp·w/ħ = −[m/(p·ħ)]·ΔV·w. Now, along path b, we will have some wavenumber k which, as per the definition of k, is equal to the derivative of the phase with respect to distance, so the ratio of the change in phase along path and the incremental distance Δx should equal the wave number k along path b, so we can write:

Δ(phase)/Δx = k ⇔ Δx = k·Δ(phase) = (λ/2π)·Δ(phase) = (ħ/p)·Δ(phase)

Combining that with Δ(phase) = −[m/(p·ħ)]·ΔV·w identity, we get:

Δx = −(ħ/p)·[m/(p·ħ)]·ΔV·w = −[m/p²]·ΔV·w

Phew! Just hang in there. We’re not done yet. I need to refer to the illustration now to show that Δx can also be approximated by D·δθ, so we write: Δx = D·δθ. [Again: it’s just plain triangle geometry, and the small angle approximation.] Combining both formulas for Δx yields:

δθ = −[m/p²]·ΔV·w/D

Now we can replace p/m by v and use our ΔV = (∂V/∂y)·D approximation to replace ΔV/D by (∂V/∂y), so we get:

δθ = −[1/(p·v)]·(∂V/∂y)·w = −[w/(p·v)]·(∂V/∂y)

So we get the very same equation for δθ as the one we got when applying classical mechanics. That’s a great result—also I have to admit that the differential analysis here was very painful and, hence, difficult to reproduce. In any case, what you should be interested in, is the result: as Feynman summarizes it,

“We get the same particle motions we get from Newton’s $F= m\cdota law$ , provided we assume that a potential contributes a phase to the probability amplitude equal to $V\cdott/ħ$ . Hence, in the classical limit, quantum mechanics agrees with Newtonian mechanics.”

However, he also cautions that “the result we have just got is correct only if the potential variations are slow and smooth”, which actually defines what is referred to as the “classical limit”.

Let me, to conclude this post, present another example out of Feynman’s ‘illustrations’ of what a field or a potential does with amplitudes. Trust me: we’ll need it later. As Feynman is really succinct here, I will just copy it. The quote describes what happens in a magnetic field, as opposed to the electrostatic field which I mentioned above. Just read it: it’s easy. [If it’s too small, just click on it, and the text size will be OK.] However, while it’s easy to understand, the consequences, on which I’ll write a separate post, are quite deep.

So… Well… That’s really it for today. 🙂

Potential energy and amplitudes: energy conservation and tunneling effects

Original post:

This post is intended to help you think about, and work with, those mysterious amplitudes. More in particular, I’ll explore how potential differences change amplitudes. But let’s first recapitulate the basics.

In my previous post, I explained why the young French Comte Louis de Broglie, when writing his PhD thesis back in 1924, i.e. before Schrödinger, Born, Heisenberg and others had published their work, boldly proposed to the ω·t − k·x argument in the wavefunction of a particle with the relativistic invariant product of the momentum and position four-vectors p_μ= (E, p) = (E, p_x, p_y, p_z,) and x_μ= (t, x) = (t, x, y, z), provided the energy and momentum are re-scaled in terms of ħ. Hence, he wrote:

θ = ω·t − k·x = (p_μx_μ)/ħ = (E∙t − p∙x)/ħ = (E/ħ)∙t − (p/ħ)∙x

As it’s usually instructive to do a quick dimensional analysis, let’s do one here too. Energy is expressed in joule, and dividing it by the quantum of action, which is expressed in joule·seconds (J·s) gives us the dimension of an (angular) frequency indeed, which, in turns, yields a pure number. Likewise, linear momentum can be expressed in newton·seconds which, when divided by joule·seconds (J·s), yields a quantity expressed per meter. Hence, the dimension of p/ħ is m^–1, which again yields a pure number when multiplied with the dimension of the coordinates x, y or z.

In the mentioned post, I also gave an unambiguous answer to the question as to what energy concept should be used in the equation: it is the total energy of the particle we are trying to describe, so that includes its kinetic energy, its rest mass energy and, finally, its potential energy in whatever force field it may find itself, such as a gravitational and/or electromagnetic force field. Now, while we know that, when talking potential energy, we have some liberty in choosing the zero point of our energy scale, this issue is easily overcome by noting that we are always talking about the amplitude to go from one state to another, or to go from one point in spacetime to another. Hence, what matters is the potential difference, really.

Feynman, in his description of the conservation of energy in a quantum-mechanical context, distinguishes:

The rest energy m₀∙c², which he describes as the rest energy ‘of the parts of the particle’. [One should remember he wrote this before the existence of quarks and the associated theory of matter was confirmed.]
The energy ‘over and above’ the rest energy, which includes both the kinetic energy, i.e. m∙v²/2 = p²/(2m), as well as the ‘binding and/or excitation energy’, which he refers to as ‘internal energy’.
Finally, there is the potential energy, which we’ll denote by U.

In my previous post, I also gave you the relativistically correct formula for the energy of a particle with momentum p:

However, we will follow Feynman in his description, who uses the non-relativistic formula E_p= E_int+ p²/(2m) + U. This is quite OK if we assume that the classical velocity of our particle does not approach the speed of light, so that covers a rather large number of real-life situations. Also, to make the example more real, we will assume the potential energy is electrostatic, and given by the formula U = q·Φ, with Φ the electrostatic potential (so just think of a number expressed in volt). Of course, q·Φ will be negative if the signs of q (i.e. the electric charge of our particle) and Φ are opposite, and positive if both have the same sign, as opposites attract and like repel when it comes to electric charge.

The illustration below visualizes the situation for Φ₂< Φ₁. For example, we may assume Φ₁ is zero, that Φ₂is negative, and that our particle is positively charged, so U₂= qΦ₂< 0. So it’s all rather simple really: we have two areas with a potential equal to U₁= qΦ₁and U₂= qΦ₂< 0 respectively. Hence, we need to use E₁= E_int+ p₁²/(2m) + U₁ to substitute ω₁for E₁/ħ in the first area, and then E₂= E_int+ p₂²/(2m) + U₂to substitute ω₂for E₂/ħ in the second area, which U₂– U₁< 0.

The corresponding amplitudes, or wavefunctions, are:

Ψ₁(θ₁) = Ψ₁(x, t) = a·e⁻ⁱ^θ^₁ = a·e^{−i[(E_int+ p₁²/(2m) + U₁)·t − p₁∙x]/ħ}
Ψ₂(θ₂) = Ψ₂(x, t) = a·e⁻ⁱ^θ^₂ = a·e^{−i[(E_int+ p₂²/(2m) + U₂)·t − p₂∙x]/ħ}

Now how should we think about these two equations? We are definitely talking different wavefunctions. However, having said that, there is no reason to assume the different potentials would have an impact on the temporal frequency. Therefore, we can boldly equate ω₁and ω₂and, therefore, write that:

E_int+ p₁²/(2m) + U₁= E_int+ p₂²/(2m) + U₂⇔ p₁²/(2m) − p₂²/(2m) = U₂– U₁< 0

⇒ p₁²− p₂²< 0 ⇔ p₂> p₁

What this says is that the kinetic energy, and/or the momentum, of our particle is greater in the second area, which is what we would classically expect, as a positive charged particle will pick up speed – and, therefore, momentum and kinetic energy – as it moves from an area with zero potential to an area with negative potential. However, the λ = h/p relation then implies that λ₂= h/p₂is smaller than λ₁= h/p₂, which is what is illustrated by the dashed lines in the illustration above – which represent surfaces of equal phase, or wavefronts – and also by the second diagram in the illustration, which shows the real part of the complex-valued amplitude and compares the wavelengths λ₁and λ₂. [As you know, the imaginary part is just like the real part but with a phase shift equal to π/2. Ideally, we should show both, but you get the idea.]

To sum it all up, the classical statement energy conservation principle is equivalent to the quantum-mechanical statement that the temporal frequency f or ω, i.e. the time-rate of change of the phase of the wavefunction, does not change – as long as the conditions do not change with time, of course – but that the spatial frequency, i.e. the wave number k or the wavelength λ – changes as the potential energy and/or kinetic energy change.

Tunneling

The p₁²/(2m) − p₂²/(2m) = U₂– U₁equation may be re-written to illustrate the quantum-mechanical effect of tunneling, i.e. the penetration of a potential barrier. Indeed, we can re-write p₁²/(2m) − p₂²/(2m) = U₂– U₁as

p₂² = 2m·[p₁²/(2m) − (U₂– U₁)]

and, importantly, try to analyze what happens if U₂– U₁ is larger than p₁²/(2m), so we get a negative value for p₂². Just imagine that Φ₁ is zero again, and that our particle is positively charged, but that Φ₂is also positive (instead of negative, as in the example above), so our particle is being repelled. In practical terms, it means that our particle just doesn’t have enough energy to “climb the potential hill”. Quantum-mechanically, however, the amplitude is still given by that equation above, and we have a purely imaginary number for p₂, as the square root of a negative number is a purely imaginary number, just like √−4 = 2i. So let’s denote p₂ as i·p’ and let’s analyze what happens by breaking our a·eⁱ^θ^₂ function up in two separate parts by writing: a·e⁻ⁱ^θ^₂ = a·e⁻ⁱ^[^{(E₂/ħ)∙t − (i·p’/ħ)x]} = a·e⁻ⁱ^{(E₂/ħ)∙t}·e^{i²·p’·x/ħ} = a·e⁻ⁱ^{(E₂/ħ)∙t}·e^{−p’·x/ħ}.

Now, the e^{−p’·x/ħ} factor in our formula for a·e⁻ⁱ^θ^₂ is a real-valued exponential function, and it’s a decreasing function, with the same shape as the general e^−x function, which I depict below.

This e^{−p’·x/ħ} basically ‘kills’ our wavefunction as we move in the positive x-direction, past the potential barrier, which is what is illustrated below.

However, the story doesn’t finish here. We may imagine that the region with the prohibitive potential is rather small—like a few wavelengths only—and that, past that region, we’ve got another region where p₂² = 2m·[p₁²/(2m) − (U₂– U₁)] is not negative. That’s the situation that’s depicted below, which also shows what might happen: the amplitude decays exponentially, but does not reach zero and, hence, there is a possibility that a particle might make it through the barrier, and that it will be found on the other side, with a real-valued and positive momentum and, hence, with a regular wavefunction.

Feynman gives a very interesting example of this: alpha-decay. Alpha decay is a type of radioactive decay in which an atomic nucleus emits an α-particle (so that’s a helium nucleus, really), thereby transforming or ‘decaying’ into an atom with a reduced mass and atomic number. The Wikipedia article on it hais not bad, but Feynman’s explanation is more to the point, especially when you’ve understood all of the above. The graph below illustrates the basic idea as it shows the potential energy U of an α-particle as a function of the distance from the center. As Feynman puts it: “If one tried to shoot an α-particle with the energy E into the nucleus, it would feel an electrostatic repulsion from the nucleus and would, classically, get no closer than the distance r₁, where its total energy is equal to U. Closer in, however, the potential energy is much lower because of the strong attraction of the short-range nuclear forces. How is it then that in radioactive decay we find α-particles which started out inside the nucleus coming out with the energy E? Because they start out with the energy E inside the nucleus and “leak” through the potential barrier.”

As for the numbers involved, the mean life of an α-particle in the uranium nucleus is as long as 4.5 billion years, according to Feynman, whereas the oscillations inside the nucleus are in the range of 10²² cycles per second! So how can one get a number like 10⁹ years from 10⁻²² seconds? The answer, as Feynman notes, is that that exponential gives a factor of about e⁻⁴⁵. So that gives the very small but definite probability of leakage. Once the α-particle is in the nucleus, there is almost no amplitude at all for finding it outside. However, if you take many nuclei and wait long enough, you’ll find one. 🙂

Now, that should be it for today, but let me end this post with something I should have told you a while ago, but then I didn’t, because I thought it would distract you from the essentials. If you’ve read my previous post carefully, you’ll note that I wrote the wavefunction as Ψ(θ) = a·eⁱ^θ, rather as a·e⁻ⁱ^θ, with the minus sign in front of the complex exponent. So why is that?

There is a long and a short answer to that. I’ll give the short answer. You’ll remember that the phase of our wavefunction is like the hand of a stopwatch. Now we could imagine a stopwatch going counter-clockwise, and we could actually make one. Now, there is no arbitrariness here: it’s one way or the other, depending on our other conventions, and the phase of our complex-valued wavefunction does actually turn clockwise if we write things the way we’re writing them, rather than anti-clockwise. That’s a direction that’s actually not as per the usual mathematical convention: an angle in the unit circle is usually measured counter-clockwise. If you’d want it that way, we can fix easily by reversing the signs inside of the bracket, so we could write θ = k·x − ω·t, which is actually what you’ll often see. But so there’s only way to get it right: there’s a direction to it, and if we use the θ = ω·t − k·x, then we need the minus sign in the Ψ(θ) = a·e⁻ⁱ^θ equation.

It’s just one of those things that is easy to state, but actually gives us a lot of food for thought. Hence, I’ll probably come back to this one day. As for now, however, I think you’ve had enough. Or I’ve had enough, at least. 🙂 I hope this was not too difficult, and that you enjoyed it.

Amplitudes, wavefunctions and relativity – or the de Broglie equation re-visited

Original post:

My previous posts were rather technical and, hence, I thought I’d re-visit a topic on which I’ve written before – but represent it from another angle: the de Broglie equation. You know it by heart: it associates a wavelength (λ) with the momentum (p) of a particle: λ = h/p. It’s a simple relationship: the wavelength and the momentum are inversely proportional, and the constant of proportionality is Planck’s constant. It’s an equation you’ll find in all of the popular accounts of quantum mechanics. However, I am of the opinion that the equation may actually not help novices to understand what quantum mechanics is all about—at least not in an initial approach.

One barrier to a proper understanding is that the de Broglie relation is always being presented as the twin of the Planck-Einstein relation for photons, which relates the energy (E) of a photon to its frequency (n): E = h∙ν = ħ∙ω [i]. It’s only natural, then, to try to relate the two equations, as momentum and energy are obviously related one to anotyher. But how exactly? What energy concept should we use? Potential energy? Kinetic energy? Should we include the equivalent energy of the rest mass?

One quickly gets into trouble here. For example, one can try the kinetic energy, K.E. = m∙v²/2 and use the definition of momentum (p = m∙v) to write E = p²/(2m), and then relate the frequency ν to the wavelength λ using the general rule that the traveling speed of a wave is equal to the product of its wavelength and its frequency (v = λ∙ν). But if E = p²/(2m) and ν = v/λ, we get:

p²/(2m) = h∙v/λ ⇔ λ = 2∙h/p

So that is almost right, but not quite: that factor 2 should not be there. In fact, it’s easy to see that we’d get de Broglie’s equation if we’d use E = m∙v² rather than E = m∙v²/2. But E = m∙v²? How could we possibly justify the use of that formula? There’s something weird here—something deep, but I will probably die before I figure out exactly what. 🙂

Note: I should make a reference to the argument of the wavefunction here: E·t −p·x. [The argument of the wavefunction has a 1/ħ factor in front, but we assume we measure both E as well as p in units of ħ here.] So that’s an invariant quantity, i.e. it doesn’t change under a relativistic transformation of the reference frame. Now, if we measure time and distance in equivalent units, so c = 1, then we can show that E/p = 1/v. [Remember: if c would not be one, we’d write: E·β = p·c, with β = v/c, i.e. the relative velocity of our particle, as measured as a ratio of the speed of light.] As E·t − p·x, is an invariant quantity, it’s some constant that’s characteristic of the particle. But x and t change as the clock is tick. Well… Yes. But if we believe the particle is somehow real, and its velocity is v, then the ratio of the real position x and the time t should be equal to v = x/t. Hence, for these very special positions x, i.e. the real position of the particle, we can equate E·t −p·x to E·t −p·v·t = E·t −m·v·v·t = (E − m∙v²)·t. So there we have the m∙v² factor. There must be something very deep about it, but, as mentioned above, I will probably die before I figure out exactly what. 🙂

The second problem is the interpretation of λ. Of course, λ is just a length in space, which we can relate to the spatial frequency or wavenumber k = 2π/λ [ii]. And, of course, the frequency and the wavelength are, once again, related through the traveling speed of the wave: v = λ∙ν. But then, when you think about it, it’s actually not that simple: the wavefunction of a particle is much more complicated. For starters, you should think of it as a wave packet, or wavetrain, i.e. a composite wave: a sum of a potentially infinite number of elementary waves. So we do not have a simple periodic phenomenon here: we need to distinguish the so-called group velocity from the phase velocity, and we’re also talking a complex-valued wavefunction, so it’s all quite different from what we’re used to.

But back to the energy concept. We have Einstein’s E = m∙c² = m₀∙γ∙c² equation, of course, from which the relativistically correct momentum-energy relationship can be derived [iii]:

E_pis the energy of a particle with momentum p, and the relationship establishes a one-to-one relationship between the energy and the momentum of a particle [iv], with the rest mass (or rest energy) E₀² = m₀∙c² appearing as a constant (the rest mass does not depend on the reference frame – per definition). However, you can try this formula too, but it will not give you the de Broglie relation. In short, it doesn’t help us in terms of understanding what the de Broglie relation is all about.

So how did this young nobleman, back in 1924, as he was writing his PhD thesis, get this λ = h/p or – using the wavenumber – the k = p/ħ equation?

Well… The relativity theory had been around for quite a while and, amongst other things (including the momentum-energy relationship above), it had also established the invariance of the four-vector product p_μx_μ = E∙t – p∙x = p_μ‘x_μ‘ = E’∙t’ – p’∙x’.

Now, any regular sinusoidal wave is associated with a phase θ = ωt – k∙x, and then quantum theory had associated a complex-valued wavefunction Ψ(θ) = Ψ(ωt – k∙x) with a particle, and so the young count, Louis de Broglie, saw the mathematical similarity between the E∙t – p∙x and ωt – k∙x expressions, and then just took the bold step of substituting ω and k for E/ħ and p/ħ respectively in the Ψ(ωt – k∙x) function. That’s it really: as the laws of physics should look the same, regardless of our frame of reference, he realized the argument of the wavefunction needed to be some invariant quantity, and so that’s what he gets through this substitution.

Of course, the substitution makes sense: it has to. To show why, let’s consider the limiting situation of a particle with zero momentum – so it’s at rest, really – but assuming that the probability of finding it at some point in space, at some point of time is equally distributed over space and time. So we have the rather nonsensical but oft-used wavefunction:

Ψ(θ) = Ψ(x, t) = a·e^iθ = a·e^{i(ωt – k∙x)}

Taking the absolute square of this yields a constant probability equal to a²indeed. The equation is non-sensical or – to put it more politely – a limiting case only because it assumes perfect knowledge about the particle’s momentum p (we said it was zero, exactly) and, hence, about its energy. So we don’t need to worry about any Δ here: Δp = ΔE = 0 and E_p= E₀. As mentioned, we know that doesn’t make much sense, and that a particle in free space will actually be represented by a wave train with a group velocity v (i.e. the classical speed of the particle) and a phase velocity, and so that’s where the modeling of uncertainty comes in, but let’s just go along with the example now. [Note that, while p = 0, that does not imply that E₀ is equal to zero. Indeed, there’s energy in the rest mass and possibly potential energy too!]

Now, if we substitute ω and k for E/ħ and p/ħ respectively – note that we did away with the bold-face k and x, so we’ve reduced the analysis to one dimension (x) only – we get:

Ψ(θ) = Ψ(x, t) = a·e^iθ = a·e^{i(E₀t – p∙x)/ħ}= a·e^i(E₀/ħ)t

What this means is that the phase θ = (E₀/ħ)·t does not depend on x: it only varies in time. Hence, the diagram below – don’t look at the x’ and t’ right now: that’s another reference frame that we’ll introduce in a moment – shows equal-phase lines parallel to the x-axis and, because of the θ = (E₀/ħ)·t equation, they’re equally spaced in time, i.e. in the t-coordinate.

Of course, a particle at rest in one reference frame – let’s say S – will appear to be moving in another – which we’ll denote as S’, so we have the primed coordinates x’ and t’, which are related to the x and t by the Lorentz transformation rules. It’s easy to see that the points of equal phase have a different spacing along the t’-axis, so the frequency in time must be different. Indeed, we’ll write that frequency as ω’ = E_p‘/ħ in the S’ reference frame.

Likewise, we see that the phase now does vary in space, so the probability amplitude does vary in space now, as the particle’s momentum in the primed reference frame is no longer zero: if we write it as p’, then we can write that θ = (E₀/ħ)·t = (E_p‘/ħ)·t − (p/ħ)∙x = (E_p‘/ħ)·t − (p/ħ)∙x. Therefore, our wavefunction becomes:

Ψ(θ) = Ψ(x, t) = a·e^iθ = a·e^i(E₀/ħ)t= Ψ(x’, t’) = a·e^{i(E_p‘·t’ − p’∙x’)/ħ}

We could introduce yet another reference frame, and we’d get similar results. The point is: the k in our Ψ(θ) = Ψ(x, t) = a·e^iθ = a·e^{i(ωt – k∙x)}equation is, effectively, equal to k = p/ħ, and that identity holds in any reference frame.

Now, none of what I wrote above actually proves the de Broglie relation: it merely explains it. When everything is said and done, the de Broglie relation is a hypothesis, but it is an important one—and it does fit into the overall quantum-mechanical or wave-mechanical approach, that physicists take for granted nowadays.

So that’s it, really. I have nothing more to write but I should, perhaps, just remind you about what I said about a ‘particle wave’: we should look at it as some composite wave. Indeed, there will be uncertainty, and the uncertainty in E implies a frequency range Δω = Δ(E/ħ) = ΔE/ħ. Likewise, the momentum will be unknown, and so we’ll have a spread in the wavenumber k as well. We write: Δk = Δ(p/ħ) = Δp/ħ. We can try to reduce this uncertainty, but the Uncertainty Principle gives us the limits: the Δp·Δx and/or the ΔE·Δt products cannot be smaller than ħ/2.

So we’ll have a potentially infinite number of waves with slightly different values for ω and k, whose sum may be visualized as a complex-valued traveling wavetrain, like the lump below.

All of the component waves necessarily need to travel at the same speed, and this speed, which is the ratio of ω and k, will be equal to the so-called phase velocity of the wave (v_p), which we can calculate as v_p= ω/k = (E/ħ)/(p/ħ) = E/p = (m·c²)/(m·v) = c²/v. This speed is superluminal (c²/v = c/β, with β = v/c < 1), but that is not in contradiction with special relativity because the phase velocity carries no ‘signal’ or ‘information’. As for the group velocity (v_g), we can effectively see that this is equal to the classical velocity of our ‘particle’ by noting that:

v_g= dω/dk = d(E/ħ)/d(p/ħ) = dE/dp = d[(p²/(2m)]/dp = p/m = v.

You may think there’s some cheating here, as we equate E with the kinetic energy only. You’re right: the total energy should also include potential energy and rest energy, so E is a sum, but then rest energy and potential energy are treated as constants and, hence, it’s only the kinetic energy that matters when taking the derivative with respect to p, and so that’s why get the result we get, which makes perfect sense.

I wanted this to be a very short post, and so I will effectively end it here. I hope you enjoyed it. It actually sets the stage for a more interesting discussion, and that’s a discussion on how a change in potential energy effectively changes the phase and, hence, the amplitude. But so that’s for next time. I also need to devote a separate post on a discussion of the wave-mechanical framework in general, with a particular focus on the math behind. So… Well… Yes, the next posts are likely to be somewhat more technical again.

[i] I should make a note on notations here, and also insert some definitions. I will denote a frequency by ν (nu), rather than by f, so as to not cause confusion with any function f. A frequency is expressed in cycles per second, while the angular frequency ω is expressed in radians per second. One cycle covers 2π radians and, therefore, we can write: ν = ω/2π. Hence, h∙ν = h∙ω/2π = ħ∙ω. Both ν as well as ω measure the time-rate of change of the phase, as opposed to k, i.e. the spatial frequency of the wave, which depends on the speed of wave. I will also use the symbol v for the speed of a wave, although that is hugely confusing, because I will also use it to denote the classical velocity of the particle. However, I find the conventional use of the symbol of c even more confusing, because this symbol is also used for the speed of light, and the speed of a wave is not necessarily equal to the speed of light. In fact, both the group as well as the phase velocity of a particle wave are very different from the speed of light. The speed of a wave and the speed of light only coincide for electromagnetic waves and, even then, it should be noted that photons also have amplitudes to travel faster or slower than the speed of light.

[ii] Note that the de Broglie relation can be re-written as k = p/ħ, with ħ = h/2π. Indeed, λ = 2π/k and, hence, we get: λ = h/p ⇔ 2π/k = h/p ⇔ p = ħ∙k ⇔ k = p/ħ.

[iii] One gets the equation by equating m to m = m₀γ (γ is the Lorentz factor) in the E² = (mc²)² equation, and re-arranging.

[iv] Note that this energy formula does not include any potential energy: it is (equivalent) rest mass plus kinetic energy only.