# Amplitudes and statistics

When re-reading Feynman’s ‘explanation’ of Bose-Einstein versus Fermi-Dirac statistics (Lectures, Vol. III, Chapter 4), and my own March 2014 post summarizing his argument, I suddenly felt his approach raises as many questions as it answers. So I thought it would be good to re-visit it, which is what I’ll do here. Before you continue reading, however, I should warn you: I am not sure I’ll manage to do a better job now, as compared to a few months ago. But let me give it a try.

Setting up the experiment

The (thought) experiment is simple enough: what’s being analyzed is the (theoretical) behavior of two particles, referred to as particle a and particle b respectively that are being scattered into  two detectors, referred to as 1 and 2. That can happen in two ways, as depicted below: situation (a) and situation (b). [And, yes, it’s a bit confusing to use the same letters a and b here, but just note the brackets and you’ll be fine.] It’s an elastic scattering and it’s seen in the center-of-mass reference frame in order to ensure we can analyze it using just one variable, θ, for the angle of incidence. So there is no interaction between those two particles in a quantum-mechanical sense: there is no exchange of spin (spin flipping) nor is there any exchange of energy–like in Compton scattering, in which a photon gives some of its energy to an electron, resulting in a Compton shift (i.e. the wavelength of the scattered photon is different than that of the incoming photon). No, it’s just what it is: two particles deflecting each other. […] Well… Maybe. Let’s fully develop the argument to see what’s going on.

First, the analysis is done for two non-identical particles, say an alpha particle (i.e. a helium nucleus) and then some other nucleus (e.g. oxygen, carbon, beryllium,…). Because of the elasticity of the ‘collision’, the possible outcomes of the experiment are binary: if particle a gets into detector 1, it means particle b will be picked up by detector 2, and vice versa. The first situation (particle a gets into detector 1 and particle b goes into detector 2) is depicted in (a), i.e. the illustration on the left above, while the opposite situation, exchanging the role of the particles, is depicted in (b), i.e. the illustration on the right-hand side. So these two ‘ways’ are two different possibilities which are distinguishable not only in principle but also in practice, for non-identical particles that is (just imagine a detector which can distinguish helium from oxygen, or whatever other substance the other particle is). Therefore, strictly following the rules of quantum mechanics, we should add the probabilities of both events to arrive at the total probability of some particle (and with ‘some’, I mean particle a or particle b) ending up in some detector (again, with ‘some’ detector, I mean detector 1 or detector 2).

Now, this is where Feynman’s explanation becomes somewhat tricky. The whole event (i.e. some particle ending up in some detector) is being reduced to two mutually exclusive possibilities that are both being described by the same (complex-valued) wave function f, which has that angle of incidence as its argument. To be precise: the angle of incidence is θ for the first possibility and it’s π–θ for the second possibility. That being said, it is obvious, even if Feynman doesn’t mention it, that both possibilities actually represent a combination of two separate things themselves:

1. For situation (a), we have particle a going to detector 1 and particle b going to detector 2. Using Dirac’s so-called bra-ket notation, we should write 〈1|a〉〈2|b〉 = f(θ), with f(θ) a probability amplitude, which should yield a probability when taking its absolute square: P(θ) = |f(θ)|2.
2. For situation (b), we have particle b going to detector 1 and particle a going to 2, so we have 〈1|b〉〈2|a〉, which Feynman equates with f(π–θ), so we write 〈1|b〉〈2|a〉 = 〈2|a〉〈1|b〉 = f(π–θ).

Now, Feynman doesn’t dwell on this–not at all, really–but this casual assumption–i.e. the assumption that situation (b) can be represented by using the same wave function f–merits some more reflection. As said, Feynman is very brief on it: he just says situation (b) is the same situation as (a), but then detector 1 and detector 2 being switched (so we exchange the role of the detectors, I’d say). Hence, the relevant angle is π–θ and, of course, it’s a center-of-mass view again so if a goes to 2, then b has to go to 1. There’s no Third Way here. In short, a priori it would seem to be very obvious indeed to associate only one wave function (i.e. that (complex-valued) f(θ) function) with the two possibilities: that wave function f yields a probability amplitude for θ and, hence, it should also yield some (other) probability amplitude for π–θ, i.e. for the ‘other’ angle. So we have two probability amplitudes but one wave function only.

You’ll say: Of course! What’s the problem? Why are you being fussy? Well… I think these assumptions about f(θ) and f(π–θ) representing the underlying probability amplitudes are all nice and fine (and, yes, they are very reasonable indeed), but I also think we should take them for what they are at this moment: assumptions.

Huh? Yes. At this point, I would like to draw your attention to the fact that the only thing we can measure are real-valued possibilities. Indeed, when we do this experiment like a zillion times, it will give us some real number P for the probability that a goes to 1 and b goes to 2 (let me denote this number as P(θ) = Pa→1 and b→2), and then, when we change the angle of incidence by switching detector 1 and 2, it will also give us some (other) real number for the probability that a goes to 2 and b goes to 1 (i.e. a number which we can denote as P(π–θ) = Pa→2 and b→1). Now, while it would seem to be very reasonable that the underlying probability amplitudes are the same, we should be honest with ourselves and admit that the probability amplitudes are something we cannot directly measure.

At this point, let me quickly say something about Dirac’s bra-ket notation, just in case you haven’t heard about it yet. As Feynman notes, we have to get away from thinking too much in terms of wave functions traveling through space because, in quantum mechanics, all sort of stuff can happen (e.g. spin flipping) and not all of it can be analyzed in terms of interfering probability amplitudes. Hence, it’s often more useful to think in terms of a system being in some state and then transitioning to some other state, and that’s why that bra-ket notation is so helpful. We have to read these bra-kets from right to left: the part on the right, e.g. |a〉, is the ket and, in this case, that ket just says that we’re looking at some particle referred to as particle a, while the part on the left, i.e. 〈1|, is the bra, i.e. a shorthand for particle a having arrived at detector 1. If we’d want to be complete, we should write:

〈1|a〉 = 〈particle a arrives at detector 1|particle a leaves its source〉

Note that 〈1|a〉 is some complex-valued number (i.e. a probability amplitude) and so we multiply it here with some other complex number, 〈2|b〉, because it’s two things happening together. As said, don’t worry too much about it. Strictly speaking, we don’t need wave functions and/or probability amplitudes to analyze this situation because there is no interaction in the quantum-mechanical sense: we’ve got a scattering process indeed (implying some randomness in where those particles end up, as opposed to what we’d have in a classical analysis of two billiard balls colliding), but we do not have any interference between wave functions (probability amplitudes) here. We’re just introducing the wave function f because we want to illustrate the difference between this situation (i.e. the scattering of non-identical particles) and what we’d have if we’d be looking at identical particles being scattered.

At this point, I should also note that this bra-ket notation is more in line with Feynman’s own so-called path integral formulation of quantum mechanics, which is actually implicit in his line of argument: rather than thinking about the wave function as representing the (complex) amplitude of some particle to be at point x in space at point t in time, we think about the amplitude as something that’s associated with a path, i.e. one of the possible itineraries from the source (its origin) to the detector (its destination). That explains why this f(θ) function doesn’t mention the position (x) and space (t) variables. What x and t variables would we use anyway? Well… I don’t know. It’s true the position of the detectors is fully determined by θ, so we don’t need to associate any x or t with them. Hence, if we’d be thinking about the space-time variables, then we should be talking the position in space and time of both particle a and particle b. Indeed, it’s easy to see that only a slight change in the horizontal (x) or vertical position (y) of either particle would ensure that both particles do not end up in the detectors. However, as mentioned above, Feynman doesn’t even mention this. Hence, we must assume that any randomness in any x or t variable is captured by that wave function f, which explains why this is actually not a classical analysis: so, in short, we do not have two billiard balls colliding here.

Hmm… You’ll say I am a nitpicker. You’ll say that, of course, any uncertainty is indeed being incorporated in the fact that we represent what’s going on by a wave function f which we cannot observe directly but whose absolute square represents a probability (or, to use precise statistical terminology, a probability density), which we can measure: P = |f(θ)|2 = f(θ)·f*(θ), with f* the complex conjugate of the complex number f. So… […] What? Well… Nothing. You’re right. This thought experiment describes a classical situation (like two billiard balls colliding) and then it doesn’t, because we cannot predict the outcome (i.e. we can’t say where the two billiard balls are going to end up: we can only describe the likely outcome in terms of probabilities Pa→1 and b→2 = |f(θ)|and Pa→2 and b→1 = |f(π–θ)|2. Of course, needless to say, the normalization condition should apply: if we add all probabilities over all angles, then we should get 1, we can write: ∫|f(θ)|2dθ = ∫f(θ)·f*(θ)dθ = 1. So that’s it, then?

No. Let this sink in for a while. I’ll come back to it. Let me first make a bit of a detour to illustrate what this thought experiment is supposed to yield, and that’s a more intuitive explanation of Bose-Einstein statistics and Fermi-Dirac statistics, which we’ll get out of the experiment above if we repeat it using identical particles. So we’ll introduce the terms Bose-Einstein statistics and Fermi-Dirac statistics. Hence, there should also be some term for the reference situation described above, i.e a situation in which we non-identical particles are ‘interacting’, so to say, but then with no interference between their wave functions. So, when everything is said and done, it’s a term we should associate with classical mechanics. It’s called Maxwell-Boltzmann statistics.

Huh? Why would we need ‘statistics’ here? Well… We can imagine many particles engaging like this–just colliding elastically and, thereby, interacting in a classical sense, even if we don’t know where exactly they’re going to end up, because of uncertainties in initial positions and what have you. In fact, you already know what this is about: it’s the behavior of particles as described by the kinetic theory of gases (often referred to as statististical mechanics) which, among other things, yields a very elegant function for the distribution of the velocities of gas molecules, as shown below for various gases (helium, neon, argon and xenon) at one specific temperature (25º C), i.e. the graph on the left-hand side, or for the same gas (oxygen) at different temperatures (–100º C, 20º C and 600º C), i.e. the graph on the right-hand side.

Now, all these density functions and what have you are, indeed, referred to as Maxwell-Boltzmann statistics, by physicists and mathematicians that is (you know they always need some special term in order to make sure other people (i.e. people like you and me, I guess) have trouble understanding them).

In fact, we get the same density function for other properties of the molecules, such as their momentum and their total energy. It’s worth elaborating on this, I think, because I’ll later compare with Bose-Einstein and Fermi-Dirac statistics.

Maxwell-Boltzmann statistics

Kinetic gas theory yields a very simple and beautiful theorem. It’s the following: in a gas that’s in thermal equilibrium (or just in equilibrium, if you want), the probability (P) of finding a molecule with energy E is proportional to e–E/kT, so we have:

P ∝ e–E/kT

Now that’s a simple function, you may think. If we treat E as just a continuous variable, and T as some constant indeed – hence, if we just treat (the probability) P as a function of (the energy) E – then we get a function like the one below (with the blue, red and green using three different values for T).

So how do we relate that to the nice bell-shaped curves above? The very simple graphs above seem to indicate the probability is greatest for E = 0, and then just goes down, instead of going up initially to reach some maximum around some average value and then drop down again. Well… The fallacy here, of course, is that the constant of proportionality is itself dependent on the temperature. To be precise, the probability density function for velocities is given by:

The function for energy is similar. To be precise, we have the following function:

This (and the velocity function too) is a so-called chi-squared distribution, and ϵ is the energy per degree of freedom in the system. Now these functions will give you such nice bell-shaped curves, and so all is alright. In any case, don’t worry too much about it. I have to get back to that story of the two particles and the two detectors.

However, before I do so, let me jot down two (or three) more formulas. The first one is the formula for the expected number 〈Ni〉 of particles occupying energy level ε(and the brackets here, 〈Ni〉, have nothing to do with the bra-ket notation mentioned above: it’s just a general notation for some expected value):

This formula has the same shape as the ones above but we brought the exponential function down, into the denominator, so the minus sign disappears. And then we also simplified it by introducing that gi factor, which I won’t explain here, because the only reason why I wanted to jot this down is to allow you to compare this formula with the equivalent formula when (a) Fermi-Dirac and (b) Bose-Einstein statistics apply:

Do you see the difference? The only change in the formula is the ±1 term in the denominator: we have a minus one (–1) for Fermi-Dirac statistics and a plus one (+1) for Bose-Einstein statistics indeed. That’s all. That’s the difference with Maxwell-Boltzmann statistics.

Huh? Yes. Think about it, but don’t worry too much. Just make a mental note of it, as it will be handy when you’d be exploring related articles. [And, of course, please don’t think I am bagatellizing the difference between Maxwell-Boltzmann, Bose-Einstein and Fermi-Dirac statistics here: that ±1 term in the denominator is, obviously, a very important difference, as evidenced by the consequences of formulas like the one above: just think about the crowding-in effect in lasers as opposed to the Pauli exclusion principle, for example. :-)]

Setting up the experiment (continued)

Let’s get back to our experiment. As mentioned above, we don’t really need probability amplitudes in the classical world: ordinary probabilities, taking into account uncertainties about initial conditions only, will do. Indeed, there’s a limit to the precision with which we can measure the position in space and time of any particle in the classical world as well and, hence, we’d expect some randomness (as captured in the scattering phenomenon) but, as mentioned above, ordinary probabilities would do to capture that. Nevertheless, we did associate probability amplitudes with the events described above in order to illustrate the difference with the quantum-mechanical world. More specifically, we distinguished:

1. Situation (a): particle a goes to detector 1 and b goes to 2, versus
2. Situation (b): particle a goes to 2 and b goes to 1.

In our bra-ket notation:

1. 〈1|a〉〈2|b〉 = f(θ), and
2. 〈1|b〉〈2|a〉 = f(π–θ).

The f(θ) function is a quantum-mechanical wave function. As mentioned above, while we’d expect to see some space (x) and time (t) variables in it, these are, apparently, already captured by the θ variable. What about f(π–θ)? Well… As mentioned above also, that’s just the same function as f(θ) but using the angle π–θ as the argument. So, the following remark is probably too trivial to note but let me do it anyway (to make sure you understand what we’re modeling here really): while it’s the same function f, the values f(θ) and f(π–θ) are, of course, not necessarily equal and, hence, the corresponding probabilities are also not necessarily the same. Indeed, some angles of scattering may be more likely than others. However, note that we assume that the function f itself is  exactly the same for the two situations (a) and (b), as evidenced by that normalization condition we assume to be respected: if we add all probabilities over all angles, then we should get 1, so ∫|f(θ)|2dθ = ∫f(θ)·f*(θ)dθ = 1.

So far so good, you’ll say. However, let me ask the same critical question once again: why would we use the same wave function f for the second situation?

Huh? You’ll say: why wouldn’t we? Well… Think about it. Again, how do we find that f(θ) function? The assumption here is that we just do the experiment a zillion times while varying the angle θ and, hence, that we’ll find some average corresponding to P(θ), i.e. the probability. Now, the next step then is to equate that average value to |f(θ)|obviously, because we have this quantum-mechanical theory saying probabilities are the absolute square of probability amplitudes. And,  so… Well… Yes. We then just take the square root of the P function to find the f(θ) function, isn’t it?

Well… No. That’s where Feynman is not very accurate when it comes to spelling out all of the assumptions underpinning this thought experiment. We should obviously watch out here, as there’s all kinds of complications when you do something like that. To a large extent (perhaps all of it), the complications are mathematical only.

First, note that any number (real or complex, but note that |f(θ)|2 is a real number) has two distinct real square roots: a positive and a negative one: x = ± √x2. Secondly, we should also note that, if f(θ) is a regular complex-valued wave function of x and t and θ (and with ‘regular’, we mean, of course, that’s it’s some solution to a Schrödinger (or Schrödinger-like) equation), then we can multiply it with some random factor shifting its phase Θ (usually written as Θ = kx–ωt+α) and the square of its absolute value (i.e. its squared norm) will still yield the same value. In mathematical terms, such factor is just a complex number with a modulus (or length or norm–whatever terminology you prefer) equal to one, which we can write as a complex exponential: eiα, for example. So we should note that, from a mathematical point of view, any function eiαf(θ) will yield the same probabilities as f(θ). Indeed,

|f(θ)|= |eiαf(θ)|= (|eiα||f(θ)|)= |eiα|2|f(θ)|= 12|f(θ)|2

Likewise, while we assume that this function f(π–θ) is the same function f as that f(θ) function, from a mathematical point of view, the function eiβf(π–θ) would do just as well, because its absolute square yields the very same (real) probability |f(π–θ)|2. So the question as to what wave function we should take for the probability amplitude is not as easy to answer as you may think. Huh? So what function should we take then? Well… We don’t know. Fortunately, it doesn’t matter, for non-identical particles that is. Indeed, when analyzing the scattering of non-identical particles, we’re interested in the probabilities only and we can calculate the total probability of particle a ending up in detector 1 or 2 (and, hence, particle b ending up in detector 2 or 1) as the following sum:

|eiαf(θ)|2 +|eiβf(π–θ)|= |f(θ)|2 +|f(π–θ)|2.

In other words, for non-identical particles, these phase factors (eiα or eiβ) don’t matter and we can just forget about them.

However, and that’s the crux of the matter really, we should mention them, of course, in case we’d have to add the probability amplitudeswhich is exactly what we’ll have to do when we’re looking at identical particles, of course. In fact, in that case (i.e. when these phase factors eiα and eiβ will actually matter), you should note that what matters really is the phase difference, so we could replace α and β with some δ (which is what we’ll do below).

However, let’s not put the cart before the horse and conclude our analysis of what’s going on when we’re considering non-identical parties: in that case, this phase difference doesn’t matter. And the remark about the positive and negative square root doesn’t matter either. In fact, if you want, you can subsume it under the phase difference story by writing eiα as eiα = ± 1. To be more explicit: we could say that –f(θ) is the probability amplitude, as |–f(θ)|is also equal to that very same real number |f(θ)|2. OK. Done.

Bose-Einstein and Fermi-Dirac statistics

As I mentioned above, the story becomes an entirely different one when we’re doing the same experiment with identical particles. At this point, Feynman’s argument becomes rather fuzzy and, in my humble opinion, that’s because he refused to be very explicit about all of those implicit assumptions I mentioned above. What I can make of it, is the following:

1. We know that we’ll have to add probability amplitudes, instead of probabilities, because we’re talking one event that can happen in two indistinguishable ways. Indeed, for non-identical particles, we can, in principle (and in practice) distinguish situation (a) and (b) – and so that’s why we only have to add some real-valued numbers representing probabilities – but so we cannot do do that for identical particles.

2. Situation (a) is still being described by some probability amplitude f(θ). We don’t know what function exactly, but we assume there is some unique wave function f(θ) out there that accurately describes the probability amplitude of particle a going to 1 (and, hence, particle b going to 2), even if we can’t tell which is a and which is b. What about the phase factor? Well… We just assume we’ve chosen our t such that α = 0. In short, the assumption is that situation (a) is represented by some probability amplitude (or wave function, if you prefer that term) f(θ).

3. However, a (or some) particle (i.e. particle a or particle b) ending up in a (some) detector (i.e. detector 1 or detector 2) may come about in two ways that cannot be distinguished one from the other. One is the way described above, by that wave function f(θ). The other way is by exchanging the role of the two particles. Now, it would seem logical to associate the amplitude f(π–θ) with the second way. But we’re in the quantum-mechanical world now. There’s uncertainty, in position, in momentum, in energy, in time, whatever. So we can’t be sure about the phase. That being said, the wave function will still have the same functional form, we must assume, as it should yield the same probability when squaring. To account for that, we will allow for a phase factor, and we know it will be important when adding the amplitudes. So, while the probability for the second way (i.e. the square of its absolute value) should be the same, its probability amplitude does not necessarily have to be the same: we have to allow for positive and negative roots or, more generally, a possible phase shift. Hence, we’ll write the probability amplitude as eiδf(π–θ) for the second way. [Why do I use δ instead of β? Well… Again: note that it’s the phase difference that matters. From a mathematical point of view, it’s the same as inserting an eiβ factor: δ can take on any value.]

4. Now it’s time for the Big Trick. Nature doesn’t matter about our labeling of particles. If we have to multiply the wave function (i.e. f(π–θ), or f(θ)–it’s the same: we’re talking a complex-valued function of some variable (i.e. the angle θ) here) with a phase factor eiδ when exchanging the roles of the particles (or, what amounts to the same, exchanging the role of the detectors), we should get back to our point of departure (i.e. no exchange of particles, or detectors) when doing that two times in a row, isn’t it? So we exchange the role of particle a and b in this analysis (or the role of the detectors), and then we’d exchange their roles once again, then there’s no exchange of roles really and we’re back at the original situation. So we must have eiδeiδf(θ) = f(θ) (and eiδeiδf(π–θ) = f(π–θ) of course, which is exactly the same statement from a mathematical point of view).

5. However, that means (eiδ)= +1, which, in turn, implies that eiδ is plus or minus one: eiδ = ± 1. So that means the phase difference δ must be equal to 0 or π (or –π, which is the same as +π).

In practical terms, that means we have two ways of combining probability amplitudes for identical particles: we either add them or, else, we subtract them. Both cases exist in reality, and lead to the dichotomy between Bose and Fermi particles:

1. For Bose particles, we find the total probability amplitude for this scattering event by adding the two individual amplitudes: f(θ) + f(π–θ).
2. For Fermi particles, we find the total probability amplitude for this scattering event by subtracting the two individual amplitudes: f(θ) – f(π–θ).

As compared to the probability for non-identical particles which, you’ll remember, was equal to |f(θ)|2 +|f(π–θ)|2, we have the following Bose-Einstein and Fermi-Dirac statistics:

1. For Bose particles: the combined probability is equal to |f(θ) + f(π–θ)|2. For example, if θ is 90°, then we have a scattering probability that is exactly twice the probability for non-identical particles. Indeed, if θ is 90°, then f(θ) = f(π–θ), and then we have |f(π/2) + f(π/2)|2 = |2f(π/2)|2 = 4|f(π/2)|2. Now, that’s two times |f(π/2)|2 +|f(π/2)|2 = 2|f(π/2)|2 indeed.
2. For Fermi particles (e.g. electrons), we have a combined probability equal to |f(θ) – f(π–θ)|2. Again, if θ is 90°, f(θ) = f(π–θ), and so it would mean that we have a combined probability which is equal to zero ! Now, that‘s a strange result, isn’t it? It is. Fortunately, the strange result has to be modified because electrons will also have spin and, hence, in half of the cases, the two electrons will actually not be identical but have opposite spin. That changes the analysis substantially (see Feynman’s Lectures, III-3-12). To be precise, if we take the spin factor into, we’ll find a total probability (for θ = 90°) equal to |f(π/2)|2, so that’s half of the probability for non-identical particles.

Hmm… You’ll say: Now that was a complicated story! I fully agree. Frankly, I must admit I feel like I still don’t quite ‘get‘ the story with that phase shift eiδ, in an intuitive way that is (and so that’s the reason for going through the trouble of writing out this post). While I think it makes somewhat more sense now (I mean, more than when I wrote a post on this in March), I still feel I’ve only brought some of the implicit assumptions to the fore. In essence, what we’ve got here is a mathematical dichotomy (or a mathematical possibility if you want) corresponding to what turns out to be an actual dichotomy in Nature: in quantum-mechanics, particles are either bosons or fermions. There is no Third Way, in quantum-mechanics that is (there is a Third Way in reality, of course: that’s the classical world!).

I guess it will become more obvious as I’ll get somewhat more acquainted with the real arithmetic involved in quantum-mechanical calculations over the coming weeks. In short, I’ve analyzed this thing over and over again, but it’s still not quite clear me. I guess I should just move on and accept that:

1. This explanation ‘explains’ the experimental evidence, and that’s different probabilities for identical particles as compared to non-identical particles.
2. This explanation ‘complements’ analyses such as that 1916 analysis of blackbody radiation by Einstein (see my post on that), which approaches interference from an angle that’s somewhat more intuitive.

A numerical example

I’ve learned that, when some theoretical piece feels hard to read, an old-fashioned numerical example often helps. So let’s try one here. We can experiment with many functional forms but let’s keep things simple. From the illustration (which I copy below for your convenience), that angle θ can take any value between −π and +π, so you shouldn’t think detector 1 can only be ‘north’ of the collision spot: it can be anywhere.

Now, it may or may not make sense (and please work out other examples than this one here), but let’s assume particle a and b are more likely to go in a line that’s more or less straight. In other words, the assumption is that both particles deflect each other only slightly, or even not at all. After all, we’re talking ‘point-like’ particles here and so, even when we try hard, it’s hard to make them collide really.

That would amount to a typical bell-shaped curve for that probability density curve P(θ): one like the blue curve below. That one shows that the probability of particle a and b just bouncing back (i.e. θ ≈ ±π) is (close to) zero, while it’s highest for θ ≈ 0, and some intermediate value for anything angle in-between. The red curve shows P(π–θ), which can be found by mirroring the P(θ) around the vertical axis, which yields the same function because the function is symmetrical: P(θ) = P(–θ), and then shifting it by adding the vertical distance π. It should: it’s the second possibility, remember? Particle a ending up in detector 2. But detector 2 is positioned at the angle π–θ and, hence, if π–θ is close to ±π (so if θ ≈ 0), that means particle 1 is basically bouncing back also, which we said is unlikely. On the other hand, if detector 2 is positioned at an angle π–θ ≈ 0, then we have the highest probability of particle a going right to it. In short, the red curve makes sense too, I would think. [But do think about yourself: you’re the ultimate judge!]

The harder question, of course, concerns the choice of some wave function f(θ) to match those P curves above. Remember that these probability densities P are real numbers and any real number is the absolute square (aka the squared norm) of an infinite number of complex numbers! So we’ve got l’embarras du choix, as they say in French. So… What do to? Well… Let’s keep things simple and stupid and choose a real-valued wave function f(θ), such as the blue function below. Huh? You’ll wonder if that’s legitimate. Frankly, I am not 100% sure, but why not? The blue f(θ) function will give you the blue P(θ) above, so why not go along with it? It’s based on a cosine function but it’s only half of a full cycle. Why? Not sure. I am just trying to match some sinusoidal function with the probability density function here, so… Well… Let’s take the next step.

The red graph above is the associated f(π–θ) function. Could we choose another one? No. There’s no freedom of choice here, I am afraid: if we choose a functional form for f(θ), then our f(π–θ) function is fixed too. So it is what it is: negative between –π and 0, and positive between 0 and +π and 0. Now that is definitely not good, because f(π–θ) for θ = –π is not equal to f(π–θ) for θ = +π: they’re opposite values. That’s nonsensical, isn’t it? Both the f(θ) and the f(π–θ) should be something cyclical… But, again, let’s go along with it as for now: note that the green horizontal line is the sum of the squared (absolute) values of f(θ) and f(π–θ), and note that it’s some constant.

Now, that’s a funny result, because I assumed both particles were more likely to go in some straight line, rather than recoil with some sharp angle θ. It again indicates I must be doing something wrong here. However, the important thing for me here is to compare with the Bose-Einstein and Fermi-Dirac statistics. What’s the total probability there if we take that blue f(θ) function? Well… That’s what’s shown below. The horizontal blue line is the same as the green line in the graph above: a constant probability for some particle (a or b) ending up in some detector (1 or 2). Note that the surface, when added, of the two rectangles above the x-axis (i.e. the θ-axis) should add up to 1. The red graph gives the probability when the experiment is carried out for (identical) bosons (or Bose particles as I like to call them). It’s weird: it makes sense from a mathematical point of view (the surface under the curve adds up to the same surface under the blue line, so it adds up to 1) but, from a physics point of view, what does this mean? A maximum at θ = π/2 and a minimum at θ = –π/2? Likewise, how to interpret the result for fermions?

Is this OK? Well… To some extent, I guess. It surely matches the theoretical results I mentioned above: we have twice the probability for bosons for θ = 90° (red curve), and a probability equal to zero for the same angle when we’re talking fermions (green curve). Still, this numerical example triggers more questions than it answers. Indeed, my starting hypothesis was very symmetrical: both particle a and b are likely to go in a straight line, rather than being deflected in some sharp(er) angle. Now, while that hypothesis gave a somewhat unusual but still understandable probability density function in the classical world (for non-identical particles, we got a constant for P(θ) + P(π–θ)), we get this weird asymmetry in the quantum-mechanical world: we’re much more likely to catch boson in a detector ‘north’ of the line of firing than ‘south’ of it, and vice versa for fermions.

That’s weird, to say the least. So let’s go back to the drawing board and take another function for f(θ) and, hence, for f(π–θ). This time, the two graphs below assume that (i) f(θ) and f(π–θ) have a real as well as an imaginary part and (ii) that they go through a full cycle, instead of a half-cycle only. This is done by equating the real part of the two functions with cos(θ) and cos(π–θ) respectively, and their imaginary part with sin(θ) and sin(π–θ) respectively. [Note that we conveniently forget about the normalization condition here.]

What do we see? Well… The imaginary part of f(θ) and f(π–θ) is the same, because sin(π–θ) = sin(θ). We also see that the real part of f(θ) and f(π–θ) are the same except for a phase difference equal to π: cos(π–θ) = cos[–(θ–π)] = cos(θ–π). More importantly, we see that the absolute square of both f(θ) and f(π–θ) yields the same constant, and so their sum P = |f(θ)|2 +|f(π–θ)|= 2|f(θ)|2 = 2|f(π–θ)|= 2P(θ) = 2P(π–θ). So that’s another constant. That’s actually OK because, this time, I did not favor one angle over the other (so I did not assume both particles were more likely to go in some straight line rather than recoil).

Now, how does this compare to Bose-Einstein and Fermi-Dirac statistics? That’s shown below. For Bose-Einstein (left-hand side), the sum of the real parts of f(θ) and f(π–θ) yields zero (blue line), while the sum of their imaginary parts (i.e. the red graph) yields a sine-like function but it has double the amplitude of sin(θ). That’s logical: sin(θ) + sin(π–θ) = 2sin(θ). The green curve is the more interesting one, because that’s the total probability we’re looking for. It has two maxima now, at +π/2 and at –π/2. That’s good, as it does away with that ‘weird asymmetry’ we got when we used a ‘half-cycle’ f(θ) function.

Likewise, the Fermi-Dirac probability density function looks good as well (right-hand side). We have the imaginary parts of f(θ) and f(π–θ) that ‘add’ to zero: sin(θ) – sin(π–θ) = 0 (I put ‘add’ between brackets because, with Fermi-Dirac, we’re subtracting of course), while the real parts ‘add’ up to a double cosine function: cos(θ) – cos(π–θ) = cos(θ) – [–cos(θ)] = 2cos(θ). We now get a minimum at +π/2 and at –π/2, which is also in line with the general result we’d expect. The (final) graph below summarizes our findings. It gives the three ‘types’ of probabilities, i.e. the probability of finding some particle in some detector as a function of the angle –π < θ < +π using:

1. Maxwell-Boltzmann statistics: that’s the green constant (non-identical particles, and probability does not vary with the angle θ).
2. Bose-Einstein: that’s the blue graph below. It has two maxima, at +π/2 and at –π/2, and two minima, at 0 and at ±π (+π and –π are the same angle obviously), with the maxima equal to twice the value we get under Maxwell-Boltzmann statistics.
3. Finally, the red graph gives the Fermi-Dirac probabilities. Also two maxima and minima, but at different places: the maxima are at θ = 0 and  θ = ±π, while the minima are at at +π/2 and at –π/2.

Funny, isn’t it? These probability density functions are all well-behaved, in the sense that they add up to the same total (which should be 1 when applying the normalization condition). Indeed, the surfaces under the green, blue and red lines are obviously the same. But so we get these weird fluctuations for Bose-Einstein and Fermi-Dirac statistics, favoring two specific angles over all others, while there’s no such favoritism when the experiment involves non-identical particles. This, of course, just follows from our assumption concerning f(θ). What if we double the frequency of f(θ), i.e. from one cycle to two cycles between –π and +π? Well… Just try it: take f(θ) = cos(2·θ) + isin(2·θ) and do the calculations. You should get the following probability graphs: we have the same green line for non-identical particles, but interference with four maxima (and four minima) for the Bose-Einstein and Fermi-Dirac probabilities.

Again… Funny, isn’t it? So… What to make of this? Frankly, I don’t know. But one last graph makes for an interesting observation: if the angular frequency of f(θ) takes on larger and larger values, the Bose-Einstein and Fermi-Dirac probability density functions also start oscillating wildly. For example, the graphs below are based on a f(θ) function equal to f(θ) = cos(25·θ) + isin(25·θ). The explosion of color hurts the eye, doesn’t it? 🙂 But, apart from that, do you now see why physicists say that, at high frequencies, the interference pattern gets smeared out? Indeed, if we move the detector just a little bit (i.e. we change the angle θ just a little bit) in the example below, we hit a maximum instead of a minimum, and vice versa. In short, the granularity may be such that we can only measure that green line, in which case we’d think we’re dealing with Maxwell-Boltzmann statistics, while the underlying reality may be different.

That explains another quote in Feynman’s famous introduction to quantum mechanics (Lectures, Vol. III, Chapter 1): “If the motion of all matter—as well as electrons—must be described in terms of waves, what about the bullets in our first experiment? Why didn’t we see an interference pattern there? It turns out that for the bullets the wavelengths were so tiny that the interference patterns became very fine. So fine, in fact, that with any detector of finite size one could not distinguish the separate maxima and minima. What we saw was only a kind of average, which is the classical curve. In the Figure below, we have tried to indicate schematically what happens with large-scale objects. Part (a) of the figure shows the probability distribution one might predict for bullets, using quantum mechanics. The rapid wiggles are supposed to represent the interference pattern one gets for waves of very short wavelength. Any physical detector, however, straddles several wiggles of the probability curve, so that the measurements show the smooth curve drawn in part (b) of the figure.”

But that should really conclude this post. It has become way too long already. One final remark, though: the ‘smearing out’ effect also explains why those three equations for 〈Ni〉 sometimes do amount to more or less the same thing: the Bose-Einstein and Fermi-Dirac formulas may approximate the Maxwell-Boltzmann equation. In that case, the ±1 term in the denominator does not make much of a difference. As we said a couple of times already, it all depends on scale. 🙂

Concluding remarks

1. The best I can do in terms of interpreting the above, is to tell myself that we cannot fully ‘fix’ the functional form of the wave function for the second or ‘other’ way the event can happen if we’re ‘fixing’ the functional form for the first of the two possibilities. We have to allow for a phase shift eiδ indeed, which incorporates all kinds of considerations of uncertainty in regard to both time and position and, hence, in regard to energy and momentum also (using both the ΔEΔt = ħ/2 and ΔxΔp = ħ/2 expressions)–I assume (but that’s just a gut instinct). And then the symmetry of the situation then implies eiδ can only take on one of two possible values: –1 or +1 which, in turn, implies that δ is equal to 0 or π.

2. For those who’d think I am basically doing nothing but re-write a chapter out of Feynman’s Lectures, I’d refute that. One point to note is that Feynman doesn’t seem to accept that we should introduce a phase factor in the analysis for non-identical particles as well. To be specific: just switching the detectors (instead of the particles) also implies that one should allow for the mathematical possibility of the phase of that f function being shifted by some random factor δ. The only difference with the quantum-mechanical analysis (i.e. the analysis for identical particles) is that the phase factor doesn’t make a difference as to the final result, because we’re not adding amplitudes but their absolute squares and, hence, a phase shift doesn’t matter.

3. I think all of the reasoning above makes not only for a very fine but also a very beautiful theoretical argument, even I feel like I don’t fully ‘understand’ it, in an intuitive way that is. I hope this post has made you think. Isn’t it wonderful to see that the theoretical or mathematical possibilities of the model actually correspond to realities, both in the classical as well as in the quantum-mechanical world? In fact, I can imagine that most physicists and mathematicians would shrug this whole reflection off like… Well… Like: “Of course! It’s obvious, isn’t it?” I don’t think it’s obvious. I think it’s deep. I would even qualify it as mysterious, and surely as beautiful. 🙂

My son, who’s fifteen, said he liked my post on lasers. That’s good, because I effectively wrote it thinking of him as part of the audience. He also said it stimulated him to considering taking on studies in engineering later. That’s great. I hope he does, so he doesn’t have to go through what I am going through right now. Indeed, when everything is said and done, you do want your kids to take on as much math and science they can handle when they’re young because, afterwards, it’s tough to catch up.

Now, I struggled quite a bit with bringing relativity into the picture while pondering the ‘essence’ of a photon in my previous post. Hence, I’d thought it would be good to return to the topic of (special) relativity and write another post to (1) refresh my knowledge on the topic and (2) try to stimulate him even more. Indeed, regardless of whether one does or doesn’t understand any of what I write below here, relativity theory sounds fascinating, doesn’t it? 🙂 So, this post intends to present, in a nutshell, what (special) relativity theory is all about.

What relativity does

The thing that’s best known about Einstein’s (special) theory of relativity is the following: the mass of an object, as measured by the (inertial) observer, increases with its speed. The formula for this is m = γm0, and the γ factor here is the so-called Lorentz factor: γ = (1–u2/c2)–1/2. Let me give you that diagram of the Lorentz factor once again, which shows that very considerable speeds are required before relativity effects kick in. However, when they do, they kick in with a vengeance, it seems, which makes c the limit !

Now, you may or may not be familiar with two other things that come out of relativity theory as well:

1. The first is length contraction: objects are measured to be shortened in the direction of motion with respect to the (inertial) observer. The formula to be used incorporates the reciprocal of the Lorentz factor: L = (1/γ)L0. For example, a stick of one meter in a space ship moving at a velocity v = 0.6c will appear to be only 80 cm to the external/inertial observer seeing it whizz past… That is if he can see anything at all of course: he’d have to take like a photo-finish picture as it zooms past ! 🙂
2. The second is time dilation, which is also rather well known – just like the mass increase effect – because of the so-called twin paradox: time will appear to be slower in that space ship and, hence, if you send one of two twins away on a space journey, traveling at relativistic speeds (i.e. a velocity sufficiently close to to make the relativistic effect significant), he will come back younger than his brother. The formula here is equally simple: t = γt0. Hence, one second in the space ship will be measured as 1.25 seconds by the external observer. Hence, the moving clock will appear to run slower – again: to the external (inertial) observer that is.

These simple rules, which comes out of Einsteins’ special relativity theory, give rise to all kinds of paradoxes. You know what a paradox is: a paradox (in physics) is something that, at first sight, does not make sense but that, when the issue is examined more in detail, does get resolved and actually helps us to better understand what’s going on.

You know the twin paradox already: only of the two twins can be the younger (or the older) when they meet again. However, because one can also say it’s the guy staying on Earth that’s moving (and, hence, is ‘traveling’ at relativistic speed) – so then the reference frame of the guy in the spaceship is the so-called inertial frame, one can say the guy who stayed behind (on Earth) should be the youngest when they meet after the journey. I am not ashamed to say that this actually is a paradox that is difficult to understand. So let me first start with another.

While the twin paradox examines the time dilation effect, the ladder paradox examines the length contraction effect. The situation is similar as the one for the twin paradox. However, because we don’t have accelerating and decelerating rockets and all that (cf. the twin paradox), I find this paradox not only more straightforward but also more amusing. Look at the left-hand side first. We have a garage which has both a front and back door. A ladder passes through it, and it seems to fit in the garage as you can see. Now, that may or may not be because of the length contraction effect, of course. Whatever. In any case, it seems we can (very) quickly close both doors of the garage to prove that it fits. Now look at the right-hand side. Here we are moving the garage over the ladder (I know, not very convenient, but just go along with the story). So now the ladder frame is the inertial reference frame and the garage is the moving frame. So, according to that length contraction ‘law’, it’s the garage that gets shorter and it turns out the ladder doesn’t fit any more. Hence, the paradox: does the ladder fit or not? The answer must be unambiguous, no? Yes or no. So what is it?

The paradox pushes us to consider all kinds of important questions which are usually just glossed over. How does we decide if the ladder fits? Well… By closing both the front and back door of course, you’ll say. But then you mean closing them simultaneously, and absolute simultaneity does not exist: two events that appear to happen at the same time in one reference frame may not happen at the same time in another. Only the space-time interval between two events is absolute, in the sense that it’s the same in whatever reference frame we’re measuring it, not the individual space and individual time intervals. Hence, if you’re in the garage shutting those doors at the same time, then that’s your time, but if I am moving with the ladder, I will not see those two doors shutting as something that’s simultaneous. More formally, and using the definition of space-time intervals (and assuming only one space dimension x), we have:

cΔt– Δx= cΔt’– Δx’2.

In this equation, we’ll take the x and t coordinates to be those of the inertial frame (so that’s the garage on the left-hand side), while the the primed coordinates (x’ and t’) are the coordinates as measured in the other reference frame, i.e. the reference frame that moves from the perspective of the inertial frame. Indeed, note that we cannot say that one reference frame moves while the other stands still as we we’re talking relative speeds here: one reference frame moves in respect to the other, and vice versa. In any case, the equation with the space-time intervals above implies that:

c(ΔtΔt’2) – (Δx– Δx’2) = 0

However, that does not imply that the two terms on the left-hand side of the above equation are zero individually. In fact, they aren’t. Hence, while it must be true that c(ΔtΔt’2) = Δx– Δx’2, we have:

ΔtΔt’≠ 0 and Δx– Δx’2 ≠ 0 or ΔtΔt’and Δx≠ Δx’2

To put it simply, if you’re in the garage, and I am moving with the ladder (we’re talking the left-hand side situation) now, you’ll claim that you were able to shut both doors momentarily, so that Δt= 0. I’ll say: bollocks! Which is rude. I should say: my Δt’is not equal to zero. Hence, from my point of view, I always saw one of the two doors open and, hence, I don’t think the ladder fits. Hence, what I am seeing, effectively, is the situation on the right-hand side: your garage looks too short for my ladder.

You’ll say: what is this? The ladder fits or it doesn’t, does it? The answer is: no. It is ambiguous. It does depend on your reference frame. It fits in your reference frame but it does not fit in mine. In order to get a non-ambiguous answer you have to stop moving, or I have to stop moving– whatever: the point is that we need to merge our reference frames.

Hence, paradox solved. In fact, now that I think of it, it’s kinda funny that we don’t have such paradoxes for the relativistic mass formula. No one seems to wonder about the apparent contradiction that, if you’re moving away from me, you look heavier than me but that, vice versa, I also look heavier to you. So we both look heavier as seen from our own respective reference frames. So who’s heavier then? Perhaps no one developed a paradox because it is kinda impolite to compare personal weights? 🙂

Of course, I am joking, but think of it: it has to do with our preconceived notions of time and space. Things like inertia (mass is a measure for inertia) don’t grab our attention as much. In any case, now it’s time to discuss time dilation.

Oh ! And do think about that photo-finish picture ! It’s related to the problem of defining what constitutes a length really. 🙂

I find the twin paradox much more difficult to analyze, and I guess many people do because it’s the one that usually receives all of the attention. [Frankly, I hadn’t heard of this ladder paradox before I started studying physics.] Feynman hardly takes the time to look at it. He basically notes that the situation is not unlike an unstable particle traveling at relativistic speeds: when it does, it lasts (much) longer that its lifetime (measured in the inertial reference frame) suggests. Let me actually just quote Feynman’s account of it:

“Peter and Paul are supposed to be twins, born at the same time. When they are old enough to drive a space ship, Paul flies away at very high speed. Because Peter, who is left on the ground, sees Paul going so fast, all of Paul’s clocks appear to go slower, his heart beats go slower, his thoughts go slower, everything goes slower, from Peter’s point of view. Of course, Paul notices nothing unusual, but if he travels around and about for a while and then comes back, he will be younger than Peter, the man on the ground! That is actually right; it is one of the consequences of the theory of relativity which has been clearly demonstrated. Just as the mu-mesons last longer when they are moving, so also will Paul last longer when he is moving. This is called a “paradox” only by the people who believe that the principle of relativity means that all motion is relative; they say, “Heh, heh, heh, from the point of view of Paul, can’t we say that Peter was moving and should therefore appear to age more slowly? By symmetry, the only possible result is that both should be the same age when they meet.” But in order for them to come back together and make the comparison, Paul must either stop at the end of the trip and make a comparison of clocks or, more simply, he has to come back, and the one who comes back must be the man who was moving, and he knows this, because he had to turn around. When he turned around, all kinds of unusual things happened in his space ship—the rockets went off, things jammed up against one wall, and so on—while Peter felt nothing.

So the way to state the rule is to say that the man who has felt the accelerations, who has seen things fall against the walls, and so on, is the one who would be the younger; that is the difference between them in an “absolute” sense, and it is certainly correct. When we discussed the fact that moving mu-mesons live longer, we used as an example their straight-line motion in the atmosphere. But we can also make mu-mesons in a laboratory and cause them to go in a curve with a magnet, and even under this accelerated motion, they last exactly as much longer as they do when they are moving in a straight line. Although no one has arranged an experiment explicitly so that we can get rid of the paradox, one could compare a mu-meson which is left standing with one that had gone around a complete circle, and it would surely be found that the one that went around the circle lasted longer. Although we have not actually carried out an experiment using a complete circle, it is really not necessary, of course, because everything fits together all right. This may not satisfy those who insist that every single fact be demonstrated directly, but we confidently predict the result of the experiment in which Paul goes in a complete circle.”

[…] Well… I am not sure I am “among those who insist that every single fact be demonstrated directly”, but you’ll admit that Feynman is quite terse here (or more terse than usual, I should say). That being said, I understand why: the calculations involved in demonstrating that the paradox is what it is, i.e. an apparent contradiction only, are not straightforward. I’ve googled a bit but it’s all quite confusing. Good explanations usually involve the so-called Minkowski diagram, also known as the spacetime diagram. You’ve surely seen it before–when the light cone was being discussed and what it implies for the concepts of past, present and future. It’s a way to represent those spacetime intervals. The Minkowski diagram–from the perspective of the twin brother on Earth (hence, we only have unprimed coordinates x and (c)t)– is shown below. Don’t worry about those simultaneity planes as for now. Just try to understand the diagram. The twin brother that stays just moves along the vertical axis: x = 0. His space-traveling brother travels out to some point and then turns back, so he first travels northeast on this diagram and then takes a turn northwest, to meet up again with his brother on Earth.

The point to note is that the twin brother is not traveling along one straight line, but along two. Hence, the argument that we can just as well say his frame of reference is inertial and that of his brother is the moving one is not correct. As Wikipedia notes (from which I got this diagram): “The trajectory of the ship is equally divided between two different inertial frames, while the Earth-based twin stays in the same inertial frame.”

Still, the situation is essentially symmetric and so we could draw a similar-looking spacetime diagram for the primed coordinates, i.e. x’ and ct’, and wonder what’s the difference. That’s where these planes of simultaneity come in. Look at the wonderful animation below: A, B, C are simultaneous events when I am standing still (v = 0). However, when I move at considerable speed (v = 0.3c), that’s no longer the case: it takes more time for news to reach me from ‘point’ A and, hence, assuming news travels at the speed of light, event A appears to happen later. Conversely, event C (in spacetime) appears to have happened before event B. Now that explains these blue so-called simultaneity planes on the diagram above: they’re the white lines traveling from the past to the future on the animation below, but for the trip out only (> 0). For the trip back, we have the red lines, which correspond to the v = –0.5c situation below. So that’s the return trip (< 0).

What you see is that, “during the U-turn, the plane of simultaneity jumps from blue to red and very quickly sweeps over a large segment of the world line of the Earth-based twin.” Hence, “when one transfers from the outgoing frame to the incoming frame there is a jump discontinuity in the age of the Earth-based twin.” [I took the quotes taken from Wikipedia here, where you can find the original references.] Now, you will say, that is also symmetric if we switch the reference frames. Yes… Except for the sign. So, yes, it is the traveling brother who effectively skips some time. Paradox solved.

Now… For some real fun…

Now, for some real fun, I’d like to ask you how the world would look like when you were traveling through it riding a photon. So… Think about it. Think hard. I didn’t google at first and I must admit the question really started wracking my brain. There are some many effects to take into account. One basic property, of course, must be that time stands still around you. You see the world as it was when you reached v = c. Well… Yes and no. The fact of the matter is that, because of all the relativistic effects (e.g. aberration, Doppler shift, intensity shifts,…), you actually don’t see a whole lot. One visualization of it (visual effects of relativistic speeds) seems to indicate that (most) science fiction movies actually present the correct picture (if the animation shows the correct visualization, that is): we’re staring into one bright flash of light ahead of us as we’re getting close to v = c. Interesting…

Finally, you should also try to find out what actually happens to the clocks during the deceleration and acceleration as the space ship of that twin brother turns. You’re going to find it fascinating. At the same time, the math behind is, quite simply, daunting and, hence, I won’t even try go into the math of this thing. 🙂

Conclusion

So… Well… That’s it really. I now realize why I never quite got this as a kid. These paradoxes do require some deep thinking and imagination and, most of all, some tools that one just couldn’t find as easily as today.

The Web definitely does make it easier to study without the guidance of professors and the material environment of a university, although I don’t think it can be a substitute for discipline. When everything is said and done, it’s still hard work. Very hard work. But I hope you get there, Vincent ! 🙂 And please do look at that Youtube video by clicking the link above. 🙂

Post scriptum: Because the resolution of the video above is quite low, I looked for others, for example one that describes the journey from the Sun to the Earth, which–as expected–takes about 8 minutes. While it has higher resolution, it is far less informative. I’ll let you google some more. Please tell me if you found something nice. 🙂

# The Complementarity Principle

Unlike what you might think when seeing the title of this post, it is not my intention to enter into philosophical discussions here: many authors have been writing about this ‘principle’, most of which–according to eminent physicists–don’t know what they are talking about. So I have no intention to make a fool of myself here too. However, what I do want to do here is explore, in an intuitive way, how the classical and quantum-mechanical explanations of the phenomenon of the diffraction of light are different from each other–and fundamentally so–while, necessarily, having to yield the same predictions. It is in that sense that the two explanations should be ‘complementary’.

The classical explanation

I’ve done a fairly complete analysis of the classical explanation in my posts on Diffraction and the Uncertainty Principle (20 and 21 September), so I won’t dwell on that here. Let me just repeat the basics. The model is based on the so-called Huygens-Fresnel Principle, according to which each point in the slit becomes a source of a secondary spherical wave. These waves then interfere, constructively or destructively, and, hence, by adding them, we get the form of the wave at each point of time and at each point in space behind the slit. The animation below illustrates the idea. However, note that the mathematical analysis does not assume that the point sources are neatly separated from each other: instead of only six point sources, we have an infinite number of them and, hence, adding up the waves amounts to solving some integral (which, as you know, is an infinite sum).

We know what we are supposed to get: a diffraction pattern. The intensity of the light on the screen at the other side depends on (1) the slit width (d), (2) the frequency of the light (λ), and (3) the angle of incidence (θ), as shown below.

One point to note is that we have smaller bumps left and right. We don’t get that if we’d treat the slit as a single point source only, like Feynman does when he discusses the double-slit experiment for (physical) waves. Indeed, look at the image below: each of the slits acts as one point source only and, hence, the intensity curves I1 and I2 do not show a diffraction pattern. They are just nice Gaussian “bell” curves, albeit somewhat adjusted because of the angle of incidence (we have two slits above and below the center, instead of just one on the normal itself). So we have an interference pattern on the screen and, now that we’re here, let me be clear on terminology: I am going along with the widespread definition of diffraction being a pattern created by one slit, and the definition of interference as a pattern created by two or more slits. I am noting this just to make sure there’s no confusion.

That should be clear enough. Let’s move on the quantum-mechanical explanation.

The quantum-mechanical explanation

There are several formulations of quantum mechanics: you’ve heard about matrix mechanics and wave mechanics. Roughly speaking, in matrix mechanics “we interpret the physical properties of particles as matrices that evolve in time”, while the wave mechanics approach is primarily based on these complex-valued wave functions–one for each physical property (e.g. position, momentum, energy). Both approaches are mathematically equivalent.

There is also a third approach, which is referred to as the path integral formulation, which  “replaces the classical notion of a single, unique trajectory for a system with a sum, or functional integral, over an infinity of possible trajectories to compute an amplitude” (all definitions here were taken from Wikipedia). This approach is associated with Richard Feynman but can also be traced back to Paul Dirac, like most of the math involved in quantum mechanics, it seems. It’s this approach which I’ll try to explain–again, in an intuitive way only–in order to show the two explanations should effectively lead to the same predictions.

The key to understanding the path integral formulation is the assumption that a particle–and a ‘particle’ may refer to both bosons (e.g. photons) or fermions (e.g. electrons)–can follow any path from point A to B, as illustrated below. Each of these paths is associated with a (complex-valued) probability amplitude, and we have to add all these probability amplitudes to arrive at the probability amplitude for the particle to move from A to B.

You can find great animations illustrating what it’s all about in the relevant Wikipedia article but, because I can’t upload video here, I’ll just insert two illustrations from Feynman’s 1985 QED, in which he does what I try to do, and that is to approach the topic intuitively, i.e. without too much mathematical formalism. So probability amplitudes are just ‘arrows’ (with a length and a direction, just like a complex number or a vector), and finding the resultant or final arrow is a matter of just adding all the little arrows to arrive at one big arrow, which is the probability amplitude, which he denotes as P(A, B), as shown below.

This intuitive approach is great and actually goes a very long way in explaining complicated phenomena, such as iridescence for example (the wonderful patterns of color on an oil film!), or the partial reflection of light by glass (anything between 0 and 16%!). All his tricks make sense. For example, different frequencies are interpreted as slower or faster ‘stopwatches’ and, as such, they determine the final direction of the arrows which, in turn, explains why blue and red light are reflected differently. And so on and son. It all works. […] Up to a point.

Indeed, Feynman does get in trouble when trying to explain diffraction. I’ve reproduced his explanation below. The key to the argument is the following:

1. If we have a slit that’s very wide, there are a lot of possible paths for the photon to take. However, most of these paths cancel each other out, and so that’s why the photon is likely to travel in a straight line. Let me quote Feynman: “When the gap between the blocks is wide enough to allow many neighboring paths to P and Q, the arrows for the paths to P add up (because all the paths to P take nearly the same time), while the paths to Q cancel out (because those paths have a sizable difference in time). So the photomultiplier at Q doesn’t click.” (QED, p.54)
2. However, “when the gap is nearly closed and there are only a few neighboring paths, the arrows to Q also add up, because there is hardly any difference in time between them, either (see Fig. 34). Of course, both final arrows are small, so there’s not much light either way through such a small hole, but the detector at Q clicks almost as much as the one at P! So when you try to squeeze light too much to make sure it’s going only in a straight line, it refuses to cooperate and begins to spread out.” (QED, p. 55)

This explanation is as simple and intuitive as Feynman’s ‘explanation’ of diffraction using the Uncertainty Principle in his introductory chapter on quantum mechanics (Lectures, I-38-2), which is illustrated below. I won’t go into the detail (I’ve done that before) but you should note that, just like the explanation above, such explanations do not explain the secondary, tertiary etc bumps in the diffraction pattern.

So what’s wrong with these explanations? Nothing much. They’re simple and intuitive, but essentially incomplete, because they do not incorporate all of the math involved in interference. Incorporating the math means doing these integrals for

1. Electromagnetic waves in classical mechanics: here we are talking ‘wave functions’ with some real-valued amplitude representing the strength of the electric and magnetic field; and
2. Probability waves: these are complex-valued functions, with the complex-valued amplitude representing probability amplitudes.

The two should, obviously, yield the same result, but a detailed comparison between the approaches is quite complicated, it seems. Now, I’ve googled a lot of stuff, and I duly note that diffraction of electromagnetic waves (i.e. light) is conveniently analyzed by summing up complex-valued waves too, and, moreover, they’re of the same familiar type: ψ = Aei(kx–ωt). However, these analyses also duly note that it’s only the real part of the wave that has an actual physical interpretation, and that it’s only because working with natural exponentials (addition, multiplication, integration, derivation, etc) is much easier than working with sine and cosine waves that such complex-valued wave functions are used (also) in classical mechanics. In fact, note the fine print in Feynman’s illustration of interference of physical waves (Fig. 37-2): he calculates the intensities I1 and I2 by taking the square of the absolute amplitudes ĥ1 and ĥ2, and the hat indicates that we’re also talking some complex-valued wave function here.

Hence, we must be talking the same mathematical waves in both explanations, aren’t we? In other words, we should get the same psi functions ψ = Aei(kx–ωt) in both explanations, don’t we? Well… Maybe. But… Probably not. As far as I know–but I must be wrong–we cannot just re-normalize the E and B vectors in these electromagnetic waves in order to establish an equivalence with probability waves. I haven’t seen that being done (but I readily admit I still have a lot of reading to do) and so I must assume it’s not very clear-cut at all.

So what? Well… I don’t know. So far, I did not find a ‘nice’ or ‘intuitive’ explanation of a quantum-mechanical approach to the phenomenon of diffraction yielding the same grand diffraction equation, referred to as the Fresnel-Kirchoff diffraction formula (see below), or one of its more comprehensible (because simplified) representations, such as the Fraunhofer diffraction formula, or the even easier formula which I used in my own post (you can google them: they’re somewhat less monstrous and–importantly–they work with real numbers only, which makes them easier to understand).

[…] That looks pretty daunting, isn’t it? You may start to understand it a bit better by noting that (n, r) and (n, s) are angles, so that’s OK in a cosine function. The other variables also have fairly standard interpretations, as shown below, but… Admit it: ‘easy’ is something else, isn’t it?

So… Where are we here? Well… As said, I trust that both explanations are mathematically equivalent – just like matrix and wave mechanics 🙂 –and, hence, that a quantum-mechanical analysis will indeed yield the same formula. However, I think I’ll only understand physics truly if I’ve gone through all of the motions here.

Well then… I guess that should be some kind of personal benchmark that should guide me on this journey, isn’t it? 🙂 I’ll keep you posted.

Post scriptum: To be fair to Feynman, and demonstrating his talent as a teacher once again, he actually acknowledges that the double-slit thought experiment uses simplified assumptions that do not include diffraction effects when the electrons go through the slit(s). He does so, however, only in one of the first chapters of Vol. III of the Lectures, where he comes back to the experiment to further discuss the first principles of quantum mechanics. I’ll just quote him: “Incidentally, we are going to suppose that the holes 1 and 2 are small enough that when we say an electron goes through the hole, we don’t have to discuss which part of the hole. We could, of course, split each hole into pieces with a certain amplitude that the electron goes to the top of the hole and the bottom of the hole and so on. We will suppose that the hole is small enough so that we don’t have to worry about this detail. That is part of the roughness involved; the matter can be made more precise, but we don’t want to do so at this stage.” So here he acknowledges that he omitted the intricacies of diffraction. I noted this only later. Sorry.

# A Royal Road to quantum physics?

It is said that, when Ptolemy asked Euclid to quickly explain him geometry, Euclid told the King that there was no ‘Royal Road’ to it, by which he meant it’s just difficult and takes a lot of time to understand.

Physicists will tell you the same about quantum physics. So, I know that, at this point, I should just study Feynman’s third Lectures Volume and shut up for a while. However, before I get lost while playing with state vectors, S-matrices, eigenfunctions, eigenvalues and what have you, I’ll try that Royal Road anyway, building on my previous digression on Hamiltonian mechanics.

So… What was that about? Well… If you understood anything from my previous post, it should be that both the Lagrangian and Hamiltonian function use the equations for kinetic and potential energy to derive the equations of motion for a system. The key difference between the Lagrangian and Hamiltonian approach was that the Lagrangian approach yields one differential equation–which had to be solved to yield a functional form for x as a function of time, while the Hamiltonian approach yielded two differential equations–which had to be solved to yield a functional form for both position (x) and momentum (p). In other words, Lagrangian mechanics is a model that focuses on the position variable(s) only, while, in Hamiltonian mechanics, we also keep track of the momentum variable(s). Let me briefly explain the procedure again, so we’re clear on it:

1. We write down a function referred to as the Lagrangian function. The function is L = T – V with T and V the kinetic and potential energy respectively. T has to be expressed as a function of velocity (v) and V has to be expressed as a function of position (x). You’ll say: of course! However, it is an important point to note, otherwise the following step doesn’t make sense. So we take the equations for kinetic and potential energy and combine them to form a function L = L(x, v).

2. We then calculate the so-called Lagrangian equation, in which we use that function L. To be precise: what we have to do is calculate its partial derivatives and insert these in the following equation:

It should be obvious now why I stressed we should write L as a function of velocity and position, i.e. as L = L(x, v). Otherwise those partial derivatives don’t make sense. As to where this equation comes from, don’t worry about it: I did not explain why this works. I didn’t do that here, and I also didn’t do it in my previous post. What we’re doing here is just explaining how it goes, not why.

3. If we’ve done everything right, we should get a second-order differential equation which, as mentioned above, we should then solve for x(t). That’s what ‘solving’ a differential equation is about: find a functional form that satisfies the equation.

Let’s now look at the Hamiltonian approach.

1. We write down a function referred to as the Hamiltonian function. It looks similar to the Lagrangian, except that we sum kinetic and potential energy, and that T has to be expressed as a function of the momentum p. So we have a function H = T + V = H(x, p).

2. We then calculate the so-called Hamiltonian equations, which is a set of two equations, rather than just one equation. [We have two for the one-dimensional situation that we are modeling here: it’s a different story (i.e. we will have more equations) if we’d have more degrees of freedom of course.] It’s the same as in the Lagrangian approach: it’s just a matter of calculating partial derivatives, and insert them in the equations below. Again, note that I am not explaining why this Hamiltonian hocus-pocus actually works. I am just saying how it works.

3. If we’ve done everything right, we should get two first-order differential equations which we should then solve for x(t) and p(t). Now, solving a set of equations may or may not be easy, depending on your point of view. If you wonder how it’s done, there’s excellent stuff on the Web that will show you how (such as, for instance, Paul’s Online Math Notes).

Now, I mentioned in my previous post that the Hamiltonian approach to modeling mechanics is very similar to the approach that’s used in quantum mechanics and that it’s therefore the preferred approach in physics. I also mentioned that, in classical physics, position and momentum are also conjugate variables, and I also showed how we can calculate the momentum as a conjugate variable from the Lagrangian: p = ∂L/∂v. However, I did not dwell on what conjugate variables actually are in classical mechanics. I won’t do that here either. Just accept that conjugate variables, in classical mechanics, are also defined as pairs of variables. They’re not related through some uncertainty relation, like in quantum physics, but they’re related because they can both be obtained as the derivatives of a function which I haven’t introduced as yet. That function is referred to as the action, but… Well… Let’s resist the temptation to digress any further here. If you really want to know what action is–in physics, that is… 🙂 Well… Google it, I’d say. What you should take home from this digression is that position and momentum are also conjugate variables in classical mechanics.

Let’s now move on to quantum mechanics. You’ll see that the ‘similarity’ in approach is… Well… Quite relative, I’d say. 🙂

Position and momentum in quantum mechanics

As you know by now (I wrote at least a dozen posts on this), the concept of position and momentum in quantum mechanics is very different from that in classical physics: we do not have x(t) and p(t) functions which give a unique, precise and unambiguous value for x and p when we assign a value to the time variable and plug it in. No. What we have in quantum physics is some weird wave function, denoted by the Greek letters φ (phi) or ψ (psi) or, using Greek capitals, Φ and Ψ. To be more specific, the psi usually denotes the wave function in the so-called position space (so we write ψ = ψ(x)), and the phi will usually denote the wave function in the so-called momentum space (so we write φ = φ(p)). That sounds more complicated than it is, obviously, but I just wanted to respect terminology here. Finally, note that the ψ(x) and φ(p) wave functions are related through the Uncertainty Principle: they’re conjugate variables, and we have this ΔxΔp = ħ/2 equation, in which the Δ is some standard deviation from some mean value. I should not go into more detail here: you know that by now, don’t you?

While the argument of these functions is some real number, the wave functions themselves are complex-valued, so they have a real and complex amplitude. I’ve also illustrated that a couple of times already but, just to make sure, take a look at the animation below, so you know what we are sort of talking about:

1. The A and B situations represent a classical oscillator: we know exactly where the red ball is at any point in time.
2. The C to H situations give us a complex-valued amplitude, with the blue oscillation as the real part, and the pink oscillation as the imaginary part.

So we have such wave function both for x and p. Note that the animation above suggests we’re only looking at the wave function for x but–trust me–we have a similar one for p, and they’re related indeed. [To see how exactly, I’d advise you to go through the proof of the so-called Kennard inequality.] So… What do we do with that?

The position and momentum operators

When we want to know where a particle actually is, or what its momentum is, we need to do something with this wave function ψ or φ. Let’s focus on the position variable first. While the wave function itself is said to have ‘no physical interpretation’ (frankly, I don’t know what that means: I’d think everything has some kind of interpretation (and what’s physical and non-physical?), but let’s not get lost in philosophy here), we know that the square of the absolute value of the probability amplitude yields a probability density. So |ψ(x)|gives us a probability density function or, to put it simply, the probability to find our ‘particle’ (or ‘wavicle’ if you want) at point x. Let’s now do something more sophisticated and write down the expected value of x, which is usually denoted by 〈x〉 (although that invites confusion with Dirac’s bra-ket notation, but don’t worry about it):

Don’t panic. It’s just an integral. Look at it. ψ* is just the complex conjugate (i.e. a – ib if ψ = a + ib) and you will (or should) remember that the product of a complex number with its (complex) conjugate gives us the square of its absolute value: ψ*ψ = |ψ(x)|2. What about that x? Can we just insert that there, in-between ψ* and ψ ? Good question. The answer is: yes, of course! That x is just some real number and we can put it anywhere. However, it’s still a good question because, while multiplication of complex numbers is commutative (hence,  z1z2 = z2z1), the order of our operators – which we will introduce soon – can often not be changed without consequences, so it is something to note.

For the rest, that integral above is quite obvious and it should really not puzzle you: we just multiply a value with its probability of occurring and integrate over the whole domain to get an expected value 〈x〉. Nothing wrong here. Note that we get some real number. [You’ll say: of course! However, I always find it useful to check that when looking at those things mixing complex-valued functions with real-valued variables or arguments. A quick check on the dimensions of what we’re dealing helps greatly in understanding what we’re doing.]

So… You’ve surely heard about the position and momentum operators already. Is that, then, what it is? Doing some integral on some function to get an expected value? Well… No. But there’s a relation. However, let me first make a remark on notation, because that can be quite confusing. The position operator is usually written with a hat on top of the variable – like ẑ – but so I don’t find a hat with every letter with the editor tool for this blog and, hence, I’ll use a bold letter x and p to denote the operator. Don’t confuse it with me using a bold letter for vectors though ! Now, back to the story.

Let’s first give an example of an operator you’re already familiar with in order to understand what an operator actually is. To put it simply: an operator is an instruction to do something with a function. For example: ∂/∂t is an instruction to differentiate some function with regard to the variable t (which usually stands for time). The ∂/∂t operator is obviously referred to as a differentiation operator. When we put a function behind, e.g. f(x, t), we get ∂f(x, t)/∂t, which is just another function in x and t.

So we have the same here: x in itself is just an instruction: you need to put a function behind in order to get some result. So you’ll see it as xψ. In fact, it would be useful to use brackets probably, like x[ψ], especially because I can’t put those hats on the letters here, but I’ll stick to the usual notation, which does not use brackets.

Likewise, we have a momentum operator: p = –iħ∂/∂x. […] Let it sink in. [..]

What’s this? Don’t worry about it. I know: that looks like a very different animal than that x operator. I’ll explain later. Just note, for the moment, that the momentum operator (also) involves a (partial) derivative and, hence, we refer to it as a differential operator (as opposed to differentiation operator). The instruction p = –iħ∂/∂x basically means: differentiate the function with regard to x and multiply with iħ (i.e. the product of Planck’s constant and the imaginary unit i). Nothing wrong with that. Just calculate a derivative and multiply with a tiny imaginary (complex) number.

Now, back to the position operator x. As you can see, that’s a very simple operator–much simpler than the momentum operator in any case. The position operator applied to ψ yields, quite simply, the xψ(x) factor in the integrand above. So we just get a new function xψ(x) when we apply x to ψ, of which the values are simply the product of x and ψ(x). Hence, we write xψ = xψ.

Really? Is it that simple? Yes. For now at least. 🙂

Back to the momentum operator. Where does that come from? That story is not so simple. [Of course not. It can’t be. Just look at it.] Because we have to avoid talking about eigenvalues and all that, my approach to the explanation will be quite intuitive. [As for ‘my’ approach, let me note that it’s basically the approach as used in the Wikipedia article on it. :-)] Just stay with me for a while here.

Let’s assume ψ is given by ψ = ei(kx–ωt). So that’s a nice periodic function, albeit complex-valued. Now, we know that functional form doesn’t make all that much sense because it corresponds to the particle being everywhere, because the square of its absolute value is some constant. In fact, we know it doesn’t even respect the normalization condition: all probabilities have to add up to 1. However, that being said, we also know that we can superimpose an infinite number of such waves (all with different k and ω) to get a more localized wave train, and then re-normalize the result to make sure the normalization condition is met. Hence, let’s just go along with this idealized example and see where it leads.

We know the wave number k (i.e. its ‘frequency in space’, as it’s often described) is related to the momentum p through the de Broglie relation: p = ħk. [Again, you should think about a whole bunch of these waves and, hence, some spread in k corresponding to some spread in p, but just go along with the story for now and don’t try to make it even more complicated.] Now, if we differentiate with regard to x, and then substitute, we get ∂ψ/∂x = ∂ei(kx–ωt)/∂x = ikei(kx–ωt) = ikψ, or

So what is this? Well… On the left-hand side, we have the (partial) derivative of a complex-valued function (ψ) with regard to x. Now, that derivative is, more likely than not, also some complex-valued function. And if you don’t believe me, just look at the right-hand side of the equation, where we have that i and ψ. In fact, the equation just shows that, when we take that derivative, we get our original function ψ but multiplied by ip/ħ. Hey! We’ve got a differential equation here, don’t we? Yes. And the solution for it is… Well… The natural exponential. Of course! That should be no surprise because we started out with a natural exponential as functional form! So that’s not the point. What is the point, then? Well… If we bring that i/ħ factor to the other side, we get:

(–i/ħ)(∂ψ/∂x) = pψ

[If you’re confused about the –i, remember that i–1 = 1/i = –i.] So… We’ve got pψ on the right-hand side now. So… Well… That’s like xψ, isn’t it? Yes. 🙂 If we define the momentum operator as p = (–i/ħ)(∂/∂x), then we get pψ = pψ. So that’s the same thing as for the position operator. It’s just that p is… Well… A more complex operator, as it has that –i/ħ factor in it. And, yes, of course it also involves an instruction to differentiate, which also sets it apart from the position operator, which is just an instruction to multiply the function with its argument.

I am sure you’ll find this funny–perhaps even fishy–business. And, yes, I have the same questions: what does it all mean? I can’t answer that here. As for now, just accept that this position and momentum operator are what they are, and that I can’t do anything about that. But… I hear you sputter: what about their interpretation? Well… Sorry… I could say that the functions xψ and pψ are so-called linear maps but that is not likely to help you much in understanding what these operators really do. You – and I for sure 🙂 – will indeed have to go through that story of eigenvalues to a somewhat deeper understanding of what these operators actually are. That’s just how it is. As for now, I just have to move on. Sorry for letting you down here. 🙂

Energy operators

Now that we sort of ‘understand’ those position and momentum operators (or their mathematical form at least), it’s time to introduce the energy operators. Indeed, in quantum mechanics, we’ve also got an operator for (a) kinetic energy, and for (b) potential energy. These operators are also denoted with a hat above the T and V symbol. All quantum-mechanical operators are like that, it seems. However, because of the limitations of the editor tool here, I’ll also use a bold T and V respectively. Now, I am sure you’ve had enough of this operators, so let me just jot them down:

1. V = V, so that’s just an instruction to multiply a function with V = V(x, t). That’s easy enough because that’s just like the position vector.
2. As for T, that’s more complicated. It involves that momentum operator p, which was also more complicated, remember? Let me just give you the formula:

T = p/2m = p2/2m.

So we multiply the operator p with itself here. What does that mean? Well… Because the operator involves a derivative, it means we have to take the derivative twice and… No ! Well… Let me correct myself: yes and no. 🙂 That p·p product is, strictly speaking, a dot product between two vectors, and so it’s not just a matter of differentiating twice. Now that we are here, we may just as well extend the analysis a bit and assume that we also have a y and z coordinate, so we’ll have a position vector r = (x, y, z). [Note that r is a vector here, not an operator. !?! Oh… Well…] Extending the analysis to three (or more) dimensions means that we should replace the differentiation operator by the so-called gradient or del operator: ∇ = (∂/∂x, ∂/∂y, ∂/∂z). And now that dot product p will, among other things, yield another operator which you’re surely familiar with: the Laplacian. Let me remind you of it:

Hence, we can write the kinetic energy operator T as:

I quickly copied this formula from Wikipedia, which doesn’t have the limitation of the WordPress editor tool, and so you see it now the way you should see it, i.e. with the hat notation. 🙂

[…]

In case you’re despairing, hang on ! We’re almost there. 🙂 We can, indeed, now define the Hamiltonian operator that’s used in quantum mechanics. While the Hamiltonian function was the sum of the potential and kinetic energy functions in classical physics, in quantum mechanics we add the two energy operators. You’ll grumble and say: that’s not the same as adding energies. And you’re right: adding operators is not the same as adding energy functions. Of course it isn’t. 🙂 But just stick to the story, please, and stop criticizing. [Oh – just in case you wonder where that minus sign comes from: i2 = –1, of course.]

Adding the two operators together yields the following:

So. Yes. That’s the famous Hamiltonian operator.

OK. So what?

Yes…. Hmm… What do we do with that operator? Well… We apply it to the function and so we write Hψ = … Hmm…

Well… What?

Well… I am not writing this post just to give some definitions of the type of operators that are used in quantum mechanics and then just do obvious stuff by writing it all out. No. I am writing this post to illustrate how things work.

OK. So how does it work then?

Well… It turns out that, in quantum mechanics, we have similar equations as in classical mechanics. Remember that I just wrote down the set of (two) differential equations when discussing Hamiltonian mechanics? Here I’ll do the same. The Hamiltonian operator appears in an equation of which you’ve surely heard of and which, just like me, you’d love to understand–and then I mean: understand it fully, completely, and intuitively. […] Yes. It’s the Schrödinger equation:

Note, once again, I am not saying anything about where this equation comes from. It’s like jotting down that Lagrange equation, or the set of Hamiltonian equations: I am not saying anything about the why of all this hocus pocus. I am just saying how it goes. So we’ve got another differential equation here, and we have to solve it. If we all write it out using the above definition of the Hamiltonian operator, we get:

If you’re still with me, you’ll immediately wonder about that μ. Well… Don’t. It’s the mass really, but the so-called reduced mass. Don’t worry about it. Just google it if you want to know more about this concept of a ‘reduced’ mass: it’s a fine point which doesn’t matter here really. The point is the grand result.

But… So… What is the grand result? What are we looking at here? Well… Just as I said above: that Schrödinger equation is a differential equation, just like those equations we got when applying the Lagrangian and Hamiltonian approach to modeling a dynamic system in classical mechanics, and, hence, just like what we (were supposed to) do there, we have to solve it. 🙂 Of course, it looks much more daunting than our Lagrangian or Hamiltonian differential equations, because we’ve got complex-valued functions here, and you’re probably scared of that iħ factor too. But you shouldn’t be. When everything is said and done, we’ve got a differential equation here that we need to solve for ψ. In other words, we need to find functional forms for ψ that satisfy the above equation. That’s it. Period.

So how do these solutions look like? Well, they look like those complex-valued oscillating things in the very first animation above. Let me copy them again:

So… That’s it then? Yes. I won’t say anything more about it here, because (1) this post has become way too long already, and so I won’t dwell on the solutions of that Schrödinger equation, and because (2) I do feel it’s about time I really start doing what it takes, and that’s to work on all of the math that’s necessary to actually do all that hocus-pocus. 🙂

Post scriptum: As for understanding the Schrödinger equation “fully, completely, and intuitively”, I am not sure that’s actually possible. But I am trying hard and so let’s see. 🙂 I’ll tell you after I mastered the math. But something inside of me tells me there’s indeed no Royal Road to it. 🙂

Post scriptum 2 (dated 16 November 2015): I’ve added this post scriptum, more than a year later after writing all of the above, because I now realize how immature it actually is. If you really want to know more about quantum math, then you should read my more recent posts, like the one on the Hamiltonian matrix. It’s not that anything that I write above is wrong—it isn’t. But… Well… It’s just that I feel that I’ve jumped the gun. […] But then that’s probably not a bad thing. 🙂

# Newtonian, Lagrangian and Hamiltonian mechanics

Post scriptum (dated 16 November 2015): You’ll smile because… Yes, I am starting this post with a post scriptum, indeed. 🙂 I’ve added it, a year later or so, because, before you continue to read, you should note I am not going to explain the Hamiltonian matrix here, as it’s used in quantum physics. That’s the topic of another post, which involves far more advanced mathematical concepts. If you’re here for that, don’t read this post. Just go to my post on the matrix indeed. 🙂 But so here’s my original post. I wrote it to tie up some loose end. 🙂

As an economist, I thought I knew a thing or two about optimization. Indeed, when everything is said and done, optimization is supposed to an economist’s forte, isn’t it? 🙂 Hence, I thought I sort of understood what a Lagrangian would represent in physics, and I also thought I sort of intuitively understood why and how it could be used it to model the behavior of a dynamic system. In short, I thought that Lagrangian mechanics would be all about optimizing something subject to some constraints. Just like in economics, right?

[…] Well… When checking it out, I found that the answer is: yes, and no. And, frankly, the honest answer is more no than yes. 🙂 Economists (like me), and all social scientists (I’d think), learn only about one particular type of Lagrangian equations: the so-called Lagrange equations of the first kind. This approach models constraints as equations that are to be incorporated in an objective function (which is also referred to as a Lagrangian–and that’s where the confusion starts because it’s different from the Lagrangian that’s used in physics, which I’ll introduce below) using so-called Lagrange multipliers. If you’re an economist, you’ll surely remember it: it’s a problem written as “maximize f(x, y) subject to g(x, y) = c”, and we solve it by finding the so-called stationary points (i.e. the points for which the derivative is zero) of the (Lagrangian) objective function f(x, y) + λ[g(x, y) – c].

Now, it turns out that, in physics, they use so-called Lagrange equations of the second kind, which incorporate the constraints directly by what Wikipedia refers to as a “judicious choice of generalized coordinates.”

Generalized coordinates? Don’t worry about it: while generalized coordinates are defined formally as “parameters that describe the configuration of the system relative to some reference configuration”, they are, in practice, those coordinates that make the problem easy to solve. For example, for a particle (or point) that moves on a circle, we’d not use the Cartesian coordinates x and y but just the angle that locates the particles (or point). That simplifies matters because then we only need to find one variable. In practice, the number of parameters (i.e. the number of generalized coordinates) will be defined by the number of degrees of freedom of the system, and we know what that means: it’s the number of independent directions in which the particle (or point) can move. Now, those independent directions may or may not include the x, y and z directions (they may actually exclude one of those), and they also may or may not include rotational and/or vibratory movements. We went over that when discussing kinetic gas theory, so I won’t say more about that here.

So… OK… That was my first surprise: the physicist’s Lagrangian is different from the social scientist’s Lagrangian.

The second surprise was that all physics textbooks seem to dislike the Lagrangian approach. Indeed, they opt for a related but different function when developing a model of a dynamic system: it’s a function referred to as the Hamiltonian. The modeling approach which uses the Hamiltonian instead of the Lagrangian is, of course, referred to as Hamiltonian mechanics. We may think the preference for the Hamiltonian approach has to do with William Rowan Hamilton being Anglo-Irish, while Joseph-Louis Lagrange (born as Giuseppe Lodovico Lagrangia) was Italian-French but… No. 🙂

And then we have good old Newtonian mechanics as well, obviously. In case you wonder what that is: it’s the modeling approach that we’ve been using all along. 🙂 But I’ll remind you of what it is in a moment: it amounts to making sense of some situation by using Newton’s laws of motion only, rather than a more sophisticated mathematical argument using more abstract concepts, such as energy, or action.

Introducing Lagrangian and Hamiltonian mechanics is quite confusing because the functions that are involved (i.e. the so-called Lagrangian and Hamiltonian functions) look very similar: we write the Lagrangian as the difference between the kinetic and potential energy of a system (L = T – V), while the Hamiltonian is the sum of both (H = T + V). Now, I could make this post very simple and just ask you to note that both approaches are basically ‘equivalent’ (in the sense that they lead to the same solutions, i.e. the same equations of motion expressed as a function of time) and that a choice between them is just a matter of preference–like choosing between an English versus a continental breakfast. 🙂 Of course, an English breakfast has usually some extra bacon, or a sausage, so you get more but… Well… Not necessarily something better. 🙂 So that would be the end of this digression then, and I should be done. However, I must assume you’re a curious person, just like me, and, hence, you’ll say that, while being ‘equivalent’, they’re obviously not the same. So how do the two approaches differ exactly?

Let’s try to get a somewhat intuitive understanding of it all by taking, once again, the example of a simple harmonic oscillator, as depicted below. It could be a mass on a spring. In fact, our example will, in fact, be that of an oscillating mass on a spring. Let’s also assume there’s no damping, because that makes the analysis soooooooo much easier.

Of course, we already know all of the relevant equations for this system just from applying Newton’s laws (so that’s Newtonian mechanics). We did that in a previous post. [I can’t remember which one, but I am sure I’ve done this already.] Hence, we don’t really need the Lagrangian or Hamiltonian. But, of course, that’s the point of this post: I want to illustrate how these other approaches to modeling a dynamic system actually work, and so it’s good we have the correct answer already so we can make sure we’re not going off track here. So… Let’s go… 🙂

I. Newtonian mechanics

Let me recapitulate the basics of a mass on a spring which, in jargon, is called a harmonic oscillator. Hooke’s law is there: the force on the mass is proportional to its distance from the zero point (i.e. the displacement), and the direction of the force is towards the zero point–not away from it, and so we have a minus sign. In short, we can write:

F = –kx (i.e. Hooke’s law)

Now, Newton‘s Law (Newton’s second law to be precise) says that F is equal to the mass times the acceleration: F = ma. So we write:

F = ma = m(d2x/dt2) = –kx

So that’s just Newton’s law combined with Hooke’s law. We know this is a differential equation for which there’s a general solution with the following form:

x(t) = A·cos(ωt + α)

If you wonder why… Well… I can’t digress on that here again: just note, from that differential equation, that we apparently need a function x(t) that yields itself when differentiated twice. So that must be some sinusoidal function, like sine or cosine, because these do that. […] OK… Sorry, but I must move on.

As for the new ‘variables’ (A, ω and α), A depends on the initial condition and is the (maximum) amplitude of the motion. We also already know from previous posts (or, more likely, because you already know a lot about physics) that A is related to the energy of the system. To be precise: the energy of the system is proportional to the square of the amplitude: E ∝ A2. As for ω, the angular frequency, that’s determined by the spring itself and the oscillating mass on it: ω = (k/m)1/2 = 2π/T = 2πf (with T the period, and f the frequency expressed in oscillations per second, as opposed to the angular frequency, which is the frequency expressed in radians per second). Finally, I should note that α is just a phase shift which depends on how we define our t = 0 point: if x(t) is zero at t = 0, then that cosine function should be zero and then α will be equal to ±π/2.

OK. That’s clear enough. What about the ‘operational currency of the universe’, i.e. the energy of the oscillator? Well… I told you already/ We don’t need the energy concept here to find the equation of motion. In fact, that’s what distinguishes this ‘Newtonian’ approach from the Lagrangian and Hamiltonian approach. But… Now that we’re at it, and we have to move to a discussion of these two animals (I mean the Lagrangian and Hamiltonian), let’s go for it.

We have kinetic versus potential energy. Kinetic energy (T) is what it always is. It depends on the velocity and the mass: K.E. = T = mv2/2 = m(dx/dt)2/2 = p2/2m. Huh? What’s this expression with p in it? […] It’s momentum: p = mv. Just check it: it’s an alternative formula for T really. Nothing more, nothing less. I am just noting it here because it will pop up again in our discussion of the Hamiltonian modeling approach. But that’s for later. Onwards!

What about potential energy (V)? We know that’s equal to V = kx2/2. And because energy is conserved, potential energy (V) and kinetic energy (T) should add up to some constant. Let’s check it: dx/dt = d[Acos(ωt + α)]/dt = –Aωsin(ωt + α). [Please do the derivation: don’t accept things at face value. :-)] Hence, T = mA2ω2sin2(ωt + α)/2 = mA2(k/m)sin2(ωt + α)/2 = kA2sin2(ωt + α)/2. Now, V is equal to V = kx2/2 = k[Acos(ωt + α)]2/2 = k[Acos(ωt + α)]2/2 = kA2cos2(ωt + α)/2. Adding both yields:

T + V = kA2sin2(ωt + α)/2 + kA2cos2(ωt + α)/2

= (1/2)kA2[sin2(ωt + α) + cos2(ωt + α)] = kA2/2.

Ouff! Glad that worked out: the total energy is, indeed, proportional to the square of the amplitude and the constant of proportionality is equal to k/2. [You should now wonder why we do not have m in this formula but, if you’d think about it, you can answer your own question: the amplitude will depend on the mass (bigger mass, smaller amplitude, and vice versa), so it’s actually in the formula already.]

The point to note is that this Hamiltonian function H = T + V is just a constant, not only for this particular case (an oscillation without damping), but in all cases where H represents the total energy of a (closed) system.

OK. That’s clear enough. How does our Lagrangian look like? That’s not a constant obviously. Just so you can visualize things, I’ve drawn the graph below:

1. The red curve represents kinetic energy (T) as a function of the displacement x: T is zero at the turning points, and reaches a maximum at the x = 0 point.
2. The blue curve is potential energy (V): unlike T, V reaches a maximum at the turning points, and is zero at the x = 0 point. In short, it’s the mirror image of the red curve.
3. The Lagrangian is the green graph: L = T – V. Hence, L reaches a minimum at the turning points, and a maximum at the x = 0 point.

While that green function would make an economist think of some Lagrangian optimization problem, it’s worth noting we’re not doing any such thing here: we’re not interested in stationary points. We just want the equation(s) of motion. [I just thought that would be worth stating, in light of my own background and confusion in regard to it all. :-)]

OK. Now that we have an idea of what the Lagrangian and Hamiltonian functions are (it’s probably worth noting also that we do not have a ‘Newtonian function’ of some sort), let us now show how these ‘functions’ are used to solve the problem. What problem? Well… We need to find some equation for the motion, remember? [I find that, in physics, I often have to remind myself of what the problem actually is. Do you feel the same? 🙂 ] So let’s go for it.

II. Lagrangian mechanics

As this post should not turn into a chapter of some math book, I’ll just describe the how, i.e. I’ll just list the steps one should take to model and then solve the problem, and illustrate how it goes for the oscillator above. Hence, I will not try to explain why this approach gives the correct answer (i.e. the equation(s) of motion). So if you want to know why rather than how, then just check it out on the Web: there’s plenty of nice stuff on math out there.

The steps that are involved in the Lagrangian approach are the following:

1. Compute (i.e. write down) the Lagrangian function L = T – V. Hmm? How do we do that? There’s more than one way to express T and V, isn’t it? Right you are! So let me clarify: in the Lagrangian approach, we should express T as a function of velocity (v) and V as a function of position (x), so your Lagrangian should be L = L(x, v). Indeed, if you don’t pick the right variables, you’ll get nowhere. So, in our example, we have L = mv2/2 – kx2/2.
2. Compute the partial derivatives ∂L/∂x and ∂L/∂v. So… Well… OK. Got it. Now that we’ve written L using the right variables, that’s a piece of cake. In our example, we have: ∂L/∂x = – kx and ∂L/∂v = mv. Please note how we treat x and v as independent variables here. It’s obvious from the use of the symbol for partial derivatives: ∂. So we’re not taking any total differential here or so. [This is an important point, so I’d rather mention it.]
3. Write down (‘compute’ sounds awkward, doesn’t it?) Lagrange’s equation: d(∂L/∂v)/dt = ∂L/∂x. […] Yep. That’s it. Why? Well… I told you I wouldn’t tell you why. I am just showing the how here. This is Lagrange’s equation and so you should take it for granted and get on with it. 🙂 In our example: d(∂L/∂v)/dt = d(mv)/dt = –k(dx/dt) = ∂L/∂x = – kx. We can also write this as m(dv/dt) = m(d2x/dt2) = –kx.
4. Finally, solve the resulting differential equation. […] ?! Well… Yes. […] Of course, we’ve done that already. It’s the same differential equation as the one we found in our ‘Newtonian approach’, i.e. the equation we found by combining Hooke’s and Newton’s laws. So the general solution is x(t) = Acos(ωt + α), as we already noted above.

So, yes, we’re solving the same differential equation here. So you’ll wonder what’s the difference then between Newtonian and Lagrangian mechanics? Yes, you’re right: we’re indeed solving the same second-order differential equation here. Exactly. Fortunately, I’d say, because we don’t want any other equation(s) of motion because we’re talking the same system. The point is: we got that differential equation using an entirely different procedure, which I actually didn’t explain at all: I just said to compute this and then that and… – Surprise, surprise! – we got the same differential equation in the end. 🙂 So, yes, the Newtonian and Lagrangian approach to modeling a dynamic system yield the same equations, but the Lagrangian method is much more (very much more, I should say) convenient when we’re dealing with lots of moving bits and if there’s more directions (i.e. degrees of freedom) in which they can move.

In short, Lagrange could solve a problem more rapidly than Newton with his modeling approach and so that’s why his approach won out. 🙂 In fact, you’ll usually see the spatial variables noted as qj. In this notation, j = 1, 2,… n, and n is the number of degrees of freedom, i.e. the directions in which the various particles can move. And then, of course, you’ll usually see a second subscript i = 1, 2,… m to keep track of every qfor each and every particle in the system, so we’ll have n×m qij‘s in our model and so, yes, good to stick to Lagrange in that case.

OK. You get that, I assume. Let’s move on to Hamiltonian mechanics now.

III. Hamiltonian mechanics

The steps here are the following. [Again, I am just explaining the how, not the why. You can find mathematical proofs of why this works in handbooks or, better still, on the Web.]

1. The first step is very similar as the one above. In fact, it’s exactly the same: write T and V as a function of velocity (v) and position (x) respectively and construct the Lagrangian. So, once again, we have L = L(x, v). In our example: L(x, v) = mv2/2 – kx2/2.
2. The second step, however, is different. Here, the theory becomes more abstract, as the Hamiltonian approach does not only keep track of the position but also of the momentum of the particles in a system. Position (x) and momentum (p) are so-called canonical variables in Hamiltonian mechanics, and the relation with Lagrangian mechanics is the following: p = ∂L/∂v. Huh? Yeah. Again, don’t worry about the why. Just check it for our example: ∂(mv2/2 – kx2/2)/∂v = 2mv/2 = mv. So, yes, it seems to work. Please note, once again, how we treat x and v as independent variables here, as is evident from the use of the symbol for partial derivatives. Let me get back to the lesson, however. The second step is: calculate the conjugate variables. In more familiar wording: compute the momenta.
3. The third step is: write down (or ‘build’ as you’ll see it, but I find that wording strange too) the Hamiltonian function H = T + V. We’ve got the same problem here as the one I mentioned with the Lagrangian: there’s more than one way to express T and V. Hence, we need some more guidance. Right you are! When writing your Hamiltonian, you need to make sure you express the kinetic energy as a function of the conjugate variable, i.e. as a function of momentum, rather than velocity. So we have H = H(x, p), not H = H(x, v)! In our example, we have H = T + V = p2/2m + kx2/2.
4. Finally, write and solve the following set of equations: (I) ∂H/∂p = dx/dt and (II) –∂H/∂x = dp/dt. [Note the minus sign in the second equation.] In our example: (I) p/m = dx/dt and (II) –kx = dp/dt. The first equation is actually nothing but the definition of p: p = mv, and the second equation is just Hooke’s law: F = –kx. However, from a formal-mathematical point of view, we have two first-order differential equations here (as opposed to one second-order equation when using the Lagrangian approach), which should be solved simultaneously in order to find position and momentum as a function of time, i.e. x(t) and p(t). The end result should be the same: x(t) = Acos(ωt + α) and p(t) = … Well… I’ll let you solve this: time to brush up your knowledge about differential equations. 🙂

You’ll say: what the heck? Why are you making things so complicated? Indeed, what am I doing here? Am I making things needlessly complicated?

The answer is the usual one: yes, and no. Yes. If we’d want to do stuff in the classical world only, the answer seems to be: yes! In that case, the Lagrangian approach will do and may actually seem much easier, because we don’t have a set of equations to solve. And why would we need to keep track of p(t)? We’re only interested in the equation(s) of motion, aren’t we? Well… That’s why the answer to your question is also: no! In classical mechanics, we’re usually only interested in position, but in quantum mechanics that concept of conjugate variables (like x and p indeed) becomes much more important, and we will want to find the equations for both. So… Yes. That means a set of differential equations (one for each variable (x and p) in the example above) rather than just one. In short, the real answer to your question in regard to the complexity of the Hamiltonian modeling approach is the following: because the more abstract Hamiltonian approach to mechanics is very similar to the mathematics used in quantum mechanics, we will want to study it, because a good understanding of Hamiltonian mechanics will help us to understand the math involved in quantum mechanics. And so that’s the reason why physicists prefer it to the Lagrangian approach.

[…] Really? […] Well… At least that’s what I know about it from googling stuff here and there. Of course, another reason for physicists to prefer the Hamiltonian approach may well that they think social science (like economics) isn’t real science. Hence, we – social scientists – would surely expect them to develop approaches that are much more intricate and abstract than the ones that are being used by us, wouldn’t we?

[…] And then I am sure some of it is also related to the Anglo-French thing. 🙂

Post scriptum 1 (dated 21 March 2016): I hate to write about stuff and just explain the how—rather than the why. However, in this case, the why is really rather complicated. The math behind is referred to as calculus of variations – which is a rather complicated branch of mathematics – but the physical principle behind is the Principle of Least Action. Just click the link, and you’ll see how the Master used to explain stuff like this. It’s an easy and difficult piece at the same time. Near the end, however, it becomes pretty complicated, as he applies the theory to quantum mechanics, indeed. In any case, I’ll let you judge for yourself. 🙂

Post scriptum 2 (dated 13 September 2017): I started a blog on the Exercises on Feynman’s Lectures, and the posts on the exercises on Chapter 4 have a lot more detail, and basically give you all the math you’ll ever want on this. Just click the link. However, let me warn you: the math is not easy. Not at all, really.

# Complex Fourier analysis: an introduction

One of the most confusing sentences you’ll read in an introduction to quantum mechanics – not only in those simple (math-free) popular books but also in Feynman’s Lecture introducing the topic – is that we cannot define a unique wavelength for a short wave train. In Feynman’s words: “Such a wave train does not have a definite wavelength; there is an indefiniteness in the wave number that is related to the finite length of the train, and thus there is an indefiniteness in the momentum.” (Feynman’s Lectures, Vol. I, Ch. 38, section 1).

That is not only confusing but, in some way, actually wrong. In fact, this is an oft-occurring statement which has effectively hampered my own understanding of quantum mechanics for a long time, and it was only when I had a closer look at what a Fourier analysis really is that I understood what Feynman, and others, wanted to say. In short, it’s a classic example of where a ‘simple’ account of things can lead you astray.

Indeed, we can all imagine a short wave train with a very definite frequency. Just take any sinusoidal function and multiply it with a so-called envelope function in order to shape it into a short pulse. Transients have that shape, and I gave an example in previous posts. Another example is given below. I copied it from the Wikipedia article on Fourier analysis: f(t) is a product of two factors:

1. The first factor in the product is a cosine function: cos[2π(3t)] to be precise.
2. The second factor is an exponential function: exp(–πt2).

The frequency of this ‘product function’ is quite precise: cos[2π(3t)] = cos[6πt] = cos[6π(t + 1/3)] for all values t, and so its period is equal to 1/3. [If f(x) is a function with period P, then f(ax+b), where a is a positive constant, is periodic with period P/a.] The only thing that the second factor, i.e. exp(–πt2), does is to shape this cosine function into a nice wave train, as it quickly tends to zero on both sides of the t = 0 point. So that second function is a nice simple bell curve (just plot the graph with a graph plotter) and it doesn’t change the period (or frequency) of the product. In short, the oscillation below–which we should imagine as the representation of ‘something’ traveling through space–has a very definite frequency. So what’s Feynman saying above? There’s no Δf or Δλ here, is there?

The point to note is that these Δ concepts – Δf, Δλ, and so on – actually have very precise mathematical definitions, as one would expect in physics: they usually refer to the standard deviation of the distribution of a variable around the mean.

[…] OK, you’ll say. So what?

Well… That f(t) function above can – and, more importantly, should – be written as the sum of a potentially infinite number of waves in order to make sense of the Δf and Δλ factors in those uncertainty relations. Each of these component waves has a very specific frequency indeed, and each one of them makes its own contribution to the resultant wave. Hence, there is a distribution function for these frequencies, and so that is what Δf refers to. In other words, unlike what you’d think when taking a quick look at that graph above, Δf is not zero. So what is it then?

Well… It’s tempting to get lost in the math of it all now but I don’t want this blog to be technical. The basic ideas, however, are the following. We have a real-valued function here, f(t), which is defined from –∞ to +∞, i.e. over its so-called time domain. Hence, t ranges from –∞ to +∞ (the definition of the zero point is a matter of convention only, and we can easily change the origin by adding or subtracting some constant). [Of course, we could – and, in fact, we should – also define it over a spatial domain, but we’ll keep the analysis simple by leaving out the spatial variable (x).]

Now, the so-called Fourier transform of this function will map it to its so-called frequency domain. The animation below (for which the credit must, once again, go to Wikipedia, from which I borrow most of the material here) clearly illustrates the idea. I’ll just copy the description from the same article: “In the first frames of the animation, a function f is resolved into Fourier series: a linear combination of sines and cosines (in blue). The component frequencies of these sines and cosines spread across the frequency spectrum, are represented as peaks in the frequency domain, as shown shown in the last frames of the animation). The frequency domain representation of the function, $\hat{f}$, is the collection of these peaks at the frequencies that appear in this resolution of the function.”

[…] OK. You sort of get this (I hope). Now we should go a couple of steps further. In quantum mechanics, we’re talking not real-valued waves but complex-valued waves adding up to give us the resultant wave. Also, unlike what’s shown above, we’ll have a continuous distribution of frequencies. Hence, we’ll not have just six discrete values for the frequencies (and, hence, just six component waves), but an infinite number of them. So how does that work? Well… To do the Fourier analysis, we need to calculate the value of the following integral for each possible frequency, which I’ll denote with the Greek letter nu (ν), as we’ve used the f symbol already–not for the frequency but to denote the function itself! Let me just jot down that integral:

Huh? Don’t be scared now. Just try to understand what it actually represents. So just relax and take a long hard look at it. Note, first, that the integrand (i.e. the function that is to be integrated, between the integral sign and the dt, so that’s f(t)ei2πtν) is a complex-valued function (that should be very obvious from the in the exponent of e). Secondly, note that we need to do such integral for each value of ν. So, for each possible value of ν, we have t ranging from –∞ to +∞ in that integral. Hmm… OK. So… How does that work? Well… The illustration below shows the real and imaginary part respectively of the integrand for ν = 3. [Just in case you still don’t get it: we fix ν here (ν = 3), and calculate the value of the real and imaginary part of the integrand for each possible value of t, so t ranges from –∞ to +∞ indeed.]

So what do we see here? The first thing you should note is that the value of both the real and imaginary part of the integrand quickly tends to zero on both sides of the t = 0 point. That’s because of the shape of f(t), which does exactly the same. However, in-between those ‘zero or close-to-zero values’, the integrand does take on very specific non-zero values. As for the real part of the integrand, which is denoted by Re[e−2πi(3t)f(t)], we see that’s always positive, with a peak value equal to one at t = 0. Indeed, the real part of the integrand is always positive because f(t) and the real part of e−2πi(3toscillate at the same rate. Hence, when f(t) is positive, so is the real part of e−2πi(3t), and when f(t) is negative, so is the real part of e−2πi(3t). However, the story is obviously different for the imaginary part of the integrand, denoted by Im[e−2πi(3t)f(t)]. That’s because, in general, eiθ = cosθ + isinθ and the sine and cosine function are essentially the same functions except for a phase difference of π/2 (remember: sin(θ+π/2) = cosθ).

Capito? No? Hmm… Well… Try to read what I am writing above once again. Else, just give up. 🙂

I know this is getting complicated but let me try to summarize what’s going on here. The bottom line is that the integral above will yield a positive real number, 0.5 to be precise (as noted in the margin of the illustration), for the real part of the integrand, but it will give you a zero value for its imaginary part (also as noted in the margin of the illustration). [As for the math involved in calculating an integral of a complex-valued function (with a real-valued argument), just note that we should indeed just separate the real and imaginary parts and integrate separately. However, I don’t want you to get lost in the math so don’t worry about it too much. Just try to stick to the main story line here.]

In short, what we have here is a very significant contribution (the associated density is 0.5) of the frequency ν = 3.

Indeed, let’s compare it to the contribution of the wave with frequency ν = 5. For ν = 5, we get, once again, a value of zero when integrating the imaginary part of the integral above, because the positive and negative values cancel out. As for the real part, we’d think they would do the same if we look at the graph below, but they don’t: the integral does yield, in fact, a very tiny positive value: 1.7×10–6 (so we’re talking 1.7 millionths here). That means that the contribution of the component wave with frequency ν = 5 is close to nil but… Well… It’s not nil: we have some contribution here (i.e. some density in other words).

You get the idea (I hope). We can, and actually should, calculate the value of that integral for each possible value of ν. In other words, we should calculate the integral over the entire frequency domain, so that’s for ν ranging from –∞ to +∞. However, I won’t do that. 🙂 What I will do is just show you the grand general result (below), with the particular results (i.e. the values of 0.5 and 1.7×10–6 for ν = 3 and ν = 5) as a green and red dot respectively. [Note that the graph below uses the ξ symbol instead of ν: I used ν because that’s a more familiar symbol, but so it doesn’t change the analysis.]

Now, if you’re still with me – probably not 🙂 – you’ll immediately wonder why there are two big bumps instead of just one, i.e. two peaks in the density function instead of just one. [You’re used to these Gauss curves, aren’t you?] And you’ll also wonder what negative frequencies actually are: the first bump is a density function for negative frequencies indeed, and… Well… Now that you think of it: why the hell would we do such integral for negative values of ν? I won’t say too much about that: it’s a particularity which results from the fact that eiθ and e−2πiθ both complete a cycle per second (if θ is measured in seconds, that is) so… Well… Hmm… […] Yes. The fact of the matter is that we do have a mathematical equivalent of the bump for positive frequencies on the negative side of the frequency domain, so… Well… […] Don’t worry about it, I’d say. As mentioned above, we shouldn’t get lost in the math here. For our purpose here, which is just to illustrate what a complex Fourier transform actually is (rather than present all of the mathematical intricacies of it), we should just focus on the second bump of that density function, i.e. the density function for positive frequencies only. 🙂

So what? You’re probably tired by now, and wondering what I want to get at. Well… Nothing much. I’ve done what I wanted to do. I started with a real-valued wave train (think of a transient electric field working its way through space, for example), and I then showed how such wave train can (and should) be analyzed as consisting of an infinite number of complex-valued component waves, which each make their own contribution to the combined wave (which consists of the sum of all component waves) and, hence, can be represented by a graph like the one above, i.e. a real-valued density function around some mean, usually denoted by μ, and with some standard deviation, usually denoted by σ. So now I hope that, when you think of Δf or Δλ in the context of a so-called ‘probability wave’ (i.e. a de Broglie wave), then you’ll think of all this machinery behind.

In other words, it is not just a matter of drawing a simple figure like the one below and saying: “You see: those oscillations represent three photons being emitted one after the other by an atomic oscillator. You can see that’s quite obvious, can’t you?”

No. It is not obvious. Why not? Because anyone that’s somewhat critical will immediately say: “But how does it work really? Those wave trains seem to have a pretty definite frequency (or wavelength), even if their amplitude dies out, and, hence, the Δf factor (or Δλ factor) in that uncertainty relation must be close or, more probably, must be equal to zero. So that means we cannot say these particles are actually somewhere, because Δx must be close or equal to infinity.”

Now you know that’s a very valid remark. Because now you understand that one actually has to go through the tedious exercise of doing that Fourier transform, and so now you understand what those Δ symbols actually represent. I hope you do because of this post, and despite the fact my approach has been very superficial and intuitive. In other words, I didn’t say what physicists would probably say, and that is: “Take a good math course before you study physics!” 🙂

# The Uncertainty Principle for energy and time

In all of my posts on the Uncertainty Principle, I left a few points open or rather vague, and that was usually because I didn’t have a clear understanding of them. As I’ve read some more in the meanwhile, I think I sort of ‘get’ these points somewhat better now. Let me share them with you in this and my next posts. This post will focus on the Uncertainty Principle for time and energy.

Indeed, most (if not all) experiments illustrating the Uncertainty Principle (such as the double-slit experiment with electrons for example) focus on the position (x) and momentum (p) variables: Δx·Δp = h. But there is also a similar relationship between time and energy:

ΔE·Δt = h

These pairs of variables (position and momentum, and energy and time) are so-called conjugate variables. I think I said enough about the Δx·Δp = h equation, but what about the ΔE·Δt = h equation? Indeed, we can sort of imagine what ΔE stands for, but what about Δt? It must also be some uncertainty: about time obviously–but what time are we talking about?

I found one particularly appealing explanation in a small booklet that I bought–long time ago– in Berlin: the dtv-Atlas zur Atomphysik. First, note that the uncertainty about the position (Δx) of our ‘wavicle’ (let’s say an electron) is to be related to the length of the (complex-valued) wave-train that represents the ‘particle’ (or ‘wavicle’ if you prefer that term) in space (and in time). In turn, the length of that wave-train is determined by the spread in the frequencies of the component waves that make up that wave-train, as illustrated below. [However, note that the illustration assumes the amplitudes are real-valued only, so there’s no imaginary part. I’ll come back to this point in my next post.]

Now, we can use the de Broglie relation (λ = h/p) to relate the uncertainty about the position to the spread in the wavelengths (and, hence, the frequencies) of the component waves:

p = h/λ and, hence, Δp = Δ(h/λ) = hΔ(1/λ)

In case you wonder why I can simply take h out of the brackets, i.e. why I can write Δ(h/λ) = hΔ(1/λ), just remember that the delta symbol here (Δ) refers to a measure like the standard deviation of a variable, so Δx represents σx. Now, one can prove the following:

1. The standard deviation of some constant function is 0: Δ(k) = 0
2. The standard deviation is invariant under changes of location: Δ(x + k) = Δx
3. The standard deviation scales with the scale of the variable: Δ(kx) = |k |Δ(x)

It’s obviously the last rule that we’re using here.

Now, Δx equals h/Δp according to the Uncertainty Principle—if we take it as an equality, rather than as an inequality, that is. Therefore, Δx must equal:

Δx = h/Δp = h/[Δ(h/λ)] =h/[hΔ(1/λ)] = 1/Δ(1/λ)

That’s obvious, but so what? We cannot write Δx = Δλ, because there’s no rule that says that Δ(1/λ) = 1/Δλ and, therefore, h/Δp ≠ Δλ. Indeed, suppose we define Δλ as an interval or a length defined by the difference between its upper bound and its lower bound. Then we can write Δλ as Δλ = λ2 – λ1 and, hence, we can then write Δp as Δp = Δ(h/λ) = h/λ1 – h/λ= h(1/λ1 – 1/λ2) = h[λ2 – λ1]/λ1λ2. Now, that’s obviously something very different than h/Δλ = h/(λ2 – λ1). So we should surely not write that Δp = h/Δλ. Never ever. Having said that, the Δx = 1/Δ(1/λ) = λ1λ2/(λ2 – λ1) relationship that emerges here is quite interesting. I encourage you to explore it yourself, as I need to move on here.

So… We’re kinda stuck. What to do? How do we get that energy-time relationship? The de Broglie relation tells us that E = hν, so we can write that ΔE = Δ(hν) = hΔν. But we need to get ΔE = Δ(hν) = hΔν = h/Δt. How do we get Δν = 1/Δt, which is – obviously – the relationship that we need to get ΔE = h/Δt?

To get the answer to that question, we need to ask ourselves another one: what’s Δt here? What are we talking about?

The answer is remarkably mundane: Δt is the measurement time. What measurement time? Relax. You’ll understand in a moment. Let’s go through it.

We know there’s a universal relationship between the propagation speed of a wave (which I’ll denote by c for the time being, but don’t confuse this variable with the speed of light: it can be any speed) and the wavelength and frequency. More specifically, c = λν, and hence, 1/λ = ν/c. So we can now write Δ(1/λ) as Δ(ν/c) = Δ(ν)/c. We also know that the frequency of the wave is the reciprocal of the so-called period of the wave, i.e. the time that’s needed to go through one oscillation: τ = 1/ν and, hence, ν = 1/τ. Hence, we can write Δ(ν) = Δ(1/τ).

OK. That’s stating the obvious. So what? Where do we go from here?

First, note that, for a wavetrain, there’s no precise frequency or period, nor is there any precise number of oscillations. That’s the essence of the Uncertainty Principle in its most ubiquitous form (Δx = h/Δp). But so we can try to measure. Now, to measure something, we need some time. More in particular, to measure the frequency of a wave, we’ll need to look at that wave and register (i.e. measure) at least a few oscillations, as shown below.

I took the image from the above-mentioned German booklet and, hence, the illustration incorporates some German. However, that should not deter you from following the remarkably simple argument, which is the following:

1. The error in our measurement of the frequency (i.e. the Meβfehler, denoted by Δν) is related to the measurement time (i.e. the Meβzeit, denoted by Δt in the diagram above). Indeed, if τ represents the actual period of the oscillation – which is the reciprocal of the frequency: τ = 1/ν) (both τ and ν are obviously unknown to us: otherwise we wouldn’t be trying to measure the frequency), then we can write Δt as some multiple of τ. More specifically, in the example above we assume that Δt ≈ 4τ = 4/ν. [Note that we use an almost equal to sign (≈) rather than an equality sign (=) because we don’t know τ (or ν). That’s the whole point about it, indeed.]
2. During that time, we measure four oscillations in our example and, hence, we are tempted to write that ν = 4/Δt. However, because of the measurement error, we should interpret the value for our measurement not as 4 exactly but as 4 plus or minus one: 4 ± 1. Indeed, it’s like measuring the length of something: if our yardstick has millimeter marks, then we’ll measure someone’s length as some number plus or minus 1 mm. Here we are counting the number of oscillations. Hence, the result of our measurement should be written as ν ± Δν = (4 ± 1)/Δt = 4/Δt ± 1/Δt. If you have trouble following the argument, just put in some numbers in order to gain a better understanding. For example, imagine an oscillation of 100 Hz (i.e. 100 oscillations per second), and a measurement time of four hundredths of a second (i.e. Δt = 4×10–2 s). Suppose, then, we do indeed measure 4 ± 1 oscillations during that time. Then the frequency of this wave must be equal to ν ± Δν = (4 ± 1)/Δt = 4/(4×10–2 s) ± 1/(4×10–2 s) = 100 ± 25 Hz. In other words, we here accept that we have a measurement error of Δν/ν = 25/100 = 25%. That’s a relatively large error because the measurement time was relatively short, [Note that ‘relatively short’ means ‘short as compared to the actual period of the oscillation’. Indeed, 4×10–2 s is obviously not short in any absolute sense: in fact, it is like an eternity when we’re talking light waves, which have frequencies measured in terahertz.]
3. The example makes it clear that Δν, i.e. the error in our measurement of the frequency, is related to the measurement time as follows: Δν = 1/Δt. Hence, if we double the measurement time, we halve the error in the measurement of the frequency. The relationship is quite straightforward indeed: let’s take the example of that 100 Hz wave once again and assume that our measurement time Δt is equal to Δt = 10τ = 10×10–2 s = 10–1 s. In that case, we get Δν = 1/10–1 s = 10 Hz. Hence, the measurement error is now Δν/ν = 10/100 = 10%.
4. How long should the measurement time be in order to get a 1% error only? Let’s write the error as a percentage first: Δν/ν = x % = x/100. But Δν = 1/Δt. Hence, we have Δν/ν = (1/Δt)/ν = 1/(Δt·ν) = x/100 or Δt = 100/(x·ν). So, for x = 1 (i.e. an error of 1%), we get Δt = 100/(1·100) = 1 second; for x = 5 (i.e. an error of 5%), we get Δt = 100/(5·100) = 0.2 seconds. Finally, for x = 25 (i.e. an error of 25%), we get Δt = 100/(25·100) = 0.04 seconds, or 4×10–2 s, which is what this example started out with.

You’ll say: so what? We’re still nowhere… Well… No. We’ve got a formula with the frequency variable here, so we can now derive the Uncertainty Principle for time and energy from the other de Broglie relation (E = hν), which relates the energy of a ‘wavicle’ to the de Broglie frequency. Hence, the uncertainty about the energy about the energy must be related to the measurement time as follows:

E = hν ⇒ ΔE = Δ(hν) = hΔν = h(1/Δt) = h/Δt ⇔ ΔE·Δt = h

So, what this expression of the Uncertainty Principle says is the following: if we increase the measurement time, we’ll reduce the uncertainty in our knowledge of the energy of our ‘wavicle’. Conversely, if we only have a very short measurement time, we’ll not be able to say much about its energy.

A final note needs to be made on the value of h: it’s very tiny. Indeed, a value of (about) 6.6×10−34 J·s or, using the smaller eV unit for energy, some 4.1×10−15 eV·s is unimaginably small, especially because we need to take into account that the energy concept as used in the de Broglie equation includes the rest mass of a particle. Now, anything that has any rest mass has enormous energy according to Einstein’s mass-energy equivalence relationship: E = mc2. Let’s consider, for example, a hydrogen atom. Its atomic mass can be expressed in eV/c2, using the same E = mcbut written as m = E/c2, although you will usually find it expressed in so-called unified atomic mass units (u). The mass of our hydrogen atom is approximately 1 u ≈ 931.5×106 eV/c2. That means its energy is about 931.5×106 eV. In plain language, that’s 931.5 million eV. Hence, if we’d be happy with an uncertainty of plus or minus one million eV, then it’s obvious that even very small values for Δt (i.e. very short measurements) will give us what we want. However, it is likely that we’ll want to reduce the measurement error to much less than plus or minus one million eV, so that means that our measurement time Δt will have to go up. Having said that, the point is still quite clear: we don’t need much time to measure the mass (or the energy) of this hydrogen atom very accurately.

The corollary of this is that the de Broglie frequency f = E/h of such particle is very high. To be precise, the frequency will be in the order of (931.5×106 eV)/(4.1×10−15 eV·s) = 0.2×1024 Hz. In practice, this means that the wavelength is so tiny that there’s no detector which will actually measure the ‘oscillation’: any physical detector will straddle most – in fact, I should say: all – of the wiggles of the probability curve. All these facts basically state the same: a hydrogen atom occupies a very precisely determined position in time and space. Hence, we will see it as a ‘hard’ particle, not as a ‘wavicle’.

That’s why the interference experiment mentions electrons, rather than hydrogen atoms or other ‘big stuff’, even if I should immediately add that interference patterns have been observed using much larger particles as well. However, I wrote about that before, so I won’t repeat myself here. The point was to make that energy-time relationship somewhat more explicit, and I hope I’ve been successful at that at least. You can play with some more numbers yourself now. 🙂

Post scriptum: The Breit-Wigner distribution

The Uncertainty Principle applied to time and energy has an interesting application: it’s used to assign a lifetime to very short-lived particles. In essence, the ‘spread’ around their mean energy (ΔE) is used to calculate their lifetime through the ΔEΔt = ħ/2 equation. I won’t say much about this, because Georgia University’s Hyperphysics website gives an excellent quick explanation of this, and so I just copied that below.