[Note (added on 23 February 2020): When re-reading what I wrote below, I realize I should fundamentally re-write certain sections. More in particular, I think of the dichotomy between fermions and bosons (adding versus subtracting amplitudes) as presented here as plain nonsense: it is just part of the rubbish that keeps the so-called ‘Mystery of Quantum Physics’ alive (see my paper on the electron model). The distinction between fermions and bosons is a unnecessary generalization. Worse, I feel that thinking of particles as having spin-one or spin-one-half, exactly, is actually incorrect: the Planck-Einstein relation is correct but – because a charge is not exactly pointlike (it always has some dimension and some mass) – the angular momentum of matter-particles will always be slightly off. In case this language puts you off, think of it like this: an electron is neither a perfect disk or a perfect ring. It’s close but not quite. As for the definition of matter-particles: matter-particles are particles with charge. [Neutrons are matter-particles too: the positive and negative charge inside just cancels out.] Photons don’t. Photons are an electromagnetic oscillation traveling in space. Any case. I’ll let you read.]
This is a longer piece. I’ll call it Quantum Mechanics for Not-So-Dummies. Let’s analyze the two elements that make up the term: quantum and mechanics.
1. Quantum, like in Quantum Theory 🙂
The quantum-theoretical aspect of quantum mechanics appeals to us intuitively–for the same reason the Greek philosophers embraced the atomic theory. Reality is granular. At some point, continuum theory breaks down. It’s the one thing you surely know about quantum theory: energy comes in discrete packets, which Einstein referred to as light quanta, but which are now known as photons.
You probably also know the Planck-Einstein relationship: E = hν. While this formula looks like 15 = 3×5, it’s not like that at all. Understanding it requires a lot of study. The complexity behind comes to the surface when stating what it stands for: the energy of a photon (E) is proportional to the frequency of the light (ν), and that the proportionality constant is equal to h, a quantity known as Planck’s constant. This statement assumes you already understand the concepts of energy, a photon, frequency, etcetera.
Note the difference between the Greek letter nu (ν), which denotes a frequency, as opposed to the Latin letter v, for velocity. And now that we’re talking quantities and symbols, let’s do a quick dimensional analysis of this equation:
- On the left-hand side of this equation, we have energy. Energy is a quantity expressed in a unit that’s derived from the so-called SI base units. To be precise: one joule (J) is the energy transferred (or the work done) when applying a force of one newton (N) through a distance of one meter (m). Hence, 1 J = 1 N·m. The newton is a derived unit too: it’s the force needed to accelerate 1 kg at the rate of 1 m/s per second, so 1 N = 1 kg·m/s2. Hence, 1 J = 1 kg·m2/s2.
- On the right-hand side, we have frequency. A frequency is associated with a waveform. A waveform is a function in time and space. A wave has an amplitude (think of the height of a water wave), and its argument is referred to as the phase. Think of a simple sine function, for example: Φ = sinθ. The phase of this function (denoted by the Greek letter phi) is its argument θ (theta), which varies in time and in space. In fact, if c is the speed of propagation of the wave, we can write θ as θ = x–ct. I’ve shown that in another post. But back to frequency: frequencies are expressed in hertz. That’s the number of cycles per second. So the unit is s–1. Hence, to make the units come out alright, we must express h in J·s. I’ll come back to this.
So, on the left-hand side, we have energy, which is a directionless quantity, while the right-hand side refers to a waveform, which relates space, time and some amplitude. Of course, a wave definitely has some direction, which brings me to the following point.
We have an alternative expression for the Planck-Einstein relationship: p = h/λ. In this expression, we have the magnitude of the photon’s momentum (p) on the left-hand side while, on the right-hand side, we have the wavelength of the light. [Note my use of boldface for a vector (p), as opposed to its magnitude (p).] Energy and momentum are related. The concept of momentum combines mass and velocity (p = m·v) and, hence, we may think of it as energy with some direction. However, because photons have no (rest) mass, explaining what the momentum of a photon takes some more time. I’ve done that elsewhere and so I won’t dwell on the relation between energy and momentum here. The point to note is that the alternative expression also relates a directionless quantity (we’re talking the magnitude of p), while the right-hand side, again incorporates the idea of a wave.
The point is: relations like this (E = hν or p = h/λ) are very deep, and it takes time and effort to acquire an equally deep understanding. Let me get back to the lesson.
Of course, you know there is a similar relationship for matter-waves, the de Broglie relationship, which we write as E = hf. The energy of an electron (E) is proportional to the de Broglie frequency (f), and the proportionality constant is the same: h. I should immediately add, however, that the de Broglie frequency is not so easy to interpret. Because of the Uncertainty Principle, we’re actually talking about a frequency range (which we’ll denote as Δf), rather than one precise value. However, that knowledge shouldn’t distract you here. Let’s look at that proportionality constant h.
First, note that you’ll often see it as ħ (h bar) because, in physics, we’ll often express frequencies as angular frequencies. The angular frequency measures frequency not in cycles (or oscillations) per second (hertz) but as the rate of change of the phase of the waveform. The rate of change of the phase (dθ/dt) is measured in radians per second, so it’s an angular speed. Because one cycle corresponds to 2π rad, the difference between h and ħ is just the 2π factor: ħ = h/2π. The point is: h is a fundamental constant, but it’s not a fundamental unit. Hence, if we express our frequency variable in a different unit (radians per second rather than cycles per second), the numerical value of the constant needs to change to, and it does so with that factor 2π. That’s just math: the physics behind the equations doesn’t change. 🙂
Planck’s constant is unimaginably small. Hence, the energy of a photon is tiny as well. Indeed, visible light has frequencies ranging from 400 to 790 tera-hertz, so that 400 to 790 trillion cycles per second. However, the photon energy is only a few electronvolt. An electronvolt is an unimaginably small unit: about 1.6×10−19 joules. So that’s something with nineteen zeroes behind the decimal point. Because these units are so tiny, continuum theory (which analyzes light as a wave, for example) works rather well. But now we know it’s not correct: tout est quantique.
Today, all of the statements above are self-evident but, just a mere hundred years ago, scientists were not so sure. Indeed, when the German physicist Max Planck presented his solution to that black body radiation problem which was bothering 19th century physicists, on 14 December 1900, in a lecture to the German Physical Society, he used the letter h for his constant because he considered it to be a Hilfsvariable only: a non-essential auxiliary variable for use in a mathematical argument only. Later, he wrote that, at the time, he was convinced that the idea of a quantization of energy was “a purely formal assumption” only.
It’s rather strange that Planck did not want to consider quantization of energy as a fundamental characteristic of Nature itself, because the new theory had been in the works for quite a while already. Most notably, Ludwig Boltzmann, the ‘Father of Statistical Mechanics‘, had already advanced it in 1877, and he knew how revolutionary it was. In fact, Boltzmann had been pushing for the idea to be accepted all his life, but no one liked it. Surely not Planck: he would later write that his acceptance of Boltzmann’s suggestion for a solution to the black body radiation problem was ‘an act of desperation’. Even in his 1906 Lectures on The Theory of Thermal Radiation, i.e. a full year after Einstein had enthusiastically embraced the new quantum theory in the first of his four Annus Mirabilis papers, Planck still favored ‘continuum theory’. By that time, Einstein had already been recognized as a genius, so it’s not that this new theory lacked credibility. Max Planck was not only conservative, but also stubborn.
Having said that, I should add that, while Einstein’s paper is usually referred to as the paper “on the photoelectric effect”, its actual title was On a Heuristic Viewpoint Concerning the Production and Transformation of Light. That reference to a ‘heuristic viewpoint’ indicates the same caution: a heuristic method is a problem-solving approach that is not based on theory but on practice, or experience. In other words, something that works but is not necessarily theoretically sound. Hence, even Einstein was very careful and did not present his theory on light quanta as the New Great Truth.
In any case, a few years later, the battle over the idea of the quantization of light and energy quanta was over, and it was clear who had won: with all of the great scientists of that era endorsing quantum theory, Planck’s constant was no longer a Hilfsvariable: it was now generally accepted as one of the most fundamental physical constants in Nature, earning Planck a Nobel Prize in Physics as the ‘Reluctant Father of Quantum Physics.’ Unfortunately, Boltzmann had hung himself in a bout of depression, and so he couldn’t savor the victory.
Looking back, with the enormous benefit of hindsight, it is a bit awkward that most of the 19th century physicists truly believed that there was no limit as to how small things could be, but so that’s the subject of the History of Science, and that’s not what I want to get into here.
I mentioned that h is an unimaginably small unit. You should take that literally: our mind can’t imagine such small numbers. We can imagine the SI unit in which h is measured, i.e. the Joule-second (J·s), which is energy multiplied by time, which is the dimension of action (or Wirkung, as the Germans call it), but not the number in front. One joule of energy is what is required to lift a hundred grams one meter straight up. 100 g is the weight (or mass, I should say) of a really small tomato: you’ll agree that doesn’t require much energy but, as mentioned, we can imagine it. More formally, one Joule-second amounts to applying a force of one newton over a distance of one meter (1 newton meter or N·m) in one second, so action is the product of work (energy) and time. So, that we can imagine, sort of. However, the number in front of Planck’s constant (h = 6.626×10−34 J·s) has nothing to do with the scale that we, human beings, are used to. Let me write it out. The minus 34 exponent says that we’ll get the first significant digit only after 33 zeroes after the decimal point, so h is:
That’s unimaginably small indeed. Even when we switch from Joules to electronvolts (eV) – a much smaller measure of energy that is more appropriate at the subatomic scale (the energy of light photons in the visible light spectrum is typically a few eV only) – we still get an ridiculously small number: approximately 4×10−15 eV·s. So, while we got rid of nineteen zeroes (1 eV ≈ 1.6×10−19 J), we’re still left with fifteen! [Just for the record, I should note that the electronvolt is an empirical unit, as opposed to an SI unit, so it’s not based on the international base units (m, kg, s, K, etcetera): its value (in SI units) is determined experimentally.]
Reflecting on this smallness of Planck’s constant, but looking at the unit in which we express that constant (J·s or eV·s), you may think that the Planck constant is terribly small because our unit of time (the second) is huge. I wouldn’t agree with that. We can, perhaps, imagine a millisecond. For example, 3 milliseconds correspond to the flap of a housefly’s wing and, during that time light travels almost 1000 km, so that’s about the air travel distance between Delhi and Mumbai. We also have millisecond stopwatches (although they usually display centiseconds only (i.e. a hundredth of a second), which is good enough because our reaction time is measured not in milli- or centiseconds but in tenths of a second). So, yes, we can imagine milliseconds. However, we’d need to use the femtosecond (10−15 s) as our unit of time in order to get rid of the zeroes behind the decimal point here. Now, we cannot imagine a femtosecond. That is simply beyond us. Hence, the conclusion stands: Planck’s constant is an unimaginably small.
As we’re talking units, I should make a small digression here. You’ve probably heard of so-called natural units, or Planck units, and you may think they may help us in getting some better understanding. To some extent, they do. But to some extent only. Let’s have a look at them. We get them by equating the most fundamental constants in Nature with 1. What are the ‘most fundamental constants in Nature’? To calculate Planck units, we use five:
- c: the speed of light (299,792,458 m/s);
- G: the universal gravitational constant (6.67384×10-–11 N·(m/kg)2);
- ħ: the reduced Planck constant, which we use when we switch from hertz (the number of cycles per second) as a measure of frequency (like in E = hν) to so-called angular frequencies (like in E = ħω), which are much more convenient to work with from a math point of view: ħ = h/2π and ,hence, ħ ≈ 6.6 ×10−16 eV·s;
- ke: Coulomb’s constant (ke = 1/4πε0); and, finally,
- kB: the Boltzmann constant. [The selection of Boltzmann’s constant in this Club of Five is another reason why his suicide is so regrettable. Boltzmann was at least as clever as Einstein, so why isn’t he as famous?]
When we equate these five constants with 1, we’re re-scaling both unimaginably large numbers (like the speed of light) as well as incredibly small numbers (like h, or G and kB), and we get so-called ‘natural units’: the Planck length, the the Planck time, the Planck energy (and mass), the Planck charge, and the Planck unit of temperature.
Now, it’s true that some of these ‘natural units’ have easy-to-understand physical meanings. For example, the Planck time and the Planck length are related to each other because one Planck time unit (tP) is the time that is required for light to travel a distance of one Planck length (lP) and, vice versa, one Planck length is the distance light travels in one Planck time unit. However, I should note that we have the same relationship between a light-second (an empirical distance unit equal to approximately 299,792 km) and a second, so that’s not very revealing. Let me quickly jot down the values of these so-called natural units (expressed in the ‘old’ units, of course):
- 1 Planck time unit (tP) ≈ 5.4×10−44 s
- 1 Planck length unit (lP) ≈ 1.6×10−35 m
- 1 Planck energy unit (EP) ≈ 1.22×1028 eV = 1.22×1019 GeV (giga-electronvolt) ≈ 2×109 J
- 1 Planck unit of electric charge (qP) ≈ 1.87555×10–18 C (Coulomb)
- 1 Planck unit of temperature (TP) ≈ 1.416834×1032 K (Kelvin)
So what? These units don’t have the same status as h: they are just units, not some fundamental physical constant. As for their values (expressed in the old units), these are calculated using equations for which you have no use here. Let me just say there are a number of fundamental equations out there, like the Uncertainty Principle, for example (σxσp = ħ and σEσt = ħ), or the equation that solves the black body radiation problem. And so these units come out of these equations if we equate ħ, c, G, ke and kB with one (so if we define ħ, c, G, ke and kB as c = ħ = kB = ke = G = 1). If you really want to know the detail, I’ll just refer you to the Wikipedia article on it. In the meanwhile, you can quickly verify the accuracy of all what I am writing above by some quick checks. For example, I mentioned that the speed of light (c) in the old units (i.e. meter per second) equals one Planck length unit divided by one Planck time unit, so that’s (1.6×10−35 m)/(5.4×10−44 s) = 3×108 m/s = c. It works. Also try the Uncertainty Principle: (1 EP unit)·(1 tP unit) = (1.22×1028 eV)·(5.4×10−44 s) = 6.6 ×10−16 eV·s = ħ. It works again. What about the one with position (x) and momentum (p = m·v = mass times velocity)? That works too, but there’s the added complication that m·kg·m2/s needs to be converted to eV·s. That’s a good exercise that will help you to understand what dimensions are all about, and so I’ll leave that one for you. 🙂
Have a look at the values again. They don’t help us as we try to imagine what goes on the so-called Planck scale. That shouldn’t surprise us, because re-scaling things doesn’t change physical realities. Indeed, the Planck time and length units are still unimaginably small. For example, the wavelength of visible light ranges from 380 to 750 nanometer: a nanometer is a billionth of a meter, so that’s 10−9 m. Also, hard gamma rays have wavelengths measured in picometer, so that 10−12 m. Again, don’t even pretend you can imagine how small 10−35 m is, because you can’t: 10−12 and 10−35 differ by a factor 10−23. Again, that’s something we cannot imagine. We just can’t. The same reasoning is valid for the Planck time unit (5.4×10−44 s), which has a (negative) exponent that’s even larger.
In contrast, we’ve got Planck energy and temperature units that are enormous—especially the temperature unit! In fact, it’s rather interesting to note that, while the energy unit is huge, we can actually relate it to our daily life by doing yet another conversion: 2×109 J (i.e. 2 giga-joule) corresponds to 0.5433 MWh (megawatt-hour), i.e. 543 kilowatt-hours! I could give you a lot of examples of how much energy that is but one illustration I particularly like is that 0.5 MWh is equivalent to the electricity consumption of a typical American home over a month or so. So, yes, that’s huge… 🙂
What about the Planck unit for electric charge? Well… The charge of an electron expressed in Coulomb is about−1.6×10−19 C, so that’s pretty close to 1.87555×10–18 C, isn’t it? To be precise, the Planck charge is approximately 11.7 times the electron charge. So… Well… The Planck charge seems to be something we can imagine at least.
What about the Planck mass? Well… Energy and mass are related through the mass-energy equivalence relationship (E = mc2) and, when you take care of the units, you should find that 2 giga-joule (i.e. the Planck energy unit) corresponds to a Planck mass unit (mP) equal to 2.1765×10−8 kg. Again, that’s huge (at the atomic scale, at least): it’s like the mass of an eyebrow hair, or a flea egg. But so it’s, once again, something we can imagine at least. Let’s quickly do the calculations for the energy and mass of an electron, just to see what we get:
- Measured in our old-fashioned super-sized SI kilogram unit, the electron mass is me = 9.1×10–31 kg.
- The Planck mass is mP = 2.1765×10−8 kg.
- Hence, the electron mass expressed in Planck units is meP = me/mP = (9.1×10–31 kg)/(2.1765×10−8 kg) = 4.181×10−23.
Now, when we calculate the (equivalent) energy of an electron, we get the same number. Indeed, from the E = mc2 relation, we know the mass of an electron can also be written as 0.511 MeV/c2. Hence, the equivalent energy is 0.511 MeV (in case you wonder, that’s just the same number but without the 1/c2 factor). Now, the Planck energy EP (in eV) is 1.22×1028 eV, so we get EeP = Ee/EP = (0.511×106 eV)/(1.22×1028 eV) = 4.181×10−23. So it’s exactly the same number as the electron mass expressed in Planck units. That’s nice, but not all that spectacular either because, when we equate c with 1, then E = E = mc2 simplifies to E = m, so we don’t need Planck units for that equality.
So what’s the point here? Well… We see the electron mass (or its energy), and its charge, emerge as some kind of gauge here, i.e. some kind of relative measure. That, in itself, is not really revolutionary. Indeed, if we’d use the electronvolt consistently as a measure for both mass and energy, we’d be doing the same. Having said that – and here’s the real advantage of these natural units – when we use them, we see some relations that we could not see so easily before. To be precise, we’ve got this ‘magical’ alpha number, the so-called fine-structure constant (α), about which I wrote in another post, so I’ll just present the basics here. The fine-structure constant relates all of the fundamental properties of the electron, thereby revealing a unity that, frankly, we struggle to understand. In that post of mine, I prove the following relationships:
(1) α is the square of the electron charge expressed in Planck units: α = eP2. [I mentioned that the Planck charge is approximately 11.7 times the electron charge. To be precise, from this equation, it’s easy to see that the factor is 1/√α ≈ 11.706. You can also quickly check the relationship: qP = e/√α.]
(2) α is the square root of the ratio of (a) the classical electron radius and (b) the Bohr radius: α = √(re /r). [Note that this equation does not depend on the units, in contrast to equation 1 (above), and 4 and 5 (below), which require you to switch to Planck units. It’s the square of a ratio and, hence, the units don’t matter. They fall away.]
(3) α is the (relative) speed of an electron: α = v/c. [The relative speed is the speed as measured against the speed of light. Note that this is also an equation that does not depend on the units, just like (2): we can express v and c in whatever unit we want, as long we’re consistent and express both in the same units.]
(4) α is also equal to the product of (a) the electron mass and (b) the classical electron radius re (if both are expressed in Planck units, that is): α = me·re. And, of course, because the electron mass and energy are the same when measured in Planck units, we can also write: α = EeP·re.
From (2) and (4), we also get:
(5) The electron mass (in Planck units) is equal me = α/re = α/α2r = 1/αr. [So that gives us an expression, using α once again, for the electron mass as a function of the Bohr radius r (expressed in Planck units).]
Finally, we can substitute (1) in (5) to get:
(6) The electron mass (in Planck units) is equal to me = α/re = eP2/re. Using the Bohr radius, we get me = 1/αr = 1/eP2r.
These relationships are truly amazing and, as mentioned, reveal an underlying unity at the atomic/sub-atomic level that we’re struggling to understand. However, at this point, I need to get back to the lesson. Indeed, I just wanted to jot down some ‘essentials’ here. Sorry I got distracted. 🙂
Popular and not-so-popular scientists like to talk about the Planck scale. They’ll say things like: “At this scale, the concepts of size and distance break down!” What do they mean with that? It means that, at this scale, the laws of physics yield nonsensical stuff. For example, a photon with a wavelength (λ) equal to the Planck length would have a frequency (ν) equal to ν = c/λ = (3×108 m/s)/(1.6×10−35 m) = 1.875×1043 Hz.
Of course, you’ll wonder why we’d need a photon with a wavelength equal to the Planck length. Well… We need it to ‘see’ things. Indeed, the size of the things we can ‘see’ is determined by the wavelength of the ‘light’ that we use. The relationship is very ‘physical’ and, hence, very straightforward: the angular resolution of any image-forming device is θ = λ/D, with λ the wavelength of the ‘light’ and D is the diameter of the aperture or primary lens (i.e. the surface through which the ‘light’ comes in). Now, I use apostrophes for the term ‘light’ because, when analyzing atomic or sub-atomic phenomena, we are likely to use an electron microscope. So, in that case, the ‘light’ is actually a particle beam itself and, in that case, λ is the de Broglie wavelength, so that’s the wavelength of the ‘matter wave’. In case you’re interested in the details behind that θ = λ/D relation, see my posts on diffraction. [I must warn you though: while the relation is ‘pretty straight’, understanding the topic itself is not so easy.]
Back to that frequency: 1043 Hz. What about it? Well… It’s simple: we just can’t imagine electromagnetic radiation with that frequency. The radiation with the highest energy (and, hence, the highest frequencies) in Nature is gamma (γ) radiation. Photons in gamma-rays have typical energies of a few hundred keV and, occasionally, a few MeV, which, using the Planck-Einstein relation (E = hν), corresponds to frequencies in the range of 1019 Hz (i.e. 10 exahertz) to 1021 Hz (zettahertz). That’s incredibly high. Of course, Nature shows us even more remarkable stuff: cosmic rays coming from extremely energetic explosions in distant galaxies (hypernovas as they’re called) may contain bursts of photons with energies of up to 103 TeV (tera-electronvolt). In fact, the HAWC gamma-ray observatory does not exclude that energy levels in the range of 106 TeV or higher may exist.
Still, 106 TeV (which is 1018 eV) corresponds to a frequency in the range of 1032 Hz ‘only’. You’ll say: so what? 1032 is pretty close to 1043, isn’t it? No. It is not. Going from 1032 to 1043 requires us to add another dozen zeroes or so. There’s a Great Desert in-between. [The ‘Great Desert’ is, indeed, the term that physicists use for energy scales between the TeV scale and the so-called GUTs scale. GUTs stands for Grand Unification Theories, as you know, and the GUTs scale is the energy level where all forces (electromagnetic force, strong nuclear force, and the weak nuclear force) are supposed to become equal in strength and unify to one force only (at least that’s what GUTs theories claim).]
It’s really all too easy to be fooled by those astronomically large exponents. Just for the record, CERN’s Large Hadron Collider (which confirmed that the Higgs particle is a pretty plausible hypothesis and, hence, confirmed that the so-called Standard Model of physics is pretty fine) gives particles TeV energy levels (like 7 TeV), but so that’s a million times below the above-mentioned 106 TeV level. As I wrote elsewhere, our mind is used to add and subtract things and, to some extent, can also deal with multiplication and division, but it’s hard to sort of intuitively ‘get’ exponentiation (I should add that logarithms (the inverse of exponentials) are equally difficult of course, if not more). Of course, we can easily visualize what squaring means (everyone can make a square out of a length), or what cubing a number means (everyone can visualize making a cube out of a length) but, frankly, if you think that you have any idea of what 1032 Hz actually means, you’re fooling yourself.
Back to the lesson. We don’t have a prefix to describe frequencies on the other side of the Great Desert but, just for fun, we can, of course, calculate the energy of the ‘photons’ of those theoretical ‘waves’ in the 1043 Hz range : E = hν = (4×10−15 eV·s)·(2×1043) ≈ 1028 eV, so it’s in the range of the Planck energy unit. Again: so what? Well… Photons are supposed to be point-like. How can we pack so much energy in a dimensionless point? Even when we’d try to fit it in one Planck length, its equivalent mass or energy would make that tiny little space an equally tiny black hole. So we get nonsensical stuff, on which I’ll let more popular writers babble. [To be precise, what counts, of course, is the energy that we try to pack into a space that’s equivalent with one wavelength, but let me refer you to another post of mine for the details.]
In fact, the ‘nonsensical stuff’ or, let me be polite, the contradictions in our model of the world, as modeled in physics, are not limited to the other side of the Great Desert. We got similar contradictions in the classical theory of electromagnetism, in which we assume that charged particles (protons and electrons) are pointlike. [If you want to know more, see the Note on point charges in one of my posts on the classical electromagnetic field theory.] Those contradictions are solved by accepting that protons and electrons are not really pointlike but that the approximation is good enough to do what we need to do. So it’s all a matter of scale indeed and, who knows, perhaps we’ll be able to solve – one day – the contradictions involved by assuming hidden or rolled-up dimensions. The problem this time, however, is that it’s not easy to see how we’ll ever be able to prove such theories. Indeed, quantum mechanics – viewed as a theory that ‘fixed’ the theoretical problems of classical physics – was backed up by experimental evidence. We won’t have such evidence for any of those GUTs.
That’s why my own interest in trying to understand what’s beyond the Great Desert has diminished substantially: just like Feynman, I’ve started to wonder whether it’s any use to think about things we will never be able to verify by experiment. As I noted in one of my posts (Loose Ends), the Standard Model looks complicated and, hence, it does not give great aesthetic pleasure – in my humble opinion, that is – but it is what it is, and with the confirmation of the Higgs hypothesis in 2012, there’s maybe not all that much left to discuss for physicists.
So… Back to quantum mechanics.
2. Mechanics, like in Quantum Mechanics 🙂
The second word in quantum mechanics is mechanics, commonly defined as “the branch of applied mathematics dealing with motion and forces producing motion.” Now, IMHO, that aspect of quantum mechanics is something that does not get enough attention in popular accounts of what quantum physics is supposed to be about.
One reason for this deficit of attention may be that it involves statistics indeed. Now, if you’re a statistician, you’ll find statistics the most fascinating branch of mathematics. If you’re not a statistician, you’re likely to find it the most boring one. 🙂 Whatever one’s appreciation of the topic, I feel I only started understanding something of quantum mechanics as I gradually understood we’re talking a different type of statistics in quantum mechanics indeed. To put it simply: statistics is about probabilities, and quantum mechanics involves different rules for calculating those probabilities.
Now, I am sure you’ve heard that before. In fact, if you’re reading this blog at all, you are very likely to already know these ‘different rules’ for calculating probabilities. But to what extent do you truly understand them? Let’s first have a look again at how classical mechanics deals with uncertainty, randomness and probabilities. In other words, let’s have a look at ‘classical’ statistics. Sorry for that, but I am just taking you through my own journey here. 🙂
You know the random walk. We toss a coin repeatedly: if it’s heads, we take a step forward, but if it’s tails, we take a step back. On the one hand, we expect to make zero progress. On the other, we also expect that, as we keep tossing that coin, we would, somehow, stray from the starting point, and stray farther and farther as we keep tossing that coin and go back and forth accordingly. These two intuitions contradict each other. So what would be the actual outcome?
Of course, because you’re reading this (and, hence, you’re probably smarter than the average person), I must assume you already know that the second intuition is true: we will stray away from the starting point. However, let’s do it the right way and calculate it.
We need to introduce some elementary statistical terms here. Let’s denote the number of times we’re tossing (and, hence, the number of steps we’re taking) by the variable N. So N is some positive integer: 1, 2, 3 etcetera. Now, progress would be measured by the so-called root-mean-square distance, which we’ll denote by Drms.
Don’t switch off now. Just bear with me for a while. In general, the root-mean-square (aka the quadratic mean) of a value, or a set of values, is the (square) root of the mean of the squares of the values. So it’s the following formula:
I should immediately note that we’re only interested in the positive root (+√) here. So, yes, we conveniently forget about the negative root (−√). Why? Because we’re talking magnitudes here, such as a distance or a length, for example. Magnitudes are positive numbers, always, and, hence, it’s only the positive root that counts indeed.
Of course, when you see that formula, the most obvious question is: why square those numbers if it’s only to take the square root of it all again? Well… To measure the magnitude (or length, if you prefer that term) of a varying quantity, that’s what we need to do: the various xi values can be positive or negative and we want the negative values to count too when calculating the (average) magnitude. So we cannot just taking the arithmetic mean, i.e. x1 + x2 + … + xn)/n. Taking the arithmetic mean would ensure that positive and negative values cancel out. For example, the arithmetic mean of –3, –1, 1 and 3 is zero and, if you think this example is too obvious, calculate the arithmetic mean of –7, –2, 4 and 5. And? What do you get? 🙂 In short, the arithmetic mean does not measure some average magnitude. Hence, we first have to square the xi values, then take the average (by summing those squared values and then dividing the result by n), and then we take the (square) root again.
You should work out a few examples to convince yourself of the logic. For example, take –3, 4, –5 and 7: the averaged squared value is [(–2)2 + (4)2 + (–5)2 + 72]/4 = [4 + 16 + 25 + 49]/4 = 94/4 = 23.5. Hence, xrms is √23.5 ≈ 4.84768. 4.84768? Yes. If you look at the values, you may think it should be 4.5, but… No. 4.5 is the absolute mean of this short series, so that’s the average of the absolute values: [|–2| + |4| + |–5| + |7|]/4 = (2 + 4 + 5 + 7]/4 = 18/4 = 4.5.
Why do we have different ways of calculating the mean (arithmetic mean, quadratic mean, absolute mean…). Because they all serve a different purpose. Indeed, if you’re a critical reader (which you are, without any doubt), you’ll say: I see why the arithmetic mean is no good, but why not take the absolute mean? While the difference between 4.5 and 4.84768 is not enormous, the squaring obviously emphasizes larger differences and, hence, it does produce a different result.
The answer to this question is twofold. First, the quadratic mean is easier to work with from a math point of view: if we have to take the mean of some function (e.g. a sinusoidal function, which we encounter a lot in physics and engineering), then it’s worth noting that we’ll have to do some integral, and integrating an absolute value function is a lot more work than integrating the root-mean-square value of the function we’re looking at. For example, integrating f(θ) = Asinθ over one cycle is pretty easy to do (if you want to know how exactly, let me just give you a link here) and yields a very simple result: A/√2 ≈ 0.707·A. Doing the same integral for the absolute value function requires us to determine where the quantity inside the absolute value bars is negative and where it is positive and then we need to break up the integral accordingly.
A more important reason for preferring the rms value, however, is that, in physics and engineering, we’ll often be interested in the energy or power (i.e. energy per second) being delivered by a wave, and energy (E) happens to be proportional to the square of the amplitude (A) of the wave: E ∝ A2. In that case, squaring the rms value (i.e. just using the squared values) will immediately give us what we want, while squaring the absolute mean won’t. It’s really not the place to go into any more detail here, so I won’t.
An important point to note (because we’ll use it later) is that the rms formula above is closely related to the Pythagorean theorem in two-, three,… or whatever n-dimensional space. Indeed, it’s valid in any space–as long as it’s Euclidean, not curved (in case you want to read a bit about curved space, see my post on it). Indeed, as I mentioned already, distance is an example of a magnitude and, in three-dimensional space, the distance between two points will be measured as:
But so there is no 1/n factor in this formula? No. I said the formulas were related, but they are not the same. I should get back to the original problem here, and that’s our random walk. So we’re not trying to find the distance between two points but the distance from the origin after we’ve taken N steps. That’s something different.
First, we should note that we’re not dealing with actual values here: we’re trying to find expected values and probabilities. That’s what statistics is all about. An expected value can be defined, rather intuitively, as a long-run average value, so it would involve the calculation of some kind of limit: if we would do the experiment a zillion times, then we’ll probably get a pretty precise value for our expected value.
However, in real life, we can’t do the experiment a zillion times: that’s too expensive, or we don’t have the time, or–most probably–it’s simply impossible. Hence, a good course in statistics will teach you how you can estimate the mean of a value within some interval, usually referred to as the confidence interval, based on a limited number of observations and/or historical data. However, that’s tough stuff with which I won’t bother you here. 🙂 In fact, in this particular case, we’re lucky: we can use a very easy formula for our random walk, which I’ll just jot down here (so without showing you how we get this):
That formula gives us the square root graph below. It shows that progress becomes exceedingly slow as N increases. Indeed, initially, we do make some progress, as the root-mean-square distance after one step is √1 = 1, √2 ≈ 1.414 after two steps, √3 ≈ 1.732 after three, and so on and so on. However, after ten steps, we should expect to be only 3.16 units of distance away from our origin and, after a hundred steps, it’s only 10 units. Finally, in order for our random walk to take us an expected 100 steps away from our origin, we’d have to take not less than 1002 = 10,000 steps. As for the formula above, please note the use of the special brackets, 〈 and 〉, to denote the expected value of D2, as opposed to the actual value (i.e. the value we get each time we’d do such walk).
As mentioned, I just gave you the formula. I conveniently skipped over some of the intricacies involved when dealing with expected values (if you’re interested in the detail, you’ll find it here). The gist of the matter is that we started off with some entirely random process (taking a step forward or backward after tossing a coin) but that, while the process is entirely random, we have a very precise (i.e definite, unambiguous) mathematical formula for the expected value of the distance covered. More in particular, we found that Drms = √N.
Now, that’s exactly what mechanics is about. Classical or quantum mechanics, it doesn’t matter: based on both experiment as well as theoretical arguments, we get probability density functions, which tell us what we should expect to happen, or where we should expect some particle to be in space and in time. These probability density functions often have a shape which you’ll surely recognize: the bell-shaped curve, i.e. the so-called normal or Gaussian distribution, whose general shape is shown below for various values of its mean or average (denoted by μ) and various values for the so-called standard deviation (denoted by σ).
The standard deviation is a measure for the dispersion from the average: we can find 68.2% of all values for x within one standard deviation from the average, i.e. in the interval [μ − σ, μ + σ], and not less than 95,4% of all values in the interval μ ± 2σ which, therefore, is known as the 95% confidence interval, aka as the 95% tolerance or prediction interval. I’ll also give you the exact formula for the normal distribution. Not because I want to show off but because it’s a distribution which we stumble upon everywhere (and surely in physics), and we’ll use it in this post too. For example, this formula will give us the distribution of velocities, or energies, of gas particles in a bottle. In fact, all of what is generally referred to as statistical mechanics is governed by it, so let me just jot it down here:
It’s nothing to be afraid of. Just look at it for a while. Indeed, understanding is all about slow reading, so please read very slowly here. 🙂 I’ve explained the μ and σ variables above already (and make sure you understand their impact on the shape of the bell curve by looking at those graphs). You obviously also know that e is just some real number, referred to as Euler’s number. It’s a bit special but, when everything is said and done, e is just some number: it’s approximately 2.71828… So the function above is just an exponential function with e as its base. It’s referred to as a natural exponential because of the ‘natural’ constant e. Indeed, you may think that Euler’s number doesn’t look very natural but, just like h, e pops up in many equation in physics. Indeed, it’s a pretty fundamental mathematical constant, just like π, and, hence, you’d better get acquainted with it if you want to move forward.
Let’s get back to the formula for the bell curve. First, note the ‘trick’ with the minus sign and the square in the exponent. Indeed, if you re-write –(x–μ)2/2σ2 as –(1/2)[(x–μ)/σ)]2, you can see what’s going on here: the exponent is a quadratic function itself, with a minus sign in front (and a factor 1/2 = 0.5). It has to be: anything else would not give us a normal distribution, as illustrated below: it’s only the blue graph that makes sense as a functional form for (normally distributed) probabilities. So… Yes. We really need the minus sign and the square in the exponent.
But why such complicated exponent? Well… That has to do with normalization. Indeed, you may or may not remember, from whatever course you ever had in (or related to) statistics, that measurements are usually standardized by transforming them into so-called z values (or z-scores): z = (x – μ)/σ. So that’s what we’ve got inside of the brackets of that –(1/2)[(x–μ)/σ)]2 exponent. But what about the 1/2 factor in the exponent, and what about the 1/σ√2π factor in front of the whole thing?
Well… That’s also linked to the so-called normalization condition: all probabilities should add up to 1. Indeed, when everything is said and done, something has to happen, or our particle has to be somewhere. That’s what it amounts to. For example, we may be dealing with probabilities expressed as a function of some angle θ. Let’s make the example even more specific and consider an experiment in which two particles (a and b) are made to collide with each other and, as a result, go off in two opposite directions (1 and 2), characterized by an angle θ and π–θ respectively, as shown below. [The choice of this example is, of course, not random: we’ll look at it later again. Hence, you should not skip this section.] So, in this case, we will have a probability density function P(θ) with θ (i.e. the argument of P) ranging from −π to +π (or from 0 to 2π, if you want to measure angles differently).
Now, let’s assume the likelihood of the particles being scattered in one or the other direction is constant. This assumption is actually not very realistic but I’ll just use it here to demonstrate what normalization is about. So we have P(θ) = k, with k some (positive) real number. What number exactly? Well… We know that the integral (remember: an integral is an infinite sum) of our probability density function (that’s what P(θ) is: a probability density function) over its domain will have to be equal to 1. Hence, ∫P(θ)dθ = ∫kdθ taken over the [–π, +π] interval must be 1. Now, the indefinite integral (aka as the antiderivative or primitive function) is kθ, so the solution to the definite integral is kθ|-ππ = kπ – k(–π) = 2kπ. Now, the normalization condition implies that the integral should equal 1, so 2kπ must equal 1. Therefore, we know that the constant k must be equal to k = 1/(2π). [Imagine a rectangle with a base equal to 2π (i.e. the length of the interval [–π, +π]) and height 1/(2π): its surface area is 2π×1/(2π) = 1.]
So that’s what normalization implies. It ensures all probabilities add up to one or, what amounts to the same, that the surface area under the curve is equal to one. In case you’re still doubting that the (x – μ)/σ and the 1/2 factor in the exponent, and the 1/σ√2π in front of the exponential, serve that purpose, you can check this yourself by integrating the curve over the whole real domain, so that’s from x = –∞ to x = +∞.
Now, you may think that this should surely be sufficient as an introduction to statistics. You’re right. It is. The point to note is that uncertainty can be modeled: somehow, uncertainty obeys rules and can be captured in unambiguous mathematical formulas, such as that formula for the normal distribution above. As Feynman notes (just like many physicists before him did): that makes the debate on the so-called indeterminacy of quantum mechanics a purely ‘philosophical’ one, in a pejorative sense. I haven’t come across a better summary of the issue at hand here than Feynman’s summary of the ‘philosophical implications of quantum mechanics‘, and so I’ll just give the floor to The Master here:OK. Let’s go back to the more mundane business of modeling uncertainty.
Uncertainty in classical mechanics
As Feynman pointed out above, those who claim that there’s no uncertainty in classical mechanics, are wrong. The standard example which is used to describe the difference between quantum mechanics and classical mechanics illustrates uncertainty (or randomness) for both. You know that experiment: it’s the ubiquitous double-slit experiment, and the illustration below shows it for ‘bullets’, i.e. classical particles. [Frankly, I never quite understood why American scientists like the example of a gun spraying bullets to illustrate classical mechanics. Why don’t they use a baseball pitching machine? In fact, I’ll use a pitching machine later on, but one that fires billiard balls instead of baseballs (I need elastic collisions, so I can’t use baseballs). While billiard balls can create a lot of damage too, they’re less mortal than bullets, aren’t they? 🙂
Section (a) of the diagram shows the set-up, while section (b) and (c) give the probabilities. More specifically, P1 gives the probability of a bullet going through slit 1 and then hit the backstop at point x, while P2 gives the probability of a bullet going through slit 2 and then also hit the backstop, at the same point x.
Now, the P12 graph in section (c) gives the probability of a bullet going through either as a function of x. The point to note is that there is considerable uncertainty here: we don’t know where a specific bullet is going to end up; we only have two probability density functions around some mean value μ and with some standard deviation σ.
Frankly, analyzing the mechanics behind the trajectory of those bullets hitting the edges of the slit must not be easy: ricochet effects are hard to predict. Hence, I won’t try it. 🙂 Just note that the P1 and P2 distributions make sense: the straightest trajectory is associated with the highest probability density and so we can effectively associate these curves with a baseball pitching machine. 🙂
Jokes aside, the point here is not the exact shape of P1 and P2: the point is that the combined P12 function shows no interference effect, as opposed to the P12 graph in the double-slit experiment with electrons, shown below.
The P12 graph above does show interference, as evidenced by the re-occurring maxima and minima. So that’s an interference pattern, just like the pattern generated when sending light (electromagnetic waves) through two slits.
Let me make another little digression here, so you have some idea about the physics involved. In one of my many posts on diffraction and interference, I explained how the experiment crucially depends on the width of the slits. Actually doing the experiment with electrons requires much finer slits: the two slits which were used in a 2012 experiment were each 62 nanometer wide (that’s 50×10–9 m!), and the distance between them was 272 nanometer, or 0.272 micrometer. In contrast, the original interference experiment with light, as carried out by Thomas Young more than 200 years earlier, involved a note card (which Young used to split ordinary sunlight entering his room through a pinhole in a window shutter) with a thickness of 1/30 of an inch, or approximately 0.85 mm. Hence, in essence, that was a double-slit experiment with the two slits being separated by a distance of almost 1 millimeter!
Back to the lesson. The small print in the illustration above shows how the interference effect is being explained in quantum mechanics. We associate the electrons with wave functions, or probability amplitudes, denoted by Φ1 and Φ2 respectively, and we know we should not add the probability functions P1 and P2 to get P12, like we did for the classical case, but the wave functions Φ1 and Φ2, and then we take the square of the absolute value of the resulting sum and we’re done. So we write: P12 = |Φ1 + Φ2|2. [Note that the ‘square of the absolute value’ is a cumbersome term and, hence, is usually just referred to as the absolute square, and I’ll also use that shorthand.]
If you’re reading this, you’ve probably read a dozen of easy or not-so-easy books on quantum mechanics already, and so you’re now just solemnly nodding your head: “Yes. These are the strange rules of quantum mechanics. If we can distinguish one or another alternative, in practice or in principle, then we have to add probabilities. If we can’t (in practice or, more importantly, in principle), then we have to add amplitudes and then take the absolute square of the sum to find the probability.”
Let me shake you up. Electrons are fermions and, hence, one should not simply add the two wave functions, but subtract them: P12 = |Φ1 – Φ2|2. That P12 = |Φ1 + Φ2|2 formula in that ubiquitous diagram is wrong. Plain wrong.
Huh? But… That can’be true. […] Euh… […] Well… […] In any case, if you subtract these functions, then they disappear, don’t they?
No, they don’t. Just like our normally distributed probability density functions, we are talking wave functions with a very specific functional form: they must be solutions for the wave equation (i.e. the Schrödinger equation) and, hence, they must be complex exponentials. So it’s not like subtracting real numbers that are almost the same–which is what P1 and P2 are: real numbers that are almost the same and so, yes, if we’d subtract one from another, we’d end up with almost nothing, as shown below (the green curve is the difference between the blue and red probability curve).
Huh? Are you awake now? Do I have your attention? Just relax. I’ll explain complex numbers and complex-valued functions and why we need to subtract these wave functions (instead of adding them), for electrons at least, in a minute. Before I proceed, however, I should note that these complex-valued wave functions are otherwise (i.e. apart from the fact they’re complex-valued instead of real-valued, which is a pretty crucial difference, of course) quite similar to real-valued probability density functions.
Indeed, these Φ (phi) or Ψ (psi) functions are also functions of the position x, or of the angle θ (note, indeed, that the position x in the diagrams above can also be identified with some angle θ from the normal to the backstop), or of whatever variable we choose – just like these probability functions. Hence, for each value of x (or of θ), these functions will yield one, and only one, ‘value’ f(x), or f(θ) if we’d work with angles. The only difference – but it’s a crucial difference – is that this value will be some complex number, instead of some real number. Hence, the key difference between probability amplitudes and probabilities is that amplitudes are complex numbers, while probabilities are… Well… Ordinary numbers (i.e. the one you learned in school). [And then the other thing, of course, is that we calculate those probabilities from the amplitudes by taking the absolute square.]
That’s why one cannot understand quantum mechanics if one isn’t familiar with complex numbers and complex-valued functions. And when I say ‘familiar’, I mean it: you need to be able to play with them – at least a little bit. 🙂
Frankly, I think that’s the point where most of these popular books on quantum mechanics fail: they compromise on this point. They should not. Why? Because understanding complex numbers (and then I mean, truly understanding them) is not so difficult, and it’s a shame that our educational systems expose us to them only at a very late stage, if at all. So, here we go. [If you think that all of the above was optional reading, the section below surely is not.]
Complex numbers and complex exponentials
A complex number is a vector, or an ‘arrow’, as Feynman puts it in his more popular Lectures on QED. It has an x and a y coordinate, which we refer to as the real and imaginary part respectively. Hence, the x-axis becomes the real axis and the y-axis becomes the imaginary part (as shown below) and instead of writing (a, b), we write a + bi (or a + ib). So the plus sign (+) here is not use to add two real numbers a and b, but combines them into one two-dimensional number, so to say, marking a as the real part and b as the imaginary part.
You can also think of a + ib as the addition of two perpendicular vectors: a and b. In addition, it’s also good to look at ib as an actual product of two factors: (i) b, so that’s a real number, and (ii) i, so that’s the imaginary unit. Now, multiplying ‘something’ with i amounts to rotating it by 90 degrees (counterclockwise). With ‘something’, I mean some other complex number, or some real number, like b. Multiplying a real number twice with i amounts to rotating it by 180 degrees, which makes it negative: iib = i2b = –b. So that’s a bit of an explanation of that rather weird ‘definition’ of the imaginary unit: i2 = –1.
I used bold letters (a and b) here to denote vectors (as opposed to the real numbers a and b). We can think of vectors as numbers having not only some magnitude but also some direction, which is exactly how we should think of complex numbers too: they are numbers that do not only have a magnitude, given by their absolute value (or norm) r (which we calculate using the Pythagorean theorem), but also a direction, given by some angle φ, as shown in the diagram below.
The diagram above not only introduces the so-called polar notation for complex numbers (z = reiφ) but also another concept: the complex conjugate which, if z = x+iy, is equal to z* = x−iy = re–iφ. [I can’t use the overline with this editor, so I’ll write the complex conjugate of z as z*.]
Now, complex numbers have a lot of other interesting properties. However, it takes a fair amount of time and space to explain these. As I’ve used that time and space in my other posts, I won’t repeat myself here. However, I do think you should read my post on Euler’s formula, because that’s a formula which we’ll use repeatedly:
eiθ = cosθ + isinθ and, hence, reiθ = r·cosθ + i·r·sinθ = r·[cosθ + isinθ]
The mentioned post takes the magic somewhat out of this stunning formula by showing how we can construct this function algebraically (we’re looking at eiθ here as a function with θ as its argument, rather than just some (complex) number). With an algebraic construction, I mean we can get this function just by adding and multiplying numbers. In the process, that post also explains the basics of complex-valued exponential functions, which are very different from real exponentials. Indeed, unlike real exponential functions, complex exponentials are periodic functions with two components: a cosine (cosθ) and a sine (sinθ). So that little i in the exponent truly does make a difference. A enormous difference: you should think of eθ and eiθ as two entirely different beasts.
Just to emphasize the difference, I give you the graph of both below. Look at these graphs: eθ is the real-valued exponential function, while eiθ is the complex-valued exponential function. The complex-valued function has two components, so to say: a real part and an imaginary part. Both depend on the value of the argument θ and, because they’re a cosine and sine respectively, the imaginary part follows the real part with a phase delay of π/2 (i.e. 90°). Indeed, sinθ = cos(π/2 – θ) = cos(θ – π/2). I shouldn’t say this (because it’s an extreme simplification and mathematicians would be disgusted by my lack of accuracy here) but, if you want, you could think of eiθ as some kind of two-dimensional sinusoidal function.
Now, as we’re talking sine and cosine functions, we should also refresh our knowledge of trigonometric identities. Indeed, adding or subtracting complex exponentials amounts to adding the sines and cosines involved and, hence, when opposite angles (θ and π−θ) are involved (I told you that example above was not innocent, and that I would come back to it), or when complex conjugates (eiθ and e–iθ) are involved, we can simplify operations substantially:
1. When opposite angles are involved (θ and π–θ), we should note the following identity:
eiθ + ei(π−θ) = eiθ + eiπe–iθ = eiθ – e–iθ
So that sum is reduced to the difference of two complex conjugates, which we’ll reduce even further fifteen seconds from now. Likewise, we have:
eiθ – ei(π−θ) = eiθ – eiπe–iθ = eiθ + e–iθ
So that’s the sum of two complex conjugates, which we’ll also reduce even further, ten seconds from now.
[If you’re perplexed by these formulas, I can’t help it. You’ll just have to go online and look things up. Just study complex numbers. It takes a couple of days only (hours sounds too optimistic, I am afraid), and it’s a great investment. The ei(π−θ) = eiπ−iθ = eiπe–iθ identity is the general ea+b = eaeb rule that we also have for real exponentials. Indeed, you should remember this at least from your school days: xa+b = xaxb. As for the eiπ = –1 identity which I am using, that follows straight from Euler’s formula, but you can also verify it by looking at that diagram introducing the polar notation for complex numbers.]
2. For complex conjugates, we have:
eiθ + e–iθ = cosθ + isinθ + cos(–θ) + isin(–θ) = cosθ +
isinθ + cosθ – isinθ = 2cosθ
Here, I use the trigonometric identities cos(–θ) = cosθ and sin(–θ) = –sinθ. Likewise (using the same identities), eiθ – e–iθ reduces to:
eiθ – e–iθ = cosθ + isinθ – cos(–θ) – isin(–θ) =
cosθ + isinθ – cosθ + isinθ = 2isinθ.
These simplifications will come in very handy later, so please do take note of them.
Finally, I need to say something about taking the so-called absolute square of a complex number. For real numbers, the square and absolute square are the same: a2 = (–a)2 =|a|2 = |–a|2. For complex numbers, that’s not the case. The square and the absolute square of a complex numbers are two very different things:
- (a + ib)2 = a2 + (ib)2 + 2iab = a2 + i2b2 + 2iab = a2 – b2 + 2iab
- |a + ib|2 = [√(a2 + b2)]2 = a2 + b2
It’s only when b = 0 (i.e. when the complex number becomes a real number) that the two become identical again: (a + i·0)2 = a2 = |a + i·0|2 = a2 – 02 + 2ia·0 = a2 + 02. We’ll use the absolute square a lot in what follows and, hence, it’s important to note the vertical bars matter a lot when working with non-real complex numbers! [Non-real complex numbers? Now that sounds good, doesn’t it? Just remember they are as real, from a math point of view, as any other complex number. :-)]
Also note that the square of a (non-real) complex number yields a complex number (indeed, note that a2 – b2 + 2iab is just another complex number), while the absolute square of a complex number will always yield a real number: a2 + b2 is the square of the norm (or magnitude) of the vector, so, yes, it must be a real number too.
The difference is equally clear when writing these complex numbers in polar form:
- (reiθ)2 = r2(cosθ + isinθ)2 = r2(cos2θ – sin2θ + 2icosθsinθ)2 = r2(1 – 2sin2θ + 2icosθsinθ)2
- |reiθ|2 = |r|2|eiθ|2 = r2[√(cos2θ + sin2θ)]2 = r2(√1)2 = r2
Squaring the complex number yields a complicated function of the phase θ. Remember: that phase is a function of time and space in the wavefunction, and so it’s that complicated function with the sine and cosine which explains why we get interference patterns. In contrast, we again get some real number for the absolute square—a constant: the square of the norm.
This difference explains why real-valued wavefunctions don’t work when trying to explain interference: they just don’t produce interference. I’ll show that at the end of this post. So it’s really those terms with the imaginary unit (including i2 = –1) – so the terms that ensure the square and the absolute square of a complex number are not the same – that produce the interference effects we want to explain. [Note that the square and absolute square become the same when θ is equal to 0 or ±π, i.e. when the complex number is just a real number, with no imaginary part. Indeed, the sine factor is zero in that case, and only r2 times 1 remains. So, in whatever notation, we arrive at the same result—of course!]
Phew! OK. We’re done. What I wrote above must surely rank among the shortest crash courses on complex numbers ever. But so it’s done. Glad I got through this. As said, if you feel you don’t ‘get’ the stuff above, go back to it and try to understand. Otherwise you won’t get what I am writing below.
We know the outcome: we have to get that interference pattern, and we have to use the rules, i.e. we need to add (or subtract) wave functions (i.e. probability amplitudes), not probabilities. And then we need to take the absolute square of the result to get the probabilities, of course. For ease of reference, I’ll reproduce that ubiquitous diagram once again. Look at the formulas underneath: (i) P1 = |Φ1|2, (ii) P12 = |Φ2|2, and (iii) P12 = |Φ1 + Φ2|2.
You’ve seen this diagram at least a dozen times. But so there’s this complication now, which you may not have heard of before. We have this dichotomy in Nature. Photons are so-called bosons, and electrons are so-called fermions, and they behave differently: for bosons we can indeed just add amplitudes: P12 = |Φ1 + Φ2|2 but, for fermions, we have to take the difference: P12 = |Φ1 – Φ2|2. The diagram below illustrates this fundamental dichotomy: all particles (including composite particles, i.e. hadrons, which are made of two (mesons) or three (baryons) quarks glued together) are either bosons or fermions. No split personalities here.
This dichotomy basically explains the world. The subtraction rule for fermions leads to the so-called Pauli exclusion principle: two identical Fermi particles, e.g. two electrons, cannot be at the same location or have the same state of motion. The only thing that softens this harsh rule a bit is the definition of identical: two electrons with opposite spin directions (don’t worry about the definition of spin right now) are not identical and, hence, can (and will) actually be together. In fact, the electrons will arrange themselves in atomic orbitals, and each of the orbitals will be occupied by two electrons with opposite spin (of course, if the number of electrons is not even, we may have a loner on the outer shell). This explains all chemical properties or, as Feynman puts it, “almost all the peculiarities of the material world: the variety that is represented in the periodic table is basically a consequence of this rule.”
Conversely, we know that, because photons are bosons, the addition rule not only allows for a potentially infinite number of them all piling up one on top of the other, but also actually increases the probability of getting one more boson in when other bosons are already there, which explains how lasers work (see my post on that). To be precise, the probability of getting a boson, where there are already n, is (n+1) times higher than it would be if there were none before. This phenomenon does not only explain how lasers work, but it also solves all of the mysteries that actually led to the invention of quantum theory, most notably the so-called black body radiation problem (and, yes, I did a post on that too).
So let’s dive into it all and explain these ‘rules’ the way they should be explained. To do that, we won’t analyze the double-slit experiment but the other experiment I introduced above: the scattering of two particles as a result of them being fired on or into each other. The diagram below shows the set-up once again.
So situation (a) must be described by some wavefunction (or probability amplitude). That wavefunction is, of course, a function of time (and space), but it is also likely to be a function of that angle θ, so we’ll denote it by f(θ). If particle a ends up in detector 1, positioned at angle θ, in this rather theoretical experiment, then b must end up in detector 2, which is positioned at angle π − θ, which I referred to as the opposite angle, which is actually not correct: strictly speaking, I should refer to π − θ as the adjacent angle. Whatever.
In case you’d wonder why b must go to 2 if a goes to 1, the analysis is a fairly simple one. We’re talking an elastic collision here, like between two billiard balls. If their mass and their speed are the same, their momentum will have the same magnitude but opposite direction right before the collision, as shown by the dark blue velocity vectors below.
To analyze what happens, we must identify (i) the contact point, (ii) the plane of collision (represented by the tangent line in the four illustrations above), and (iii) the normal line, which connects the center of both balls. We then identify the two components of the velocity vector along the normal and tangent line respectively. The red and yellowish vectors represent the normal components, while the green vectors represent the tangential components. Using both the conservation for momentum as well as the conservation law for kinetic energy, it’s fairly easy to show (but I won’t do it here: just click on this link here if you want details) that the normal components are being exchanged as a result of the collision, while the tangential components just stay the same. We can then find the velocity vectors after the collision by adding the exchanged normal vectors and the (identical) tangential vectors, which gives us the purple velocity vectors. As you can see, the purple vectors are just like the blue vectors: they have the same magnitude but opposite direction. Hence, if particle a ends up going in an angle θ, then b must end up in an angle π − θ, not some other random direction. Note that I’ve drawn four different situations above to show that any angle θ and π − θ is actually possible, except the angle θ = 0 (which implies that π − θ would be π), because the particles aren’t supposed to go through each other. [I’ll have to nuance that statement later: photons actually do go through each other.]
OK. That’s clear enough. Again, back to quantum mechanics. Let’s associate a specific functional form with f(θ). Let’s try the most obvious one: we equate f(θ) with eiθ. Can we do that? Sure: eiθ is a regular wave function, except that it’s not normalized. Indeed, if we take its absolute square to get the probabilities, we get |eiθ|2 = cos2θ + sin2θ = 1. Hence, integrating |eiθ|2 (so we have a constant probability density function here) from θ = 0 to θ = 2π (or from θ = –π to θ = +π) yields a value of 2π, which is not 1. So we should put a factor 1/√2π in front to normalize our function. OK… So we’ll write:
f(θ) = (1/√2π)·eiθ
[Sorry if you can’t follow the math here. I can’t do much: you’ll just need to do your homework and google it. For the absolute square, we’re applying the Pythagorean theorem basically, which we introduced already (so we’re actually calculating the norm of a vector here, or the distance between two points if you want). As for the cos2α + sin2α = 1 identity, that’s the Pythagorean theorem as well.]
We have two possibilities here:
(a) Particle a goes into detector 1 (and, because the collision is elastic, and because the two particles are identical, particle b will then go into detector 2)
(b) Particle a goes into detector 2 (and, hence, particle b goes into detector 1).
We choose a wave function for the first possibility. Now we need one for the second possibility too. The wave function for the second possibility should be simple. In fact, it should be f(π–θ) because situation (a) is situation (b) with the role of the two detectors being exchanged. So we surely must have the same wave function. We only need to plug in a different value for its argument: π–θ instead of θ.
Well… Yes and no. The rules of quantum mechanics are mysterious and, hence, we need to be careful. So let me note that, from a mathematical point of view, the function f(π–θ+δ) = ei(π–θ+δ) will yield exactly the same probability as f(π–θ) = ei(π–θ). [Note that, to simplify the analysis, I sometimes just drop that 1/√2π normalization factor. You can always plug it back in if you want.]
But so what’s that delta? Well… We can look at δ as some kind of arbitrary phase shift which we introduce because… Well… Because we’re switching those detectors: detector 1 becomes detector 2, and 2 becomes 1. Or, what amounts to the same, we’re exchanging the role of the particles a and b: a becomes b and b becomes a. You’ll say: so what? Well… As said, quantum mechanics is weird and, from a mathematical point of view, who says we should use exactly the same wave function: that other wave function yields the same probabilities, so we should consider it, from a mathematical point of view that is. Let’s quickly check it though: if we take the absolute square of that ‘other’ wave function f(π–θ+δ) = ei(π–θ+δ) , it should give us the same probabilities for the same angles. Let’s write it out, with the 1/√2π normalization factor, so you do not suspect me of foul play. 🙂 So we have f(π–θ+δ) = (1/√2π)·ei(π–θ+δ) does give us the same probabilities indeed. Let’s take the absolute square of that to get the probabilities:
|f(π–θ+δ)|2 = |(1/√2π)·ei(π–θ+δ)|2 = |(1/√2π)·ei(π–θ)eiδ|2
= |(1/√2π)·ei(π–θ)|2|eiδ|2 = |(1/√2π)·ei(π–θ)|2 = |f(π–θ)|2
[Again, sorry for not helping too much with the math but the ‘rules’ I use here are: (i) |z1z2|2 = |z1|2|z2|2 and (ii) |eiα| = 1.]
That’s weird, you’ll say: I don’t like that δ. You’re right. I don’t like it either, because it obviously makes our analysis not so simple. But, then, in physics, such personal opinions do not matter much, do they? 🙂 However, there’s a logical way out. Switching the detectors again should give us f(θ) again. Huh? Why? Well… Nature may play tricks on us when exchanging the roles of particles or detectors. However, when we switch the set-up two times, then we’re modeling the same situation and, hence, then we should really stick to our choice of f(θ). Let’s go through the logic here:
- When switching the role of the detectors, our wave function changes from f(θ) to f(π−θ+δ). So the variable θ becomes π–θ and we add a phase shift δ to the argument. Hence, the argument of the ‘other’ function becomes π−θ+δ.
- Now, our wave function should change once again if we switch the role of the detectors (or, what amounts to saying the same, the role of the particles) once more.
- However, because of the symmetry of the situation, we might just as well have started out with an analysis of situation (b), to then look at (a) and, hence, the ‘rules’ for the exchange of roles should be the same. So if we’d start out with situation (b), and then look at (a), we’d be talking the same phase shift: so it’s the same unknown δ. [In case you’re confused now… Well… It is confusing indeed. While the logic is simple, it is, at the same time, quite abstruse too. Just think about it a couple of times.]
- So our wave function changes from f(π−θ+δ) to f(π−(π−θ)+δ+δ) = f(θ+2δ). To be clear, what we’re doing here is swap the variable θ for π–θ and add the phase shift δ to the argument once more.
- We’re now ready for the Great Trick: we should now be back at where we were and, therefore, f(θ+2δ) must equal f(θ). Therefore, 2δ is either 0 or, else, ±2π (adding or subtracting 2π to/from the angle doesn’t change the angle). In other words, δ is either 0 or, else, ±π.
Now, that explains the two different ways of ‘adding’ wave functions. Let us apply those quantum-mechanical rules that you know so well:
1. Because we’re talking identical particles here and, hence, we cannot distinguish whether particle 1 went into detector 1 or 2 (and, hence, if particle b went into detector 2 or 1), we should not add the probabilities but the amplitudes.
2. However, we noted the wave function for the ‘other’ possibility can have one of two related but different forms:
(i) If δ = 0, then f(π−θ+0) = f(π−θ) = +(1/√2π)·ei(π–θ)
(ii) If δ = ±π (note that the two values are the same angle), then f(π−θ±π) = (1/√2π)·ei(π–θ±π) = (1/√2π)·ei(π–θ)e±iπ = –(1/√2π)·ei(π–θ). [Note that ei(±π) = –1.]
Therefore, ‘adding’ the two amplitudes means either adding or subtracting indeed, and that corresponds to the above-mentioned dichotomy between bosons and fermions, or between Bose-Einstein statistics and Fermi-Dirac statistics:
1. For bosons (e.g. photons), we just need to add the two functions. So the wavefunction of bosons will have the plus sign. Let me write it all out:
Bose-Einstein statistics: f(θ) + f(π−θ+δ) = f(θ) + f(π−θ+0) = f(θ) + f(π−θ)
2. The wavefunction of (identical) fermions (e.g. electrons with parallel spin) will have the minus sign, so we need to subtract:
Fermi-Dirac statistics: f(θ) + f(π−θ+δ) = f(θ) + f(π−θ±π) = f(θ) – f(π−θ)
You must be very tired of all of this by now, but just hang in there. We’re almost done. How does it work out for example?
The rather remarkable fact is that the interference patterns for bosons and fermions are actually very similar. I call that fact ‘rather remarkable’ because of the very fundamental difference in their behavior, as described above: if electrons behaved like bosons, every atom would be “a little round ball with all the electrons sitting near the nucleus” (Feynman, Lectures, III-4-13), which, as mentioned above, would result in… Well… Not the world that we know, for sure. 🙂 But so we do have bosons and fermions, and electrons are fermions and, hence, obey Fermi-Dirac statistics, and so that’s why we have elastic collisions between billiard balls, for example. 🙂
OK. Back to the example. If particle a and b would be bosons – let’s say photons indeed – then we have to add f(θ) and f(π−θ) and we get:
f(θ) + f(π−θ) = (1/√2π)·eiθ + (1/√2π)·ei(π–θ) = (1/√2π)·(eiθ + e–iθ) = (1/√2π)·2cosθ = (2/√2π)·cosθ
The associated probability function is:
P(θ) = |f(θ) + f(π−θ)|2 = |(2/√2π)·cosθ|2 = (2/π)·cos2θ
Note that I use a result I’ve derived much higher up: eiα + e–iα = 2cosα, and then I use these rules for taking the absolute square. That should be alright, although I should note that we’d need to re-normalize again. Indeed, integrating cos2θ over [–π, +π] yields π and, therefore, integrating P(θ) over the same interval yields (2/π)·π = 2, instead of 1, and so we’d have to insert a factor 1/2 to make it come out alright. [You may wonder if this is kosher but, yes, it is. It’s just another complication that most popular writers skip over, rather conveniently.]
For fermions – let’s say electrons indeed – we have to subtract f(π−θ) from f(π−θ) and so we get:
f(θ) – f(π−θ) = (1/√2π)·eiθ – (1/√2π)·ei(π–θ) = (1/√2π)·(eiθ – e–iθ) = (1/√2π)·2isinθ = (2/√2π)·isinθ
The associated probability function is:
P(θ) = |f(θ) – f(π−θ)|2 = |(2/√2π)·isinθ|2 = (2/π)·sin2θ = (2/π)·cos2(θ – π/2)
[What about the i2 = –1 identity? It’s an absolute square, remember? Hence, |i2| = +1. As for the sin2θ = cos2(θ – π/2) equation, I use the trigonometric identities sinα = cos(π/2–α) and cosα = cos(–α).]
Now, isn’t that remarkable indeed? The Fermi-Dirac probability density function is the same as the Bose-Einstein probability density function but with a phase shift of –π/2, which is just like the phase shift between an ordinary sine and a cosine function. […]
Also note that adding or subtracting the wave functions here amounts to adding either their real parts or their imaginary parts. Huh? Yes. Look at it: the wave function for the ‘other’ possibility is the complex conjugate of the wave function for the ‘first’ possibility, and adding complex conjugates amounts to adding their real parts (which means doubling it) – the imaginary parts vanish – while subtracting complex conjugates amounts to adding their imaginary parts – now it’s the real parts that vanish !
I know. You’re tired. You’ll probably only appreciate this point when I show you the graphs. We have three graphs below:
- The blue line shows the classical situation, also referred to as Maxwell-Boltzmann statistics (see my post comparing classical and quantum-mechanical ‘rules’ for statistics). No interference as you can see. In fact, all angles are equally likely, which, as mentioned above, may or may not be realistic. The graphs that are of interest to use are the red and green graphs.
- The green graph shows the interference curve for bosons (photons): we have two maxima (at θ = 0 and at θ = ±π) and two minima (at θ = ±π/2). [This obviously raises a fine question: can photons go through each other? Yes. Without any doubt. In fact, they usually do. Check this interesting article on photon-photon collisions.]
- Finally, the red graph shows the interference curve for fermions (electrons): we also have two maxima but, as expected from the functional form, they’re π/2 units away from the maxima and minima of the Bose-Einstein graph (i.e. maxima at θ = ±π/2 and minima at θ = 0 and θ = ±π).
Note that I used non-normalized wavefunctions for the graphs above, but it doesn’t matter: you can see that the areas under the blue, green and red graphs are all the same, and so it’s just a matter of multiplying all values with a normalization factor indeed.
Of course, the interference pattern above is quite simple: two minima and maxima only. That’s because we assumed a rather simplistic functional form for f(θ). Indeed, f(θ) = eiθ depends on the angle θ only. We know that, in reality, we are talking photons and/or electrons, so some phase needs to enter the equation as well (θ is not the phase). We’ll denote that phase by φ. The phase is a function of position and time which, in general, we can write as:
φ(x, t) = φ[ωt–kx] = φ[–k(x–ct)]
For more details, I’ll refer to one of my posts on that. Now, it would be nice to develop an example with a specific functional form incorporating both θ (the angle of incidence) and the phase (φ). I should actually work on that. However, I trust you believe me such example would yield what it’s supposed to yield, and so that’s the much finer interference pattern below, about which Feynman writes the following: “If the motion of all matter—as well as electrons—must be described in terms of waves, what about the bullets in our first experiment? Why didn’t we see an interference pattern there? It turns out that for the bullets the wavelengths were so tiny that the interference patterns became very fine. So fine, in fact, that with any detector of finite size one could not distinguish the separate maxima and minima. What we saw was only a kind of average, which is the classical curve. The illustration below indicates schematically what happens with large-scale objects. Part (a) of the figure shows the probability distribution one might predict for bullets, using quantum mechanics. The rapid wiggles are supposed to represent the interference pattern one gets for waves of very short wavelength. Any physical detector, however, straddles several wiggles of the probability curve, so that the measurements show the smooth curve drawn in part (b) of the figure.”
So… That says it all – for the moment at least.
Introducing (classical) uncertainty again
If you’re a critical reader (and I must assume you are), you’ll note an inconsistency. The classical |f(θ)|2 and |f(π–θ)|2 probability density functions in the example above yield a constant value: |f(θ)|2 = |eiθ|2 = |f(π–θ)|2 = |ei(π–θ)|2 = 1 and, hence, the classical probability density function P(θ) = |f(θ)|2 + |f(π–θ)|2 is a constant too (namely 1 + 1 = 2). [Again, if you worry about normalization, just put the 1/√2π factor.]
Now, that doesn’t make all that much sense. Indeed, with real-life particles (I am not talking photons now), the most likely thing to happen is that the particles just bounce back, so the θ = ±π angle should be associated with much higher probabilities than any other angle. In short, we’d expect the probability distributions |f(θ)|2 and |f(π–θ)|2 to resemble something like the graphs below:
What can I say? Nothing much. You’re right. While adding these two probability functions still yields some constant (and, therefore, the function P(θ) = |f(θ)|2 + |f(π–θ)|2 = some constant (1/√2π), still makes sense), the |f(θ)|2 and |f(π–θ)|2 functions themselves are obviously not some constant. So, yes, you’re right. We assumed all angles are equally likely, and that’s obviously not something we’d see in a real experiment (as opposed to a thought experiment).
I should also make another remark here. Elementary particles are supposed to be point-like. So, yes, they are like billiard balls bouncing back when they hit each other, but then they are also not like billiard balls, because they don’t have any size and, hence, there’s no way to determine what the tangent line (or the plane of collision as I termed it above) would be. In other words, we need to make some kind of analysis of how reducing the size of our particles would impact the results above. In fact, when I say ‘reducing the size’, I mean we should actually try to go the zero size limit.
That’s not so easy as it sounds. Let’s first analyze the ‘billiard ball physics’ once again. From the analysis above, it’s obvious that the velocity vectors after the collision have equal magnitude but opposite direction, but what’s the relationship between the angle of the tangent line (α) and the angle of the velocity vectors after the collision (θ)?
Let’s look at it. Note that the red and black billiard ball cannot assume just any position, as illustrated below. Indeed, we need to remember that the red ball is being fired from the left and, hence, will always hit the black ball from the left, as shown in the first example of a collision below. We cannot have the red ball hitting the black ball from the right, as shown in the second example.
Now, note the limit situations shown in the third and four example: the two balls touch but don’t collide really: as a result, they just pass each other. Now, if the particles are point-like, we can actually say they go in a straight line and, hence, that they sort of go through each other. 🙂
They way we are measuring our angle α does distinguish between the third and fourth possibility, as shown below: α goes from 0 (i.e. the first limit situation) to π/2 (i.e. the straight bounce-back), and then π (i.e. the second limit situation).
Now, I don’t know about your intuition, but my intuition tells me that the relationship between α and θ should be very straightforward. I should have looked it up but so I didn’t. I used a heuristic method. 🙂 I basically roughly measured the α and θ angles in those drawings of billiard ball collisions and I got the following table:
Now, the table suggests that θ is, quite simply, twice the angle of incidence α: θ = 2α. What if α becomes larger than π? As mentioned above, that’s logically impossible: the angle α describes all possible situations. 🙂
Now, it’s easy to see that α depends on the vertical distance between the centers of the two balls, which I denoted by Δy in my rather primitive drawing below.
So α depends on Δy, but how exactly? My drawing below shows how. If d is the separation between the two centers, then sin(π/2–α) = cosα = Δy/d. Note that d is always the same when the collision happens, and d is twice the radius of our billiard balls. I’ll denote the radius of our billiard balls with r and, hence, d = 2r. That gives us the first of two equations:
cosα = Δy/2r [i.e. equation 1]
Also note that, while α is always positive (it varies between 0 and π), Δy can become negative: that happens when the center of the black ball is ‘higher’ than the center of the red ball and it’s entirely consistent with the conventions and definitions we use: when the angle α goes past π/2, π/2–α becomes negative and, hence, sin(π/2–α) = cosα also becomes negative. [I am just noting this because I had to look at this a couple of times to make sure I got it right.]
Now, the Δy distance obviously depends on the firing angle of our pitching machine, which I denoted by ε below. To be precise, Δy/2 = R·sinε with R the distance between our pitching machine and the collision point. To be precise, it should be the distance between the pitching machine and the center of the ball but, because of the scales involved (the radius of the balls is very small as compared to the distances involved), we can equate both. [Note again that Δy can be positive or negative and, hence, ε (and, hence, sinε) can also be positive or negative, just like its sine.] In any case, we get the second of two equations:
Δy = 2R·sinε [i.e. equation 2]
Now, this story is becoming way too long but just hang in here: we’re almost there. Combining the cosα = Δy/2r and Δy = 2R·sinε formulas, we get cosα = (R/r)·sinε. But, as we know that θ/2 = α, we get the following relationship between θ and ε:
cos(θ/2) = (R/r)sinε [i.e. the final equation]
Before we continue, we should note the constraint on ε: the pitching machine should be precise enough to ensure collision. Hence, there’s a maximum and minimum value for ε. I’ll leave it to you to work this out and just give the solution: the maximum and minimum value for ε is r/R and –r/R. In fact, now that I am here, I should note this calculation involves a so-called small angle approximation: if ε is small, we can indeed write that sinε ≈ ε. So we can simplify the formula above even more and just write:
cos(θ/2) = (R/r)ε [i.e. the same equation, but simplified]
In case you’d have doubts, this expression makes sense once again: we know ε ranges from –r/R to +r/R and, hence, the value of the expression on the right-hand side varies from –1 to +1, just like a cosine. We can also use the inverse cosine function (i.e. the arccos(x) function) and write:
θ = 2·arccos[(R/r)ε]
So that’s the equation as the final equation, but in different form. 🙂 In case you forgot about the arcsin and arccos functions, think about the etymology of the arc– prefix: the arccos(x) function should give us the arc (i.e. the angle expressed in radians) whose cosine is x. For clarity, I’ve copied the graphs below.
Now we need to relate the probabilities for ε to the probabilities for θ. Huh? Yes. Believe it or not, that’s the objective of this digression: there’s (classical) uncertainty in the firing angle ε and we want to know how that translates into (classical) uncertainty about θ.
Hmm… How do we do that, going from a probability distribution in one variable to a probability distribution in another? To keep things simple, we’ll first assume that all angles ε are equally likely. We then have that θ = 2·arccos[(R/r)ε] relation above, with (R/r)ε, ranging from –1 to +1, as shown below and, because we have a one-to-one relation between θ and ε, all angles θ will also be equally likely. [Note that, because of our definitions above, the angle θ ranges from 0 to 2π, rather than from −π to π, as in the earlier presentations of what’s going on. It amounts to the same, but not having to deal with two different directions for θ simplifies the math.]
Ha! The example shows it’s not complicated at all: because we have a one-to-one relation between θ and ε, the two probability distributions, i.e. the one for ε and for θ will be exactly the same. So, yes, if ε is normally distributed between –r/R and +r/R, then θ will also be normally distributed, like in the example below, which shows θ ranging from 0 to 2π with the highest probabilities being associated with values θ near π. So, yes, that means that our billiard balls are most likely to just bounce back (provided our pitching machine is somewhat accurate, of course!).
OK. Done! Finally! [I can hear your sigh of relief indeed. :-)] Yes. Done. But what about that remark about reducing the size of our billiard balls, so they become point-like? Well… If we impose the condition that the uncertainty in the firing angle of our pitching machine should be small enough so as to ensure those billiard balls (i.e. our particles) still hit each other, then the uncertainty in the firing range should also decrease as we decrease the size of our billiard balls. Taking it to the limit, we should conclude that, even when our billiard balls reach zero size and, hence, when we cannot determine what the tangent line is, the probability function for θ should still reflect the probability function for ε. In short, the point about particles having to be point-like is not very relevant, and it surely does not change the identified inconsistency in the thought experiment we presented in the previous section: even with point-like particles, we should not get a constant for |f(θ)|2 and |f(π–θ)|2.
So… The conclusion has to be that my choice of f(θ) was wrong, indeed. It’s actually quite obvious: one should not expect to get a probability density function that favors one angle over another if the underlying wave function doesn’t favor one angle over another either. So we should correct this example. However, I’ll leave that as yet another exercise for you. 🙂 When everything is said and done, I just wanted to show that the math and the physics of quantum mechanics make sense:
- The example shows that we do need complex-valued wave functions (rather than some real-valued function) to explain the interference patterns that we observe in experiments, and
- It also shows that the theory respects Occam’s Razor: there’s parsimony in the explanation indeed. Every variable and assumption makes sense, and none of them is superfluous.
Post scriptum: Finding a function f(θ) that yields a non-constant |f(θ)|2 and |f(π–θ)|2 may seem to be quite straightforward. In fact, we can easily associate a real-valued f(θ) function with any normal distribution. Look at the formula I used in the graph above: it’s the formula for the normal distribution, with σ = 1 and μ = π:
Mind you: this is a probability density function, so it’s a P(θ) function, not an f(θ) wave function. But so we could take the square root of this function, and then we have an f(θ) function that, when squared, will indeed give us back that normally distributed P(θ) function. Let me show how. Note that the square in the exponent itself (–(x–μ)2/2) does not fall away when we’re taking the square of the whole function. Indeed, you should note that [exp(x2)]^(1/2) = exp(x2/2) ≠ exp(x). What we get is something like what’s shown below. The blue graph is the probability density function. The red graph is the square root of that function.
What about the f(π–θ) function? Well… Same graphs but shifted by π, as shown below:
Note we should add these two probability functions. To do that, we should ‘move’ the left half of this curve (i.e. the graph between –π to 0) to the [π, 2π] interval to ensure compatibility in the way we define the angle θ for both graphs: we cannot have the same θ going from −π to +π for one graph, and from 0 to 2π in the other. Else, we can also move the ‘right half’ of the other curve (i.e. the graph between π and 2π) to ensure both graphs respect the other convention, i.e. −π < θ < +π. Then we’ll have something that resembles what we started this section with, but with a much smaller standard deviation (see below).
Now, I know what you’ll say now: these two probability functions do not add up to a constant. No. They don’t. Why should they? We’ve allowed for uncertainty and, hence, some angles are likely to be associated with higher probabilities than others. To be precise: we have a combined probability function favoring (i) 0 (i.e. the red part of the graph) and (ii) π, or –π, which is the same angle (i.e. the blue part of the graph). Now, that’s exactly what we’d expect: if our pitching machine is somewhat accurate, the billiard balls are most likely to just bounce back.
Note that we don’t have interference effect here, and that’s not because our wave functions are real-valued but because they don’t overlap: one is centered around zero, and the other is centered around ±π.
I’ll make a last note here. What’s interference from a mathematical point of view? Let’s not assume adjacent or oppositive angles, or whatever other symmetric set-up. Let’s just assume we have two identical wavefunctions interacting with each other but with a different phase, which I’ll just write as α and β. So it’s the same function (e.g. Φ) but we add/subtract Φ(α) and Φ(β). What we’re doing is adding/subtracting sine and cosine functions. Let’s, once again, take an extremely simplified functional form for Φ: Φ(θ) = eiθ. [Again, we don’t care about normalization here.] More specifically, when adding, we get:
Φ(α) + Φ(β) = cosα + isinα + cosβ + isinβ = (cosα + cosβ) + i(sinα + sinβ)
When subtracting, we get:
Φ(α) – Φ(β) = cosα + isinα – cosβ – isinβ = (cosα – cosβ) + i(sinα – sinβ)
There’s trigonometric formulas for adding/subtracting sines and cosines (just look it up), which allow us write these sums as a product:
- sinα + sinβ = 2sin[(α+β)/2]cos[(α–β)/2]
- sinα – sinβ = 2sin[(α–β)/2]cos[(α+β)/2]
- cosα + cosβ = 2cos[(α+β)/2]cos[(α–β)/2]
- cosα – cosβ = –2sin[(α+β)/2]sin[(α–β)/2]
However, that’s just another way of presenting the same thing. The point to note is that both the real as well as the imaginary part of the wavefunction interfere. Hence, we can have real-valued wavefunctions that interfere too. The example above was just particular: we did not have interference there because the functions did not overlap: when the value of one was non-zero, the value of the other was zero, and vice versa. That’s all. But, in general, real-valued wavefunctions will interfere with each other, just like real waves do. 🙂
What if we’d equate α and β? When adding the two probability amplitudes, we get Φ(α) + Φ(α) = 2Φ(α). Taking the absolute square of that gives us 22|Φ(α)|2 = 4|Φ(α)|2. However, we have to re-normalize here and, hence, divide by 2, so we get 2|Φ(α)|2. So we get a probability density that’s twice the probability density associated with the one particle on its own. This is related to what I wrote on bosons: the addition rule not only allows for a potentially infinite number of them all piling up one on top of the other, but also actually increases the probability of getting one more boson in when other bosons are already there, which explains how lasers work (see my post on that). To be precise, the probability of getting a boson, where there are already n, is (n+1) times higher than it would be if there were none before.Conversely, when subtracting the two probability amplitudes, we obviously get Φ(α) – Φ(α) = 0, and the absolute square of zero is zero. That’s why two identical fermions 2Φ cannot be together: the associated probability is zero. But, as mentioned, I wrote about this before and, hence, I should really conclude this rather long post. I hope you enjoyed it, and that you got the key ideas. 🙂