In my previous post, I derived and explained the general formula for the pattern generated by a light beam going through a slit or a circular aperture: the diffraction pattern. For light going through an aperture, this generates the so-called Airy pattern. In practice, diffraction causes a blurring of the image, and may make it difficult to distinguish two separate points, as shown below (credit for the image must go to Wikipedia again, I am afraid).
What’s actually going on is that the lens acts as a slit or, if it’s circular (which is usually the case), as an aperture indeed: the wavefront of the transmitted light is taken to be spherical or plane when it exits the lens and interferes with itself, thereby creating the ring-shaped diffraction pattern that we explained in the previous post.
The spatial resolution is also known as the angular resolution, which is quite appropriate, because it refers to an angle indeed: we know the first minimum (i.e. the first black ring) occurs at an angle θ such that sinθ = λ/L, with λ the wavelength of the light and L the lens diameter. It’s good to remind ourselves of the geometry of the situation: below we picture the array of oscillators, and so we know that the first minimum occurs at an angle such that Δ = λ. The second, third, fourth etc minimum occurs at an angle θ such that Δ = 2λ, 3λ, 4λ, etc. However, these secondary minima do not play any role in determining the resolving power of a lens, or a telescope, or an electron microscope, etc, and so you can just forget about them for the time being.
For small angles (expressed in radians), we can use the so-called small-angle approximation and equate sinθ with θ: the error of this approximation is less than one percent for angles smaller than 0.244 radians (14°), so we have the amazingly simply result that the first minimum occurs at an angle θ such that:
θ = λ/L
Spatial resolution of a microscope: the Rayleigh criterion versus Dawes’ limit
If we have two point sources right next to each other, they will create two Airy disks, as shown above, which may overlap. That may make it difficult to see them, in a telescope, a microscope, or whatever device. Hence, telescopes, microscopes (using light or electron beams or whatever) have a limited resolving power. How do we measure that?
The so-called Rayleigh criterion regards two point sources as just resolved when the principal diffraction maximum of one image coincides with the first minimum of the other, as shown below. If the distance is greater, the two points are (very) well resolved, and if it is smaller, they are regarded as not resolved. This angle is obviously related to the θ = λ/L angle but it’s not the same: in fact, it’s a slightly wider angle. The analysis involved in calculating the angular resolution in terms of angle, and we use the same symbol θ for it, is quite complicated and so I’ll skip that and just give you the result:
θ = 1.22λ/L
Note that, in this equation, θ stands for the angular resolution, λ for the wavelength of the light being used, and L is the diameter of the (aperture of) the lens. In the first of the three images above, the two points are well separated and, hence, the angle between them is well above the angular resolution. In the second, the angle between just meets the Rayleigh criterion, and in the third the angle between them is smaller than the angular resolution and, hence, the two points are not resolved.
Of course, the Rayleigh criterion is, to some extent, a matter of judgment. In fact, an English 19th century astronomer, named William Rutter Dawes, actually tested human observers on close binary stars of equal brightness, and found they could make out the two stars within an angle that was slightly narrower than the one given by the Rayleigh criterion. Hence, for an optical telescope, you’ll also find the simple θ = λ/L formula, so that’s the formula without the 1.22 factor (of course, λ here is, once again, the wavelength of the observed light or radiation, and L is the diameter of the telescope’s primary lens). This very simple formula allows us, for example, to calculate the diameter of the telescope lens we’d need to build to separate (see) objects in space with a resolution of, for example, 1 arcsec (i.e. 1/3600 of a degree or π/648,000 of a radian). Indeed, if we filter for yellow light only, which has a wavelength of 580 nm, we find L = 580×10−9 m/(π/648,000) = 0.119633×10−6 m ≈ 12 cm. [Just so you know: that’s about the size of the lens aperture of a good telescope (4 or 6 inches) for amateur astronomers–just in case you’d want one. :-)]
This simplified formula is called Dawes’ limit, and you’ll often see it used instead of Rayleigh’s criterion. However, the fact that it’s exactly the same formula as our formula for the first minimum of the Airy pattern should not confuse you: angular resolution is something different.
Now, after this introduction, let me get to the real topic of this post: Heisenberg’s Uncertainty Principle according to Heisenberg.
Heisenberg’s Uncertainty Principle according to Heisenberg
I don’t know about you but, as a kid, I didn’t know much about waves and fields and all that, and so I had difficulty understanding why the resolving power of a microscope or any other magnifying device depended on the frequency or wavelength. I now know my understanding was limited because I thought the concept of the amplitude of an electromagnetic wave had some spatial meaning, like the amplitude of a water or a sound wave. You know what I mean: this false idea that an electromagnetic wave is something that sort of wriggles through space, just like a water or sound wave wriggle through their medium (water and air respectively). Now I know better: the amplitude of an electromagnetic wave measures field strength and there’s no medium (no aether). So it’s not like a wave going around some object, or making some medium oscillate. I am not ashamed to acknowledge my stupidity at the time: I am just happy I finally got it, because it helps to really understand Heisenberg’s own illustration of his Uncertainty Principle, which I’ll present now.
Heisenberg imagined a gamma-ray microscope, as shown below (I copied this from the website of the American Institute for Physics ). Gamma-ray microscopes don’t exist – they’re hard to produce: you need a nuclear reactor or so 🙂 – but, as Heisenberg saw the development of new microscopes using higher and higher energy beams (as opposed to the 1.5-3 eV light in the visible spectrum) so as to increase the angular resolution and, hence, be able to see smaller things, he imagined one could use, perhaps, gamma-rays for imaging. Gamma rays are the hardest radiation, with frequencies of 10 exaherz and more (or >1019 Hz) and, hence, energies above 100 keV (i.e. 100,000 more than photons in the visible light spectrum, and 1000 times more than the electrons used in an average electron microscope). Gamma rays are not the result of some electron jumping from a higher to a lower energy level: they are emitted in decay processes of atomic nuclei (gamma decay). But I am digressing. Back to the main story line. So Heisenberg imagined we could ‘shine’ gamma rays on an electron and that we could then ‘see’ that electron in the microscope because some of the gamma photons would indeed end up in the microscope after their ‘collision’ with the electron, as shown below.
The experiment is described in many places elsewhere but I found these accounts often confusing, and so I present my own here. 🙂
What Heisenberg basically meant to show is that this set-up would allow us to gather precise information on the position of the electron–because we would know where it was–but that, as a result, we’d lose information in regard to its momentum. Why? To put it simply: because the electron recoils as a result of the interaction. The point, of course, is to calculate the exact relationship between the two (position and momentum). In other words: what we want to do is to state the Uncertainty Principle quantitatively, not qualitatively.
Now, the animation above uses the symbol L for the γ-ray wavelength λ, which is confusing because I used L for the diameter of the aperture in my explanation of diffraction above. The animation above also uses a different symbol for the angular resolution: A instead of θ. So let me borrow the diagram used in the Wikipedia article and rephrase the whole situation.
From the diagram above, it’s obvious that, to be scattered into the microscope, the γ-ray photon must be scattered into a cone with angle ε. That angle is obviously related to the angular resolution of the microscope, which is θ = ε/2 = λ/D, with D the diameter of the aperture (i.e. the primary lens). Now, the electron could actually be anywhere, and the scattering angle could be much larger than ε, and, hence, relating D to the uncertainty in position (Δx) is not as obvious as most accounts of this thought experiment make it out to be. The thing is: if the scattering angle is larger than ε, it won’t reach the light detector at the end of the microscope (so that’s the flat top in the diagram above). So that’s why we can equate D with Δx, so we write Δx = ± D/2 = D. To put it differently: the assumption here is basically that this imaginary microscope ‘sees’ an area that is approximately as large as the lens. Using the small-angle approximation (so we write sin(2ε) ≈ 2ε), we can write:
Δx = 2λ/ε
Now, because of the recoil effect, the electron receives some momentum from the γ-ray photon. How much? Well… The situation is somewhat complicated (much more complicated than the Wikipedia article on this very same topic suggests), because the photon keeps some but also gives some of its original momentum. In fact, what’s happening really is Compton scattering: the electron first absorbs the photon, and then emits another with a different energy and, hence, also with different frequency and wavelength. However, what we do now is that the photon’s original momentum was equal to E/c= p = h/λ. That’s just the Planck relation or, if you’d want to look at the photon as a particle, the de Broglie equation.
Now, because we’re doing an analysis in one dimension only (x), we’re only going to look at the momentum in this direction only, i.e. px, and we’ll assume that all of the momentum of the photon before the interaction (or ‘collision’ if you want) was horizontal. Hence, we can write px = h/λ. After the collision, however, this momentum is spread over the electron and the scattered or emitted photon that’s going into the microscope. Let’s now imagine the two extremes:
- The scattered photon goes to the left edge of the lens. Hence, its horizontal momentum is negative (because it moves to the left) and the momentum px will be distributed over the electron and the photon such that px = p’x –h(ε/2)/λ’. Why the ε/2 factor? Well… That’s just trigonometry: the horizontal momentum of the scattered photon is obviously only a tiny fraction of its original horizontal momentum, and that fraction is given by the angle ε/2.
- The scattered photon goes to the right edge of the lens. In that case, we write px = p”x + h(ε/2)/λ”.
Now, the spread in the momentum of the electron, which we’ll simply write as Δp, is obviously equal to:
Δp = p”x – p’x = px + h(ε/2)/λ” – px + h(ε/2)/λ’ = h(ε/2)/λ” + h(ε/2)/λ’ = h(ε/2)/λ” + h(ε/2)/λ’
That’s a nice formula, but what can we do with it? What we want is a relationship between Δx and Δp, i.e. the position and the momentum of the electron, and of the electron only. That involves another simplification, which is also dealt with very summarily – too summarily in my view – in most accounts of this experiment. So let me spell it out. The angle ε is obviously very small and, hence, we may equate λ’ and λ”. In addition, while these two wavelengths differ from the wavelength of the incoming photon, the scattered photon is, obviously, still a gamma ray and, therefore, we are probably not too far off when substituting both λ’ and λ” for λ, i.e. the frequency of the incoming γ-ray. Now, we can re-write Δx = 2λ/ε as 1/Δx = ε/(2λ). We then get:
Δp = p”x – p’x = hε/2λ” + hε/2λ’ = 2hε/2λ = 2h/Δx
Now that yields ΔpΔx = 2h, which is an approximate expression of Heisenberg’s Uncertainty Principle indeed (don’t worry about the factor 2, as that’s something that comes with all of the approximations).
A final moot point perhaps: it is obviously a thought experiment. Not only because we don’t have gamma-ray microscopes (that’s not relevant because we can effectively imagine constructing one) but because the experiment involves only one photon. A real microscope would organize a proper beam, but that would obviously complicate the analysis. In fact, it would defeat the purpose, because the whole point is to analyze one single interaction here.
Now how should we interpret all of this? Is this Heisenberg’s ‘proof’ of his own Principle? Yes and no, I’d say. It’s part illustration, and part ‘proof’, I would say. The crucial assumptions here are:
- We can analyze γ-ray photons, or any photon for that matter, as particles having some momentum, and when ‘colliding’, or interacting, with an electron, the photon will impart some momentum to that electron.
- Momentum is being conserved and, hence, the total (linear) momentum before and after the collision, considering both particles–i.e. (1) the incoming ray and the electron before the interaction and (2) the emitted photon and the electron that’s getting the kick after the interaction–must be the same.
- For the γ-ray photon, we can relate (or associate, if you prefer that term) its wavelength λ with its momentum p through the Planck relation or, what amounts to the same for photons (because they have no mass), the de Broglie relation.
Now, these assumptions are then applied to an analysis of what we know to be true from experiment, and that’s the phenomenon of diffraction, part of which is the observation that the resolving power of a microscope is limited, and that its resolution is given by the θ = λ/D equation.
Bringing it all together, then gives us a theory which is consistent with experiment and, hence, we then assume the theory is true. Why? Well… I could start a long discourse here on the philosophy of science but, when everything is said and done, we should admit we don’t any ‘better’ theory.
But, you’ll say: what’s a ‘better’ theory? Well… Again, the answer to that question is the subject-matter of philosophers. As for me, I’d just refer to what’s known as Occam’s razor: among competing hypotheses, we should select the one with the fewest assumptions. Hence, while more complicated solutions may ultimately prove correct, the fewer assumptions that are made, the better. Now, when I was a kid, I thought quantum mechanics was very complicated and, hence, describing it here as a ‘simple’ theory sounds strange. But that’s what it is in the end: there’s no better (read: simpler) way to describe, for example, why electrons interfere with each other, and with themselves, when sending them through one or two slits, and so that’s what all these ‘illustrations’ want to show in the end, even if you think there must be simpler way to describe reality. As said, as a kid, I thought so too. 🙂