Planck’s constant (II)

My previous post was tough. Tough for you–if you’ve read it. But tough for me too. 🙂

The blackbody radiation problem is complicated but, when everything is said and done, what the analysis says is that the the ‘equipartition theorem’ in the kinetic theory of gases ‘theorem (or the ‘theorem concerning the average energy of the center-of-mass motion’, as Feynman terms it), is not correct. That equipartition theorem basically states that, in thermal equilibrium, energy is shared equally among all of its various forms. For example, the average kinetic energy per degree of freedom in the translation motion of a molecule should equal that of its rotational motions. That equipartition theorem is also quite precise: it also states that the mean energy, for each atom or molecule, for each degree of freedom, is kT/2. Hence, that’s the (average) energy the 19th century scientists also assigned to the atomic oscillators in a gas.

However, the discrepancy between the theoretical and empirical result of their work shows that adding atomic oscillators–as radiators and absorbers of light–to the system (a box of gas that’s being heated) is not just a matter of adding additional ‘degree of freedom’ to the system. It can’t be analyzed in ‘classical’ terms: the actual spectrum of blackbody radiation shows that these atomic oscillators do not absorb, on average, an amount of energy equal to kT/2. Hence, they are not just another ‘independent direction of motion’.

So what are they then? Well… Who knows? I don’t. But, as I didn’t quite go through the full story in my previous post, the least I can do is to try to do that here. It should be worth the effort. In Feynman’s words: “This was the first quantum-mechanical formula ever known, or discussed, and it was the beautiful culmination of decades of puzzlement.” And then it does not involve complex numbers or wave functions, so that’s another reason why looking at the detail is kind of nice. 🙂

Discrete energy levels and the nature of h

To solve the blackbody radiation problem, Planck assumed that the permitted energy levels of the atomic harmonic oscillator were equally spaced, at ‘distances’ ħωapart from each other. That’s what’s illustrated below.

Equally space energy levels

Now, I don’t want to make too many digressions from the main story, but this En = nħω0 formula obviously deserves some attention. First note it immediately shows why the dimension of ħ is expressed in joule-seconds (J·s), or electronvolt-seconds (J·s): we’re multiplying it with a frequency indeed, so that’s something expressed per second (hence, its dimension is s–1) in order to get a measure of energy: joules or, because of the atomic scale, electronvolts. [The eV is just a (much) smaller measure than the joule, but it amounts to the same: 1 eV ≈ 1.6×10−19 J.]

One thing to note is that the equal spacing consists of distances equal to ħω0, not of ħ. Hence, while h, or ħ (ħ is the constant to be used when the frequency is expressed in radians per second, rather than oscillations per second, so ħ = h/2π) is now being referred to as the quantum of action (das elementare Wirkungsquantum in German), Planck referred to it as as a Hilfsgrösse only (that’s why he chose the h as a symbol, it seems), so that’s an auxiliary constant only: the actual quantum of action is, of course, ΔE, i.e. the difference between the various energy levels, which is the product of ħ and ω(or of h and ν0 if we express frequency in oscillations per second, rather than in angular frequency). Hence, Planck (and later Einstein) did not assume that an atomic oscillator emits or absorbs packets of energy as tiny as ħ or h, but packets of energy as big as ħωor, what amounts to the same (ħω = (h/2π)(2πν) = hν), hν0. Just to give an example, the frequency of sodium light (ν) is 500×1012 Hz, and so its energy is E = hν. That’s not a lot–about 2 eV only– but it still packs 500×1012 ‘quanta of action’ !

Another thing is that ω (or ν) is a continuous variable: hence, the assumption of equally spaced energy levels does not imply that energy itself is a discrete variable: light can have any frequency and, hence, we can also imagine photons with any energy level: the only thing we’re saying is that the energy of a photon of a specific color (i.e. a specific frequency ν) will be a multiple of hν.

Probability assumptions

The second key assumption of Planck as he worked towards a solution of the blackbody radiation problem was that the probability (P) of occupying a level of energy E is P(EαeE/kT. OK… Why not? But what is this assumption really? You’ll think of some ‘bell curve’, of course. But… No. That wouldn’t make sense. Remember that the energy has to be positive. The general shape of this P(E) curve is shown below.

graph

The highest probability density is near E = 0, and then it goes down as E gets larger, with kT determining the slope of the curve (just take the derivative). In short, this assumption basically states that higher energy levels are not so likely, and that very high energy levels are very unlikely. Indeed, this formula implies that the relative chance, i.e. the probability of being in state E1 relative to the chance of being in state E0, is P1/Pe−(E1–E0)k= e−ΔE/kT. Now, Pis n1/N and Pis n0/N and, hence, we find that nmust be equal to n0e−ΔE/kT. What this means is that the atomic oscillator is less likely to be in a higher energy state than in a lower one.

That makes sense, doesn’t it? I mean… I don’t want to criticize those 19th century scientists but… What were they thinking? Did they really imagine that infinite energy levels were as likely as… Well… More down-to-earth energy levels? I mean… A mechanical spring will break when you overload it. Hence, I’d think it’s pretty obvious those atomic oscillators cannot be loaded with just about anything, can they? Garbage in, garbage out:  of course, that theoretical spectrum of blackbody radiation didn’t make sense!

Let me copy Feynman now, as the rest of the story is pretty straightforward:

Now, we have a lot of oscillators here, and each is a vibrator of frequency w0. Some of these vibrators will be in the bottom quantum state, some will be in the next one, and so forth. What we would like to know is the average energy of all these oscillators. To find out, let us calculate the total energy of all the oscillators and divide by the number of oscillators. That will be the average energy per oscillator in thermal equilibrium, and will also be the energy that is in equilibrium with the blackbody radiation and that should go in the equation for the intensity of the radiation as a function of the frequency, instead of kT. [See my previous post: that equation is I(ω) = (ω2kt)/(π2c2).]

Thus we let N0 be the number of oscillators that are in the ground state (the lowest energy state); N1 the number of oscillators in the state E1; N2 the number that are in state E2; and so on. According to the hypothesis (which we have not proved) that in quantum mechanics the law that replaced the probability eP.E./kT or eK.E./kT in classical mechanics is that the probability goes down as eΔE/kT, where ΔE is the excess energy, we shall assume that the number N1 that are in the first state will be the number N0 that are in the ground state, times e−ħω/kT. Similarly, N2, the number of oscillators in the second state, is N=N0e−2ħω/kT. To simplify the algebra, let us call e−ħω/k= x. Then we simply have N1 = N0x, N2 = N0x2, …, N= N0xn.

The total energy of all the oscillators must first be worked out. If an oscillator is in the ground state, there is no energy. If it is in the first state, the energy is ħω, and there are N1 of them. So N1ħω, or ħωN0x is how much energy we get from those. Those that are in the second state have 2ħω, and there are N2 of them, so N22ħω=2ħωN0x2 is how much energy we get, and so on. Then we add it all together to get Etot = N0ħω(0+x+2x2+3x3+…).

And now, how many oscillators are there? Of course, N0 is the number that are in the ground state, N1 in the first state, and so on, and we add them together: Ntot = N0(1+x+x2+x3+…). Thus the average energy is

formula

Now the two sums which appear here we shall leave for the reader to play with and have some fun with. When we are all finished summing and substituting for x in the sum, we should get—if we make no mistakes in the sum—
energy

Feynman concludes as follows: “This, then, was the first quantum-mechanical formula ever known, or ever discussed, and it was the beautiful culmination of decades of puzzlement. Maxwell knew that there was something wrong, and the problem was, what was right? Here is the quantitative answer of what is right instead of kT. This expression should, of course, approach kT as ω → 0 or as → .”

It does, of course. And so Planck’s analysis does result in a theoretical I(ω) curve that matches the observed I(ω) curve as a function of both temperature (T) and frequency (ω). But so what it is, then? What’s the equation describing the dotted curves? It’s given below:

formula blackbody

I’ll just quote Feynman once again to explain the shape of those dotted curves: “We see that for a large ω, even though we have ωin the numerator, there is an e raised to a tremendous power in the denominator, so the curve comes down again and does not “blow up”—we do not get ultraviolet light and x-rays where we do not expect them!”

Is the analysis necessarily discrete?

One question I can’t answer, because I just am not strong enough in math, is the question or whether or not there would be any other way to derive the actual blackbody spectrum. I mean… This analysis obviously makes sense and, hence, provides a theory that’s consistent and in accordance with experiment. However, the question whether or not it would be possible to develop another theory, without having recourse to the assumption that energy levels in atomic oscillators are discrete and equally spaced with the ‘distance’ between equal to hν0, is not easy to answer. I surely can’t, as I am just a novice, but I can imagine smarter people than me have thought about this question. The answer must be negative, because I don’t know of any other theory: quantum mechanics obviously prevailed. Still… I’d be interested to see the alternatives that must have been considered.

Post scriptum: The “playing with the sums” is a bit confusing. The key to the formula above is the substitution of (0+x+2x2+3x3+…)/(1+x+x2+x3+…) by 1/[(1/x)–1)] = 1/[eħω/kT–1]. Now, the denominator 1+x+x2+x3+… is the Maclaurin series for 1/(1–x). So we have:

(0+x+2x2+3x3+…)/(1+x+x2+x3+…) = (0+x+2x2+3x3+…)(1–x)

x+2x2+3x3… –x22x3–3x4… = x+x2+x3+x4

= –1+(1+x+x2+x3…) = –1 + 1/(1–x) = –(1–x)+1/(1–x) = x/(1–x).

Note the tricky bit: if x = e−ħω/kT, then eħω/kis x−1 = 1/x, and so we have (1/x)–1 in the denominator of that (mean) energy formula, not 1/(x–1). Now 1/[(1/x)–1)] = 1/[(1–x)/x] = x/(1–x), indeed, and so the formula comes out alright.

Diffraction and the Uncertainty Principle (II)

In my previous post, I derived and explained the general formula for the pattern generated by a light beam going through a slit or a circular aperture: the diffraction pattern. For light going through an aperture, this generates the so-called Airy pattern. In practice, diffraction causes a blurring of the image, and may make it difficult to distinguish two separate points, as shown below (credit for the image must go to Wikipedia again, I am afraid).

Airy_disk_spacing_near_Rayleigh_criterion

What’s actually going on is that the lens acts as a slit or, if it’s circular (which is usually the case), as an aperture indeed: the wavefront of the transmitted light is taken to be spherical or plane when it exits the lens and interferes with itself, thereby creating the ring-shaped diffraction pattern that we explained in the previous post.

The spatial resolution is also known as the angular resolution, which is quite appropriate, because it refers to an angle indeed: we know the first minimum (i.e. the first black ring) occurs at an angle θ such that sinθ = λ/L, with λ the wavelength of the light and L the lens diameter. It’s good to remind ourselves of the geometry of the situation: below we picture the array of oscillators, and so we know that the first minimum occurs at an angle such that Δ = λ. The second, third, fourth etc minimum occurs at an angle θ such that Δ = 2λ, 3λ, 4λ, etc. However, these secondary minima do not play any role in determining the resolving power of a lens, or a telescope, or an electron microscope, etc, and so you can just forget about them for the time being.

geometry

For small angles (expressed in radians), we can use the so-called small-angle approximation and equate sinθ with θ: the error of this approximation is less than one percent for angles smaller than 0.244 radians (14°), so we have the amazingly simply result that the first minimum occurs at an angle θ such that:

θ = λ/L

Spatial resolution of a microscope: the Rayleigh criterion versus Dawes’ limit 

If we have two point sources right next to each other, they will create two Airy disks, as shown above, which may overlap. That may make it difficult to see them, in a telescope, a microscope, or whatever device. Hence, telescopes, microscopes (using light or electron beams or whatever) have a limited resolving power. How do we measure that?

The so-called Rayleigh criterion regards two point sources as just resolved when the principal diffraction maximum of one image coincides with the first minimum of the other, as shown below. If the distance is greater, the two points are (very) well resolved, and if it is smaller, they are regarded as not resolved. This angle is obviously related to the θ = λ/L angle but it’s not the same: in fact, it’s a slightly wider angle. The analysis involved in calculating the angular resolution in terms of angle, and we use the same symbol θ for it, is quite complicated and so I’ll skip that and just give you the result:

θ = 1.22λ/L

two point sourcesRayleigh criterion

Note that, in this equation, θ stands for the angular resolution, λ for the wavelength of the light being used, and L is the diameter of the (aperture of) the lens. In the first of the three images above, the two points are well separated and, hence, the angle between them is well above the angular resolution. In the second, the angle between just meets the Rayleigh criterion, and in the third the angle between them is smaller than the angular resolution and, hence, the two points are not resolved.

Of course, the Rayleigh criterion is, to some extent, a matter of judgment. In fact, an English 19th century astronomer, named William Rutter Dawes, actually tested human observers on close binary stars of equal brightness, and found they could make out the two stars within an angle that was slightly narrower than the one given by the Rayleigh criterion. Hence, for an optical telescope, you’ll also find the simple θ = λ/L formula, so that’s the formula without the 1.22 factor (of course, λ here is, once again, the wavelength of the observed light or radiation, and L is the diameter of the telescope’s primary lens). This very simple formula allows us, for example, to calculate the diameter of the telescope lens we’d need to build to separate (see) objects in space with a resolution of, for example, 1 arcsec (i.e. 1/3600 of a degree or π/648,000 of a radian). Indeed, if we filter for yellow light only, which has a wavelength of 580 nm, we find L = 580×10−9 m/(π/648,000) = 0.119633×10−6 m ≈ 12 cm. [Just so you know: that’s about the size of the lens aperture of a good telescope (4 or 6 inches) for amateur astronomers–just in case you’d want one. :-)]

This simplified formula is called Dawes’ limit, and you’ll often see it used instead of Rayleigh’s criterion. However, the fact that it’s exactly the same formula as our formula for the first minimum of the Airy pattern should not confuse you: angular resolution is something different.

Now, after this introduction, let me get to the real topic of this post: Heisenberg’s Uncertainty Principle according to Heisenberg.

Heisenberg’s Uncertainty Principle according to Heisenberg

I don’t know about you but, as a kid, I didn’t know much about waves and fields and all that, and so I had difficulty understanding why the resolving power of a microscope or any other magnifying device depended on the frequency or wavelength. I now know my understanding was limited because I thought the concept of the amplitude of an electromagnetic wave had some spatial meaning, like the amplitude of a water or a sound wave. You know what I mean: this false idea that an electromagnetic wave is something that sort of wriggles through space, just like a water or sound wave wriggle through their medium (water and air respectively). Now I know better: the amplitude of an electromagnetic wave measures field strength and there’s no medium (no aether). So it’s not like a wave going around some object, or making some medium oscillate. I am not ashamed to acknowledge my stupidity at the time: I am just happy I finally got it, because it helps to really understand Heisenberg’s own illustration of his Uncertainty Principle, which I’ll present now.

Heisenberg imagined a gamma-ray microscope, as shown below (I copied this from the website of the American Institute for Physics ). Gamma-ray microscopes don’t exist – they’re hard to produce: you need a nuclear reactor or so 🙂 – but, as Heisenberg saw the development of new microscopes using higher and higher energy beams (as opposed to the 1.5-3 eV light in the visible spectrum) so as to increase the angular resolution and, hence, be able to see smaller things, he imagined one could use, perhaps, gamma-rays for imaging. Gamma rays are the hardest radiation, with frequencies of 10 exaherz and more (or >1019 Hz) and, hence, energies above 100 keV (i.e. 100,000 more than photons in the visible light spectrum, and 1000 times more than the electrons used in an average electron microscope). Gamma rays are not the result of some electron jumping from a higher to a lower energy level: they are emitted in decay processes of atomic nuclei (gamma decay). But I am digressing. Back to the main story line. So Heisenberg imagined we could ‘shine’ gamma rays on an electron and that we could then ‘see’ that electron in the microscope because some of the gamma photons would indeed end up in the microscope after their ‘collision’ with the electron, as shown below.

gammaray

The experiment is described in many places elsewhere but I found these accounts often confusing, and so I present my own here. 🙂

What Heisenberg basically meant to show is that this set-up would allow us to gather precise information on the position of the electron–because we would know where it was–but that, as a result, we’d lose information in regard to its momentum. Why? To put it simply: because the electron recoils as a result of the interaction. The point, of course, is to calculate the exact relationship between the two (position and momentum). In other words: what we want to do is to state the Uncertainty Principle quantitatively, not qualitatively.

Now, the animation above uses the symbol L for the γ-ray wavelength λ, which is confusing because I used L for the diameter of the aperture in my explanation of diffraction above. The animation above also uses a different symbol for the angular resolution: A instead of θ. So let me borrow the diagram used in the Wikipedia article and rephrase the whole situation.

Heisenberg_Microscope

From the diagram above, it’s obvious that, to be scattered into the microscope, the γ-ray photon must be scattered into a cone with angle ε. That angle is obviously related to the angular resolution of the microscope, which is θ = ε/2 = λ/D, with D the diameter of the aperture (i.e. the primary lens). Now, the electron could actually be anywhere, and the scattering angle could be much larger than ε, and, hence, relating D to the uncertainty in position (Δx) is not as obvious as most accounts of this thought experiment make it out to be. The thing is: if the scattering angle is larger than ε, it won’t reach the light detector at the end of the microscope (so that’s the flat top in the diagram above). So that’s why we can equate D with Δx, so we write Δx = ± D/2 = D. To put it differently: the assumption here is basically that this imaginary microscope ‘sees’ an area that is approximately as large as the lens. Using the small-angle approximation (so we write sin(2ε) ≈ 2ε), we can write:

Δx = 2λ/ε

Now, because of the recoil effect, the electron receives some momentum from the γ-ray photon. How much? Well… The situation is somewhat complicated (much more complicated than the Wikipedia article on this very same topic suggests), because the photon keeps some but also gives some of its original momentum. In fact, what’s happening really is Compton scattering: the electron first absorbs the photon, and then emits another with a different energy and, hence, also with different frequency and wavelength. However, what we do now is that the photon’s original momentum was equal to E/c= p = h/λ. That’s just the Planck relation or, if you’d want to look at the photon as a particle, the de Broglie equation.

Now, because we’re doing an analysis in one dimension only (x), we’re only going to look at the momentum in this direction only, i.e. px, and we’ll assume that all of the momentum of the photon before the interaction (or ‘collision’ if you want) was horizontal. Hence, we can write p= h/λ. After the collision, however, this momentum is spread over the electron and the scattered or emitted photon that’s going into the microscope. Let’s now imagine the two extremes:

  1. The scattered photon goes to the left edge of the lens. Hence, its horizontal momentum is negative (because it moves to the left) and the momentum pwill be distributed over the electron and the photon such that p= p’–h(ε/2)/λ’. Why the ε/2 factor? Well… That’s just trigonometry: the horizontal momentum of the scattered photon is obviously only a tiny fraction of its original horizontal momentum, and that fraction is given by the angle ε/2.
  2. The scattered photon goes to the right edge of the lens. In that case, we write p= p”+ h(ε/2)/λ”.

Now, the spread in the momentum of the electron, which we’ll simply write as Δp, is obviously equal to:

Δp = p”– p’= p+ h(ε/2)/λ” – p+ h(ε/2)/λ’ = h(ε/2)/λ” + h(ε/2)/λ’ = h(ε/2)/λ” + h(ε/2)/λ’

That’s a nice formula, but what can we do with it? What we want is a relationship between Δx and Δp, i.e. the position and the momentum of the electron, and of the electron only. That involves another simplification, which is also dealt with very summarily – too summarily in my view – in most accounts of this experiment. So let me spell it out. The angle ε is obviously very small and, hence, we may equate λ’ and λ”. In addition, while these two wavelengths differ from the wavelength of the incoming photon, the scattered photon is, obviously, still a gamma ray and, therefore, we are probably not too far off when substituting both λ’ and λ” for λ, i.e. the frequency of the incoming γ-ray. Now, we can re-write Δx = 2λ/ε as 1/Δx = ε/(2λ). We then get:

Δp = p”– p’= hε/2λ” + hε/2λ’ = 2hε/2λ = 2h/Δx

Now that yields ΔpΔx = 2h, which is an approximate expression of Heisenberg’s Uncertainty Principle indeed (don’t worry about the factor 2, as that’s something that comes with all of the approximations).

A final moot point perhaps: it is obviously a thought experiment. Not only because we don’t have gamma-ray microscopes (that’s not relevant because we can effectively imagine constructing one) but because the experiment involves only one photon. A real microscope would organize a proper beam, but that would obviously complicate the analysis. In fact, it would defeat the purpose, because the whole point is to analyze one single interaction here.

The interpretation

Now how should we interpret all of this? Is this Heisenberg’s ‘proof’ of his own Principle? Yes and no, I’d say. It’s part illustration, and part ‘proof’, I would say. The crucial assumptions here are:

  1. We can analyze γ-ray photons, or any photon for that matter, as particles having some momentum, and when ‘colliding’, or interacting, with an electron, the photon will impart some momentum to that electron.
  2. Momentum is being conserved and, hence, the total (linear) momentum before and after the collision, considering both particles–i.e. (1) the incoming ray and the electron before the interaction and (2) the emitted photon and the electron that’s getting the kick after the interaction–must be the same.
  3. For the γ-ray photon, we can relate (or associate, if you prefer that term) its wavelength λ with its momentum p through the Planck relation or, what amounts to the same for photons (because they have no mass), the de Broglie relation.

Now, these assumptions are then applied to an analysis of what we know to be true from experiment, and that’s the phenomenon of diffraction, part of which is the observation that the resolving power of a microscope is limited, and that its resolution is given by the θ = λ/D equation.

Bringing it all together, then gives us a theory which is consistent with experiment and, hence, we then assume the theory is true. Why? Well… I could start a long discourse here on the philosophy of science but, when everything is said and done, we should admit we don’t any ‘better’ theory.

But, you’ll say: what’s a ‘better’ theory? Well… Again, the answer to that question is the subject-matter of philosophers. As for me, I’d just refer to what’s known as Occam’s razor: among competing hypotheses, we should select the one with the fewest assumptions. Hence, while more complicated solutions may ultimately prove correct, the fewer assumptions that are made, the better. Now, when I was a kid, I thought quantum mechanics was very complicated and, hence, describing it here as a ‘simple’ theory sounds strange. But that’s what it is in the end: there’s no better (read: simpler) way to describe, for example, why electrons interfere with each other, and with themselves, when sending them through one or two slits, and so that’s what all these ‘illustrations’ want to show in the end, even if you think there must be simpler way to describe reality. As said, as a kid, I thought so too. 🙂