Planck’s constant (I)

Pre-script (dated 26 June 2020): This post did not suffer too much from the attack on this blog by the the dark force. It remains relevant. If anything, the removal of material will help you to think things through for yourself. 🙂

Original post:

If you made it here, it means you’re totally fed up with all of the easy stories on quantum mechanics: diffraction, double-slit experiments, imaginary gamma-ray microscopes,… You’ve had it! You now know what quantum mechanics is all about, and you’ve realized all these thought experiments never answer the tough question: where did Planck find that constant (h) which pops up everywhere? And how did he find that Planck relation which seems to underpin all and everything in quantum mechanics?

If you don’t know, that’s because you’ve skipped the blackbody radiation story. So let me give it to you here. What’s blackbody radiation?

Thermal equilibrium of radiation

That’s what the blackbody radiation problem is about: thermal equilibrium of radiation.

Huh? 

Yes. Imagine a box with gas inside. You’ll often see it’s described as a furnace, because we heat the box. Hence, the box, and everything inside, acquires a certain temperature, which we then assume to be constant. The gas inside will absorb energy and start emitting radiation, because the gas atoms or molecules are atomic oscillators. Hence, we have electrons getting excited and then jumping up and down from higher to lower energy levels, and then again and again and again, thereby emitting photons with a certain energy and, hence, light of a certain frequency. To put it simply: we’ll find light with various frequencies in the box and, in thermal equilibrium, we should have some distribution of the intensity of the light according to the frequency: what kind of radiation do we find in the furnace? Well… Let’s find out.

The assumption is that the box walls send light back, or that the box has mirror walls. So we assume that all the radiation keeps running around in the box. Now that implies that the atomic oscillators not only radiate energy, but also receive energy, because they’re constantly being illuminated by radiation that comes straight back at them. If the temperature of the box is kept constant, we arrive at a situation which is referred to as thermal equilibrium. In Feynman’s words: “After a while there is a great deal of light rushing around in the box, and although the oscillator is radiating some, the light comes back and returns some of the energy that was radiated.”

OK. That’s easy enough to understand. However, the actual analysis of this equilibrium situation is what gave rise to the ‘problem’ of blackbody radiation in the 19th century which, as you know, led Planck and Einstein to develop a quantum-mechanical view of things. It turned out that the classical analysis predicted a distribution of the intensity of light that didn’t make sense, and no matter how you looked at it, it just didn’t come out right. Theory and experiment did not agree. Now, that is something very serious in science, as you know, because it means your theory isn’t right. In this case, it was disastrous, because it meant the whole of classical theory wasn’t right.

To be frank, the analysis is not all that easy. It involves all that I’ve learned so far: the math behind oscillators and interference, statistics, the so-called kinetic theory of gases and what have you. I’ll try to summarize the story but you’ll see it requires quite an introduction.

Kinetic energy and temperature

The kinetic theory of gases is part of what’s referred to as statistical mechanics: we look at a gas as a large number of inter-colliding atoms and we describe what happens in terms of the collisions between them. As Feynman puts it: “Fundamentally, we assert that the gross properties of matter should be explainable in terms of the motion of its parts.” Now, we can do a lot of intellectual gymnastics, analyzing one gas in one box, two gases in one box, two gases in one box with a piston between them, two gases in two boxes with a hole in the wall between them, and so on and so on, but that would only distract us here. The rather remarkable conclusion of such exercises, which you’ll surely remember from your high school days, is that:

  1. Equal volumes of different gases, at the same pressure and temperature, will have the same number of molecules.
  2. In such view of things, temperature is actually nothing but the mean kinetic energy of those molecules (or atoms if it’s a monatomic gas).

So we can actually measure temperature in terms of the kinetic energy of the molecules of the gas, which, as you know, equals mv2/2, with m the mass and v the velocity of the gas molecules. Hence, we’re tempted to define some absolute measure of temperature T and simply write:

T = 〈mv2/2〉

The 〈 and 〉 brackets denote the mean here. To be precise, we’re talking the root mean square here, aka as the quadratic mean, because we want to average some magnitude of a varying quantity. Of course, the mass of different gases will be different – and so we have 〈m1v12/2〉 for gas 1 and 〈m2v22/2〉 for gas 2 – but that doesn’t matter: we can, actually, imagine measuring temperature in joule, the unit of energy, including kinetic energy. Indeed, the units come out alright: 1 joule = 1 kg·(m2/s2). For historical reasons, however, T is measured in different units: degrees Kelvin, centigrades (i.e. degrees Celsius) or, in the US, in Fahrenheit. Now, we can easily go from one measure to the other as you know and, hence, here I should probably just jot down the so-called ideal gas law–because we need that law for the subsequent analysis of blackbody radiation–and get on with it:

PV = NkT

However, now that we’re here, let me give you an inkling of how we derive that law. A classical (Newtonian) analysis of the collisions (you can find the detail in Feynman’s Lectures, I-39-2) will yield the following equation: P = (2/3)n〈mv2/2〉, with n the number of atoms or molecules per unit volume. So the pressure of a gas (which, as you know, is the force (of a gas on a piston, for example) per unit area: P = F/A) is also equal to the mean kinetic energy of the gas molecules multiplied by (2/3)n. If we multiply that equation by V, we get PV = N(2/3)〈mv2/2〉. However, we know that equal volumes of volumes of different gases, at the same pressure and temperature, will have the same number of molecules, so we have PV = N(2/3)〈m1v12/2〉 = N(2/3)〈m2v22/2〉, which we write as PV = NkT with kT = (2/3)〈m1v12/2〉 = (2/3)〈m2v22/2〉.

In other words, that factor of proportionality k is the one we have to use to convert the temperature as measured by 〈mv2/2〉 (i.e. the mean kinetic energy expressed in joules) to T (i.e. the temperature expressed in the measure we’re used to, and that’s degrees Kelvin–or Celsius or Fahrenheit, but let’s stick to Kelvin, because that’s what’s used in physics). Vice versa, we have 〈mv2/2〉 = (3/2)kT. Now, that constant of proportionality k is equal to k 1.38×10–23 joule per Kelvin (J/K). So if T is (absolute) temperature, expressed in Kelvin (K), our definition says that the mean molecular kinetic energy is (3/2)kT.

That k factor is a physical constant referred to as the Boltzmann constant. If it’s one of these constants, you may wonder why we don’t integrate that 3/2 factor in it? Well… That’s just how it is, I guess. In any case, it’s rather convenient because we’ll have 2/3 factors in other equations and so these will cancel out with that 3/2 term. However, I am digressing way too much here. I should get back to the main story line. However, before I do that, I need to expand on one more thing, and that’s a small lecture on how things look like when we also allow for internal motion, i.e. the rotational and vibratory motions of the atoms within the gas molecule. Let me first re-write that PV = NkT equation as

PV = NkT = N(2/3)〈m1v12/2〉 = (2/3)U = 2U/3

For monatomic gas, that U would only be the kinetic energy of the atoms, and so we can write it as U = (2/3)NkT. Hence, we have the grand result that the kinetic energy, for each atom, is equal to (3/2)kT, on average that is.

What about non-monatomic gas? Well… For complex molecules, we’d also have energy going into the rotational and vibratory motion of the atoms within the molecule, separate from what is usually referred to as the center-of-mass (CM) motion of the molecules themselves. Now, I’ll again refer you to Feynman for the detail of the analysis, but it turns out that, if we’d have, for example, a diatomic molecule, consisting of an A and B atom, the internal rotational and vibratory motion would, indeed, also absorb energy, and we’d have a total energy equal to (3/2)kT + (3/2)kT = 2×(3/2)kT = 3kT. Now, that amount (3kT) can be split over (i) the energy related to the CM motion, which must still be equal to (3/2)kT, and (ii) the average kinetic energy of the internal motions of the diatomic molecule excluding the bodily motion of the CM. Hence, the latter part must be equal to 3kT – (3/2)kT = (3/2)kT. So, for the diatomic molecule, the total energy happens to consist of two equal parts.

Now, there is a more general theorem here, for which I have to introduce the notion of the degrees of freedom of a system. Each atom can rotate or vibrate or oscillate or whatever in three independent directions–namely the three spatial coordinates x, y and z. These spatial dimensions are referred to as the degrees of freedom of the atom (in the kinetic theory of gases, that is), and if we have two atoms, we have 2×3 = 6 degrees of freedom. More in general, the number of degrees of freedom of a molecule composed of r atoms is equal to 3rNow, it can be shown that the total energy of an r-atom molecule, including all internal energy as well as the CM motion, will be 3r×kT/2 = 3rkT/2 joules. Hence, for every independent direction of motion that there is, the average kinetic energy for that direction will be kT/2. [Note that ‘independent direction of motion’ is used, somewhat confusingly, as a synonym for degree of freedom, so we don’t have three but six ‘independent directions of motion’ for the diatomic molecule. I just wanted to note that because I do think it causes confusion when reading a textbook like Feynman’s.] Now, that total amount of energy, i.e.  3r(kT/2), will be split as follows according to the “theorem concerning the average energy of the CM motion”, as Feynman terms it:

  1. The kinetic energy for the CM motion of each molecule is, and will always be, (3/2)kT.
  2. The remainder, i.e. r(3/2)kT – (3/2)kT = (3/2)(r–1)kt, is internal vibrational and rotational kinetic energy, i.e. the sum of all vibratory and rotational kinetic energy but excluding the energy of the CM motion of the molecule.

Phew! That’s quite something. And we’re not quite there yet.

The analysis for photon gas

Photon gas? What’s that? Well… Imagine our box is the gas in a very hot star, hotter than the sun. As Feynman writes it: “The sun is not hot enough; there are still too many atoms, but at still higher temperatures in certain very hot stars, we may neglect the atoms and suppose that the only objects that we have in the box are photons.” Well… Let’s just go along with it. We know that photons have no mass but they do have some very tiny momentum, which we related to the magnetic field vector, as opposed to the electric field. It’s tiny indeed. Most of the energy of light goes into the electric field. However, we noted that we can write p as p = E/c, with c the speed of light (3×108). Now, we had that = (2/3)n〈mv2/2〉 formula for gas, and we know that the momentum p is defined as p = mv. So we can substitute mvby (mv)v = pv. So we get = (2/3)n〈pv/2〉 = (1/3)n〈pv〉.

Now, the energy of photons is not quite the same as the kinetic energy of an atom or an molecule, i.e. mv2/2. In fact, we know that, for photons, the speed v is equal to c, and pc = E. Hence, multiplying by the volume V, we get

PV = U/3

So that’s a formula that’s very similar to the one we had for gas, for which we wrote: PV = NkT = 2U/3. The only thing is that we don’t have a factor 2 in the equation but so that’s because of the different energy concepts involved. Indeed, the concept of the energy of a photon (E = pc) is different than the concept of kinetic energy. But so the result is very nice: we have a similar formula for the compressibility of gas and radiation. In fact, both PV = 2U/3 and PV = U/3 will usually be written, more generally, as:

PV = (γ – 1)U 

Hence, this γ would be γ = 5/3 ≈ 1.667 for gas and 4/3 ≈ 1.333 for photon gas. Now, I’ll skip the detail (it involves a differential analysis) but it can be shown that this general formula, PV = (γ – 1)U, implies that PVγ (i.e. the pressure times the volume raised to the power γ) must equal some constant, so we write:

PVγ = C

So far so good. Back to our problem: blackbody radiation. What you should take away from this introduction is the following:

  1. Temperature is a measure of the average kinetic energy of the atoms or molecules in a gas. More specifically, it’s related to the mean kinetic energy of the CM motion of the atoms or molecules, which is equal to (3/2)kT, with k the Boltzmann constant and T the temperature expressed in Kelvin (i.e. the absolute temperature).
  2. If gas atoms or molecules have additional ‘degrees of freedom’, aka ‘independent directions of motion’, then each of these will absorb additional energy, namely kT/2.

Energy and radiation

The atoms in the box are atomic oscillators, and we’ve analyzed them before. What the analysis above added was that average kinetic energy of the atoms going around is (3/2)kT and that, if we’re talking molecules consisting of r atoms, we have a formula for their internal kinetic energy as well. However, as an oscillator, they also have energy separate from that kinetic energy we’ve been talking about alrady. How much? That’s a tricky analysis. Let me first remind you of the following:

  1. Oscillators have a natural frequency, usually denoted by the (angular) frequency ω0.
  2. The sum of the potential and kinetic energy stored in an oscillator is a constant, unless there’s some damping constant. In that case, the oscillation dies out. Here, you’ll remember the concept of the Q of an oscillator. If there’s some damping constant, the oscillation will die out and the relevant formula is 1/Q = (dW/dt)/(ω0W) = γ0, with γ the damping constant (not to be confused with the γ we used in that PVγ = C formula).

Now, for gases, we said that, for every independent direction of motion there is, the average kinetic energy for that direction will be kT/2. I admit it’s a bit of a stretch of the imagination but so that’s how the blackbody radiation analysis starts really: our atomic oscillators will have an average kinetic energy equal to kT/2 and, hence, their total energy (kinetic and potential) should be twice that amount, according to the second remark I made above. So that’s kT. We’ll note the total energy as W below, so we can write:

W = kT

Just to make sure we know what we’re talking about (one would forget, wouldn’t one?), kT is the product of the Boltzmann constant (1.38×10–23 J/K) and the temperature of the gas (so note that the product is expressed in joule indeed). Hence, that product is the average energy of our atomic oscillators in the gas in our furnace.

Now, I am not going to repeat all of the detail we presented on atomic oscillators (I’ll refer you, once again, to Feynman) but you may or may not remember that atomic oscillators do have a Q indeed and, hence, some damping constant γ. So we can use and re-write that formula above as

dW/dt = (1/Q)(ω0W) = (ω0W)(γ/ω0) = γW, which implies γ = (dW/dt)/W

What’s γ? Well, we’ve calculated the Q of an atomic oscillator already: Q = 3λ/4πr0. Now, λ = 2πc/ω(we just convert the wavelength into (angular) frequency using λν = c) and γ = ω0/Q, so we get γ = 4πr0ω0/[3(2πc/ω0)] = (2/3)r0ω02/c. Now, plugging that result back into the equation above, we get

dW/dt = γW = (2/3)(r0ω02kT)/c

Just in case you’d have difficulty following – I admit I did 🙂 – dW/dt is the average rate of radiation of light of (or near) frequency ω02. I’ll let Feynman take over here:

Next we ask how much light must be shining on the oscillator. It must be enough that the energy absorbed from the light (and thereupon scattered) is just exactly this much. In other words, the emitted light is accounted for as scattered light from the light that is shining on the oscillator in the cavity. So we must now calculate how much light is scattered from the oscillator if there is a certain amount—unknown—of radiation incident on it. Let I(ω)dω be the amount of light energy there is at the frequency ω, within a certain range dω (because there is no light at exactly a certain frequency; it is spread all over the spectrum). So I(ω) is a certain spectral distribution which we are now going to find—it is the color of a furnace at temperature T that we see when we open the door and look in the hole. Now how much light is absorbed? We worked out the amount of radiation absorbed from a given incident light beam, and we calculated it in terms of a cross section. It is just as though we said that all of the light that falls on a certain cross section is absorbed. So the total amount that is re-radiated (scattered) is the incident intensity I(ω)dω multiplied by the cross section σ.

OK. That makes sense. I’ll not copy the rest of his story though, because this is a post in a blog, not a textbook. What we need to find is that I(ω). So I’ll refer you to Feynman for the details (these ‘details’ involve fairly complicated calculations, which are less important than the basic assumptions behind the model, which I presented above) and just write down the result:

blackbody radiation formula

This formula is Rayleigh’s law. [And, yes, it’s the same Rayleigh – Lord Rayleigh, I should say respectfully – as the one who invented that criterion I introduced in my previous post, but so this law and that criterion have nothing to do with each other.] This ‘law’ gives the intensity, or the distribution, of light in a furnace. Feynman says it’s referred to as blackbody radiation because “the hole in the furnace that we look at is black when the temperature is zero.” […] OK. Whatever. What we call it doesn’t matter. The point is that this function tells us that the intensity goes as the square of the frequency, which means that if we have a box at any temperature at all, and if we look at it, the X- and gamma rays will be burning out eyes out ! The graph below shows both the theoretical curve for two temperatures (Tand 2T0), as derived above (see the solid lines), and then the actual curves for those two temperatures (see the dotted lines).

Blackbody radation graph

This is the so-called UV catastrophe: according to classical physics, an ideal black body at thermal equilibrium should emit radiation with infinite power. In reality, of course, it doesn’t: Rayleigh’s law is false. Utterly false. And so that’s where Planck came to the rescue, and he did so by assuming radiation is being emitted and/or absorbed in finite quanta: multiples of h, in fact.

Indeed, Planck studied the actual curve and fitted it with another function. That function assumed the average energy of a harmonic oscillator was not just proportional with the temperature (T), but that it was also a function of the (natural) frequency of the oscillators. By fiddling around, he found a simple derivation for it which involved a very peculiar assumption. That assumption was that the harmonic oscillator can take up energies only ħω at the time, as shown below.

Equally space energy levels

Hence, the assumption is that the harmonic oscillators cannot take whatever (continous) energy level. No. The allowable energy levels of the harmonic oscillators are equally spaced: E= nħω. Now, the actual derivation is at least as complex as the derivation of Rayleigh’s law, so I won’t do it here. Let me just give you the key assumptions:

  1. The gas consists of a large number of atomic oscillators, each with their own natural frequency ω0.
  2. The permitted energy levels of these harmonic oscillator are equally spaced and ħωapart.
  3. The probability of occupying a level of energy E is P(Eαe–E/kT.

All the rest is tedious calculation, including the calculation of the parameters of the model, which include ħ (and, hence, h, because h = 2πħ) and are found by matching the theoretical curves to the actual curves as measured in experiments. I’ll just mention one result, and that’s the average energy of these oscillators:

energy

As you can see, the average energy does not only depend on the temperature T, but also on their (natural) frequency. So… Now you know where h comes from. As I relied so heavily on Feynman’s presentation here, I’ll include the link. As Feynman puts it: “This, then, was the first quantum-mechanical formula ever known, or ever discussed, and it was the beautiful culmination of decades of puzzlement. Maxwell knew that there was something wrong, and the problem was, what was right? Here is the quantitative answer of what is right instead of kT.”

So there you go. Now you know. 🙂 Oh… And in case you’d wonder: why the h? Well… Not sure. It’s said the h stands for Hilfsgrösse, so that’s some constant which was just supposed to help him out with the calculation. At that time, Planck did not suspect it would turn out to be one of the most fundamental physical constants. 🙂

Post scriptum: I went quite far in my presentation of the basics of the kinetic theory of gases. You may wonder now. I didn’t use that theoretical PVγ = C relation, did I? And why all the fuss about photon gas? Well… That was just to introduce that PVγ = C relation, so I could note, here, in this post scriptum, that it has a similar problem. The γ exponent is referred to as the specific heat ratio of a gas, and it can be calculated theoretically as well, as we did–well… Sort of, because we skipped the actual derivation. However, their theoretical value also differs substantially from actually measured values, and the problem is the same: one should not assume that a continuous value for 〈E〉. Agreement between theory and experiment can only be reached when the same assumptions as those of Planck are used: discrete energy levels, multiples of ħ and ω: E= nħω. Also, the specific functional form which Planck used to resolve the blackbody radiation problem is also to be used here. For more details, I’ll refer to Feynman too. I can’t say this is easy to digest, but then who said it would be easy? 🙂

The point to note is that the blackbody radiation problem wasn’t the only problem in the 19th century. As Feynman puts it: “One often hears it said that physicists at the latter part of the nineteenth century thought they knew all the significant physical laws and that all they had to do was to calculate more decimal places. Someone may have said that once, and others copied it. But a thorough reading of the literature of the time shows they were all worrying about something.” They were, and so Planck came up with something new. And then Einstein took it to the next level and then… Well… The rest is history. 🙂

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 20, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Diffraction and the Uncertainty Principle (II)

Pre-script (dated 26 June 2020): This post did not suffer too much from the attack on this blog by the the dark force. It remains relevant. 🙂

Original post:

In my previous post, I derived and explained the general formula for the pattern generated by a light beam going through a slit or a circular aperture: the diffraction pattern. For light going through an aperture, this generates the so-called Airy pattern. In practice, diffraction causes a blurring of the image, and may make it difficult to distinguish two separate points, as shown below (credit for the image must go to Wikipedia again, I am afraid).

Airy_disk_spacing_near_Rayleigh_criterion

What’s actually going on is that the lens acts as a slit or, if it’s circular (which is usually the case), as an aperture indeed: the wavefront of the transmitted light is taken to be spherical or plane when it exits the lens and interferes with itself, thereby creating the ring-shaped diffraction pattern that we explained in the previous post.

The spatial resolution is also known as the angular resolution, which is quite appropriate, because it refers to an angle indeed: we know the first minimum (i.e. the first black ring) occurs at an angle θ such that sinθ = λ/L, with λ the wavelength of the light and L the lens diameter. It’s good to remind ourselves of the geometry of the situation: below we picture the array of oscillators, and so we know that the first minimum occurs at an angle such that Δ = λ. The second, third, fourth etc minimum occurs at an angle θ such that Δ = 2λ, 3λ, 4λ, etc. However, these secondary minima do not play any role in determining the resolving power of a lens, or a telescope, or an electron microscope, etc, and so you can just forget about them for the time being.

geometry

For small angles (expressed in radians), we can use the so-called small-angle approximation and equate sinθ with θ: the error of this approximation is less than one percent for angles smaller than 0.244 radians (14°), so we have the amazingly simply result that the first minimum occurs at an angle θ such that:

θ = λ/L

Spatial resolution of a microscope: the Rayleigh criterion versus Dawes’ limit 

If we have two point sources right next to each other, they will create two Airy disks, as shown above, which may overlap. That may make it difficult to see them, in a telescope, a microscope, or whatever device. Hence, telescopes, microscopes (using light or electron beams or whatever) have a limited resolving power. How do we measure that?

The so-called Rayleigh criterion regards two point sources as just resolved when the principal diffraction maximum of one image coincides with the first minimum of the other, as shown below. If the distance is greater, the two points are (very) well resolved, and if it is smaller, they are regarded as not resolved. This angle is obviously related to the θ = λ/L angle but it’s not the same: in fact, it’s a slightly wider angle. The analysis involved in calculating the angular resolution in terms of angle, and we use the same symbol θ for it, is quite complicated and so I’ll skip that and just give you the result:

θ = 1.22λ/L

two point sourcesRayleigh criterion

Note that, in this equation, θ stands for the angular resolution, λ for the wavelength of the light being used, and L is the diameter of the (aperture of) the lens. In the first of the three images above, the two points are well separated and, hence, the angle between them is well above the angular resolution. In the second, the angle between just meets the Rayleigh criterion, and in the third the angle between them is smaller than the angular resolution and, hence, the two points are not resolved.

Of course, the Rayleigh criterion is, to some extent, a matter of judgment. In fact, an English 19th century astronomer, named William Rutter Dawes, actually tested human observers on close binary stars of equal brightness, and found they could make out the two stars within an angle that was slightly narrower than the one given by the Rayleigh criterion. Hence, for an optical telescope, you’ll also find the simple θ = λ/L formula, so that’s the formula without the 1.22 factor (of course, λ here is, once again, the wavelength of the observed light or radiation, and L is the diameter of the telescope’s primary lens). This very simple formula allows us, for example, to calculate the diameter of the telescope lens we’d need to build to separate (see) objects in space with a resolution of, for example, 1 arcsec (i.e. 1/3600 of a degree or π/648,000 of a radian). Indeed, if we filter for yellow light only, which has a wavelength of 580 nm, we find L = 580×10−9 m/(π/648,000) = 0.119633×10−6 m ≈ 12 cm. [Just so you know: that’s about the size of the lens aperture of a good telescope (4 or 6 inches) for amateur astronomers–just in case you’d want one. :-)]

This simplified formula is called Dawes’ limit, and you’ll often see it used instead of Rayleigh’s criterion. However, the fact that it’s exactly the same formula as our formula for the first minimum of the Airy pattern should not confuse you: angular resolution is something different.

Now, after this introduction, let me get to the real topic of this post: Heisenberg’s Uncertainty Principle according to Heisenberg.

Heisenberg’s Uncertainty Principle according to Heisenberg

I don’t know about you but, as a kid, I didn’t know much about waves and fields and all that, and so I had difficulty understanding why the resolving power of a microscope or any other magnifying device depended on the frequency or wavelength. I now know my understanding was limited because I thought the concept of the amplitude of an electromagnetic wave had some spatial meaning, like the amplitude of a water or a sound wave. You know what I mean: this false idea that an electromagnetic wave is something that sort of wriggles through space, just like a water or sound wave wriggle through their medium (water and air respectively). Now I know better: the amplitude of an electromagnetic wave measures field strength and there’s no medium (no aether). So it’s not like a wave going around some object, or making some medium oscillate. I am not ashamed to acknowledge my stupidity at the time: I am just happy I finally got it, because it helps to really understand Heisenberg’s own illustration of his Uncertainty Principle, which I’ll present now.

Heisenberg imagined a gamma-ray microscope, as shown below (I copied this from the website of the American Institute for Physics ). Gamma-ray microscopes don’t exist – they’re hard to produce: you need a nuclear reactor or so 🙂 – but, as Heisenberg saw the development of new microscopes using higher and higher energy beams (as opposed to the 1.5-3 eV light in the visible spectrum) so as to increase the angular resolution and, hence, be able to see smaller things, he imagined one could use, perhaps, gamma-rays for imaging. Gamma rays are the hardest radiation, with frequencies of 10 exaherz and more (or >1019 Hz) and, hence, energies above 100 keV (i.e. 100,000 more than photons in the visible light spectrum, and 1000 times more than the electrons used in an average electron microscope). Gamma rays are not the result of some electron jumping from a higher to a lower energy level: they are emitted in decay processes of atomic nuclei (gamma decay). But I am digressing. Back to the main story line. So Heisenberg imagined we could ‘shine’ gamma rays on an electron and that we could then ‘see’ that electron in the microscope because some of the gamma photons would indeed end up in the microscope after their ‘collision’ with the electron, as shown below.

gammaray

The experiment is described in many places elsewhere but I found these accounts often confusing, and so I present my own here. 🙂

What Heisenberg basically meant to show is that this set-up would allow us to gather precise information on the position of the electron–because we would know where it was–but that, as a result, we’d lose information in regard to its momentum. Why? To put it simply: because the electron recoils as a result of the interaction. The point, of course, is to calculate the exact relationship between the two (position and momentum). In other words: what we want to do is to state the Uncertainty Principle quantitatively, not qualitatively.

Now, the animation above uses the symbol L for the γ-ray wavelength λ, which is confusing because I used L for the diameter of the aperture in my explanation of diffraction above. The animation above also uses a different symbol for the angular resolution: A instead of θ. So let me borrow the diagram used in the Wikipedia article and rephrase the whole situation.

Heisenberg_Microscope

From the diagram above, it’s obvious that, to be scattered into the microscope, the γ-ray photon must be scattered into a cone with angle ε. That angle is obviously related to the angular resolution of the microscope, which is θ = ε/2 = λ/D, with D the diameter of the aperture (i.e. the primary lens). Now, the electron could actually be anywhere, and the scattering angle could be much larger than ε, and, hence, relating D to the uncertainty in position (Δx) is not as obvious as most accounts of this thought experiment make it out to be. The thing is: if the scattering angle is larger than ε, it won’t reach the light detector at the end of the microscope (so that’s the flat top in the diagram above). So that’s why we can equate D with Δx, so we write Δx = ± D/2 = D. To put it differently: the assumption here is basically that this imaginary microscope ‘sees’ an area that is approximately as large as the lens. Using the small-angle approximation (so we write sin(2ε) ≈ 2ε), we can write:

Δx = 2λ/ε

Now, because of the recoil effect, the electron receives some momentum from the γ-ray photon. How much? Well… The situation is somewhat complicated (much more complicated than the Wikipedia article on this very same topic suggests), because the photon keeps some but also gives some of its original momentum. In fact, what’s happening really is Compton scattering: the electron first absorbs the photon, and then emits another with a different energy and, hence, also with different frequency and wavelength. However, what we do now is that the photon’s original momentum was equal to E/c= p = h/λ. That’s just the Planck relation or, if you’d want to look at the photon as a particle, the de Broglie equation.

Now, because we’re doing an analysis in one dimension only (x), we’re only going to look at the momentum in this direction only, i.e. px, and we’ll assume that all of the momentum of the photon before the interaction (or ‘collision’ if you want) was horizontal. Hence, we can write p= h/λ. After the collision, however, this momentum is spread over the electron and the scattered or emitted photon that’s going into the microscope. Let’s now imagine the two extremes:

  1. The scattered photon goes to the left edge of the lens. Hence, its horizontal momentum is negative (because it moves to the left) and the momentum pwill be distributed over the electron and the photon such that p= p’–h(ε/2)/λ’. Why the ε/2 factor? Well… That’s just trigonometry: the horizontal momentum of the scattered photon is obviously only a tiny fraction of its original horizontal momentum, and that fraction is given by the angle ε/2.
  2. The scattered photon goes to the right edge of the lens. In that case, we write p= p”+ h(ε/2)/λ”.

Now, the spread in the momentum of the electron, which we’ll simply write as Δp, is obviously equal to:

Δp = p”– p’= p+ h(ε/2)/λ” – p+ h(ε/2)/λ’ = h(ε/2)/λ” + h(ε/2)/λ’ = h(ε/2)/λ” + h(ε/2)/λ’

That’s a nice formula, but what can we do with it? What we want is a relationship between Δx and Δp, i.e. the position and the momentum of the electron, and of the electron only. That involves another simplification, which is also dealt with very summarily – too summarily in my view – in most accounts of this experiment. So let me spell it out. The angle ε is obviously very small and, hence, we may equate λ’ and λ”. In addition, while these two wavelengths differ from the wavelength of the incoming photon, the scattered photon is, obviously, still a gamma ray and, therefore, we are probably not too far off when substituting both λ’ and λ” for λ, i.e. the frequency of the incoming γ-ray. Now, we can re-write Δx = 2λ/ε as 1/Δx = ε/(2λ). We then get:

Δp = p”– p’= hε/2λ” + hε/2λ’ = 2hε/2λ = 2h/Δx

Now that yields ΔpΔx = 2h, which is an approximate expression of Heisenberg’s Uncertainty Principle indeed (don’t worry about the factor 2, as that’s something that comes with all of the approximations).

A final moot point perhaps: it is obviously a thought experiment. Not only because we don’t have gamma-ray microscopes (that’s not relevant because we can effectively imagine constructing one) but because the experiment involves only one photon. A real microscope would organize a proper beam, but that would obviously complicate the analysis. In fact, it would defeat the purpose, because the whole point is to analyze one single interaction here.

The interpretation

Now how should we interpret all of this? Is this Heisenberg’s ‘proof’ of his own Principle? Yes and no, I’d say. It’s part illustration, and part ‘proof’, I would say. The crucial assumptions here are:

  1. We can analyze γ-ray photons, or any photon for that matter, as particles having some momentum, and when ‘colliding’, or interacting, with an electron, the photon will impart some momentum to that electron.
  2. Momentum is being conserved and, hence, the total (linear) momentum before and after the collision, considering both particles–i.e. (1) the incoming ray and the electron before the interaction and (2) the emitted photon and the electron that’s getting the kick after the interaction–must be the same.
  3. For the γ-ray photon, we can relate (or associate, if you prefer that term) its wavelength λ with its momentum p through the Planck relation or, what amounts to the same for photons (because they have no mass), the de Broglie relation.

Now, these assumptions are then applied to an analysis of what we know to be true from experiment, and that’s the phenomenon of diffraction, part of which is the observation that the resolving power of a microscope is limited, and that its resolution is given by the θ = λ/D equation.

Bringing it all together, then gives us a theory which is consistent with experiment and, hence, we then assume the theory is true. Why? Well… I could start a long discourse here on the philosophy of science but, when everything is said and done, we should admit we don’t any ‘better’ theory.

But, you’ll say: what’s a ‘better’ theory? Well… Again, the answer to that question is the subject-matter of philosophers. As for me, I’d just refer to what’s known as Occam’s razor: among competing hypotheses, we should select the one with the fewest assumptions. Hence, while more complicated solutions may ultimately prove correct, the fewer assumptions that are made, the better. Now, when I was a kid, I thought quantum mechanics was very complicated and, hence, describing it here as a ‘simple’ theory sounds strange. But that’s what it is in the end: there’s no better (read: simpler) way to describe, for example, why electrons interfere with each other, and with themselves, when sending them through one or two slits, and so that’s what all these ‘illustrations’ want to show in the end, even if you think there must be simpler way to describe reality. As said, as a kid, I thought so too. 🙂

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Diffraction and the Uncertainty Principle (I)

Pre-script (dated 26 June 2020): This post got mutilated by the removal of material by the dark force. It should be possible, however, to follow the main story line. If anything, the lack of illustrations will help you think things through for yourself. 🙂

Original post:

In his Lectures, Feynman advances the double-slit experiment with electrons as the textbook example explaining the “mystery” of quantum mechanics. It shows interference–a property of waves–of ‘particles’, electrons: they no longer behave as particles in this experiment. While it obviously illustrates “the basic peculiarities of quantum mechanics” very well, I think the dual behavior of light – as a wave and as a stream of photons – is at least as good as an illustration. And he could also have elaborated the phenomenon of electron diffraction.

Indeed, the phenomenon of diffraction–light, or an electron beam, interfering with itself as it goes through one slit only–is equally fascinating. Frankly, I think it does not get enough attention in textbooks, including Feynman’s, so that’s why I am devoting a rather long post to it here.

To be fair, Feynman does use the phenomenon of diffraction to illustrate the Uncertainty Principle, both in his Lectures as well as in that little marvel, QED: The Strange Theory of Light of Matter–a must-read for anyone who wants to understand the (probability) wave function concept without any reference to complex numbers or what have you. Let’s have a look at it: light going through a slit or circular aperture, illustrated in the left-hand image below, creates a diffraction pattern, which resembles the interference pattern created by an array of oscillators, as shown in the right-hand image.

Diffraction for particle wave Line of oscillators

Let’s start with the right-hand illustration, which illustrates interference, not diffraction. We have eight point sources of electromagnetic radiation here (e.g. radio waves, but it can also be higher-energy light) in an array of length L. λ is the wavelength of the radiation that is being emitted, and α is the so-called intrinsic relative phase–or, to put it simply, the phase difference. We assume α is zero here, so the array produces a maximum in the direction θout = 0, i.e. perpendicular to the array. There are also weaker side lobes. That’s because the distance between the array and the point where we are measuring the intensity of the emitted radiation does result in a phase difference, even if the oscillators themselves have no intrinsic phase difference.

Interference patterns can be complicated. In the set-up below, for example, we have an array of oscillators producing not just one but many maxima. In fact, the array consists of just two sources of radiation, separated by 10 wavelengths.

Interference two dipole radiatorsThe explanation is fairly simple. Once again, the waves emitted by the two point sources will be in phase in the east-west (E-W) direction, and so we get a strong intensity there: four times more, in fact, than what we would get if we’d just have one point source. Indeed, the waves are perfectly in sync and, hence, add up, and the factor four is explained by the fact that the intensity, or the energy of the wave, is proportional to the square of the amplitude: 2= 4. We get the first minimum at a small angle away (the angle from the normal is denoted by ϕ in the illustration), where the arrival times differ by 180°, and so there is destructive interference and the intensity is zero. To be precise, if we draw a line from each oscillator to a distant point and the difference Δ in the two distances is λ/2, half an oscillation, then they will be out of phase. So this first null occurs when that happens. If we move a bit further, to the point where the difference Δ is equal to λ, then the two waves will be a whole cycle out of phase, i.e. 360°, which is the same as being exactly in phase again! And so we get many maxima (and minima) indeed.

But this post should not turn into a lesson on how to construct a radio broadcasting array. The point to note is that diffraction is usually explained using this rather simple theory on interference of waves assuming that the slit itself is an array of point sources, as illustrated below (while the illustrations above were copied from Feynman’s Lectures, the ones below were taken from the Wikipedia article on diffraction). This is referred to as the Huygens-Fresnel Principle, and the math behind is summarized in Kirchhoff’s diffraction formula.

500px-Refraction_on_an_aperture_-_Huygens-Fresnel_principle Huygens_Fresnel_Principle 

Now, that all looks easy enough, but the illustration above triggers an obvious question: what about the spacing between those imaginary point sources? Why do we have six in the illustration above? The relation between the length of the array and the wavelength is obviously important: we get the interference pattern that we get with those two point sources above because the distance between them is 10λ. If that distance would be different, we would get a different interference pattern. But so how does it work exactly? If we’d keep the length of the array the same (L = 10λ) but we would add more point sources, would we get the same pattern?

The easy answer is yes, and Kirchhoff’s formula actually assumes we have an infinite number of point sources between those two slits: every point becomes the source of a spherical wave, and the sum of these secondary waves then yields the interference pattern. The animation below shows the diffraction pattern from a slit with a width equal to five times the wavelength of the incident wave. The diffraction pattern is the same as above: one strong central beam with weaker lobes on the sides.

5wavelength=slitwidthsprectrum

However, the truth is somewhat more complicated. The illustration below shows the interference pattern for an array of length L = 10λ–so that’s like the situation with two point sources above–but with four additional point sources to the two we had already. The intensity in the E–W direction is much higher, as we would expect. Adding six waves in phase yields a field strength that is six times as great and, hence, the intensity (which is proportional to the square of the field) is thirty-six times as great as compared to the intensity of one individual oscillator. Also, when we look at neighboring points, we find a minimum and then some more ‘bumps’, as before, but then, at an angle of 30°, we get a second beam with the same intensity as the central beam. Now, that’s something we do not see in the diffraction patterns above. So what’s going on here?

Six-dipole antenna

Before I answer that question, I’d like to compare with the quantum-mechanical explanation. It turns out that this question in regard to the relevance of the number of point sources also pops up in Feynman’s quantum-mechanical explanation of diffraction.

The quantum-mechanical explanation of diffraction

The illustration below (taken from Feynman’s QED, p. 55-56) presents the quantum-mechanical point of view. It is assumed that light consists of a photons, and these photons can follow any path. Each of these paths is associated with what Feynman simply refers to as an arrow, but so it’s a vector with a magnitude and a direction: in other words, it’s a complex number representing a probability amplitude.

Many arrows Few arrows

In order to get the probability for a photon to travel from the source (S) to a point (P or Q), we have to add up all the ‘arrows’ to arrive at a final ‘arrow’, and then we take its (absolute) square to get the probability. The text under each of the two illustrations above speaks for itself: when we have ‘enough’ arrows (i.e. when we allow for many neighboring paths, as in the illustration on the left), then the arrows for all of the paths from S to P will add up to one big arrow, because there is hardly any difference in time between them, while the arrows associated with the paths to Q will cancel out, because the difference in time between them is fairly large. Hence, the light will not go to Q but travel to P, i.e. in a straight line.

However, when the gap is nearly closed (so we have a slit or a small aperture), then we have only a few neighboring paths, and then the arrows to Q also add up, because there is hardly any difference in time between them too. As I am quoting from Feynman’s QED here, let me quote all of the relevant paragraph: “Of course, both final arrows are small, so there’s not much light either way through such a small hole, but the detector at Q will click almost as much as the one at P ! So when you try to squeeze light too much to make sure it’s going only in a straight line, it refuses to cooperate and begins to spread out. This is an example of the Uncertainty Principle: there is a kind of complementarity between knowledge of where the light goes between the blocks and where it goes afterwards. Precise knowledge of both is impossible.” (Feynman, QED, p. 55-56).

Feynman’s quantum-mechanical explanation is obviously more ‘true’ that the classical explanation, in the sense that it corresponds to what we know is true from all of the 20th century experiments confirming the quantum-mechanical view of reality: photons are weird ‘wavicles’ and, hence, we should indeed analyze diffraction in terms of probability amplitudes, rather than in terms of interference between waves. That being said, Feynman’s presentation is obviously somewhat more difficult to understand and, hence, the classical explanation remains appealing. In addition, Feynman’s explanation triggers a similar question as the one I had on the number of point sources. Not enough arrows !? What do we mean with that? Why can’t we have more of them? What determines their number?

Let’s first look at their direction. Where does that come from? Feynman is a wonderful teacher here. He uses an imaginary stopwatch to determine their direction: the stopwatch starts timing at the source and stops at the destination. But all depends on the speed of the stopwatch hand of course. So how fast does it turn? Feynman is a bit vague about that but notes that “the stopwatch hand turns around faster when it times a blue photon than when it does when timing a red photon.” In other words, the speed of the stopwatch hand depends on the frequency of the light: blue light has a higher frequency (645 THz) and, hence, a shorter wavelength (465 nm) then red light, for which f = 455 THz and λ = 660 nm. Feynman uses this to explain the typical patterns of red, blue, and violet (separated by borders of black), when one shines red and blue light on a film of oil or, more generally,the phenomenon of iridescence in general, as shown below.

Iridescence

As for the size of the arrows, their length is obviously subject to a normalization condition, because all probabilities have to add up to 1. But what about their number? We didn’t answer that question–yet.

The answer, of course, is that the number of arrows and their size are obviously related: we associate a probability amplitude with every way an event can happen, and the (absolute) square of all these probability amplitudes has to add up to 1. Therefore, if we would identify more paths, we would have more arrows, but they would have to be smaller. The end result would be the same though: when the slit is ‘small enough’, the arrows representing the paths to Q would not cancel each other out and, hence, we’d have diffraction.

You’ll say: Hmm… OK. I sort of see the idea, but how do you explain that pattern–the central beam and the smaller side lobes, and perhaps a second beam as well? Well… You’re right to be skeptical. In order to explain the exact pattern, we need to analyze the wave functions, and that requires a mathematical approach rather than the type of intuitive approach which Feynman uses in his little QED booklet. Before we get started on that, however, let me give another example of such intuitive approach.

Diffraction and the Uncertainty Principle

Let’s look at that very first illustration again, which I’ve copied, for your convenience, again below. Feynman uses it (III-2-2) to (a) explain the diffraction pattern which we observe when we send electrons through a slit and (b) to illustrate the Uncertainty Principle. What’s the story?

Well… Before the electrons hit the wall or enter the slit, we have more or less complete information about their momentum, but nothing on their position: we don’t know where they are exactly, and we also don’t know if they are going to hit the wall or go through the slit. So they can be anywhere. However, we do know their energy and momentum. That momentum is horizontal only, as the electron beam is normal to the wall and the slit. Hence, their vertical momentum is zero–before they hit the wall or enter the slit that is. We’ll denote their (horizontal) momentum, i.e. the momentum before they enter the slit, as p0.

Diffraction for particle wave

Now, if an electron happens to go through the slit, and we know because we detected it on the other side, then we know its vertical position (y) at the slit itself with considerable accuracy: that position will be the center of the slit ±B/2. So the uncertainty in position (Δy) is of the order B, so we can write: Δy = B. However, according to the Uncertainty Principle, we cannot have precise knowledge about its position and its momentum. In addition, from the diffraction pattern itself, we know that the electron acquires some vertical momentum. Indeed, some electrons just go straight, but more stray a bit away from the normal. From the interference pattern, we know that the vast majority stays within an angle Δθ, as shown in the plot. Hence, plain trigonometry allows us to write the spread in the vertical momentum py as p0Δθ, with pthe horizontal momentum. So we have Δpy = p0Δθ.

Now, what is Δθ? Well… Feynman refers to the classical analysis of the phenomenon of diffraction (which I’ll reproduce in the next section) and notes, correctly, that the first minimum occurs at an angle such that the waves from one edge of the slit have to travel one wavelength farther than the waves from the other side. The geometric analysis (which, as noted, I’ll reproduce in the next section) shows that that angle is equal to the wavelength divided by the width of the slit, so we have Δθ = λ/B. So now we can write:

Δpy = p0Δθ = p0λ/B

That shows that the uncertainty in regard to the vertical momentum is, indeed, inversely proportional to the uncertainty in regard to its position (Δy), which is the slit width B. But we can go one step further. The de Broglie relation relates wavelength to momentum: λ = h/p. What momentum? Well… Feynman is a bit vague on that: he equates it with the electron’s horizontal momentum, so he writes λ = h/p0. Is this correct? Well… Yes and no. The de Broglie relation associates a wavelength with the total momentum, but then it’s obvious that most of the momentum is still horizontal, so let’s go along with this. What about the wavelength? What wavelength are we talking about here? It’s obviously the wavelength of the complex-valued wave function–the ‘probability wave’ so to say.

OK. So, what’s next? Well… Now we can write that Δpy = p0Δθ = p0λ/B = p0(h/p0)/B. Of course, the pfactor vanishes and, hence, bringing B to the other side and substituting for Δy = B yields the following grand result:

Δy·Δp= h

Wow ! Did Feynman ‘prove’ Heisenberg’s Uncertainty Principle here?

Well… No. Not really. First, the ‘proof’ above actually assumes there’s fundamental uncertainty as to the position and momentum of a particle (so it actually assumes some uncertainty principle from the start), and then it derives it from another fundamental assumption, i.e. the de Broglie relation, which is obviously related to the Uncertainty Principle. Hence, all of the above is only an illustration of the Uncertainty Principle. It’s no proof. As far as I know, one can’t really ‘prove’ the Uncertainty Principle: it’s a fundamental assumption which, if accepted, makes our observations consistent with the theory that is based on it, i.e. quantum or wave mechanics.

Finally, note that everything that I wrote above also takes the diffraction pattern as a given and, hence, while all of the above indeed illustrates the Uncertainty Principle, it’s not an explanation of the phenomenon of diffraction as such. For such explanation, we need a rigorous mathematical analysis, and that’s a classical analysis. Let’s go for it!

Going from six to n oscillators

The mathematics involved in analyzing diffraction and/or interference are actually quite tricky. If you’re alert, then you should have noticed that I used two illustrations that both have six oscillators but that the interference pattern doesn’t match. I’ve reproduced them below. The illustration on the right-hand side has six oscillators and shows a second beam besides the central one–and, of course, there’s such beam also 30° higher, so we have (at least) three beams with the same intensity here–while the animation on the left-hand side shows only one central beam. So what’s going on here?

Six-dipole antenna Huygens_Fresnel_Principle

The answer is that, in the particular example on the left-hand side, the successive dipole radiators (i.e. the point sources) are separated by a distance of two wavelengths (2λ). In that case, it is actually possible to find an angle where the distance δ between successive dipoles is exactly one wavelength (note the little δ in the illustration, as measured from the second point source), so that the effects from all of them are in phase again. So each one is then delayed relative to the next one by 360 degrees, and so they all come back in phase, and then we have another strong beam in that direction! In this case, the other strong beam has an angle of 30 degrees as compared to the E-W line. If we would put in some more oscillators, to ensure that they are all closer than one wavelength apart, then this cannot happen. And so it’s not happening with light. 🙂 But now that we’re here, I’ll just quickly note that it’s an interesting and useful phenomenon used in diffraction gratings, but I’ll refer you to the literature on that, as I shouldn’t be bothering you with all these digressions. So let’s get back at it.

In fact, let me skip the nitty-gritty of the detailed analysis (I’ll refer you to Feynman’s Lectures for that) and just present the grand result for n oscillators, as depicted below:

n oscillatorsThis, indeed, shows the diffraction pattern we are familiar with: one strong maximum separated from successive smaller ones (note that the dotted curve magnifies the actual curve with a factor 10). The vertical axis shows the intensity, but expressed as a fraction of the maximum intensity, which is n2I(Iis the intensity we would observe if there was only 1 oscillator). As for the horizontal axis, the variable there is ϕ really, although we re-scale the variable in order to get 1, 2, 2 etcetera for the first, second, etcetera minimum. This ϕ is the phase difference. It consists of two parts:

  1. The intrinsic relative phase α, i.e. the difference in phase between one oscillator and the next: this is assumed to be zero in all of the examples of diffraction patterns above but so the mathematical analysis here is somewhat more general.
  2. The phase difference which results from the fact that we are observing the array in a given direction θ from the normal. Now that‘s the interesting bit, and it’s not so difficult to show that this additional phase is equal to 2πdsinθ/λ, with d the distance between two oscillators, λ the wavelength of the radiation, and θ the angle from the normal.

In short, we write:

ϕ α 2πdsinθ/λ

Now, because I’ll have to use the variables below in the analysis that follows, I’ll quickly also reproduce the geometry of the set-up (all illustrations here taken from Feynman’s Lectures): 

geometry

Before I proceed, please note that we assume that d is less than λ, so we only have one great maximum, and that’s the so-called zero-order beam centered at θ 0. In order to get subsidiary great maxima (referred to as first-order, second-order, etcetera beams in the context of diffraction gratings), we must have the spacing d of the array greater than one wavelength, but so that’s not relevant for what we’re doing here, and that is to move from a discrete analysis to a continuous one.

Before we do that, let’s look at that curve again and analyze where the first minimum occurs. If we assume that α = 0 (no intrinsic relative phase), then the first minimum occurs when ϕ 2π/n. Using the ϕ α 2πdsinθ/λ formula, we get 2πdsinθ/λ 2π/n or ndsinθ λ. What does that mean? Well, nd is the total length L of the array, so we have ndsinθ Lsinθ Δ = λWhat that means is that we get the first minimum when Δ is equal to one wavelength.

Now why do we get a minimum when Δ λ? Because the contributions of the various oscillators are then uniformly distributed in phase from 0° to 360°. What we’re doing, once again, is adding arrows in order to get a resultant arrow AR, as shown below for n = 6. At the first minimum, the arrows are going around a whole circle: we are adding equal vectors in all directions, and such a sum is zero. So when we have an angle θ such that Δ λ, we get the first minimum. [Note that simple trigonometry rules imply that θ must be equal to λ/L, a fact which we used in that quantum-mechanical analysis of electron diffraction above.]    

Adding waves

What about the second minimum? Well… That occurs when ϕ = 4π/n. Using the ϕ 2πdsinθ/λ formula again, we get 2πdsinθ/λ = 4π/n or ndsinθ = 2λ. So we get ndsinθ Lsinθ Δ = 2λ. So we get the second minimum at an angle θ such that Δ = 2λFor the third minimum, we have ϕ = 6π/n. So we have 2πdsinθ/λ = 6π/n or ndsinθ = 3λ. So we get the third minimum at an angle θ such that Δ = 3λAnd so on and so on.

The point to note is that the diffraction pattern depends only on the wavelength λ and the total length L of the array, which is the width of the slit of course. Hence, we can actually extend the analysis for n going from some fixed value to infinity, and we’ll find that we will only have one great maximum with a lot of side lobes that are much and much smaller, with the minima occurring at angles such that Δ = λ, 2λ, 3λ, etcetera.

OK. What’s next? Well… Nothing. That’s it. I wanted to do a post on diffraction, and so that’s what I did. However, to wrap it all up, I’ll just include two more images from Wikipedia. The one on the left shows the diffraction pattern of a red laser beam made on a plate after passing a small circular hole in another plate. The pattern is quite clear. On the right-hand side, we have the diffraction pattern generated by ordinary white light going through a hole. In fact, it’s a computer-generated image and the gray scale intensities have been adjusted to enhance the brightness of the outer rings, because we would not be able to see them otherwise.

283px-Airy-pattern 600px-Laser_Interference

But… Didn’t I say I would write about diffraction and the Uncertainty Principle? Yes. And I admit I did not write all that much about the Uncertainty Principle above. But so I’ll do that in my next post, in which I intend to look at Heisenberg’s own illustration of the Uncertainty Principle. That example involves a good understanding of the resolving power of a lens or a microscope, and such understanding also involves some good mathematical analysis. However, as this post has become way too long already, I’ll leave that to the next post indeed. I’ll use the left-hand image above for that, so have a good look at it. In fact, let me quickly quote Wikipedia as an introduction to my next post:

The diffraction pattern resulting from a uniformly-illuminated circular aperture has a bright region in the center, known as the Airy disk which together with the series of concentric bright rings around is called the Airy pattern.

We’ll need in order to define the resolving power of a microscope, which is essential to understanding Heisenberg’s illustration of the Principle he advanced himself. But let me stop here, as it’s the topic of my next write-up indeed. This post has become way too long already. 🙂

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Photons as strings

Pre-script written much later: In the meanwhile, we figured it all out. We found the common-sense interpretation of quantum physics. No ambiguity. No hocus-pocus. I keep posts like the one below online only to, one day, go back to where I went wrong. 🙂

Jean Louis Van Belle, 20 May 2020

In my previous post, I explored, somewhat jokingly, the grey area between classical physics and quantum mechanics: light as a wave versus light as a particle. I did so by trying to picture a photon as an electromagnetic transient traveling through space, as illustrated below. While actual physicists would probably deride my attempt to think of a photon as an electromagnetic transient traveling through space, the idea illustrates the wave-particle duality quite well, I feel.

Photon wave

Understanding light is the key to understanding physics. Light is a wave, as Thomas Young proved to the Royal Society of London in 1803, thereby demolishing Newton’s corpuscular theory. But its constituents, photons, behave like particles. According to modern-day physics, both were right. Just to put things in perspective, the thickness of the note card which Young used to split the light – ordinary sunlight entering his room through a pinhole in a window shutter – was 1/30 of an inch, or approximately 0.85 mm. Hence, in essence, this is a double-slit experiment with the two slits being separated by a distance of almost 1 millimeter. That’s enormous as compared to modern-day engineering tolerance standards: what was thin then, is obviously not considered to be thin now. Scale matters. I’ll come back to this.

Young’s experiment (from www.physicsclassroom.com)

Young experiment

The table below shows that the ‘particle character’ of electromagnetic radiation becomes apparent when its frequency is a few hundred terahertz, like the sodium light example I used in my previous post: sodium light, as emitted by sodium lamps, has a frequency of 500×1012 oscillations per second and, therefore (the relation between frequency and wavelength is very straightforward: their product is the velocity of the wave, so for light we have the simple λf = c equation), a wavelength of 600 nanometer (600×10–9 meter).

Electromagnetic spectrum

However, whether something behaves like a particle or a wave also depends on our measurement scale: 0.85 mm was thin in Young’s time, and so it was a delicate experiment then but now, it’s a standard classroom experiment indeed. The theory of light as a wave would hold until more delicate equipment refuted it. Such equipment came with another sense of scale. It’s good to remind oneself that Einstein’s “discovery of the law of the photoelectric effect”, which explained the photoelectric effect as the result of light energy being carried in discrete quantized packets of energy, now referred to as photons, goes back to 1905 only, and that the experimental apparatus which could measure it was not much older. So waves behave like particles if we look at them close enough. Conversely, particles behave like waves if we look at them close enough. So there is this zone where they are neither, the zone for which we invoke the mathematical formalism of quantum mechanics or, to put it more precisely, the formalism of quantum electrodynamics: that “strange theory of light and Matter”, as Feynman calls it.

Let’s have a look at how particles became waves. It should not surprise us that the experimental apparatuses needed to confirm that electrons–or matter in general–can actually behave like a wave is more recent than the 19th century apparatuses which led Einstein to develop his ‘corpuscular’ theory of light (i.e. the theory of light as photons). The engineering tolerances involved are daunting. Let me be precise here. To be sure, the phenomenon of electron diffraction (i.e. electrons going through one slit and producing a diffraction pattern on the other side) had been confirmed experimentally already in 1925, in the famous Davisson-Germer experiment. I am saying because it’s rather famous indeed. First, because electron diffraction was a weird thing to contemplate at the time. Second, because it confirmed the de Broglie hypothesis only two years after Louis de Broglie had advanced it. And, third, because Davisson and Germer had never intended to set it up to detect diffraction: it was pure coincidence. In fact, the observed diffraction pattern was the result of a laboratory accident, and Davisson and Germer weren’t aware of other, conscious, attempts of trying to prove the de Broglie hypothesis. 🙂 […] OK. I am digressing. Sorry. Back to the lesson.

The nanotechnology that was needed to confirm Feynman’s 1965 thought experiment on electron interference (i.e. electrons going through two slits and interfering with each other (rather than producing some diffraction pattern as they go through one slit only) – and, equally significant as an experiment result, with themselves as they go through the slit(s) one by one! – was only developed over the past decades. In fact, it was only in 2008 (and again in 2012) that the experiment was carried out exactly the way Feynman describes it in his Lectures.

It is useful to think of what such experiments entail from a technical point of view. Have a look at the illustration below, which shows the set-up. The insert in the upper-left corner shows the two slits which were used in the 2012 experiment: they are each 62 nanometer wide – that’s 50×10–9 m! – and the distance between them is 272 nanometer, or 0.272 micrometer. [Just to be complete: they are 4 micrometer tall (4×10–6 m), and the thing in the middle of the slits is just a little support (150 nm) to make sure the slit width doesn’t vary.]

The second inset (in the upper-right corner) shows the mask that can be moved to close one or both slits partially or completely. The mask is 4.5µm wide ×20µm tall. Please do take a few seconds to contemplate the technology behind this feat: a nanometer is a millionth of a millimeter, so that’s a billionth of a meter, and a micrometer is a millionth of a meter. To imagine how small a nanometer is, you should imagine dividing one millimeter in ten, and then one of these tenths in ten again, and again, and once again, again, and again. In fact, you actually cannot imagine that because we live in the world we live in and, hence, our mind is used only to addition (and subtraction) when it comes to comparing sizes and – to a much more limited extent – with multiplication (and division): our brain is, quite simply, not wired to deal with exponentials and, hence, it can’t really ‘imagine’ these incredible (negative) powers. So don’t think you can imagine it really, because one can’t: in our mind, these scales exist only as mathematical constructs. They don’t correspond to anything we can actually make a mental picture of.

Electron double-slit set-up

The electron beam consisted of electrons with an (average) energy of 600 eV. That’s not an awful lot: 8.5 times more than the energy of an electron in orbit in a atom, whose energy would be some 70 eV, so the acceleration before they went through the slits was relatively modest. I’ve calculated the corresponding de Broglie wavelength of these electrons in another post (Re-Visiting the Matter-Wave, April 2014), using the de Broglie equations: f = E/h or λ = p/h. And, of course, you could just google the article on the experiment and read about it, but it’s a good exercise, and actually quite simple: just note that you’ll need to express the energy in joule (not in eV) to get it right. Also note that you need to include the rest mass of the electron in the energy. I’ll let you try it (or else just go to that post of mine). You should find a de Broglie wavelength of 50 picometer for these electrons, so that’s 50×10–12 m. While that wavelength is less than a thousandth of the slit width (62 nm), and about 5,500 times smaller than the space between the two slits (272 nm), the interference effect was unambiguous in the experiment. I advice you to google the results yourself (or read that April 2014 post of mine if you want a summary): the experiment was done at the University of Nebraska-Lincoln in 2012.

Electrons and X-rays

To put everything in perspective: 50 picometer is like the wavelength of X-rays, and you can google similar double-slit experiments for X-rays: they also loose their ‘particle behavior’ when we look at them at this tiny scale. In short, scale matters, and the boundary between ‘classical physics’ (electromagnetics) and quantum physics (wave mechanics) is not clear-cut. If anything, it depends on our perspective, i.e. what we can measure, and we seem to be shifting that boundary constantly. In what direction?

Downwards obviously: we’re devising instruments that measure stuff at smaller and smaller scales, and what’s happening is that we can ‘see’ typical ‘particles’, including hard radiation such as gamma rays, as local wave trains. Indeed, the next step is clear-cut evidence for interference between gamma rays.

Energy levels of photons

We would not associate low-frequency electromagnetic waves, such as radio or radar waves, with photons. But light in the visible spectrum, yes. Obviously. […]

Isn’t that an odd dichotomy? If we see that, on a smaller scale, particles start to look like waves, why would the reverse not be true? Why wouldn’t we analyze radio or radar waves, on a much larger scale, as a stream of very (I must say extremely) low-energy photons? I know the idea sounds ridiculous, because the energies involved would be ridiculously low indeed. Think about it. The energy of a photon is given by the Planck relation: E = h= hc/λ. For visible light, with wavelengths ranging from 800 nm (red) to 400 nm (violet or indigo), the photon energies range between 1.5 and 3 eV. Now, the shortest wavelengths for radar waves are in the so-called millimeter band, i.e. they range from 1 mm to 1 cm. A wavelength of 1 mm corresponds to a photon energy of 0.00124 eV. That’s close to nothing, of course, and surely not the kind of energy levels that we can currently detect.

But you get the idea: there is a grey area between classical physics and quantum mechanics, and it’s our equipment–notably the scale of our measurements–that determine where that grey area begins, and where it ends, and it seems to become larger and larger as the sensitivity of our equipment improves.

What do I want to get at? Nothing much. Just some awareness of scale, as an introduction to the actual topic of this post, and that’s some thoughts on a rather primitive string theory of photons. What !? 

Yes. Purely speculative, of course. 🙂

Photons as strings

I think my calculations in the previous post, as primitive as they were, actually provide quite some food for thought. If we’d treat a photon in the sodium light band (i.e. the light emitted by sodium, from a sodium lamp for instance) just like any other electromagnetic pulse, we would find it’s a pulse of some 10 meter long. We also made sense of this incredibly long distance by noting that, if we’d look at it as a particle (which is what we do when analyzing it as a photon), it should have zero size, because it moves at the speed of light and, hence, the relativistic length contraction effect ensures we (or any observer in whatever reference frame really, because light always moves at the speed of light, regardless of the reference frame) will see it as a zero-size particle.

Having said that, and knowing damn well that we have treat the photon as an elementary particle, I would think it’s very tempting to think of it as a vibrating string.

Huh?

Yes. Let me copy that graph again. The assumption I started with is a standard one in physics, and not something that you’d want to argue with: photons are emitted when an electron jumps from a higher to a lower energy level and, for all practical purposes, this emission can be analyzed as the emission of an electromagnetic pulse by an atomic oscillator. I’ll refer you to my previous post – as silly as it is – for details on these basics: the atomic oscillator has a Q, and so there’s damping involved and, hence, the assumption that the electromagnetic pulse resembles a transient should not sound ridiculous. Because the electric field as a function in space is the ‘reversed’ image of the oscillation in time, the suggested shape has nothing blasphemous.

Photon wave

Just go along with it for a while. First, we need to remind ourselves that what’s vibrating here is nothing physical: it’s an oscillating electromagnetic field. That being said, in my previous post, I toyed with the idea that the oscillation could actually also represent the photon’s wave function, provided we use a unit for the electric field that ensures that the area under the squared curve adds up to one, so as to normalize the probability amplitudes. Hence, I suggested that the field strength over the length of this string could actually represent the probability amplitudes, provided we choose an appropriate unit to measure the electric field.

But then I was joking, right? Well… No. Why not consider it? An electromagnetic oscillation packs energy, and the energy is proportional to the square of the amplitude of the oscillation. Now, the probability of detecting a particle is related to its energy, and such probability is calculated from taking the (absolute) square of probability amplitudes. Hence, mathematically, this makes perfect sense.

It’s quite interesting to think through the consequences, and I hope I will (a) understand enough of physics and (b) find enough time for this—one day! One interesting thing is that the field strength (i.e. the magnitude of the electric field vector) is a real number. Hence, if we equate these magnitudes with probability amplitudes, we’d have real probability amplitudes, instead of complex-valued ones. That’s not a very fundamental issue. It probably indicates we should also take into account the fact that the E vector also oscillates in the other direction that’s normal to the direction of propagation, i.e. the y-coordinate (assuming that the z-axis is the direction of propagation). To put it differently, we should take the polarization of the light into account. The figure below–which I took from Wikipedia again (by far the most convenient place to shop for images and animations: what would I do without it?– shows how the electric field vector moves in the xy-plane indeed, as the wave travels along the z-axis. So… Well… I still have to figure it all out, but the idea surely makes sense.

Circular.Polarization.Circularly.Polarized.Light_Right.Handed.Animation.305x190.255Colors

Another interesting thing to think about is how the collapse of the wave function would come about. If we think of a photon as a string, it must have some ‘hooks’ which could cause it to ‘stick’ or ‘collapse’ into a ‘lump’ as it hits a detector. What kind of hook? What force would come into play?

Well… The interaction between the photon and the photodetector is electromagnetic, but we’re looking for some other kind of ‘hook’ here. What could it be? I have no idea. Having said that, we know that the weakest of all fundamental forces—gravity—becomes much stronger—very much stronger—as the distance becomes smaller and smaller. In fact, it is said that, if we go to the Planck scale, the strength of the force of gravity becomes quite comparable with the other forces. So… Perhaps it’s, quite simply, the equivalent mass of the energy involved that gets ‘hooked’, somehow, as it starts interacting with the photon detector. Hence, when thinking about a photon as an oscillating string of energy, we should also think of that string as having some inseparable (equivalent) mass that, once it’s ‘hooked’, has no other option that to ‘collapse into itself’. [You may note there’s no quantum theory for gravity as yet. I am not sure how, but I’ve got a gut instinct that tells me that may help to explain why a photon consists of one single ‘unbreakable’ lump, although I need to elaborate this argument obviously.]

You must be laughing aloud now. A new string theory–really?

I know… I know… I haven’t reach sophomore level and I am already wildly speculating… Well… Yes. What I am talking about here has probably nothing to do with current string theories, although my proposed string would also replace the point-like photon by a one-dimensional ‘string’. However, ‘my’ string is, quite simply, an electromagnetic pulse (a transient actually, for reasons I explained in my previous post). Naive? Perhaps. However, I note that the earliest version of string theory is referred to as bosonic string theory, because it only incorporated bosons, which is what photons are.

So what? Well… Nothing… I am sure others have thought of this too, and I’ll look into it. It’s surely an idea which I’ll keep in the back of my head as I continue to explore physics. The idea is just too simple and beautiful to disregard, even if I am sure it must be pretty naive indeed. Photons as ten-meter long strings? Let’s just forget about it. 🙂 Onwards !!! 🙂

Post Scriptum: The key to ‘closing’ this discussion is, obviously, to be found in a full-blown analysis of the relativity of fields. So, yes, I have not done all of the required ‘homework’ on this and the previous post. I apologize for that. If anything, I hope it helped you to also try to think somewhat beyond the obvious. I realize I wasted a lot of time trying to understand the pre-cooked ready-made stuff that’s ‘on the market’, so to say. I still am, actually. Perhaps I should first thoroughly digest Feynman’s Lectures. In fact, I think that’s what I’ll try to do in the next year or so. Sorry for any inconvenience caused. 🙂

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

The shape and size of a photon

Important post script (PS) – dated 22 December 2018: Dear readers of this post, this is one of the more popular posts of my blog but − in the meanwhile − I did move on, and quite a bit, actually! The analysis below is not entirely consistent: I got many questions on it, and I have been thinking differently as a result. The Q&A below sums up everything: I do think of the photon as a pointlike particle now, and Chapter VIII of my book sums up the photon model. At the same time, if you are really interested in this question – how should one think of a photon? – then it’s probably good you also read the original post. If anything, it shows you how easy it is to get confused.

Hi Brian – see section III of this paper: http://vixra.org/pdf/1812.0273v2.pdf

Feynman’s classical idea of an atomic oscillator is fine in the context of the blackbody radiation problem, but his description of the photon as a long wavetrain does not make any sense. A photon has to pack two things: (1) the energy difference between the Bohr orbitals and (2) Planck’s constant h, which is the (physical) action associated with one cycle of an oscillation (so it’s a force over a distance (the loop or the radius – depending on the force you’re looking at) over a cycle time). See section V of the paper for how the fine-structure constant pops up here – it’s, as usual, a sort of scaling constant, but this time it scales a force. In any case, the idea is that we should think of a photon as one cycle – rather than a long wavetrain. The one cycle makes sense: when you calculate field strength and force you get quite moderate values (not the kind of black-hole energy concentrations some people suggest). It also makes sense from a logical point of view: the wavelength is something real, and so we should think of the photon amplitude (the electric field strength) as being real as well – especially when you think of how that photon is going to interact or be absorbed into another atom.

Sorry for my late reply. It’s been a while since I checked the comments. Please let me know if this makes sense. I’ll have a look at your blog in the coming days. I am working on a new paper on the anomalous magnetic moment – which is not anomalous as all if you start to think about how things might be working in reality. After many years of study, I’ve come to the conclusion that quantum mechanics is a nice way of describing things, but it doesn’t help us in terms of understanding anything. When we want to understand something, we need to push the classical framework a lot further than we currently do. In any case, that’s another discussion. :-/

JL

 

OK. Now you can move on to the post itself. 🙂 Sorry if this is confusing the reader, but it is necessary to warn him. I think of this post now as still being here to document the history of my search for a ‘basic version of truth’, as someone called it. [For an even more recent update, see Chapter 8 of my book, A Realist Interpretation of Quantum Mechanics.

Original post:

Photons are weird. All elementary particles are weird. As Feynman puts it, in the very first paragraph of his Lectures on Quantum Mechanics : “Historically, the electron, for example, was thought to behave like a particle, and then it was found that in many respects it behaved like a wave. So it really behaves like neither. Now we have given up. We say: “It is like neither. There is one lucky break, however—electrons behave just like light. The quantum behavior of atomic objects (electrons, protons, neutrons, photons, and so on) is the same for all, they are all “particle waves,” or whatever you want to call them. So what we learn about the properties of electrons will apply also to all “particles,” including photons of light.” (Feynman’s Lectures, Vol. III, Chapter 1, Section 1)

I wouldn’t dare to argue with Feynman, of course, but… What? Well… Photons are like electrons, and then they are not. Obviously not, I’d say. For starters, photons do not have mass or charge, and they are also bosons, i.e. ‘force-carriers’ (as opposed to matter-particles), and so they obey very different quantum-mechanical rules, which are referred to as Bose-Einstein statistics. I’ve written about that in other post (see, for example, my post on Bose-Einstein and Fermi-Dirac statistics), so I won’t do that again here. It’s probably sufficient to remind the reader that these rules imply that the so-called Pauli exclusion principle does not apply to them: bosons like to crowd together, thereby occupying the same quantum state—unlike their counterparts, the so-called fermions or matter-particles: quarks (which make up protons and neutrons) and leptons (including electrons and neutrinos), which can’t do that. Two electrons, for example, can only sit on top of each other (or be very near to each other, I should say) if their spins are opposite (so that makes their quantum state different), and there’s no place whatsoever to add a third one because there are only two possible ‘directions’ for the spin: up or down.

From all that I’ve been writing so far, I am sure you have some kind of picture of matter-particles now, and notably of the electron: it’s not really point-like, because it has a so-called scattering cross-section (I’ll say more about this later), and we can find it somewhere taking into account the Uncertainty Principle, with the probability of finding it at point x at time t given by the absolute square of a so-called ‘wave function’ Ψ(x, t).

But what about the photon? Unlike quarks or electrons, they are really point-like, aren’t they? And can we associate them with a psi function too? I mean, they have a wavelength, obviously, which is given by the Planck-Einstein energy-frequency relation: E = hν, with h the Planck constant and ν the frequency of the associated ‘light’. But an electromagnetic wave is not like a ‘probability wave’. So… Do they have a de Broglie wavelength as well?

Before answering that question, let me present that ‘picture’ of the electron once again.

The wave function for electrons

The electron ‘picture’ can be represented in a number of ways but one of the more scientifically correct ones – whatever that means – is that of a spatially confined wave function representing a complex quantity referred to as the probability amplitude. The animation below (which I took from Wikipedia) visualizes such wave functions. As mentioned above, the wave function is usually represented by the Greek letter psi (Ψ), and it is often referred to as a ‘probability wave’ – by bloggers like me, that is 🙂 – but that term is quite misleading. Why? You surely know that by now: the wave function represents a probability amplitude, not a probability. [So, to be correct, we should say a ‘probability amplitude wave’, or an ‘amplitude wave’, but so these terms are obviously too long and so they’ve been dropped and everybody talks about ‘the’ wave function now, although that’s confusing too, because an electromagnetic wave is a ‘wave function’ too, but describing ‘real’ amplitudes, not some weird complex numbers referred to as ‘probability amplitudes’.]

StationaryStatesAnimation

Having said what I’ve said above, probability amplitude and probability are obviously related: if we take the (absolute) square of the psi function – i.e. if we take the (absolute) square of all these amplitudes Ψ(x, t) – then we get the actual probability of finding that electron at point x at time t. So then we get the so-called probability density functions, which are shown on the right-hand side of the illustration above. [As for the term ‘absolute’ square, the absolute square is the squared norm of the associated ‘vector’. Indeed, you should note that the square of a complex number can be negative as evidenced, for example, by the definition of i: i= –1. In fact, if there’s only an imaginary part, then its square is always negative. Probabilities are real numbers between 0 and 1, and so they can’t be negative, and so that’s why we always talk about the absolute square, rather than the square as such.]

Below, I’ve inserted another image, which gives a static picture (i.e. one that is not varying in time) of the wave function of a real-life electron. To be precise: it’s the wave function for an electron on the 5d orbital of a hydrogen orbital. You can see it’s much more complicated than those easy things above. However, the idea behind is the same. We have a complex-valued function varying in space and in time. I took it from Wikipedia and so I’ll just copy the explanation here: “The solid body shows the places where the electron’s probability density is above a certain value (0.02), as calculated from the probability amplitude.” What about these colors? Well… The image uses the so-called HSL color system to represent complex numbers: each complex number is represented by a unique color, with a different hue (H), saturation (S) and lightness (L). Just google if you want to know how that works exactly.

Hydrogen_eigenstate_n5_l2_m1

OK. That should be clear enough. I wanted to talk about photons here. So let’s go for it. Well… Hmm… I realize I need to talk about some more ‘basics’ first. Sorry for that.

The Uncertainty Principle revisited (1)

The wave function is usually given as a function in space and time: Ψ = Ψ(x, t). However, I should also remind you that we have a similar function in the ‘momentum space’: if ψ is a psi function, then the function in the momentum space is a phi function, and we’ll write it as Φ = Φ(p, t). [As for the notation, x and p are written with capital letters and, hence, represent (three-dimensional) vectors. Likewise, we use a capital letter for psi and phi so we don’t confuse it with, for example, the lower-case φ (phi) representing the phase of a wave function.]

The position-space and momentum-space wave functions Ψ and Φ are related through the Uncertainty Principle. To be precise: they are Fourier transforms of each other. Huh? Don’t be put off by that statement. In fact, I shouldn’t have mentioned it, but then it’s how one can actually prove or derive the Uncertainty Principle from… Well… From ‘first principles’, let’s say, instead of just jotting it down as some God-given rule. Indeed, as Feynman puts: “The Uncertainty Principle should be seen in its historical context. If you get rid of all of the old-fashioned ideas and instead use the ideas that I’m explaining in these lectures—adding arrows for all the ways an event can happen—there is no need for an uncertainty principle!” However, I must assume you’re, just like me, not quite used to the new ideas as yet, and so let me just jot down the Uncertainty Principle once again, as some God-given rule indeed :-):

σx·σħ/2

This is the so-called Kennard formulation of the Principle: it measures the uncertainty about the exact position (x) as well as the momentum (p), in terms of the standard deviation (so that’s the σ (sigma) symbol) around the mean. To be precise, the assumption is that we cannot know the real x and p: we can only find some probability distribution for x and p, which is usually some nice “bell curve” in the textbooks. While the Kennard formulation is the most precise (and exact) formulation of the Uncertainty Principle (or uncertainty relation, I should say), you’ll often find ‘other’ formulations. These ‘other’ formulates usually write Δx and Δp instead of σand σp, with the Δ symbol indicating some ‘spread’ or a similar concept—surely do not think of Δ as a differential or so! [Sorry for assuming you don’t know this (I know you do!) but I just want to make sure here!] Also, these ‘other’ formulations will usually (a) not mention the 1/2 factor, (b) substitute ħ for h (ħ = h/2π, as you know, so ħ is preferred when we’re talking things like angular frequency or other stuff involving the unit circle), or (c) put an equality (=) sign in, instead of an inequality sign (≥). Niels Bohr’s early formulation of the Uncertainty Principle actually does all of that:

ΔxΔp h

So… Well… That’s a bit sloppy, isn’t it? Maybe. In Feynman’s Lectures, you’ll find an oft-quoted ‘application’ of the Uncertainty Principle leading to a pretty accurate calculation of the typical size of an atom (the so-called Bohr radius), which Feynman starts with an equally sloppy statement of the Uncertainty Principle, so he notes: “We needn’t trust our answer to within factors like 2, π etcetera.” Frankly, I used to think that’s ugly and, hence, doubt the ‘seriousness’ of such kind of calculations. Now I know it doesn’t really matter indeed, as the essence of the relationship is clearly not a 2, π or 2π factor. The essence is the uncertainty itself: it’s very tiny (and multiplying it with 2, π or 2π doesn’t make it much bigger) but so it’s there.

In this regard, I need to remind you of how tiny that physical constant ħ actually is: about 6.58×10−16 eV·s. So that’s a zero followed by a decimal point and fifteen zeroes: only then we get the first significant digits (65812…). And if 10−16 doesn’t look tiny enough for you, then just think about how tiny the electronvolt unit is: it’s the amount of (potential) energy gained (or lost) by an electron as it moves across a potential difference of one volt (which, believe me, is nothing much really): if we’d express ħ in Joule, then we’d have to add nineteen more zeroes, because 1 eV = 1.6×10−19 J. As for such phenomenally small numbers, I’ll just repeat what I’ve said many times before: we just cannot imagine such small number. Indeed, our mind can sort of intuitively deal with addition (and, hence, subtraction), and with multiplication and division (but to some extent only), but our mind is not made to understand non-linear stuff, such as exponentials indeed. If you don’t believe me, think of the Richter scale: can you explain the difference between a 4.0 and a 5.0 earthquake? […] If the answer to that question took you more than a second… Well… I am right. 🙂 [The Richter scale is based on the base-10 exponential function: a 5.0 earthquake has a shaking amplitude that is 10 times that of an earthquake that registered 4.0, and because energy is proportional to the square of the amplitude, that corresponds to an energy release that is 31.6 times that of the lesser earthquake.]

A digression on units

Having said what I said above, I am well aware of the fact that saying that we cannot imagine this or that is what most people say. I am also aware of the fact that they usually say that to avoid having to explain something. So let me try to do something more worthwhile here.

1. First, I should note that ħ is so small because the second, as a unit of time, is so incredibly large. All is relative, of course. 🙂 For sure, we should express time in a more natural unit at the atomic or sub-atomic scale, like the time that’s needed for light to travel one meter. Let’s do it. Let’s express time in a unit that I shall call a ‘meter‘. Of course, it’s not an actual meter (because it doesn’t measure any distance), but so I don’t want to invent a new word and surely not any new symbol here. Hence, I’ll just put apostrophes before and after: so I’ll write ‘meter’ or ‘m’. When adopting the ‘meter’ as a unit of time, we get a value for ‘ħ‘ that is equal to (6.6×10−16 eV·s)(1/3×108 ‘meter’/second) = 2.2×10−8 eV·’m’. Now, 2.2×10−8 is a number that is still too tiny to imagine. But then our ‘meter’ is still a rather huge unit at the atomic scale: we should take the ‘millimicron’, aka the ‘nanometer’ (1 nm = 1×10−9 m), or – even better because more appropriate – the ‘angstrom‘: 1 Å = 0.1 nm = 1×10−10 m. Indeed, the smallest atom (hydrogen) has a radius of 0.25 Å, while larger atoms will have a radius of about 1 or more Å. Now that should work, isn’t it? You’re right, we get a value for ‘ħ‘ equal to (6.6×10−16 eV·s)(1/3×108 ‘m’/s)(1×1010 ‘Å’/m) = 220 eV·’Å’, or 22 220 eV·’nm’. So… What? Well… If anything, it shows ħ is not a small unit at the atomic or sub-atomic level! Hence, we actually can start imagining how things work at the atomic level when using more adequate units.

[Now, just to test your knowledge, let me ask you: what’s the wavelength of visible light in angstrom? […] Well? […] Let me tell you: 400 to 700 nm is 4000 to 7000 Å. In other words, the wavelength of visible light is quite sizable as compared to the size of atoms or electron orbits!]

2. Secondly, let’s do a quick dimension analysis of that ΔxΔp h relation and/or its more accurate expression σx·σħ/2.

A position (and its uncertainty or standard deviation) is expressed in distance units, while momentum… Euh… Well… What? […] Momentum is mass times velocity, so it’s kg·m/s. Hence, the dimension of the product on the left-hand side of the inequality is m·kg·m/s = kg·m2/s. So what about this eV·s dimension on the right-hand side? Well… The electronvolt is a unit of energy, and so we can convert it to joules. Now, a joule is a newton-meter (N·m), which is the unit for both energy and work: it’s the work done when applying a force of one newton over a distance of one meter. So we now have N·m·s for ħ, which is nice, because Planck’s constant (h or ħ—whatever: the choice for one of the two depends on the variables we’re looking at) is the quantum for action indeed. It’s a Wirkung as they say in German, so its dimension combines both energy as well as time.

To put it simply, it’s a bit like power, which is what we men are interested in when looking at a car or motorbike engine. 🙂 Power is the energy spent or delivered per second, so its dimension is J/s, not J·s. However, your mind can see the similarity in thinking here. Energy is a nice concept, be it potential (think of a water bucket above your head) or kinetic (think of a punch in a bar fight), but it makes more  sense to us when adding the dimension of time (emptying a bucket of water over your head is different than walking in the rain, and the impact of a punch depends on the power with which it is being delivered). In fact, the best way to understand the dimension of Planck’s constant is probably to also write the joule in ‘base units’. Again, one joule is the amount of energy we need to move an object over a distance of one meter against a force of one newton. So one J·s is one N·m·s is (1) a force of one newton acting over a distance of (2) one meter over a time period equal to (3) one second.

I hope that gives you a better idea of what ‘action’ really is in physics. […] In any case, we haven’t answered the question. How do we relate the two sides? Simple: a newton is an oft-used SI unit, but it’s not a SI base unit, and so we should deconstruct it even more (i.e. write it in SI base units). If we do that, we get 1 N = 1 kg·m/s2: one newton is the force needed to give a mass of 1 kg an acceleration of 1 m/s per second. So just substitute and you’ll see the dimension on the right-hand side is kg·(m/s2)·m·s = kg·m2/s, so it comes out alright.

Why this digression on units? Not sure. Perhaps just to remind you also that the Uncertainty Principle can also be expressed in terms of energy and time:

ΔE·Δt = h

Here there’s no confusion  in regard to the units on both sides: we don’t need to convert to SI base units to see that they’re the same: [ΔE][Δt] = J·s.

The Uncertainty Principle revisited (2)

The ΔE·Δt = h expression is not so often used as an expression of the Uncertainty Principle. I am not sure why, and I don’t think it’s a good thing. Energy and time are also complementary variables in quantum mechanics, so it’s just like position and momentum indeed. In fact, I like the energy-time expression somewhat more than the position-momentum expression because it does not create any confusion in regard to the units on both sides: it’s just joules (or electronvolts) and seconds on both sides of the equation. So what?

Frankly, I don’t want to digress too much here (this post is going to become awfully long) but, personally, I found it hard, for quite a while, to relate the two expressions of the very same uncertainty ‘principle’ and, hence, let me show you how the two express the same thing really, especially because you may or may not know that there are even more pairs of complementary variables in quantum mechanics. So, I don’t know if the following will help you a lot, but it helped me to note that:

  1. The energy and momentum of a particle are intimately related through the (relativistic) energy-momentum relationship. Now, that formula, E2 = p2c2 – m02c4, which links energy, momentum and intrinsic mass (aka rest mass), looks quite monstrous at first. Hence, you may prefer a simpler form: pc = Ev/c. It’s the same really as both are based on the relativistic mass-energy equivalence: E = mc2 or, the way I prefer to write it: m = E/c2. [Both expressions are the same, obviously, but we can ‘read’ them differently: m = E/c2 expresses the idea that energy has a equivalent mass, defined as inertia, and so it makes energy the primordial concept, rather than mass.] Of course, you should note that m is the total mass of the object here, including both (a) its rest mass as well as (b) the equivalent mass it gets from moving at the speed v. So m, not m0, is the concept of mass used to define p, and note how easy it is to demonstrate the equivalence of both formulas: pc = Ev/c ⇔ mvc = Ev/c ⇔ E = mc2. In any case, the bottom line is: don’t think of the energy and momentum of a particle as two separate things; they are two aspects of the same ‘reality’, involving mass (a measure of inertia, as you know) and velocity (as measured in a particular (so-called inertial) reference frame).
  2. Time and space are intimately related through the universal constant c, i.e. the speed of light, as evidenced by the fact that we will often want to express distance not in meter but in light-seconds (i.e. the distance that light travels (in a vacuum) in one second) or, vice versa, express time in meter (i.e. the time that light needs to travel a distance of one meter).

These relationships are interconnected, and the following diagram shows how.

Uncertainty relations

The easiest way to remember it all is to apply the Uncertainty Principle, in both its ΔE·Δt = h as well as its Δp·Δx = h  expressions, to a photon. A photon has no rest mass and its velocity v is, obviously, c. So the energy-momentum relationship is a very simple one: p = E/c. We then get both expressions of the Uncertainty Principle by simply substituting E for p, or vice versa, and remember that time and position (or distance) are related in exactly the same way: the constant of proportionality is the very same. It’s c. So we can write: Δx = Δt·c and Δt = Δx/c. If you’re confused, think about it in very practical terms: because the speed of light is what it is, an uncertainty of a second in time amounts, roughly, to an uncertainty in position of some 300,000 km (c = 3×10m/s). Conversely, an uncertainty of some 300,000 km in the position amounts to a uncertainty in time of one second. That’s what the 1-2-3 in the diagram above is all about: please check if you ‘get’ it, because that’s ‘essential’ indeed.

Back to ‘probability waves’

Matter-particles are not the same, but we do have the same relations, including that ‘energy-momentum duality’. The formulas are just somewhat more complicated because they involve mass and velocity (i.e. a velocity less than that of light). For matter-particles, we can see that energy-momentum duality not only in the relationships expressed above (notably the relativistic energy-momentum relation), but also in the (in)famous de Broglie relation, which associates some ‘frequency’ (f) to the energy (E) of a particle or, what amounts to the same, some ‘wavelength’ (λ) to its momentum (p):

λ = h/p and f = E/h

These two complementary equations give a ‘wavelength’ (λ) and/or a ‘frequency’ (f) of a de Broglie wave, or a ‘matter wave’ as it’s sometimes referred to. I am using, once again, apostrophes because the de Broglie wavelength and frequency are a different concept—different than the wavelength or frequency of light, or of any other ‘real’ wave (like water or sound waves, for example). To illustrate the differences, let’s start with a very simple question: what’s the velocity of a de Broglie wave? Well… […] So? You thought you knew, didn’t you?

Let me answer the question:

  1. The mathematically (and physically) correct answer involves distinguishing the group and phase velocity of a wave.
  2. The ‘easy’ answer is: the de Broglie wave of a particle moves with the particle and, hence, its velocity is, obviously, the speed of the particle which, for electrons, is usually non-relativistic (i.e. rather slow as compared to the speed of light).

To be clear on this, the velocity of a de Broglie wave is not the speed of light. So a de Broglie wave is not like an electromagnetic wave at all. They have nothing in common really, except for the fact that we refer to both of them as ‘waves’. 🙂

The second thing to note is that, when we’re talking about the ‘frequency’ or ‘wavelength’ of ‘matter waves’ (i.e. de Broglie waves), we’re talking the frequency and wavelength of a wave with two components: it’s a complex-valued wave function, indeed, and so we get a real and imaginary part when we’re ‘feeding’ the function with some values for x and t.

Thirdly and, perhaps, most importantly, we should always remember the Uncertainty Principle when looking at the de Broglie relation. The Uncertainty Principle implies that we can actually not assign any precise wavelength (or, what amounts to the same, a precise frequency) to a de Broglie wave: if there is a spread in p (and, hence, in E), then there will be a spread in λ (and in f). In fact, I tend to think that it would be better to write the de Broglie relation as an ‘uncertainty relation’ in its own right:

Δλ = Δ(h/p) = hΔp and Δf = ΔE/h = hΔE

Besides from underscoring the fact that we have other ‘pairs’ of complementary variables, this ‘version’ of the de Broglie equation would also remind us continually of the fact that a ‘regular’ wave with an exact frequency and/or an exact wavelength (so a Δλ and/or a Δf equal to zero) would not give us any information about the momentum and/or the energy. Indeed, as Δλ and/or Δf go to zero (Δλ → 0 and/or Δf → 0 ), then Δp and ΔE must go to infinity (Δp → ∞ and ΔE → ∞. That’s just the math involved in such expressions. 🙂

Jokes aside, I’ll admit I used to have a lot of trouble understanding this, so I’ll just quote the expert teacher (Feynman) on this to make sure you don’t get me wrong here:

“The amplitude to find a particle at a place can, in some circumstances, vary in space and time, let us say in one dimension, in this manner: Ψ Aei(ωtkx, where ω is the frequency, which is related to the classical idea of the energy through ħω, and k is the wave number, which is related to the momentum through ħk. [These are equivalent formulations of the de Broglie relations using the angular frequency and the wave number instead of wavelength and frequency.] We would say the particle had a definite momentum p if the wave number were exactly k, that is, a perfect wave which goes on with the same amplitude everywhere. The Ψ Aei(ωtkxequation [then] gives the [complex-valued probability] amplitude, and if we take the absolute square, we get the relative probability for finding the particle as a function of position and time. This is a constant, which means that the probability to find a [this] particle is the same anywhere.” (Feynman’s Lectures, I-48-5)

You may say or think: What’s the problem here really? Well… If the probability to find a particle is the same anywhere, then the particle can be anywhere and, for all practical purposes, that amounts to saying it’s nowhere really. Hence, that wave function doesn’t serve the purpose. In short, that nice Ψ Aei(ωtkxfunction is completely useless in terms of representing an electron, or any other actual particle moving through space. So what to do?

The Wikipedia article on the Uncertainty Principle has this wonderful animation that shows how we can superimpose several waves, one on top of each other, to form a wave packet. Let me copy it below:

Sequential_superposition_of_plane_waves

So that’s what the wave we want indeed: a wave packet that travels through space but which is, at the same time, limited in space. Of course, you should note, once again, that it shows only one part of the complex-valued probability amplitude: just visualize the other part (imaginary if the wave above would happen to represent the real part, and vice versa if the wave would happen to represent the imaginary part of the probability amplitude). The animation basically illustrates a mathematical operation. To be precise, it involves a Fourier analysis or decomposition: it separates a wave packet into a finite or (potentially) infinite number of component waves. Indeed, note how, in the illustration above, the frequency of the component waves gradually increases (or, what amounts to the same, how the wavelength gets smaller and smaller) and how, with every wave we ‘add’ to the packet, it becomes increasingly localized. Now, you can easily see that the ‘uncertainty’ or ‘spread’ in the wavelength here (which we’ll denote by Δλ) is, quite simply, the difference between the wavelength of the ‘one-cycle wave’, which is equal to the space the whole wave packet occupies (which we’ll denote by Δx), and the wavelength of the ‘highest-frequency wave’. For all practical purposes, they are about the same, so we can write: Δx ≈ Δλ. Using Bohr’s formulation of the Uncertainty Principle, we can see the expression I used above (Δλ = hΔp) makes sense: Δx = Δλ = h/Δp, so ΔλΔp = h.

[Just to be 100% clear on terminology: a Fourier decomposition is not the same as that Fourier transform I mentioned when talking about the relation between position and momentum in the Kennard formulation of the Uncertainty Principle, although these two mathematical concepts obviously have a few things in common.]

The wave train revisited

All what I’ve said above, is the ‘correct’ interpretation of the Uncertainty Principle and the de Broglie equation. To be frank, it took me quite a while to ‘get’ that—and, as you can see, it also took me quite a while to get ‘here’, of course. 🙂

In fact, I was confused, for quite a few years actually, because I never quite understood whey there had to be a spread in the wavelength of a wave train. Indeed, we can all easily imagine a localized wave train with a fixed frequency and a fixed wavelength, like the one below, which I’ll re-use later. I’ve made this wave train myself: it’s a standard sine and cosine function multiplied with an ‘envelope’ function generating the envelope. As you can see, it’s a complex-valued thing indeed: the blue curve is the real part, and the imaginary part is the red curve.

Photon wave

You can easily make a graph like this yourself. [Just use of one of those online graph tools.] This thing is localized in space and, as mentioned above, it has a fixed frequency and wavelength. So all those enigmatic statements you’ll find in serious or less serious books (i.e. textbooks or popular accounts) on quantum mechanics saying that “we cannot define a unique wavelength for a short wave train” and/or saying that “there is an indefiniteness in the wave number that is related to the finite length of the train, and thus there is an indefiniteness in the momentum” (I am quoting Feynman here, so not one of the lesser gods) are – with all due respect for these authors, especially Feynman – just wrong. I’ve made another ‘short wave train’ below, but this time it depicts the real part of a (possible) wave function only.

graph (1)

Hmm… Now that one has a weird shape, you’ll say. It doesn’t look like a ‘matter wave’! Well… You’re right. Perhaps. [I’ll challenge you in a moment.] The shape of the function above is consistent, though, with the view of a photon as a transient electromagnetic oscillation. Let me come straight to the point by stating the basics: the view of a photon in physics is that photons are emitted by atomic oscillators. As an electron jumps from one energy level to the other, it seems to oscillate back and forth until it’s in equilibrium again, thereby emitting an electromagnetic wave train that looks like a transient.

Huh? What’s a transient? It’s an oscillation like the one above: its amplitude and, hence, its energy, gets smaller and smaller as time goes by. To be precise, its energy level has the same shape as the envelope curve below: E = E0e–t/τ. In this expression, we have τ as the so-called decay time, and one can show it’s the inverse of the so-called decay rate: τ = 1/γ with γE = –dE/dt. In case you wonder, check it out on Wikipedia: it’s one of the many applications of the natural exponential function: we’re talking a so-called exponential decay here indeed, involves a quantity (in this case, the amplitude and/or the energy) decreasing at a rate that is proportional to its current value, with the coefficient of proportionality being γ. So we write that as γE = –dE/dt in mathematical notation. 🙂

decay time

I need to move on. All of what I wrote above was ‘plain physics’, but so what I really want to explore in this post is a crazy hypothesis. Could these wave trains above – I mean the wave trains with the fixed frequency and wavelength – possible represent a de Broglie wave for a photon?

You’ll say: of course not! But, let’s be honest, you’d have some trouble explaining why. The best answer you could probably come up with is: because no physics textbook says something like that. You’re right. It’s a crazy hypothesis because, when you ask a physicist (believe it or not, but I actually went through the trouble of asking two nuclear scientists), they’ll tell you that photons are not to be associated with de Broglie waves. [You’ll say: why didn’t you try looking for an answer on the Internet? I actually did but – unlike what I am used to – I got very confusing answers on this one, so I gave up trying to find some definite answer on this question on the Internet.]

However, these negative answers don’t discourage me from trying to do some more freewheeling. Before discussing whether or not the idea of a de Broglie wave for a photon makes sense, let’s think about mathematical constraints. I googled a bit but I only see one actually: the amplitudes of a de Broglie wave are subject to a normalization condition. Indeed, when everything is said and done, all probabilities must take a value between 0 and 1, and they must also all add up to exactly 1. So that’s a so-called normalization condition that obviously imposes some constraints on the (complex-valued) probability amplitudes of our wave function.

But let’s get back to the photon. Let me remind you of what happens when a photon is being emitted by inserting the two diagrams below, which gives the energy levels of the atomic orbitals of electrons.

Energy Level Diagrams

So an electron absorbs or emits a photon when it goes from one energy level to the other, so it absorbs or emits radiation. And, of course, you will also remember that the frequency of the absorbed or emitted light is related to those energy levels. More specifically, the frequency of the light emitted in a transition from, let’s say, energy level Eto Ewill be written as ν31 = (E– E1)/h. This frequency will be one of the so-called characteristic frequencies of the atom and will define a specific so-called spectral emission line.

Now, from a mathematical point of view, there’s no difference between that ν31 = (E– E1)/h equation and the de Broglie equation, f = E/h, which assigns a de Broglie wave to a particle. But, of course, from all that I wrote above, it’s obvious that, while these two formulas are the same from a math point of view, they represent very different things. Again, let me repeat what I said above: a de Broglie wave is a matter-wave and, as such, it has nothing to do with an electromagnetic wave. 

Let me be even more explicit. A de Broglie wave is not a ‘real’ wave, in a sense (but, of course, that’s a very unscientific statement to make); it’s a psi function, so it represents these weird mathematical quantities–complex probability amplitudes–which allow us to calculate the probability of finding the particle at position x or, if it’s a wave function for the momentum-space, to find a value p for its momentum. In contrast, a photon that’s emitted or absorbed represents a ‘real’ disturbance of the electromagnetic field propagating through space. Hence, that frequency ν is something very different than f, which is why we use another symbol for it (ν is the Greek letter nu, not to be confused with the v symbol we use for velocity). [Of course, you may wonder how ‘real’ or ‘unreal’ an electromagnetic field is but, in the context of this discussion, let me assure you we should look at it as something that’s very real.]

That being said, we also know light is emitted in discrete energy packets: in fact, that’s how photons were defined originally, first by Planck and then by Einstein. Now, when an electron falls from one energy level in an atom to another (lower) energy level, it emits one – and only one – photon with that particular wavelength and energy. The question then is: how should we picture that photon? Does it also have some more or less defined position in space, and some momentum? The answer is definitely yes, on both accounts:

  1. Subject to the constraints of the Uncertainty Principle, we know, more or less indeed, when a photon leaves a source and when it hits some detector. [And, yes, due to the ‘Uncertainty Principle’ or, as Feynman puts it, the rules for adding arrows, it may not travel in a straight line and/or at the speed of light—but that’s a discussion that, believe it or not, is not directly relevant here. If you want to know more about it, check one or more of my posts on it.]
  2. We also know light has a very definite momentum, which I’ve calculated elsewhere and so I’ll just note the result: p = E/c. It’s a ‘pushing momentum’ referred to as radiation pressure, and its in the direction of travel indeed.

In short, it does makes sense, in my humble opinion that is, to associate some wave function with the photon, and then I mean a de Broglie wave. Just think about it yourself. You’re right to say that a de Broglie wave is a ‘matter wave’, and photons aren’t matter but, having said that, photons do behave like like electrons, don’t they? There’s diffraction (when you send a photon through one slit) and interference (when photons go through two slits, altogether or – amazingly – one by one), so it’s the same weirdness as electrons indeed, and so why wouldn’t we associate some kind of wave function with them?

You can react in one of three ways here. The first reaction is: “Well… I don’t know. You tell me.” Well… That’s what I am trying to do here. 🙂

The second reaction may be somewhat more to the point. For example, those who’ve read Feynman’s Strange Theory of Light and Matter, could say: “Of course, why not? That’s what we do when we associate a photon going from point A to B with an amplitude P(A to B), isn’t it?”

Well… No. I am talking about something else here. Not some amplitude associated with a path in spacetime, but a wave function giving an approximate position of the photon.

The third reaction may be the same as the reaction of those two nuclear scientists I asked: “No. It doesn’t make sense. We do not associate photons with a de Broglie wave.” But so they didn’t tell me why because… Well… They didn’t have the time to entertain a guy like me and so I didn’t dare to push the question and continued to explore it more in detail myself.

So I’ve done that, and I thought of one reason why the question, perhaps, may not make all that much sense: a photon travels at the speed of light; therefore, it has no length. Hence, doing what I am doing below, and that’s to associate the electromagnetic transient with a de Broglie wave might not make sense.

Maybe. I’ll let you judge. Before developing the point, I’ll raise two objections to the ‘objection’ raised above (i.e. the statement that a photon has no length). First, if we’re looking at the photon as some particle, it will obviously have no length. However, an electromagnetic transient is just what it is: an electromagnetic transient. I’ve see nothing that makes me think its length should be zero. In fact, if that would be the case, the concept of an electromagnetic wave itself would not make sense, as its ‘length’ would always be zero. Second, even if – somehow – the length of the electromagnetic transient would be reduced to zero because of its speed, we can still imagine that we’re looking at the emission of an electromagnetic pulse (i.e. a photon) using the reference frame of the photon, so that we’re traveling at speed c,’ riding’ with the photon, so to say, as it’s being emitted. Then we would ‘see’ the electromagnetic transient as it’s being radiated into space, wouldn’t we?

Perhaps. I actually don’t know. That’s why I wrote this post and hope someone will react to it. I really don’t know, so I thought it would be nice to just freewheel a bit on this question. So be warned: nothing of what I write below has been researched really, so critical comments and corrections from actual specialists are more than welcome.

The shape of a photon wave

As mentioned above, the answer in regard to the definition of a photon’s position and momentum is, obviously, unambiguous. Perhaps we have to stretch whatever we understand of Einstein’s (special) relativity theory, but we should be able to draw some conclusions, I feel.

Let me say one thing more about the momentum here. As said, I’ll refer you to one of my posts for the detail but, all you should know here is that the momentum of light is related to the magnetic field vector, which we usually never mention when discussing light because it’s so tiny as compared to the electric field vector in our inertial frame of reference. Indeed, the magnitude of the magnetic field vector is equal to the magnitude of the electric field vector divided by c = 3×108, so we write B = E/c. Now, the E here stands for the electric field, so let me use W to refer to the energy instead of E. Using the B = E/equation and a fairly straightforward calculation of the work that can be done by the associated force on a charge that’s being put into this field, we get that famous equation which we mentioned above already: the momentum of a photon is its total energy divided by c, so we write p = W/c. You’ll say: so what? Well… Nothing. I just wanted to note we get the same p = W/c equation indeed, but from a very different angle of analysis here. We didn’t use the energy-momentum relation here at all! In any case, the point to note is that the momentum of a photon is only a tiny fraction of its energy (p = W/c), and that the associated magnetic field vector is also just a tiny fraction of the electric field vector (B = E/c).

But so it’s there and, in fact, when adopting a moving reference frame, the mix of E and B (i.e. the electric and magnetic field) becomes an entirely different one. One of the ‘gems’ in Feynman’s Lectures is the exposé on the relativity of electric and magnetic fields indeed, in which he analyzes the electric and magnetic field caused by a current, and in which he shows that, if we switch our inertial reference frame for that of the moving electrons in the wire, the ‘magnetic’ field disappears, and the whole electromagnetic effect becomes ‘electric’ indeed.

I am just noting this because I know I should do a similar analysis for the E and B ‘mixture’ involved in the electromagnetic transient that’s being emitted by our atomic oscillator. However, I’ll admit I am not quite comfortably enough with the physics nor the math involved to do that, so… Well… Please do bear this in mind as I will be jotting down some quite speculative thoughts in what follows.

So… A photon is, in essence, a electromagnetic disturbance and so, when trying to picture a photon, we can think of some oscillating electric field vector traveling through–and also limited in–space. [Note that I am leaving the magnetic field vector out of the analysis from the start, which is not ‘nice’ but, in light of that B = E/c relationship, I’ll assume it’s acceptable.] In short, in the classical world – and in the classical world only of course – a photon must be some electromagnetic wave train, like the one below–perhaps.

Photon - E

But why would it have that shape? I only suggested it because it has the same shape as Feynman’s representation of a particle (see below) as a ‘probability wave’ traveling through–and limited in–space. Wave train

So, what about it? Let me first remind you once again (I just can’t stress this point enough it seems) that Feynman’s representation – and most are based on his, it seems – is misleading because it suggests that ψ(x) is some real number. It’s not. In the image above, the vertical axis should not represent some real number (and it surely should not represent a probability, i.e. some real positive number between 0 and 1) but a probability amplitude, i.e. a complex number in which both the real and imaginary part are important. Just to be fully complete (in case you forgot), such complex-valued wave function ψ(x) will give you all the probabilities you need when you take its (absolute) square, but so… Well… We’re really talking a different animal here, and the image above gives you only one part of the complex-valued wave function (either the real or the imaginary part), while it should give you both. That’s why I find my graph below much better. 🙂 It’s the same really, but so it shows both the real as well as the complex part of a wave function.

Photon wave

But let me go back to the first illustration: the vertical axis of the first illustration is not ψ but E – the electric field vector. So there’s no imaginary part here: just a real number, representing the strength–or magnitude I should say– of the electric field E as a function of the space coordinate x. [Can magnitudes be negative? The honest answer is: no, they can’t. But just think of it as representing the field vector pointing in the other way .]

Regardless of the shortcomings of this graph, including the fact we only have some real-valued oscillation here, would it work as a ‘suggestion’ of how a real-life photon could look like?

Of course, you could try to not answer that question by mumbling something like: “Well… It surely doesn’t represent anything coming near to a photon in quantum mechanics.” But… Well… That’s not my question here: I am asking you to be creative and ‘think outside of the box’, so to say. 🙂

So you should say ‘No!’ because of some other reason. What reason? Well… If a photon is an electromagnetic transient – in other words, if we adopt a purely classical point of view – it’s going to be a transient wave indeed, and so then it should walk, talk and even look like a transient. 🙂 Let me quickly jot down the formula for the (vertical) component of E as a function of the acceleration of some charge q:

EMR law

The charge q (i.e. the source of the radiation) is, of course, our electron that’s emitting the photon as it jumps from a higher to a lower energy level (or, vice versa, absorbing it). This formula basically states that the magnitude of the electric field (E) is proportional to the acceleration (a) of the charge (with t–r/c the retarded argument). Hence, the suggested shape of E as a function of x as shown above would imply that the acceleration of the electron is (a) initially quite small, (b) then becomes larger and larger to reach some maximum, and then (c) becomes smaller and smaller again to then die down completely. In short, it does match the definition of a transient wave sensu stricto (Wikipedia defines a transient as “a short-lived burst of energy in a system caused by a sudden change of state”) but it’s not likely to represent any real transient. So, we can’t exclude it, but a real transient is much more likely to look like something what’s depicted below: no gradual increase in amplitude but big swings initially which then dampen to zero. In other words, if our photon is a transient electromagnetic disturbance caused by a ‘sudden burst of energy’ (which is what that electron jump is, I would think), then its representation will, much more likely, resemble a damped wave, like the one below, rather than Feynman’s picture of a moving matter-particle.

graph (1)

In fact, we’d have to flip the image, both vertically and horizontally, because the acceleration of the source and the field are related as shown below. The vertical flip is because of the minus sign in the formula for E(t). The horizontal flip is because of the minus sign in the (t – r/c) term, the retarded argument: if we add a little time (Δt), we get the same value for a(tr/cas we would have if we had subtracted a little distance: Δr=cΔt. So that’s why E as a function of r (or of x), i.e. as a function in space, is a ‘reversed’ plot of the acceleration as a function of time.

wave in space

So we’d have something like below.

Photon wave

What does this resemble? It’s not a vibrating string (although I do start to understand the attractiveness of string theory now: vibrating strings are great as energy storage systems, so the idea of a photon being some kind of vibrating string sounds great, doesn’t it?). It’s not resembling a bullwhip effect either, because the oscillation of a whip is confined by a different envelope (see below). And, no, it’s also definitely not a trumpet. 🙂

800px-Bullwhip_effect

It’s just what it is: an electromagnetic transient traveling through space. Would this be realistic as a ‘picture’ of a photon? Frankly, I don’t know. I’ve looked at a lot of stuff but didn’t find anything on this really. The easy answer, of course, is quite straightforward: we’re not interested in the shape of a photon because we know it is not an electromagnetic wave. It’s a ‘wavicle’, just like an electron.

[…] Sure. I know that too. Feynman told me. 🙂 But then why wouldn’t we associate some wave function with it? Please tell me, because I really can’t find much of an answer to that question in the literature, and so that’s why I am freewheeling here. So just go along with me for a while, and come up with another suggestion. As I said above, your bet is as good as mine. All that I know is that there’s one thing we need to explain when considering the various possibilities: a photon has a very well-defined frequency (which defines its color in the visible light spectrum) and so our wave train should – in my humble opinion – also have that frequency. At least for ‘quite a while’—and then I mean ‘most of the time’, or ‘on average’ at least. Otherwise the concept of a frequency – or a wavelength – wouldn’t make much sense. Indeed, if the photon has no defined wavelength or frequency, then we could not perceive it as some color (as you may or may not know, the sense of ‘color’ is produced by our eye and brain, but so it’s definitely associated with the frequency of the light). A photon should have a color (in phyics, that means a frequency) because, when everything is said and done, that’s what the Planck relation is all about.

What would be your alternative? I mean… Doesn’t it make sense to think that, when jumping from one energy level to the other, the electron would initially sort of overshoot its new equilibrium position, to then overshoot it again on the other side, and so on and so on, but with an amplitude that becomes smaller and smaller as the oscillation dies out? In short, if we look at radiation as being caused by atomic oscillators, why would we not go all the way and think of them as oscillators subject to some damping force? Just think about it. 🙂

The size of a photon wave

Let’s forget about the shape for a while and think about size. We’ve got an electromagnetic train here. So how long would it be? Well… Feynman calculated the Q of these atomic oscillators: it’s of the order of 10(see his Lectures, I-33-3: it’s a wonderfully simple exercise, and one that really shows his greatness as a physics teacher) and, hence, this wave train will last about 10–8 seconds (that’s the time it takes for the radiation to die out by a factor 1/e). To give a somewhat more precise example, for sodium light, which has a frequency of 500 THz (500×1012 oscillations per second) and a wavelength of 600 nm (600×10–9 meter), the radiation will lasts about 3.2×10–8 seconds. [In fact, that’s the time it takes for the radiation’s energy to die out by a factor 1/e, so(i.e. the so-called decay time τ), so the wavetrain will actually last longer, but so the amplitude becomes quite small after that time.]

So that’s a very short time, but still, taking into account the rather spectacular frequency (500 THz) of sodium light, that still makes for some 16 million oscillations and, taking into the account the rather spectacular speed of light (3×10m/s), that makes for a wave train with a length of, roughly, 9.6 meter. Huh? 9.6 meter!?

You’re right. That’s an incredible distance: it’s like infinity on an atomic scale!

So… Well… What to say? Such length surely cannot match the picture of a photon as a fundamental particle which cannot be broken up, can it? So it surely cannot be right because, if this would be the case, then there surely must be some way to break this thing up and, hence, it cannot be ‘elementary’, can it?

Well… Maybe. But think it through. First note that we will not see the photon as a 10-meter long string because it travels at the speed of light indeed and so the length contraction effect ensure its length, as measured in our reference frame (and from whatever ‘real-life’ reference frame actually, because the speed of light will always be c, regardless of the speeds we mortals could ever reach (including speeds close to c), is zero.

So, yes, I surely must be joking here but, as far as jokes go, I can’t help thinking this one is fairly robust from a scientific point of view. Again, please do double-check and correct me, but all what I’ve written so far is not all that speculative. It corresponds to all what I’ve read about it: only one photon is produced per electron in any de-excitation, and its energy is determined by the number of energy levels it drops, as illustrated (for a simple hydrogen atom) below. For those who continue to be skeptical about my sanity here, I’ll quote Feynman once again:

“What happens in a light source is that first one atom radiates, then another atom radiates, and so forth, and we have just seen that atoms radiate a train of waves only for about 10–8 sec; after 10–8 sec, some atom has probably taken over, then another atom takes over, and so on. So the phases can really only stay the same for about 10–8 sec. Therefore, if we average for very much more than 10–8 sec, we do not see an interference from two different sources, because they cannot hold their phases steady for longer than 10–8 sec. With photocells, very high-speed detection is possible, and one can show that there is an interference which varies with time, up and down, in about 10–8 sec.” (Feynman’s Lectures, I-34-4)

600px-Hydrogen_transitions

So… Well… Now it’s up to you. I am going along here with the assumption that a photon in the visible light spectrum, from a classical world perspective, should indeed be something that’s several meters long and packs a few million oscillations. So, while we usually measure stuff in seconds, or hours, or years, and, hence, while we would that think 10–8 seconds is short, a photon would actually be a very stretched-out transient that occupies quite a lot of space. I should also add that, in light of that number of ten meter, the dampening seems to happen rather slowly!

[…]

I can see you shaking your head now, for various reasons.

First because this type of analysis is not appropriate. […] You think so? Well… I don’t know. Perhaps you’re right. Perhaps we shouldn’t try to think of a photon as being something different than a discrete packet of energy. But then we also know it is an electromagnetic waveSo why wouldn’t we go all the way? 

Second, I guess you may find the math involved in this post not to your liking, even if it’s quite simple and I am not doing anything spectacular here. […] Well… Frankly, I don’t care. Let me bulldozer on. 🙂

What about the ‘vertical’ dimension, the y and the z coordinates in space? We’ve got this long snaky  thing: how thick-bodied is it?

Here, we need to watch our language. While it’s fairly obvious to associate a wave with a cross-section that’s normal to its direction of propagation, it is not obvious to associate a photon with the same thing. Not at all actually: as that electric field vector E oscillates up and down (or goes round and round, as shown in the illustration below, which is an image of a circularly polarized wave), it does not actually take any space. Indeed, the electric and magnetic field vectors E and B have a direction and a magnitude in space but they’re not representing something that is actually taking up some small or larger core in space.

Circular.Polarization.Circularly.Polarized.Light_Right.Handed.Animation.305x190.255Colors

Hence, the vertical axis of that graph showing the wave train does not indicate some spatial position: it’s not a y-coordinate but the magnitude of an electric field vector. [Just to underline the fact that the magnitude E has nothing to do with spatial coordinates: note that its value depends on the unit we use to measure field strength (so that’s newton/coulomb, if you want to know), so it’s really got nothing to do with an actual position in space-time.]

So, what can we say about it? Nothing much, perhaps. But let me try.

Cross-sections in nuclear physics

In nuclear physics, the term ‘cross-section’ would usually refer to the so-called Thompson scattering cross-section of an electron (or any charged particle really), which can be defined rather loosely as the target area for the incident wave (i.e. the photons): it is, in fact, a surface which can be calculated from what is referred to as the classical electron radius, which is about 2.82×10–15 m. Just to compare: you may or may not remember the so-called Bohr radius of an atom, which is about 5.29×10–11 m, so that’s a length that’s about 20,000 times longer. To be fully complete, let me give you the exact value for the Thompson scattering cross-section of an electron: 6.62×10–29 m(note that this is a surface indeed, so we have m squared as a unit, not m).

Now, let me remind you – once again – that we should not associate the oscillation of the electric field vector with something actually happening in space: an electromagnetic field does not move in a medium and, hence, it’s not like a water or sound wave, which makes molecules go up and down as it propagates through its medium. To put it simply: there’s nothing that’s wriggling in space as that photon is flashing through space. However, when it does hit an electron, that electron will effectively ‘move’ (or vibrate or wriggle or whatever you can imagine) as a result of the incident electromagnetic field.

That’s what’s depicted and labeled below: there is a so-called ‘radial component’ of the electric field, and I would say: that’s our photon! [What else would it be?] The illustration below shows that this ‘radial’ component is just E for the incident beam and that, for the scattered beam, it is, in fact, determined by the electron motion caused by the incident beam through that relation described above, in which a is the normal component (i.e. normal to the direction of propagation of the outgoing beam) of the electron’s acceleration.

Thomson_scattering_geometry

Now, before I proceed, let me remind you once again that the above illustration is, once again, one of those illustrations that only wants to convey an idea, and so we should not attach too much importance to it: the world at the smallest scale is best not represented by a billiard ball model. In addition, I should also note that the illustration above was taken from the Wikipedia article on elastic scattering (i.e. Thomson scattering), which is only a special case of the more general Compton scattering that actually takes place. It is, in fact, the low-energy limit. Photons with higher energy will usually be absorbed, and then there will be a re-emission, but, in the process, there will be a loss of energy in this ‘collision’ and, hence, the scattered light will have lower energy (and, hence, lower frequency and longer wavelength). But – Hey! – now that I think of it: that’s quite compatible with my idea of damping, isn’t it? 🙂 [If you think I’ve gone crazy, I am really joking here: when it’s Compton scattering, there’s no ‘lost’ energy: the electron will recoil and, hence, its momentum will increase. That’s what’s shown below (credit goes to the HyperPhysics site).]

compton4

So… Well… Perhaps we should just assume that a photon is a long wave train indeed (as mentioned above, ten meter is very long indeed: not an atomic scale at all!) but that its effective ‘radius’ should be of the same order as the classical electron radius. So what’s that order? If it’s more or less the same radius, then it would be in the order of femtometers (1 fm = 1 fermi = 1×10–15 m). That’s good because that’s a typical length-scale in nuclear physics. For example, it would be comparable with the radius of a proton. So we look at a photon here as something very different – because it’s so incredibly long (at least as measured from its own reference frame) – but as something which does have some kind of ‘radius’ that is normal to its direction of propagation and equal or smaller than the classical electron radius. [Now that I think of it, we should probably think of it as being substantially smaller. Why? Well… An electron is obviously fairly massive as compared to a photon (if only because an electron has a rest mass and a photon hasn’t) and so… Well… When everything is said and done, it’s the electron that absorbs a photon–not the other way around!]

Now, that radius determines the area in which it may produce some effect, like hitting an electron, for example, or like being detected in a photon detector, which is just what this so-called radius of an atom or an electron is all about: the area which is susceptible of being hit by some particle (including a photon), or which is likely to emit some particle (including a photon). What is exactly, we don’t know: it’s still as spooky as an electron and, therefore, it also does not make all that much sense to talk about its exact position in space. However, if we’d talk about its position, then we should obviously also invoke the Uncertainty Principle, which will give us some upper and lower bounds for its actual position, just like it does for any other particle: the uncertainty about its position will be related to the uncertainty about its momentum, and more knowledge about the former, will implies less knowledge about the latter, and vice versa. Therefore, we can also associate some complex wave function with this photon which is – for all practical purposes – a de Broglie wave. Now how should we visualize that wave?

Well… I don’t know. I am actually not going to offer anything specific here. First, it’s all speculation. Second, I think I’ve written too much rubbish already. However, if you’re still reading, and you like this kind of unorthodox application of electromagnetics, then the following remarks may stimulate your imagination.

The first thing to note is that we should not end up with a wave function that, when squared, gives us a constant probability for each and every point in space. No. The wave function needs to be confined in space and, hence, we’re also talking a wave train here, and a very short one in this case. So… Well… What about linking its amplitude to the amplitude of the field for the photon. In other words, the probability amplitude could, perhaps, be proportional to the amplitude of E, with the proportionality factor being determined by (a) the unit in which we measure E (i.e. newton/coulomb) and (b) the normalization condition.

OK. I hear you say it now: “Ha-ha! Got you! Now you’re really talking nonsense! How can a complex number (the probability amplitude) be proportional to some real number (the field strength)?”

Well… Be creative. It’s not that difficult to imagine some linkages. First, the electric field vector has both a magnitude and a direction. Hence, there’s more to E than just its magnitude. Second, you should note that the real and imaginary part of a complex-valued wave function is a simple sine and cosine function, and so these two functions are the same really, except for a phase difference of π/2. In other words, if we have a formula for the real part of a wave function, we have a formula for its imaginary part as well. So… Your remark is to the point and then it isn’t.

OK, you’ll say, but then so how exactly would you link the E vector with the ψ(x, t) function for a photon. Well… Frankly, I am a bit exhausted now and so I’ll leave any further speculation to you. The whole idea of a de Broglie wave of a photon, with the (complex-valued) amplitude having some kind of ‘proportional’ relationship to the (magnitude of) the electric field vector makes sense to me, although we’d have to be innovative about what that ‘proportionality’ exactly is.

Let me conclude this speculative business by noting a few more things about our ‘transient’ electromagnetic wave:

1. First, it’s obvious that the usual relations between (a) energy (W), (b) frequency (f) and (c) amplitude (A) hold. If we increase the frequency of a wave, we’ll have a proportional increase in energy (twice the frequency is twice the energy), with the factor of proportionality being given by the Planck-Einstein relation: W = hf. But if we’re talking amplitudes (for which we do not have a formula, which is why we’re engaging in those assumptions on the shape of the transient wave), we should not forget that the energy of a wave is proportional to the square of its amplitude: W ∼ A2. Hence, a linear increase of the amplitudes results in an exponential (quadratic) increase in energy (e.g. if you double all amplitudes, you’ll pack four times more energy in that wave).

2. Both factors come into play when an electron emits a photon. Indeed, if the difference between the two energy levels is larger, then the photon will not only have a higher frequency (i.e. we’re talking light (or electromagnetic radiation) in the upper ranges of the spectrum then) but one should also expect that the initial overshooting – and, hence, the initial oscillation – will also be larger. In short, we’ll have larger amplitudes. Hence, higher-energy photons will pack even more energy upfront. They will also have higher frequency, because of the Planck relation. So, yes, both factors would come into play.

What about the length of these wave trains? Would it make them shorter? Yes. I’ll refer you to Feynman’s Lectures to verify that the wavelength appears in the numerator of the formula for Q. Hence, higher frequency means shorter wavelength and, hence, lower Q. Now, I am not quite sure (I am not sure about anything I am writing here it seems) but this may or may not be the reason for yet another statement I never quite understood: photons with higher and higher energy are said to become smaller and smaller, and when they reach the Planck scale, they are said to become black holes.

Hmm… I should check on that. 🙂

Conclusion

So what’s the conclusion? Well… I’ll leave it to you to think about this. As said, I am a bit tired now and so I’ll just wrap this up, as this post has become way too long anyway. Let me, before parting, offer the following bold suggestion in terms of finding a de Broglie wave for our photon: perhaps that transient above actually is the wave function.

You’ll say: What !? What about normalization? All probabilities have to add up to one and, surely, those magnitudes of the electric field vector wouldn’t add up to one, would they?

My answer to that is simple: that’s just a question of units, i.e. of normalization indeed. So just measure the field strength in some other unit and it will come all right.

[…] But… Yes? What? Well… Those magnitudes are real numbers, not complex numbers.

I am not sure how to answer that one but there’s two things I could say:

  1. Real numbers are complex numbers too: it’s just that their imaginary part is zero.
  2. When working with waves, and especially with transients, we’ve always represented them using the complex exponential function. For example, we would write a wave function whose amplitude varies sinusoidally in space and time as Aei(ωtr), with ω the (angular) frequency and k the wave number (so that’s the wavelength expressed in radians per unit distance).

So, frankly, think about it: where is the photon? It’s that ten-meter long transient, isn’t it? And the probability to find it somewhere is the (absolute) square of some complex number, right? And then we have a wave function already, representing an electromagnetic wave, for which we know that the energy which it packs is the square of its amplitude, as well as being proportional to its frequency. We also know we’re more likely to detect something with high energy than something with low energy, don’t we? So… Tell me why the transient itself would not make for a good psi function?

But then what about these probability amplitudes being a function of the y and z coordinates?

Well… Frankly, I’ve started to wonder if a photon actually has a radius. If it doesn’t have a mass, it’s probably the only real point-like particle (i.e. a particle not occupying any space) – as opposed to all other matter-particles, which do have mass.

Why?

I don’t know. Your guess is as good as mine. Maybe our concepts of amplitude and frequency of a photon are not very relevant. Perhaps it’s only energy that counts. We know that a photon has a more or less well-defined energy level (within the limits of the Uncertainty Principle) and, hence, our ideas about how that energy actually gets distributed over the frequency, the amplitude and the length of that ‘transient’ have no relation with reality. Perhaps we like to think of a photon as a transient electromagnetic wave, because we’re used to thinking in terms of waves and fields, but perhaps a photon is just a point-like thing indeed, with a wave function that’s got the same shape as that transient. 🙂

Post scriptum: Perhaps I should apologize to you, my dear reader. It’s obvious that, in quantum mechanics, we don’t think of a photon as having some frequency and some wavelength and some dimension in space: it’s just an elementary particle with energy interacting with other elementary particles with energy, and we use these coupling constants and what have you to work with them. So we don’t usually think of photons as ten-meter long transients moving through space. So, when I write that “our concepts of amplitude and frequency of a photon are maybe not very relevant” when trying to picture a photon, and that “perhaps, it’s only energy that counts”, I actually don’t mean “maybe” or “perhaps“. I mean: Of course! […] In the quantum-mechanical world view, that is.

So I apologize for, perhaps, posting what may or may not amount to plain nonsense. However, as all of this nonsense helps me to make sense of these things myself, I’ll just continue. 🙂 I seem to move very slowly on this Road to Reality, but the good thing about moving slowly, is that it will − hopefully − give me the kind of ‘deeper’ understanding I want, i.e. an understanding beyond the formulas and mathematical and physical models. In the end, that’s all that I am striving for when pursuing this ‘hobby’ of mine. Nothing more, nothing less. 🙂 Onwards!

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 20, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Babushka thinking

Pre-scriptum (dated 26 June 2020): This is an interesting post. I think my thoughts on the relevance of scale – especially the role of the fine-structure constant in this regard – have evolved considerably, so you should probably read my papers instead of these old blog posts.

Original post:

What is that we are trying to understand? As a kid, when I first heard about atoms consisting of a nucleus with electrons orbiting around it, I had this vision of worlds inside worlds, like a set of babushka dolls, one inside the other. Now I know that this model – which is nothing but the 1911 Rutherford model basically – is plain wrong, even if it continues to be used in the logo of the International Atomic Energy Agency, or the US Atomic Energy Commission. 

IAEA logo US_Atomic_Energy_Commission_logo

Electrons are not planet-like things orbiting around some center. If one wants to understand something about the reality of electrons, one needs to familiarize oneself with complex-valued wave functions whose argument represents a weird quantity referred to as a probability amplitude and, contrary to what you may think (unless you read my blog, or if you just happen to know a thing or two about quantum mechanics), the relation between that amplitude and the concept of probability tout court is not very straightforward.

Familiarizing oneself with the math involved in quantum mechanics is not an easy task, as evidenced by all those convoluted posts I’ve been writing. In fact, I’ve been struggling with these things for almost a year now and I’ve started to realize that Roger Penrose’s Road to Reality (or should I say Feynman’s Lectures?) may lead nowhere – in terms of that rather spiritual journey of trying to understand what it’s all about. If anything, they made me realize that the worlds inside worlds are not the same. They are different – very different.

When everything is said and done, I think that’s what’s nagging us as common mortals. What we are all looking for is some kind of ‘Easy Principle’ that explains All and Everything, and we just can’t find it. The point is: scale matters. At the macro-scale, we usually analyze things using some kind of ‘billiard-ball model’. At a smaller scale, let’s say the so-called wave zone, our ‘law’ of radiation holds, and we can analyze things in terms of electromagnetic or gravitational fields. But then, when we further reduce scale, by another order of magnitude really – when trying to get  very close to the source of radiation, or if we try to analyze what is oscillating really – we get in deep trouble: our easy laws do no longer hold, and the equally easy math – easy is relative of course 🙂 – we use to analyze fields or interference phenomena, becomes totally useless.

Religiously inclined people would say that God does not want us to understand all or, taking a somewhat less selfish picture of God, they would say that Reality (with a capital R to underline its transcendental aspects) just can’t be understood. Indeed, it is rather surprising – in my humble view at least – that things do seem to get more difficult as we drill down: in physics, it’s not the bigger things – like understanding thermonuclear fusion in the Sun, for example – but the smallest things which are difficult to understand. Of course, that’s partly because physics leaves some of the bigger things which are actually very difficult to understand – like how a living cell works, for example, or how our eye or our brain works – to other sciences to study (biology and biochemistry for cells, or for vision or brain functionality). In that respect, physics may actually be described as the science of the smallest things. The surprising thing, then, is that the smallest things are not necessarily the simplest things – on the contrary.

Still, that being said, I can’t help feeling some sympathy for the simpler souls who think that, if God exists, he seems to throw up barriers as mankind tries to advance its knowledge. Isn’t it strange, indeed, that the math describing the ‘reality’ of electrons and photons (i.e. quantum mechanics and quantum electrodynamics), as complicated as it is, becomes even more complicated – and, important to note, also much less accurate – when it’s used to try to describe the behavior of  quarks and gluons? Additional ‘variables’ are needed (physicists call these ‘variables’ quantum numbers; however, when everything is said and done, that’s what quantum numbers actually are: variables in a theory), and the agreement between experimental results and predictions in QCD is not as obvious as it is in QED.

Frankly, I don’t know much about quantum chromodynamics – nothing at all to be honest – but when I read statements such as “analytic or perturbative solutions in low-energy QCD are hard or impossible due to the highly nonlinear nature of the strong force” (I just took this one line from the Wikipedia article on QCD), I instinctively feel that QCD is, in fact, a different world as well – and then I mean different from QED, in which analytic or perturbative solutions are the norm. Hence, I already know that, once I’ll have mastered Feynman’s Volume III, it won’t help me all that much to get to the next level of understanding: understanding quantum chromodynamics will be yet another long grind. In short, understanding quantum mechanics is only a first step.

Of course, that should not surprise us, because we’re talking very different order of magnitudes here: femtometers (10–15 m), in the case of electrons, as opposed to attometers (10–18 m) or even zeptometers (10–21 m) when we’re talking quarks. Hence, if past experience (I mean the evolution of scientific thought) is any guidance, we actually should expect an entirely different world. Babushka thinking is not the way forward.

Babushka thinking

What’s babushka thinking? You know what babushkas are, don’t you? These dolls inside dolls. [The term ‘babushka’ is actually Russian for an old woman or grandmother, which is what these dolls usually depict.] Babushka thinking is the fallacy of thinking that worlds inside worlds are the same. It’s what I did as a kid. It’s what many of us still do. It’s thinking that, when everything is said and done, it’s just a matter of not being able to ‘see’ small things and that, if we’d have the appropriate equipment, we actually would find the same doll within the larger doll – the same but smaller – and then again the same doll with that smaller doll. In Asia, they have these funny expression: “Same-same but different.” Well… That’s what babushka thinking all about: thinking that you can apply the same concepts, tools and techniques to what is, in fact, an entirely different ballgame.

First_matryoshka_museum_doll_open

Let me illustrate it. We discussed interference. We could assume that the laws of interference, as described by superimposing various waves, always hold, at every scale, and that it’s just  the crudeness of our detection apparatus that prevents us from seeing what’s going on. Take two light sources, for example, and let’s say they are a billion wavelengths apart – so that’s anything between 400 to 700 meters for visible light (because the wavelength of visible light is 400 to 700 billionths of a meter). So then we won’t see any interference indeed, because we can’t register it. In fact, none of the standard equipment can. The interference term oscillates wildly up and down, from positive to negative and back again, if we move the detector just a tiny bit left or right – not more than the thickness of a hair (i.e. 0.07 mm or so). Hence, the range of angles θ (remember that angle θ was the key variable when calculating solutions for the resultant wave in previous posts) that are being covered by our eye – or by any standard sensor really – is so wide that the positive and negative interference averages out: all that we ‘see’ is the sum of the intensities of the two lights. The terms in the interference term cancel each other out. However, we are still essentially correct assuming there actually is interference: we just cannot see it – but it’s there.

Reinforcing the point, I should also note that, apart from this issue of ‘distance scale’, there is also the scale of time. Our eye has a tenth-of-a-second averaging time. That’s a huge amount of time when talking fundamental physics: remember that an atomic oscillator – despite its incredibly high Q – emits radiation for like 10-8 seconds only, so that’s one-hundred millionths of a second. Then another atom takes over, and another – and so that’s why we get unpolarized light: it’s all the same frequencies (because the electron oscillators radiate at their resonant frequencies), but so there is no fixed phase difference between all of these pulses: the interference between all of these pulses should result in ‘beats’ – as they interfere positively or negatively – but it all cancels out for us, because it’s too fast.

Indeed, while the ‘sensors’ in the retina of the human eye (there are actually four kind of cells there, but so the principal ones are referred to as ‘rod’ and ‘cone’ cells respectively) are, apparently, sensitive enough able to register individual photons, the “tenth-of-a-second averaging” time means that the cells – which are interconnected and ‘pre-process’ light really – will just amalgamate all those individual pulses into one signal of a certain color (frequency) and a certain intensity (energy). As one scientist puts it: “The neural filters only allow a signal to pass to the brain when at least about five to nine photons arrive within less than 100 ms.” Hence, that signal will not keep track of the spacing between those photons.

In short, information gets lost. But so that, in itself, does not invalidate babushka thinking. Let me visualize it by a non-very-mathematically-rigorous illustration. Suppose that we have some very regular wave train coming in, like the one below: one wave train consisting of three ‘groups’ separated between ‘nodes’.

Graph

All will depend on the period of the wave as compared to that one-tenth-of-a-second averaging time. In fact, we have two ‘periods’: the periodicity of the group – which is related to the concept of group velocity – and, hence, I’ll associate a ‘group wavelength’ and a ‘group period’ with that. [In case you haven’t heard of these terms before, don’t worry: I haven’t either. :-)] Now, if one tenth of a second covers like two or all three of the groups between the nodes (so that means that one tenth of a second is a multiple of the group period Tg), then even the envelope of the wave does not matter much in terms of ‘signal’: our brain will just get one pulse that averages it all out. We will see none of the detail of this wave train. Our eye will just get light in (remember that the intensity of the light is the square of the amplitude, so the negative amplitudes make contributions too) but we cannot distinguish any particular pulse: it’s just one signal. This is the most common situation when we are talking about electromagnetic radiation: many photons arrive but our eye just sends one signal to the brain: “Hey Boss! Light of color X and intensity Y coming from direction Z.”

In fact, it’s quite remarkable that our eye can distinguish colors in light of the fact that the wavelengths of various colors (violet, blue, green, yellow, orange and red) differs 30 to 40 billionths of a meter only! Better still: if the signal lasts long enough, we can distinguish shades whose wavelengths differ by 10 or 15 nm only, so that’s a difference of 1% or 2% only. In case you wonder how it works: Feynman devotes not less than two chapters in his Lectures to the physiology of the eye: not something you’ll find in other physics handbooks! There are apparently three pigments in the cells in our eyes, each sensitive to color in a different way and it is “the spectral absorption in those three pigments that produces the color sense.” So it’s a bit like the RGB system in a television – but then more complicated, of course!

But let’s go back to our wave there and analyze the second possibility. If a tenth of a second covers less than that ‘group wavelength’, then it’s different: we will actually see the individual groups as two or  three separate pulses. Hence, in that case, our eye – or whatever detector (another detector will just have another averaging time – will average over a group, but not over the whole wave train. [Just in case you wonder how we humans compare with our living beings: from what I wrote above, it’s obvious we can see ‘flicker’ only if the oscillation is in the range of 10 or 20 Hz. The eye of a bee is made to see the vibrations of feet and wings of other bees and, hence, its averaging time is much shorter, like a hundredth of a second and, hence, it can see flicker up to 200 oscillations per second! In addition, the eye of a bee is sensitive over a much wider range of ‘color’ – it sees UV light down to a wavelength of 300 nm (where as we don’t see light with a wavelength below 400 nm) – and, to top it all off, it has got a special sensitivity for polarized light, so light that gets reflected or diffracted looks different to the bee.]

Let’s go to the third and final case. If a tenth of a second would cover less than the wavelength of the the so-called carrier wave, i.e. the actual oscillation, then we will be able to distinguish the individual peaks and troughs of the carrier wave!

Of course, this discussion is not limited to our eye as a sensor: any instrument will be able to measure individual phenomena only within a certain range, with an upper and a lower range, i.e. the ‘biggest’ thing it can see, and the ‘smallest’. So that explains the so-called resolution of an optical or an electron microscope: whatever the instrument, it cannot really ‘see’ stuff that’s smaller than the wavelength of the ‘light’ (real light or – in the case of an electron microscope – electron beams) it uses to ‘illuminate’ the object it is looking at. [The actual formula for the resolution of a microscope is obviously a bit more complicated, but this statement does reflect the gist of it.]

However, all that I am writing above, suggests that we can think of what’s going on here as ‘waves within waves’, with the wave between nodes not being any different – in substance that is – as the wave as a whole: we’ve got something that’s oscillating, and within each individual oscillation, we find another oscillation. From a math point of view, babushka thinking is thinking we can analyze the world using Fourier’s machinery to decompose some function (see my posts on Fourier analysis). Indeed, in the example above, we have a modulated carrier wave (it is an example of amplitude modulation – the old-fashioned way of transmitting radio signals), and we see a wave within a wave and, hence, just like the Rutherford model of an atom, you may think there will always be ‘a wave within a wave’.

In this regard, you may think of fractals too: fractals are repeating or self-similar patterns that are always there, at every scale. However, the point to note is that fractals do not represent an accurate picture of how reality is actually structured: worlds within worlds are not the same.

Reality is no onion

Reality is not some kind of onion, from which you peel off a layer and then you find some other layer, similar to the first: “same-same but different”, as they’d say in Asia. The Coast of Britain is, in fact, finite, and the grain of sand you’ll pick up at one of its beaches will not look like the coastline when you put it under a microscope. In case you don’t believe me: I’ve inserted a real-life photo below. The magnification factor is a rather modest 300 times. Isn’t this amazing? [The credit for this nice picture goes to a certain Dr. Gary Greenberg. Please do google his stuff. It’s really nice.]

sand-grains-under-microscope-gary-greenberg-1

In short, fractals are wonderful mathematical structures but – in reality – there are limits to how small things get: we cannot carve a babushka doll out of the cellulose and lignin molecules that make up most of what we call wood. Likewise, the atoms that make up the D-glucose chains in the cellulose will never resemble the D-glucose chains. Hence, the babushka doll, the D-glucose chains that make up wood, and the atoms that make up the molecules within those macro-molecules are three different worlds. They’re not like layers of the same onion. Scale matters. The worlds inside words are different, and fundamentally so: not “same-same but different” but just plain different. Electrons are no longer point-like negative charges when we look at them at close range.

In fact, that’s the whole point: we can’t look at them at close range because we can’t ‘locate’ them. They aren’t particles. They are these strange ‘wavicles’ which we described, physically and mathematically, with a complex wave function relating their position (or their momentum) with some probability amplitude, and we also need to remember these funny rules for adding these amplitudes, depending on whether or not the ‘wavicle’ obeys Fermi or Bose statistics.

Weird, but – come to think of it – not more weird, in terms of mathematical description, than these electromagnetic waves. Indeed, when jotting down all these equations and developing all those mathematical argument, one often tends to forget that we are not talking some physical wave here. The field vector E (or B) is a mathematical construct: it tells us what force a charge will feel when we put it here or there. It’s not like a water or sound wave that makes some medium (water or air) actually move. The field is an influence that travels through empty space. But how can something actually through empty space? When it’s truly empty, you can’t travel through it, can you?

Oh – you’ll say – but we’ve got these photons, don’t we? Waves are not actually waves: they come in little packets of energy – photons. Yes. You’re right. But, as mentioned above, these photons aren’t little bullets – or particles if you want. They’re as weird as the wave and, in any case, even a billiard ball view of the world is not very satisfying: what happens exactly when two billiard balls collide in a so-called elastic collision? What are the springs on the surface of those balls – in light of the quick reaction, they must resemble more like little explosive charges that detonate on impact, isn’t it? – that make the two balls recoil from each other?

So any mathematical description of reality becomes ‘weird’ when you keep asking questions, like that little child I was – and I still am, in a way, I guess. Otherwise I would not be reading physics at the age of 45, would I? 🙂

Conclusion

Let me wrap up here. All of what I’ve been blogging about over the past few months concerns the classical world of physics. It consists of waves and fields on the one hand, and solid particles on the other – electrons and nucleons. But so we know it’s not like that when we have more sensitive apparatuses, like the apparatus used in that 2012 double-slit electron interference experiment at the University of Nebraska–Lincoln, that I described at length in one of my earlier posts. That apparatus allowed control of two slits – both not more than 62 nanometer wide (so that’s the difference between the wavelength of dark-blue and light-blue light!), and the monitoring of single-electron detection events. Back in 1963, Feynman already knew what this experiment would yield as a result. He was sure about it, even if he thought such instrument could never be built. [To be fully correct, he did have some vague idea about a new science, for which he himself coined the term ‘nanotechnology’, but what we can do today surpasses, most probably, all his expectations at the time. Too bad he died too young to see his dreams come through.]

The point to note is that this apparatus does not show us another layer of the same onion: it shows an entirely different world. While it’s part of reality, it’s not ‘our’ reality, nor is it the ‘reality’ of what’s being described by classical electromagnetic field theory. It’s different – and fundamentally so, as evidenced by those weird mathematical concepts one needs to introduce to sort of start to ‘understand’ it.

So… What do I want to say here? Nothing much. I just had to remind myself where I am right now. I myself often still fall prey to babushka thinking. We shouldn’t. We should wonder about the wood these dolls are made of. In physics, the wood seems to be math. The models I’ve presented in this blog are weird: what are those fields? And just how do they exert a force on some charge? What’s the mechanics behind? To these questions, classical physics does not have an answer really.

But, of course, quantum mechanics does not have a very satisfactory answer either: what does it mean when we say that the wave function collapses? Out of all of the possibilities in that wonderful indeterminate world ‘inside’ the quantum-mechanical universe, one was ‘chosen’ as something that actually happened: a photon imparts momentum to an electron, for example. We can describe it, mathematically, but – somehow – we still don’t really understand what’s going on.

So what’s going on? We open a doll, and we do not find another doll that is smaller but similar. No. What we find is a completely different toy. However – Surprise ! Surprise ! – it’s something that can be ‘opened’ as well, to reveal even weirder stuff, for which we need even weirder ‘tools’ to somehow understand how it works (like lattice QCD, if you’d want an example: just google it if you want to get an inkling of what that’s about). Where is this going to end? Did it end with the ‘discovery’ of the Higgs particle? I don’t think so.

However, with the ‘discovery’ (or, to be generous, let’s call it an experimental confirmation) of the Higgs particle, we may have hit a wall in terms of verifying our theories. At the center of a set of babushka dolls, you’ll usually have a little baby: a solid little thing that is not like the babushkas surrounding it: it’s young, male and solid, as opposed to the babushkas. Well… It seems that, in physics, we’ve got several of these little babies inside: electrons, photons, quarks, gluons, Higgs particles, etcetera. And we don’t know what’s ‘inside’ of them. Just that they’re different. Not “same-same but different”. No. Fundamentally different. So we’ve got a lot of ‘babies’ inside of reality, very different from the ‘layers’ around them, which make up ‘our’ reality. Hence, ‘Reality’ is not a fractal structure. What is it? Well… I’ve started to think we’ll never know. For all of the math and wonderful intellectualism involved, do we really get closer to an ‘understanding’ of what it’s all about?

I am not sure. The more I ‘understand’, the less I ‘know’ it seems. But then that’s probably why many physicists still nurture an acute sense of mystery, and why I am determined to keep reading. 🙂

Post scriptum: On the issue of the ‘mechanistic universe’ and the (related) issue of determinability and indeterminability, that’s not what I wanted to write about above, because I consider that solved. This post is meant to convey some wonder – on the different models of understanding that we need to apply to different scales. It’s got little to do with determinability or not. I think that issue got solved long time ago, and I’ll let Feynman summarize that discussion:

“The indeterminacy of quantum mechanics has given rise to all kinds of nonsense and questions on the meaning of freedom of will, and of the idea that the world is uncertain. […] Classical physics is also indeterminate. It is true, classically, that if we knew the position and the velocity of every particle in the world, or in a box of gas, we could predict exactly what would happen. And therefore the classical world is deterministic. Suppose, however, we have a finite accuracy and do not know exactly where just one atom is, say to one part in a billion. Then as it goes along it hits another atom, and because we did not know the position better than one part in a billion, we find an even larger error in the position after the collision. And that is amplified, of course, in the next collision, so that if we start with only a tiny error it rapidly magnifies to a very great uncertainty. […] Speaking more precisely, given an arbitrary accuracy, no matter how precise, one can find a time long enough that we cannot make predictions valid for that long a time. That length of time is not very large. It is not that the time is millions of years if the accuracy is one part in a billion. The time goes only logarithmically with the error. In only a very, very tiny time – less than the time it took to state the accuracy – we lose all our information. It is therefore not fair to say that from the apparent freedom and indeterminacy of the human mind, we should have realized that classical ‘deterministic’ physics could not ever hope to understand, and to welcome quantum mechanics as a release from a completely ‘mechanistic’ universe. For already in classical mechanics, there was indeterminability from a practical point of view.” (Feynman, Lectures, 1963, p. 38-10)

That really says it all, I think. I’ll just continue to keep my head down – i.e. stay away from philosophy as for now – and try to find a way to open the toy inside the toy. 🙂

Light: relating waves to photons

Pre-scriptum (dated 26 June 2020): Some of the relevant illustrations in this post were removed as a result of an attack by the dark force. In any case, my ideas on the nature of light and photons have evolved considerably, so you should probably read my papers instead of these old blog posts.

Original post:

This is a concluding note on my ‘series’ on light. The ‘series’ gave you an overview of the ‘classical’ theory: light as an electromagnetic wave. It was very complete, including relativistic effects (see my previous post). I could have added more – there’s an equivalent for four-vectors, for example, when we’re dealing with frequencies and wave numbers: quantities that transform like space and time under the Lorentz transformations – but you got the essence.

One point we never ever touched upon, was that magnetic field vector though. It is there. It is tiny because of that 1/c factor, but it’s there. We wrote it as

magnetic field

All symbols in bold are vectors, of course. The force is another vector vector cross-product: F = qv×B, and you need to apply the usual right-hand screw rule to find the direction of the force. As it turns out, that force – as tiny as it is – is actually oriented in the direction of propagation, and it is what is responsible for the so-called radiation pressure.

So, yes, there is a ‘pushing momentum’. How strong is it? What power can it deliver? Can it indeed make space ships sail? Well… The magnitude of the unit vector er’ is obviously one, so it’s the values of the other vectors that we need to consider. If we substitute and average F, the thing we need to find is:

〈F〉 = q〈vE〉/c

But the charge q times the field is the electric force, and the force on the charge times the velocity is the work dW/dt being done on the charge. So that should equal the energy absorbed that is being absorbed from the light per second. Now, I didn’t look at that much. It’s actually one of the very few things I left – but I’ll refer you to Feynman’s Lectures if you want to find out more: there’s a fine section on light scattering, introducing the notion of the Thompson scattering cross section, but – as said – I think you had enough as for now. Just note that 〈F〉 = [dW/dt]/c and, hence, that the momentum that light delivers is equal to the energy that is absorbed (dW/dt) divided by c.

So the momentum carried is 1/c times the energy. Now, you may remember that Planck solved the ‘problem’ of black-body radiation – an anomaly that physicists couldn’t explain at the end of the 19th century – by re-introducing a corpuscular theory of light: he said light consisted of photons. We all know that photons are the kind of ‘particles’ that the Greek and medieval corpuscular theories of light envisaged but, well… They have a particle-like character – just as much as they have a wave-like character. They are actually neither, and they are physically and mathematically being described by these wave functions – which, in turn, are functions describing probability amplitudes. But I won’t entertain you with that here, because I’ve written about that in other posts. Let’s just go along with the ‘corpuscular’ theory of photons for a while.

Photons also have energy (which we’ll write as W instead of E, just to be consistent with the symbols above) and momentum (p), and Planck’s Law says how much:

W = hf and p = W/c

So that’s good: we find the same multiplier 1/c here for the momentum of a photon. In fact, this is more than just a coincidence of course: the “wave theory” of light and Planck’s “corpuscular theory” must of course link up, because they are both supposed to help us understand real-life phenomena.

There’s even more nice surprises. We spoke about polarized light, and we showed how the end of the electric field vector describes a circular or elliptical motion as the wave travels to space. It turns out that we can actually relate that to some kind of angular momentum of the wave (I won’t go into the details though – because I really think the previous posts have really been too heavy on equations and complicated mathematical arguments) and that we could also relate it to a model of photons carrying angular momentum, “like spinning rifle bullets” – as Feynman puts it.

However, he also adds: “But this ‘bullet’ picture is as incomplete as the ‘wave’ picture.” And so that’s true and that should be it. And it will be it. I will really end this ‘series’ now. It was quite a journey for me, as I am making my way through all of these complicated models and explanations of how things are supposed to work. But a fascinating one. And it sure gives me a much better feel for the ‘concepts’ that are hastily explained in all of these ‘popular’ books dealing with science and physics, hopefully preparing me better for what I should be doing, and that’s to read Penrose’s advanced mathematical theories.

Some content on this page was disabled on June 20, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Radiation and relativity

Pre-scriptum (dated 26 June 2020): Some of the relevant illustrations in this post were removed as a result of an attack by the dark force. Too bad, because I liked this post. In any case, despite the removal of the illustrations, you should be able to reconstruct the main story line.

Original post:

Now we are really going to do some very serious analysis: relativistic effects in radiation. Fasten your seat belts please.

The Doppler effect for physical waves traveling through a medium

In one of my post – I don’t remember which one – I wrote about the Doppler effect for sound waves, or any physical wave traveling through a medium. I also said it had nothing to do with relativity. What happens really is that the observer sort of catches up with wave or – the other way around: that he falls back – and, because the velocity of a wave in a medium is always the same (that also has nothing to do with relativity: it’s just a general principle that’s true – always), the frequency of the physical wave will appear to be different. [So that’s why a siren of an ambulance sounds different as it moves past you.] Wikipedia has an excellent article on that – so I’ll refer you to that – but so that article does not say anything – or nothing much – about the Doppler effect when electromagnetic radiation is involved. So that’s what we’ll talk about here. Before we do that, though, let me quickly jot down the formula for the Doppler effect for a physical wave traveling through a medium, so we are clear about the differences between the two ‘Doppler effects’:

Capture

In this formula, vp is the propagation speed of the wave in the medium which – as mentioned above – depends on the medium. Now the source and the receiver will have a velocity with respect to the medium as well (positive or negative – depending on whether they’re moving in the same direction as the wave or not), and so that’s vr (r for receiver) and v(s for source) respectively. So we’re adding speeds here to calculate some relative speed and then we take the ratio. Some people think that’s what relativity theory is all about. It’s not. Everything I’ve written so far is just Galilean relativity – so stuff that’s been known for a thousand years already, if not longer.

Relativity is weird: one aspect of it is length contraction – not often discussed – but the other thing is better known: there is no such thing as absolute time, so when we talk velocities, you need to specify according to whose time?

The Doppler effect for electromagnetic radiation

One thing that’s not relative – and which makes things look somewhat similar to what I wrote above – is that the speed of light is always equal to c. That was the startling fact that came out of Maxwell’s equations (startling because it is not consistent with Galilean relativity: it says that we cannot ‘catch up’ with a light wave!) around which Einstein build all of “new physics”, and so it’s something we use rather matter-of-factly in all that we’ll write below.

In all of the preceding posts about light, I wrote – more than once actually – that the movement of the oscillating charge (i.e. the source of the radiation) along the line of sight did not matter: the only thing that mattered was its acceleration in the xy-plane, which is perpendicular to our line of sight, which we’ll call the z-axis. Indeed, let me remind of you of the two equations defining electromagnetic radiation (see my post on light and radiation about a week ago):

Formula 3

Formula 4

The first formula gives the electromagnetic effect that dominates in the so-called wave zone, i.e. a few wavelengths away from the source – because the Coulomb force varies as the inverse of the square of the distance r, unlike this ‘radiation’ effect, which varies inversely as the distance only (E ∝ 1/r), so it falls off much less rapidly.

Now, that general observation still holds when we’re considering an oscillating charge that is moving towards or away from us with some relativistic speed (i.e. a speed getting close enough to c to produce relativistic length contraction and time dilation effects) but, because of the fact that we need to consider local times, our formula for the retarded time is no longer correct.

Huh? Yes. The matter is quite complicated, and Feynman starts with jotting down the derivatives for the displacement in the x- and y-directions, but I think I’ll skip that. I’ll go for the geometrical picture straight away, which is given below. As said, it’s going to be difficult, but try to hang in here, because it’s necessary to understand the Doppler effect in a correct way (I myself have been fooled by quite a few nonsensical explanations of it in popular books) and, as a bonus, you also get to understand synchrotron radiation and other exciting stuff.

Doppler effect with text

So what’s going on here? Well… Don’t look at the illustration above. We’ll come back at it. Let’s first build up the logic. We’ve got a charge moving vertically – from our point that is (the observer), but also moving in and out of us, i.e. in the z-direction. Indeed, note the arrow pointing to the observer: that’s us! So it could indeed be us looking at electrons going round and round and round – at phenomenal speeds – in a synchrotron indeed – but then one that’s been turned 90 degrees (probably easier for us to just lie down on the ground and look sideways). In any case, I hope you can imagine the situation. [If not, try again.] Now, if that charge was not moving at relativistic speeds (e.g. 0.94c, which is actually the number that Feynman uses for the graph above), then we would not have to worry about ‘our’ time t and the ‘local’ time τ. Huh? Local time? Yes.

We denote the time as measured in the reference frame of the moving charge as τ. Hence, as we are counting t = 1, 2, 3 etcetera, the electron is counting τ = 1, 2, 3 etcetera as it goes round and round and round. If the charge would not be moving at relativistic speeds, we’d do the standard thing and that’s to calculate the retarded acceleration of the charge a'(t) = a(t − r/c). [Remember that we used a prime to mark functions for which we should use a retarded argument and, yes, I know that the term ‘retarded’ sounds a bit funny, but that’s how it is. In any case, we’d have a'(t) = a(t − r/c) – so the prime vanishes as we put in the retarded argument.] Indeed, from the ‘law’ of radiation, we know that the field now and here is given by the acceleration of the charge at the retarded time, i.e. t – r/c. To sum it all up, we would, quite simply, relate t and τ as follows:

τ = t – r/c or, what amounts to the same, t = τ + r/c

So the effect that we see now, at time t, was produced at a distance r at time τ = t − r/c. That should make sense. [Again, if it doesn’t: read again. I can’t explain it in any other way.]

The crucial chain in this rather complicated chain of reasoning comes now. You’ll remember that one of the assumptions we used to derive our ‘law’ of radiation was that the assumption that “r is practically constant.” That does no longer hold. Indeed, that electron moving around in the synchrotron comes in and out at us at crazy speeds (0.94c is a bit more than 280,000 km per second), so r goes up and down too, and the relevant distance is not r but r + z(τ). This means the retardation effect is actually a bit larger: it’s not r/c but [r + z(τ)]/c = r/c + z(τ)/c. So we write:

τ = t – r/c – z(τ)/c or, what amounts to the same, t = τ + r/c + z(τ)/c

Hmm… You’ll say this is rather fishy business. Why would we use the actual distance and not the distance a while ago? Well… I’ll let you mull over that. We’ve got two points in space-time here and so they are separated both by distance as well as time. It makes sense to use the actual distance to calculate the actual separation in time, I’d say. If you’re not convinced, I can only refer you to those complicated derivations that Feynman briefly does before introducing this ‘easier’ geometric explanation. This brings us to the next point. We can measure time in seconds, but also in equivalent distance, i.e. light-seconds: the distance that light (remember: always at absolute speed c) travels in one second, i.e. approximately 299,792,458 meter. It’s just another ‘time unit’ to get used to.

Now that’s what’s being done above. To be sure, we get rid of the constant r/c, which is a constant indeed: that amounts to a shift of the origin by some constant (so we start counting earlier or later). In short, we have a new variable ‘t’ really that’s equal to t = τ + z(τ)/c. But we’re going to count time in meter (well – in units of c meter really), so we will just multiply this and we get:

ct = cτ + z(τ)

Why the different unit? Well… We’re talking relativistic speeds here, don’t we? And so the second is just an inappropriate unit. When we’re finished with this example, we’ll give you an even simpler example: a source just moving in on us, with an equally astronomical speed, so the functional shape of z(τ) will be some fraction of c (let’s say kc) times τ, so kcτ. So, to simplify things, just think of it as re-scaling the time axis in units that makes sense as compared to the speeds we are talking about.

Now we can finally analyze that graph on the right-hand side. If we would keep r fixed – so if we’d not care about the charge moving in and out of – the plot of x'(t) – i.e. the retarded position indeed – against ct would yield the sinusoidal graph plotted by the red numbers 1 to 13 here. In fact, instead of a sinusoidal graph, it resembles a normal distribution, but that’s just because we’re looking at one revolution only. In any case, the point to note is that – when everything is said and done – we need to calculate the retarded acceleration, so that’s the second derivative of x'(t). The animated illustration shows how that works: the second derivative (not the first) turns from positive to negative – and vice versa – at inflection points, when the curve goes from convex to concave, and vice versa. So, on the segment of that sinusoidal function marked by the red numbers, it’s positive at first (the slope of the tangent becomes steeper and steeper), then negative (cf. that tangent line turning from blue to green in the illustration below), and then it becomes positive again, as the negative slope becomes less negative at the second inflection point.

Animated_illustration_of_inflection_pointThat should be straightforward. However, the actual x'(t) curve is the black curve with the cusp. A curve like that is called a hypocycloid. Let me reproduce it once again for ease of reference.

Doppler effect with textWe relate x'(t) to ct (this is nothing but our new unit for t) by noting that ct = cτ + z(τ). Capito? It’s not easy to grasp: the instinct is to just equate t and τ and write x'(t) = x'(τ), but that would not be correct. No. We must measure in ‘our’ time and get a functional form for x’ as a function of t, not of τ. In fact, x'(t) − i.e. the retarded vertical position at time t – is not equal to x'(τ) but to x(τ), i.e. that’s the actual (instead of retarded) position at (local) time τ, and so that’s what the black graph above shows.

I admit it’s not easy. I myself am not getting it in an ‘intuitive’ way. But the logic is solid and leads us where it leads us. Perhaps it helps to think in terms of the curvature of this graph. In fact, we have to think in terms of curvature of this graph in order to understand what’s happening in terms of radiation. When the charge is moving away from us, i.e. during the first three ‘seconds’ (so that’s the 1-2-3 count), we see that the curvature is less than than what it would be – and also doesn’t change very much – if the displacement was given by the sinusoidal function, which means there’s very little radiation (because there’s little acceleration – negative or positive. However, as the charge moves towards us, we get that sharp cusp and, hence, we also get sharp curvature, which results in a sharp pulse of the electric field, rather than the regular – equally sinusoidal – amplitude we’d have if that electron was not moving in and out at us at relativistic speeds. In fact, that’s what synchrotron radiation is: we get these sharp pulses indeed. Feynman shows how they are measured – in very much detail – using a diffraction grating, but that would just be another diversion and so I’ll spare you of that.

Hmm… This has nothing to do with the Doppler effect, you’ll say. Well… Yes and no. The discussion above basically set the stage for that discussion. So let’s turn to that now. However, before I do that, I want to insert another graph for an oscillating charge moving in and out at us in some irregular way – rather than the nice circular route described above.

Doppler irregular

The Doppler effect

The illustration below is a similar diagram as the ones above – but looks much simpler. It shows what happens when an oscillating charge (which we assume to oscillate at its natural or resonant frequency ω0) moves towards us at some relativistic speed v (whatever speed – fairly close to c, so the ratio v/c is substantial). Note that the movement is from A to B – and that the observer (we!) are, once again, at the left – and, hence, the distance traveled is AB = vτ. So what’s the impact on the frequency? That’s shown on the x'(t) graph on the right: the curvature of the sinusoidal motion is much sharper, which means that its angular frequency as we see or measure it (and we’ll denote that by ω1) will be higher: if it’s a larger object emitting ‘white’ light (i.e. a mix of everything), then the light will not longer be ‘white’ but it will have shifted towards the violet spectrum. If it moves away from us, it will appear ‘more red’.

Doppler moving in

What’s the frequency change? Well… The z(τ) function is rather simple here: z(τ) = vτ. Let’s use f and ffor a moment, instead of the angular frequency ω and ω0, as we know they only differ by the factor 2π (ω = 2π/T = 2π·f, with f = 1/T, i.e. the reciprocal of the period). Hence, in a given time Δτ, the number of oscillations will be f0Δτ. These oscillations will be spread over a distance vΔτ, and the time needed to travel that distance is Δτ – of course! For the observer, however, the same number of oscillations now is compressed over a distance (c-v)Δτ. The time needed to travel that distance corresponds to a time interval Δt = (c − v)Δτ/c = (1 − v/c)Δτ. Now, hence, the frequency f will be equal to f0Δτ (the number of oscillations) divided by Δt = (1 − v/c)Δτ. Hence, we get this relatively simple equation:

f = f0/(1 − v/c) and ω = ω0/(1 − v/c)

 Is that it? It’s not quite the same as the formula we had for the Doppler effect of physical waves traveling through a medium, but it’s simple enough indeed. And it also seems to use relative speed. Where’s the Lorentz factor? Why did we need all that complicated machinery?

You are a smart ass ! You’re right. In fact, this is exactly the same formula: if we equal the speed of propagation with c, set the velocity of the receiver to zero, and substitute v (with a minus sign obviously) for the speed of the source, then we get what we get above:

The thing we need to add is that the natural frequency of an atomic oscillator is not the same as that measured when standing still: the time dilation effects kicks in. If w0  is the ‘true’ natural frequency (so measured locally, so to say), then the modified natural frequency – as corrected for the time dilation effect – will be w1 = w0(1 – v2/c2)1/2. Therefore, the grand final relativistic formula for the Doppler effect for electromagnetic radiation is:

Doppler - relativistic

You may feel cheated now: did you really have to suffer going through that story on the synchrotron radiation to get the formula above? I’d say: yes and no. No, because you could be happy with the Doppler formula alone. But, yes, because you don’t get the story about those sharp pulses just from the relativistic Doppler formula alone. So the final answer is: yes. I hope you felt it was worth the suffering 🙂

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Loose ends: on energy of radiation and polarized light

Pre-scriptum (dated 26 June 2020): Some of the relevant illustrations in this post were removed as a result of an attack by the dark force. Too bad, because I liked this post. In any case, despite the removal of the illustrations, you should be able to reconstruct the main story line.

Original post:

I said I would move on to another topic, but let me wrap up some loose ends in this post. It will say a few things about the energy of a field; then it will analyze these electron oscillators in some more detail; and, finally, I’ll say a few words about polarized light.

The energy of a field

You may or may not remember, from our discussions on oscillators and energy, that the total energy in a linear oscillator is a constant sum of two variables: the kinetic energy mv2/2 and the potential energy (i.e. the energy stored in the spring as it expands and contracts) kx2/2 (remember that the force is -kx). So the kinetic energy is proportional to the square of the velocity, and the potential energy to the square of the displacement. Now, from the general solution that we had obtained for a linear oscillator – damped or not – we know that the displacement x, its velocity dx/dt, and even its acceleration are all proportional to the magnitude of the field – with different factors of proportionality of course. Indeed, we have x = qeE0eiωt/m(ω02–ω2), and so every time we take a derivative, we’ll be bring a iω factor down (and so we’ll have another factor of proportionality), but the E0 factor is still the same, and a factor of proportionality multiplied with some constant is still a factor of proportionality. Hence, the energy should be proportional to the square of the amplitude of the motion E0. What more can we say about it?

The first thing to note is that, for a field emanating from a point source, the magnitude of the field vector E will vary inversely with r. That’s clear from our formula for radiation:

Formula 5

Hence, the energy that the source can deliver will vary inversely as the square of the distance. That implies that the energy we can take out of a wave, within a given conical angle, will always be the same, not matter how far away we are. What we have is an energy flux spreading over a greater and greater effective area. That’s what’s illustrated below: the energy flowing within the cone OABCD is independent of the distance r at which it is measured.

Energy cone

However, these considerations do not answer the question: what is that factor of proportionality? What’s its value? What does it depend on?

We know that our formula for radiation is an approximate formula, but it’s accurate for what is called the “wave zone”, i.e. for all of space as soon as we are more than a few wavelengths away from the source. Likewise, Feynman derives an approximate formula only for the energy carried by a wave using the same framework that was used to derive the dispersion relation. It’s a bit boring – and you may just want to go to the final result – but, well… It’s kind of illustrative of how physics analyzes physical situations and derives approximate formulas to explain them.

Let’s look at that framework again: we had a wave coming in, and then a wave being transmitted. In-between, the plate absorbed some of the energy, i.e. there was some damping. The situation is shown below, and the exact formulas were derived in the previous post.

radiation and transparent sheet

Now, we can write the following energy equation for a unit area:

Energy in per second = energy out per second + work done per second

That’s simple, you’ll say. Yes, but let’s see where we get with this. For the energy that’s going in (per second), we can write that as α〈Es2〉, so that’s the averaged square of the amplitude of the electric field emanating from the source multiplied by a factor α. What factor α? Well… That’s exactly what we’re trying to find out: be patient.

For the energy that’s going out per second, we have α〈Es2 + Ea2〉. Why the same α? Well… The transmitted wave is traveling through the same medium as the incoming wave (air, most likely), so it should be the same factor of proportionality. Now, α〈Es2 + Ea2〉 = α[〈Es2〉 + 2〈Es〉〈Ea〉 + 〈Ea2〉]. However, we know that we’re looking at a very thin plate here only, and so the amplitude Ea must be small as compared to Ea. So we can leave its averaged square 〈Ea2〉 value out. Indeed, as mentioned above, we’re looking at an approximation here: any term that’s proportional with NΔz, we’ll leave in (and so we’ll leave 〈Es〉〈Ea〉 in), but terms that are proportional to (NΔz)2 or a higher power can be left out. [That’s, in fact, also the reason why we don’t bother to analyze the reflected wave.]

So we now have the last term: the work done per second in the plate. Work done is force times distance, and so the work done per second (i.e. the power being delivered) is the force times the velocity. [In fact, we should do a dot product but the force and the velocity point are along the same direction – except for a possible minus sign – and so that’s alright.] So, for each electron oscillator, the work done per second will be 〈qeEsv〉 and, hence, for a unit area, we’ll have NΔzqe〈Esv〉. So our energy equation becomes:

α〈Es2〉 = α〈Es2〉 + 2α〈Es〉〈Ea〉 + NΔzqe〈Esv〉

⇔ –2α〈Es〉〈Ea〉 = NΔzqe〈Esv〉

Now, we had a formula for Ea (we didn’t do the derivation of this one though: just accept it):

Formula 8

We can substitute this in the energy equation, noting that the average of Ea is not dependent from time. So the left-hand side of our energy equation becomes:

 

Formula 9

However, Es(at z) is Es(at atoms) retarded by z/c, so we can insert the same argument. But then, now that we’ve made sure that we got the same argument for Es and v, we know that such average is independent of time and, hence, it will be equal to the 〈Esv〉 factor on the right-hand side of our energy equation, which means this factor can be scrapped. The NΔzqe (and that 2 in the numerator and denominator) can be scrapped as well, of course. We then get the remarkably simple result that

α = ε0c

Hence, the energy carried in an electric wave per unit area and per unit time, which is also referred to as the intensity of the wave, equals:

〈S〉 = ε0c〈E〉

The rate of radiation of energy

Plugging our formula for radiation above into this formula, we get an expression for the power per square meter radiated in the direction q:

Formula 10

In this formula, a’ is, of course, the retarded acceleration, i.e. the value of a at point t – r/c. The formula makes it clear that the power varies inversely as the square of the distance, as it should, from what we wrote above. I’ll spare you the derivation (you’ve had enough of these derivations,  I am sure), but we can use this formula to calculate the total energy radiated in all directions, by integrating the formula over all directions. We get the following general formula:

Formula 10-5

This formula is no longer dependent on the distance r – which is also in line with what we said above: in a given cone, the energy flux is the same. In this case, the ‘cone’ is actually a sphere around the oscillating charge, as illustrated below.

Power out of a sphere

Now, we usually assume we have a nice sinusoidal function for the displacement of the charge and, hence, for the acceleration, so we’ll often assume that the acceleration a equals a = –ω2x0et. In that case, we can average over a cycle (note that the average of a cosine is one-half) and we get:

Formula 11

Now, historically, physicists used a value written as e2, not to be confused with the transcendental number e, equal to e2 = qe2/4πe0, which – when inserted above – yields the older form of the formula above:

P = 2e2a2/3c3

In fact, we actually worked with that e2 factor already, when we were talking about potential energy and calculated the potential energy between a proton and an electron at distance r: that potential energy was equal to e2/r but that was a while ago indeed – and so you’ll probably not remember.

Atomic oscillators

Now, I can imagine you’ve had enough of all these formulas. So let me conclude by giving some actual numbers and values for things. Let’s look at these atomic oscillators and put some values in indeed. Let’s start with calculating the Q of an atomic oscillator.

You’ll remember what the Q of an oscillator is: it is a measure of the ‘quality’ (that’s what the Q stands for really) of a particular oscillator. A high Q implies that, if we ‘hit’ the oscillator, it will ‘ring’ for many cycles, so its decay time will be quite long. It also means that the peak width of its ‘frequency response’ will be quite tall. Huh? The illustrations below will refresh your memory.

The first one (below) gives a very general form for a typical resonance: we have a fixed frequency f0 (which defines the period T, and vice versa), and so this oscillator ‘rings’ indeed, and slowly dies out. An associated concept is the decay time (τ) of an oscillation: that’s the time it takes for the amplitude of the oscillation to fall by a factor 1/e = 1/2.7182… ≈ 36.8% of the original value.

decay time

The second illustration (below) gives the frequency response curve. That assumes there is a continuous driving force, and we know that the oscillator will react to that driving force by oscillating – after an initial transient – at the same frequency driving force, but its amplitude will be determined by (i) the difference between the frequency of the driving force and the oscillator’s natural frequency (f0) as well as (ii) the damping factor. We will not prove it here, but the ‘peak height’ is equal to the low-frequency response (C) multiplied by the Q of the system, and the peak width is f0 divided by Q.

frequency response

But what is the Q for an atomic oscillator? Well… The Q of any system is the total energy content of the oscillator and the work done (or the energy loss) per radian. [If we define it per cycle, then we need to throw an additional 2π factor in – that’s just how the Q has been defined !] So we write:

Q = W/(dW/dΦ)

Now, dW/dΦ = (dW/dt)/(dΦ/dt) = (dW/dt)/ω, so Q = ωW/(dW/dt), which can be re-written as the first-order differential equation dW/dt = -(ω/Q)W. Now, that equation has the general solution

W = W0eωt/Q, with W0 the initial energy.

Using our energy equation – and assuming that our atomic oscillators are radiating at some natural (angular) frequency ω0, which we’ll relate to the wavelength λ = 2πc/ω0 – we can calculate the Q. But what do we use for W0? Well… The kinetic energy of the oscillator is mv2/2. Assuming the displacement x has that nice sinusoidal shape, we get mω2x02/4 for the mean kinetic energy, which we have to double to get the total energy (remember that, on average, the total energy of an oscillator is half kinetic, and half potential), so then we get W = mω2x02/2. Using me (the electron mass) for m, we can then plug it all in, divide and cancel what we need to divide and cancel, and we get the grand result:

 Q = Q = ωW/(dW/dt) = 3λmec2/4πe2 or 1/Q =  4πe2/3λmec2

The second form is preferred because it allows substituting e2/mec2 for yet another ‘historical’ constant, referred to as the classical electron radius r0 = e2/mec2 = 2.82×10–15 m. However, that’s yet another diversion, and I’ll try to spare you here. Indeed, we’re almost done so let’s sprint to the finish.

So all we need now is a value for λ. Well… Let’s just take one: a sodium atom emits light with a wavelength of approximately 600 nanometer. Yes, that’s the yellow-orange light emitted by low-pressure sodium-vapor lamps used for street lighting. So that’s a typical wavelength and we get a Q equal to

Q = 3λ/4πr0 ≈ 5×107.

So what? Well… This is great ! We can finally calculate things like the decay time now – for our atomic oscillators ! Now, there is a formula for the decay time: τ = 2Q/ω. This is a formula we can also write in terms of the wavelength λ because ω and λ are related through the speed of light: ω = 2πf = 2πc/λ. So we can write τ = Qλ/πc. In this case, we get τ ≈ 3.2×10–8 seconds (but please do check my calculation). It seems that that corresponds to experimental fact: light, as emitted by all these atomic oscillators, basically consists of very sharp pulses: one atom emits a pulse, and then another one takes over, etcetera. That’s why light is usually unpolarized – I’ll talk about that in a minute.

In addition, we can calculate the peak width Δf = f0/Q. In fact, we’ll not use frequency but wavelength: Δλ = λ/Q = 1.2×10–14. This also seems to correspond with the width of the so-called spectral lines of light-emitting sodium atoms.

Isn’t this great? With a few simple formulas, we’ve illustrated the strange world of atomic oscillators and electromagnetic radiation. I’ve covered an awful lot of ground here, I feel.

There is one more “loose end” which I’ll quickly throw in here. It’s the topic of polarization – as promised – and then we’re done really. I promise. 🙂

Polarization

One of the properties of the ‘law’ of radiation as derived by Feynman is that the direction of the electric field is perpendicular to the line of sight. That’s – quite simply – because it’s only the component ax perpendicular to the line of sight that’s important. So if we have a source – i.e. an accelerating electric charge – moving in and out straight at us, we will not get a signal.

That being said, while the field is perpendicular to the line of sight – which we identify with the z-axis – the field still can have two components and, in fact, it is likely to have two components: an x- and a y-component. We show a beam with such x- and y-component below (so that beam ‘vibrates’ not only up and down but also sideways), and we assume it hits an atom – i.e. an electron oscillator – which, in turn, emits another beam. As you can see from the illustration, the light scattered at right angles to the incident beam will only ‘vibrate’ up and down: not sideways. We call such light ‘polarized’. The physical explanation is quite obvious from the illustration below: the motion of the electron oscillator is perpendicular to the z-direction only and, therefore, any radiation measured from a direction that’s perpendicular to that z-axis must be ‘plane polarized’ indeed.

Light can be polarized in various ways. In fact, if we have a ‘regular’ wave, it will always be polarized. With ‘regular’, we mean that both the vibration in the x- and y-direction will be sinusoidal: the phase may or may not be the same, that doesn’t matter. But both vibrations need to be sinusoidal. In that case, there are two broad possibilities: either the oscillations are ‘in phase’, or they are not. When the x- and y-vibrations are in phase, then the superposition of their amplitudes will look like the examples below. You should imagine here that you are looking at the end of the electric field vector, and so the electric field oscillates on a straight line.

Polarization in phase

When they are in phase, it means that the frequency of oscillation is the same. Now, that may not be the case, as shown in the examples below. However, even these ‘out of phase’ x- and y-vibrations produce a nice ellipsoidal motion and, hence, such beams are referred to as being ‘elliptically polarized’.

Polarization out of phase

So what’s unpolarized light then? Well… That’s light that’s – quite simply – not polarized. So it’s irregular. Most light is unpolarized because it was emitted by electron oscillators. From what I explained above, you now know that such electron oscillators emit light during a fraction of a second only – the window is of the order of 10-–8 seconds only actually – so that’s very short indeed (a hundred millionth of a second!). It’s a sharp little pulse basically, quickly followed by another pulse as another atom takes over, and then another and so on. So the light that’s being emitted cannot have a steady phase for more than 10-8 seconds. In that sense, such light will be ‘out of phase’.

In fact, that’s why two light sources don’t interfere. Indeed, we’ve been talking about interference effects all of the time but you may have noticed 🙂 that – in daily life – the combined intensity of light from two sources is just the sum of the intensities of the two lights: we don’t see interference. So there you are. [Now you will, of course, wonder why physics studies phenomena we don’t observe in daily life – but that’s an entirely different matter, and you would actually not be reading this post if you thought that.]

Now, with polarization, we can explain a number of things that we couldn’t explain before. One of them is birefringence: a material may have a different index of refraction depending on whether the light is linearly polarized in one direction rather than another, which explains why the amusing property of Iceland spar, a crystal that doubles the image of anything seen through it. But we won’t play with that here. You can look that up yourself.

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Refraction and Dispersion of Light

Pre-scriptum (dated 26 June 2020): Some of the relevant illustrations in this post were removed as a result of an attack by the dark force. Too bad, because I liked this post. In any case, despite the removal of the illustrations, I think you will still be able to reconstruct the main story line.

Original post:

In this post, we go right at the heart of classical physics. It’s going to be a very long post – and a very difficult one – but it will really give you a good ‘feel’ of what classical physics is all about. To understand classical physics – in order to compare it, later, with quantum mechanics – it’s essential, indeed, to try to follow the math in order to get a good feel for what ‘fields’ and ‘charges’ and ‘atomic oscillators’ actually represent.

As for the topic of this post itself, we’re going to look at refraction again: light gets dispersed as it travels from one medium to another, as illustrated below.

Prism_rainbow_schema

Dispersion literally means “distribution over a wide area”, and so that’s what happens as the light travels through the prism: the various frequencies (i.e. the various colors that make up natural ‘white’ light) are being separated out over slightly different angles. In physics jargon, we say that the index of refraction depends on the frequency of the wave – but so we could also say that the breaking angle depends on the color. But that sounds less scientific, of course. In any case, it’s good to get the terminology right. Generally speaking, the term refraction (as opposed to dispersion) is used to refer to the bending (or ‘breaking’) of light of a specific frequency only, i.e. monochromatic light, as shown in the photograph below. […] OK. We’re all set now.

Refraction_photo

It is interesting to note that the photograph above shows how the monochromatic light is actually being obtained: if you look carefully, you’ll see two secondary beams on the left-hand side (with an intensity that is much less than the central beam – barely visible in fact). That suggests that the original light source was sent through a diffraction grating designed to filter only one frequency out of the original light beam. That beam is then sent through a bloc of transparent material (plastic in this case) and comes out again, but displaced parallel to itself. So the block of plastics ‘offsets’ the beam. So how do we explain that in classical physics?

The index of refraction and the dispersion equation

As I mentioned in my previous post, the Greeks had already found out, experimentally, what the index of refraction was. To be more precise, they had measured the θ1 and θ2 – depicted below – for light going from air to water. For example, if the angle in air (θ1) is 20°, then the angle in the water (θ2) will be 15°. It the angle in air is 70°, then the angle in the water will be 45°.

Refraction_at_interface

Of course, it should be noted that a lot of the light will also be reflected from the water surface (yes, imagine the romance of the image of the moon reflected on the surface of glacial lake while you’re feeling damn cold) – but so that’s a phenomenon which is better  explained by introducing probability amplitudes, and looking at light as a bundle of photons, which we will not do here. I did that in previous posts, and so here, we will just acknowledge that there is a reflected beam but not say anything about it.

In any case, we should go step by step, and I am not doing that right now. Let’s first define the index of refraction. It is a number n which relates the angles above through the following relationship, which is referred to as Snell’s Law:

sinθ1 = n sinθ2

Using the numbers given above, we get: sin(20°) = n sin(15°), and sin(70°) = n sin(45°), so n must be equal to n = sin(20°)/sin(15°)  = sin(70°)/sin(45°) ≈ 1.33. Just for the record, Willibrord Snell was a medieval Dutch astronomer but, according to Wikipedia, some smart Persian, Ibn Sahl, had already jotted this down in a treatise – “On Burning Mirrors and Lenses” – while he was serving the Abbasid court of Baghdad, back in 984, i.e. more than a thousand years ago! What to say? It was obviously a time when the Sunni-Shia divide did not matter, and Arabs and ‘Persians’ were leading civilization. I guess I should just salute the Islamic Golden Age here, regret the time lost during Europe’s Dark Ages and, most importantly, regret where Baghdad is right now ! And, as for the ‘burning’ adjective, it just refers to the fact that large convex lenses can concentrate the sun’s rays to a very small area indeed, thereby causing ignition. [It seems that story about Archimedes burning Roman ships with a ‘death ray’ using mirrors – in all likelihood: something that did not happen – fascinated them as well.]

But let’s get back at it. Where were we? Oh – yes – the refraction index. It’s (usually) a positive number written as n = 1 + some other number which may be positive or negative, and which depends on the properties of the material. To be more specific, it depends on the resonant frequencies of the atoms (or, to be precise, I should say: the resonant frequencies of the electrons bound by the atom, because it’s the charges that generate the radiation). Plus a whole bunch of natural constants that we have encountered already, most of which are related to electrons. Let me jot down the formula – and please don’t be scared away now (you can stop a bit later, but not now 🙂 please):

Formula 1

N is just the number of charges (electrons) per unit volume of the material (e.g. the water, or that block of plastic), and qe and m are just the charge and mass of the electron. And then you have that electric constant once again, ε0, and… Well, that’s it ! That’s not too terrible, is it? So the only variables on the right-hand side are ω0 and ω, so that’s (i) the resonant frequency of the material (or the atoms – well, the electrons bound to the nucleus, to be precise, but then you know what I mean and so I hope you’ll allow me to use somewhat less precise language from time to time) and (ii) the frequency of the incoming light.

The equation above is referred to as the dispersion relation. It’s easy to see why: it relates the frequency of the incoming light to the index of refraction which, in turn, determinates that angle θ. So the formula does indeed determine how light gets dispersed, as a function of the frequencies in it, by some medium indeed (glass, air, water,…).

So the objective of this post is to show how we can derive that dispersion relation using classical physics only. As usual, I’ll follow Feynman – arguably the best physics teacher ever. 🙂 Let me warn you though: it is not a simple thing to do. However, as mentioned above, it goes to the heart of the “classical world view” in physics and so I do think it’s worth the trouble. Before we get going, however, let’s look at the properties of that formula above, and relate it some experimental facts, in order to make sure we more or less understand what it is that we are trying to understand. 🙂

First, we should note that the index of refraction has nothing to do with transparency. In fact, throughout this post, we’ll assume that we’re looking at very transparent materials only, i.e. materials that do not absorb the electromagnetic radiation that tries to go through them, or only absorb it a tiny little bit. In reality, we will have, of course, some – or, in the case of opaque (i.e. non-transparent) materials, a lot – of absorption going on, but so we will deal with that later. So, let me repeat: the index of refraction has nothing to do with transparency. A material can have a (very) high index of refraction but be fully transparent. In fact, diamond is a case in point: it has one of the highest indexes of refraction (2.42) of any material that’s naturally available, but it’s – obviously – perfectly transparent. [In case you’re interested in jewellery, the refraction index of its most popular substitute, cubic zirconia, comes very close (2.15-2.18) and, moreover, zirconia actually works better as a prism, so its disperses light better than diamond, which is why it reflects more colors. Hence, real diamond actually sparkles less than zirconia! So don’t be fooled! :-)]

Second, it’s obvious that the index of refraction depends on two variables indeed: the natural, or resonant frequency, ω0, and the frequency ω, which is the frequency of the incoming light. For most of the ordinary gases, including those that make up air (i.e. nitrogen (78%) and oxygen (21%), plus some vapor (averaging 1%) and the so-called noble gas argon (0.93%) – noble because, just like helium and neon, it’s colorless, odorless and doesn’t react easily), the natural frequencies of the electron oscillators are close to the frequency of ultraviolet light. [The greenhouse gases are a different story – which is why we’re in trouble on this planet. Anyway…] So that’s why air absorbs most of the UV, especially the cancer-causing ultraviolet-C light (UVC), which is formally classified as a carcinogen by the World Health Organization. The wavelength of UVC light is 100 to 300 nanometer – as opposed to visible light, which has a wavelength ranging from 400 to 700 nm – and, hence, the frequency of UV light is in the 1000 to 3000 Teraherz range (1 THz = 1012 oscillations per second) – as opposed to visible light, which has a frequency in the range of 400 to 800 THz. So, because we’re squaring those frequencies in the formula, ω2 can then be disregarded in comparison with ω02: for example, 15002 = 2,250,000 and that’s not very different from 15002 – 5002 = 2,000,000. Hence, if we leave the ω2 out, we are still dividing by a very large number. That’s why n is very close to one for visible light entering the atmosphere from space (i.e. the vacuum). Its value is, in fact, around 1.000292 for incoming light with a wavelength of 589.3 nm (the odd value is the mean of so-called sodium D light, a pretty common yellow-orange light (street lights!), so that’s why it’s used as a reference value – however, don’t worry about it).

That being said, while the n of air is close to one for all visible light, the index is still slightly higher for blue light as compared to red light, and that’s why the sky is blue, except in the morning and evening, when it’s reddish. Indeed, the illustration below is a bit silly, but it gives you the idea. [I took this from http://mathdept.ucr.edu/ so I’ll refer you to that for the full narrative on that. :-)]

blue_sky

Where are we in this story? Oh… Yes. Two frequencies. So we should also note that – because we have two frequency variables – it also makes sense to talk about, for instance, the index of refraction of graphite (i.e. carbon in its most natural occurrence, like in coal) for x-rays. Indeed, coal is definitely not transparent to visible light (that has to do with the absorption phenomenon, which we’ll discuss later) but it is very ‘transparent’ to x-rays. Hence, we can talk about how graphite bends x-rays, for example. In fact, the frequency of x-rays is much higher than the natural frequency of the carbon atoms and, hence, in this case we can neglect the w02 factor, so we get a denominator that is negative (because only the -w2 remains relevant), so we get a refraction index that is (a bit) smaller than 1. [Of course, our body is transparent to x-rays too – to a large extent – but in different degrees, and that’s why we can take x-ray photographs of, for example, a broken rib or leg.]

OK. […] So that’s just to note that we can have a refraction index that is smaller than one and that’s not ‘anomalous’ – even if that’s a historical term that has survived.

Finally, last but not least as they say, you may have heard that scientists and engineers have managed to construct so-called negative index metamaterials. That matter is (much) more complicated than you might think, however, and so I’ll refer you to the Web if you want to find out more about that.

Light going through a glass plate: the classical idea

OK. We’re now ready to crack the nut. We’ll closely follow my ‘Great Teacher’ Feynman (Lectures, Vol. I-31) as he derives that formula above. Let me warn you again: the narrative below is quite complicated, but really worth the trouble – I think. The key to it all is the illustration below. The idea is that we have some electromagnetic radiation emanating from a far-away source hitting a glass plate – or whatever other transparent material. [Of course, nothing is to scale here: it’s just to make sure you get the theoretical set-up.]

radiation and transparent sheet

So, as I explained in my previous post, the source creates an oscillating electromagnetic field which will shake the electrons up and down in the glass plate, and then these shaking electrons will generate their own waves. So we look at the glass as an assembly of little “optical-frequency radio stations” indeed, that are all driven with a given phase. It creates two new waves: one reflecting back, and one modifying the original field.

Let’s be more precise. What do we have here? First, we have the field that’s generated by the source, which is denoted by Es above. Then we have the “reflected” wave (or field – not much difference in practice), so that’s Eb. As mentioned above, this is the classical theory, not the quantum-electrodynamical one, so we won’t say anything about this reflection really: just note that the classical theory acknowledges that some of the light is effectively being reflected.

OK. Now we go to the other side of the glass. What do we expect to see there? If we would not have the glass plate in-between, we’d have the same Es field obviously, but so we don’t: there is a glass plate. 🙂 Hence, the “transmitted” wave, or the field that’s arriving at point P let’s say, will be different than Es. Feynman writes it as Es + Ea.

Hmm… OK. So what can we say about that? Not easy…

The index of refraction and the apparent speed of light in a medium

Snell’s Law – or Ibn Sahl’s Law – was re-formulated, by a 17th century French lawyer with an interesting in math and physics, Pierre de Fermat, as the Principle of Least Time. It is a way of looking at things really – but it’s very confusing actually. Fermat assumed that light traveling through a medium (water or glass, for instance) would travel slower, by a certain factor n, which – indeed – turns out to be the index of refraction. But let’s not run before we can walk. The Principle is illustrated below. If light has to travel from point S (the source) to point D (the detector), then the fastest way is not the straight line from S to D, but the broken S-L-D line. Now, I won’t go into the geometry of this but, with a bit of trial and error, you can verify for yourself that it turns out that the factor n will indeed be the same factor n as the one which was ‘discovered’ by Ibn Sahl: sinθ1 = n sinθ2.

Least time principle

What we have then, is that the apparent speed of the wave in the glass plate that we’re considering here will be equal to v = c/n. The apparent speed? So does that mean it is not the real speed? Hmm… That’s actually the crux of the matter. The answer is: yes and no. What? An ambiguous answer in physics? Yes. It’s ambiguous indeed. What’s the speed of a wave? We mentioned above that n could be smaller than one. Hence, in that case, we’d have a wave traveling faster than the speed of light. How can we make sense of that?

We can make sense of that by noting that the wave crests or nodes may be traveling faster than c, but that the wave itself – as a signal – cannot travel faster than light. It’s related to what we said about the difference between the group and phase velocity of a wave. The phase velocity – i.e. the nodes, which are mathematical points only – can travel faster than light, but the signal as such, i.e. the wave envelope in the illustration below, cannot.

Wave_group (1)

What is happening really is the following. A wave will hit one of these electron oscillators and start a so-called transient, i.e. a temporary response preceding the ‘steady state’ solution (which is not steady but dynamic – confusing language once again – so sorry!). So the transient settles down after a while and then we have an equilibrium (or steady state) oscillation which is likely to be out of phase with the driving field. That’s because there is damping: the electron oscillators resist before they go along with the driving force (and they continue to put up resistance, so the oscillation will die out when the driving force stops!). The illustration below shows how it works for the various cases:

delay and advance of phase

In case (b), the phase of the transmitted wave will appear to be delayed, which results in the wave appearing to travel slower, because the distance between the wave crests, i.e. the wavelength λ, is being shortened. In case (c), it’s the other way around: the phase appears to be advanced, which translated into a bigger distance between wave crests, or a lengthening of the wavelength, which translates into an apparent higher speed of the transmitted wave.

So here we just have a mathematical relationship between the (apparent) speed of a wave and its wavelength. The wavelength is the (apparent) speed of the wave (that’s the speed with which the nodes of the wave travel through space, or the phase velocity) divided by the frequency: λ = vp/f. However, from the illustration above, it is obvious that the signal, i.e. the start of the wave, is not earlier – or later – for either wave (b) and (c). In fact, the start of the wave, in time, is exactly the same for all three cases. Hence, the electromagnetic signal travels at the same speed c, always.

While this may seem obvious, it’s quite confusing, and therefore I’ll insert one more illustration below. What happens when the various wave fronts of the traveling field hit the glass plate (coming from the top-left hand corner), let’s say at time t = t0, as shown below, is that the wave crests will have the same spacing along the surface. That’s obvious because we have a regular wave with a fixed frequency and, hence, a fixed wavelength λ0, here. Now, these wave crests must also travel together as the wave continues its journey through the glass, which is what is shown by the red and green arrows below: they indicate where the wave crest is after one and two periods (T and 2T) respectively.

Wave crest and frequency

To understand what’s going on, you should note that the frequency f of the wave that is going through the glass sheet and, hence, its period T, has not changed. Indeed, the driven oscillation, which was illustrated for the two possible cases above (n > 1 and n < 1), after the transient has settled down, has the same frequency (f) as the driving source. It must. Always. That being said, the driven oscillation does have that phase delay (remember: we’re in the (b) case here, but we can make a similar analysis for the (c) case). In practice, that means that the (shortest) distance between the crests of the wave fronts at time t = t0 and the crests at time t0 + T will be smaller. Now, the (shortest) distance between the crests of a wave is, obviously, the wavelength divided by the frequency: λ = vp/f, with vp the speed of propagation, i.e. the phase velocity, of the wave, and f = 1/T. [The frequency f is the reciprocal of the period T – always. When studying physics, I found out it’s useful to keep track of a few relationships that hold always, and so this is one of them. :-)]

Now, the frequency is the same, but so the wavelength is shortened as the wave travels through the various layers of electron oscillators, each causing a delay of phase – and, hence, a shortening of the wavelength, as shown above. But, if f is the same, and the wavelength is shorter, then vp cannot be equal to the speed of the incoming light, so vp ≠ c. The apparent speed of the wave traveling through the glass, and the associated shortening of the wavelength, can be calculated using Snell’s Law. Indeed, knowing that n ≈ 1.33, we can calculate the apparent speed of light through the glass as v = c/n  ≈ 0.75c and, therefore, we can calculate the wavelength of the wave in the glass l as λ = 0.75λ0.

OK. I’ve been way too lengthy here. Let’s sum it all up:

  • The field in the glass sheet must have the shape that’s depicted above: there is no other way. So that means the direction of ‘propagation’ has been changed. As mentioned above, however, the direction of propagation is a ‘mathematical’ property of the field: it’s not the speed of the ‘signal’.
  • Because the direction of propagation is normal to the wave front, it implies that the bending of light rays comes about because the effective speed of the waves is different in the various materials or, to be even more precise, because the electron oscillators cause a delay of phase.
  • While the speed and direction of propagation of the wave, i.e. the phase velocity, accurately describes the behavior of the field, it is not the speed with which the signal is traveling (see above). That is why it can be larger or smaller than c, and so it should not raise any eyebrow. For x-rays in particular, we have a refractive index smaller than one. [It’s only slightly less than one, though, and, hence, x-ray images still have a very good resolution. So don’t worry about your doctor getting a bad image of your broken leg. 🙂 In case you want to know more about this: just Google x-ray optics, and you’ll find loads of information. :-)]

Calculating the field

Are you still there? Probably not. If you are, I am afraid you won’t be there ten or twenty minutes from now. Indeed, you ain’t done nothing yet. All of the above was just setting the stage: we’re now ready for the pièce de résistance, as they say in French. We’re back at that illustration of the glass plate and the various fields in front and behind the plate. So we have electron oscillators in the glass plate. Indeed, as Feynman notes: “As far as problems involving light are concerned, the electrons behave as though they were held by springs. So we shall suppose that the electrons have a linear restoring force which, together with their mass m, makes them behave like little oscillators, with a resonant frequency ω0.”

So here we go:

1. From everything I wrote about oscillators in previous posts, you should remember that the equation for this motion can be written as m[d2x/dt2 + ω02) = F. That’s just Newton’s Law. Now, the driving force F comes from the electric field and will be equal to F = qeEs.

Now, we assume that we can chose the origin of time (i.e. the moment from which we start counting) such that the field Es = E0cos(ωt). To make calculations easier, we look at this as the real part of a complex function Es = E0eiωt. So we get:

m[d2x/dt2 + ω02] = qeE0eiωt

We’ve solved this before: its solution is x = x0eiωt. We can just substitute this in the equation above to find x0 (just substitute and take the first- and then second-order derivative of x indeed): x0 = qeE0/m(ω022). That, then, gives us the first piece in this lengthy derivation:

x = qeE0eiωt/m(ω02 2)

Just to make sure you understand what we’re doing: this piece gives us the motion of the electrons in the plate. That’s all.

2. Now, we need an equation for the field produced by a plane of oscillating charges, because that’s what we’ve got here: a plate or a plane of oscillating charges. That’s a complicated derivation in its own, which I won’t do there. I’ll just refer to another chapter of Feynman’s Lectures (Vol. I-30-7) and give you the solution for it (if I wouldn’t do that, this post would be even longer than it already is):

Formula 2

This formula introduces just one new variable, η, which is the number of charges per unit area of the plate (as opposed to N, which was the number of charges per unit volume in the plate), so that’s quite straightforward. Less straightforward is the formula itself: this formula says that the magnitude of the field is proportional to the velocity of the charges at time t – z/c, with z the shortest distance from P to the plane of charges. That’s a bit odd, actually, but so that’s the way it comes out: “a rather simple formula”, as Feynman puts it.

In any case, let’s use it. Differentiating x to get the velocity of the charges, and plugging it into the formula above yields:

Formula 3

Note that this is only Ea, the additional field generated by the oscillating charges in the glass plate. To get the total electric field at P, we still have to add Es, i.e. the field generated by the source itself. This may seem odd, because you may think that the glass plate sort of ‘shields’ the original field but, no, as Feynman puts it: “The total electric field in any physical circumstance is the sum of the fields from all the charges in the universe.”

3. As mentioned above, z is the distance from P to the plate. Let’s look at the set-up here once again. The transmitted wave, or Eafter the plate as we shall note it, consists of two components: Es and Ea. Es here will be equal to (the real part of) Es = E0eiω(t-z/c). Why t – z/c instead of just t? Well… We’re looking at Es here as measured in P, not at Es at the glass plate itself.

radiation and transparent sheet

Now, we know that the wave ‘travels slower’ through the glass plate (in the sense that its phase velocity is less, as should be clear from the rather lengthy explanation on phase delay above, or – if n would be greater than one – a phase advance). So if the glass plate is of thickness Δz, and the phase velocity is is v = c/n, then the time it will take to travel through the glass plate will be Δz/(c/n) instead of Δz/c (speed is distance divided by time and, hence, time = distance divided by speed). So the additional time that is needed is Δt = Δz/(c/n) – Δz/c = nΔz/c – Δz/c = (n-1)Δz/c. That, then, implies that Eafter the plate is equal to a rather monstrously looking expression:

Eafter plate = E0eiω[t (n1)Δz/c z/c) = eiω(n1)Δz/c)E0eiω(t z/c)

We get this by just substituting t for t – Δt.

So what? Well… We have a product of two complex numbers here and so we know that this involves adding angles – or substracting angles in this case, rather, because we’ve got a minus sign in the exponent of the first factor. So, all that we are saying here is that the insertion of the glass plate retards the phase of the field with an amount equal to w(n-1)Δz/c. What about that sum Eafter the plate = Es + Ea that we were supposed to get?

Well… We’ll use the formula for a first-order (linear) approximation of an exponential once again: ex ≈ 1 + x. Yes. We can do that because Δz is assumed to be very small, infinitesimally small in fact. [If it is not, then we’ll just have to assume that the plate consists of a lot of very thin plates.] So we can write that eiω(n-1)Δz/c) = 1 – iω(n-1)Δz/c, and then we, finally, get that sum we wanted:

Eafter plate = E0eiω[t z/c) iω(n-1)Δz·E0eiω(t z/c)/c

The first term is the original Es field, and the second term is the Ea field. Geometrically, they can be represented as follows:

Addition of fields

Why is Ea perpendicular to Es? Well… Look at the –i = 1/i factor. Multiplication with –i amounts to a clockwise rotation by 90°, and then just note that the magnitude of the vector must be small because of the ω(n-1)Δz/c factor.

4. By now, you’ve either stopped reading (most probably) or, else, you wonder what I am getting at. Well… We have two formulas for Ea now:

Formula 4

and Ea = – iω(n-1)Δz·E0eiω(t – z/c)/c

Equating both yields:

Formula 5

But η, the number of charges per unit area, must be equal to NΔz, with N the number of charges per unit volume. Substituting and then cancelling the Δz finally gives us the formula we wanted, and that’s the classical dispersion relation whose properties we explored above:

Formula 6

Absorption and the absorption index

The model we used to explain the index of refraction had electron oscillators at its center. In the analysis we did, we did not introduce any damping factor. That’s obviously not correct: it means that a glass plate, once it had illuminated, would continue to emit radiation, because the electrons would oscillate forever. When introducing damping, the denominator in our dispersion relation becomes m(ω02 – ω2 + iγω), instead of m(ω02 – ω2). We derived this in our posts on oscillators. What it means is that the oscillator continues to oscillate with the same frequency as the driving force (i.e. not its natural frequency) – so that doesn’t change – but that there is an envelope curve, ensuring the oscillation dies out when the driving force is no longer being applied. The γ factor is the damping factor and, hence, determines how fast the damping happens.

We can see what it means by writing the complex index of refraction as n = n’ – in’’, with n’ and n’’ real numbers, describing the real and imaginary part of n respectively. Putting that complex n in the equation for the electric field behind the plate yields:

Eafter plate = eωn’’Δz/ceiω(n’1)Δz/cE0eiω(t z/c)

This is the same formula that we had derived already, but so we have an extra exponential factor: eωn’’Δz/c. It’s an exponential factor with a real exponent, because there were two i‘s that cancelled. The e-x function has a familiar shape (see below): e-x is 1 for x = 0, and between 0 and 1 for any value in-between. That value will depend on the thickness of the glass sheet. Hence, it is obvious that the glass sheet weakens the wave as it travels through it. Hence, the wave must also come out with less energy (the energy being proportional to the square of the amplitude). That’s no surprise: the damping we put in for the electron oscillators is a friction force and, hence, must cause a loss of energy.

Note that it is the n’’ term – i.e. the imaginary part of the refractive index n – that determines the degree of absorption (or attenuation, if you want). Hence, n’’ is usually referred to as the “absorption index”.

The complete dispersion relation

We need to add one more thing in order to get a fully complete dispersion relation. It’s the last thing: then we have a formula which can really be used to describe real-life phenomena. The one thing we need to add is that atoms have several resonant frequencies – even an atom with only one electron, like hydrogen ! In addition, we’ll usually want to take into account the fact that a ‘material’ actually consists of various chemical substances, so that’s another reason to consider more than one resonant frequency. The formula is easily derived from our first formula (see the previous post), when we assumed there was only one resonant frequency. Indeed, when we have Nk electrons per unit of volume, whose natural frequency is ωk and whose damping factor is γk, then we can just add the contributions of all oscillators and write:

Formula 7

The index described by this formula yields the following curve:

Several resonant frequencies

So we have a curve with a positive slope, and a value n > 1, for most frequencies, except for a very small range of ω’s for which the slope is negative, and for which the index of refraction has a value n < 1. As Feynman notes, these ω’s– and the negative slope – is sometimes referred to as ‘anomalous’ dispersion but, in fact, there’s nothing ‘abnormal’ about it.

The interesting thing is the iγkω term in the denominator, i.e. the imaginary component of the index, and how that compares with the (real) “resonance term” ωk2– ω2. If the resonance term becomes very small compared to iγkω, then the index will become almost completely imaginary, which means that the absorption effect becomes dominant. We can see that effect in the spectrum of light that we receive from the sun: there are ‘dark lines’, i.e. frequencies that have been strongly absorbed at the resonant frequencies of the atoms in the Sun and its ‘atmosphere’, and that allows us to actually tell what the Sun’s ‘atmosphere’ (or that of other stars) actually consists of.     

So… There we are. I am aware of the fact that this has been the longest post of all I’ve written. I apologize. But so it’s quite complete now. The only piece that’s missing is something on energy and, perhaps, some more detail on these electron oscillators. But I don’t think that’s so essential. It’s time to move on to another topic, I think.

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Euler’s spiral

Pre-scriptum (dated 26 June 2020): Most of the relevant illustrations in this post were removed as a result of an attack by the dark force. Too bad, because I liked this post. In any case, despite the removal of the illustrations, you should be able to reconstruct the main story line.

Original post:

When talking diffraction, one of the more amusing curves is the curve showing the intensity of light near the edge of a shadow. It is shown below.

Fig 30-9

Light becomes more intense as we move away from the edge, then it overshoots (so it is brighter than further away), then the intensity wobbles and oscillates, to finally ‘settle’ at the intensity of the light elsewhere.

How do we get a curve like that? We get it through another amusing curve: the Cornu spiral (which was re-named as the Euler spiral for some reason I don’t understand), which we’ve encountered also when adding probability amplitudes. Let me first depict the ‘real’ situation below: we have an opaque object AB, so no light goes through AB itself. However, the light that goes past it, casts a shadow on a screen, which is denoted as QPR here. And so the curve above shows the intensity of the light near the edge of that shadow.

Fig 30-7

The first weird thing to note is what I said about diffraction of light through a slit (or a hole – in somewhat less respectful language) in my previous post: the diffraction patterns can be explained if we assume that there are sources distributed, with uniform density, across the open holes. This is a deep mystery, which I’ll attempt to explain later. As for now, I can only state what Feynman has to say about it: “Of course, actually there are no sources at the holes. In fact, that is the only place that there are certainly no sources. Nevertheless, we get the correct diffraction pattern by considering the holes to be the only places where there are sources.”

So we do the same here. We assume that we have a series of closely spaced ‘antennas’, or sources, starting from B, up to D, E, C and all the way up to infinity, and so we need to add the contributions – or the waves – from these sources to calculate the intensity at all of the points on the screen. Let’s start with the (random) point P. P defines the inflection point D: we’ll say the phase there is zero (because we can, of course, choose our point in time so as to make it zero). So we’ll associate the contribution from D with a tiny vector (an infinitesimal vector) with angle zero. That is shown below: it’s the ‘flat’ (horizontal) vector pointing straight east at the very center of this so-called Cornu spiral.

Fig 30-8

Now, in the neighborhood of D, i.e. just below or above point D, the phase difference will be very small, because the distance from those points near D to P will not differ much from the distance between D and P (i.e. the distance DP). However, as h increases, the phase difference will become larger and larger, it will not increase linearly with h but, because of the geometry involved, the path difference – and, hence, the phase difference (remember – from the previous post – that the phase difference was the product of the wave number and the difference in distance) will increase proportionally with the square of h. In fact, using similar triangles once again, we can easily show that this path difference EF can be approximated by EF ≈ h2/s. However, don’t lose sleep if you wouldn’t manage to figure that out. 🙂

The point to note is that, when you look at that spiral above, the angle of each vector that we’re adding, increases more and more, so that’s why we get a spiral, and not a polygon in a circle, such as the one we encountered in our previous post: the phase differences there were linearly proportional and, hence, each vector added a constant angle to the previous one. Likewise, if we go down from D, to the edge B, the angles will decrease. Of course, if we’re adding contributions to get the amplitude or intensity for point P, we will not get any contributions from points below B. The last (or, I should say, the first) contribution that we get is denoted by the vector BP on that spiral curve, so if we want to get the total contribution, then we have to start adding vectors from there. [Don’t worry: you’ll understand why the other vectors, ‘down south’, are there in a few minutes.]

So we start from BP and go all the way… Well… You see that, once, we’re ‘up north’, in the center of the upper-most spiral, we’re not adding much anymore, because the additional vectors are just sharply changing direction and going round and round and round. In short, most of the contribution to the amplitude of the resultant vector BP∞ is given by points near D. Now, we have chosen point P randomly, and you can easily see from that Cornu spiral that the amplitude, or the intensity rather (which is the square of the amplitude) of that vector BP∞, increases initially, to reach some maximum, depending upon where P is located above B, but then it falls and oscillates indeed, producing the curve with which we started this post.

OK. […] So what else do we have here? Well… That Cornu spiral also shows how we should add arrows to get the intensity at point Q. We’d be adding arrows in the upper-most spiral only and, hence, we would not get much of a total contribution as a result. That’s what marked by vector BQ. On the other hand, if we’d be adding contributions to calculate the intensity at a point much higher than P, i.e. R, then we’d be using pretty much all of the arrows, down from the spiral ‘south’ all the way up to the spiral ‘north’. So that’s BR obviously and, as you can see, most of the contribution comes, once again, from points near D, so that’s the points near the edge. [So now you know why we have an infinite number of arrows in both directions: we need to be able to calculate the intensity from any point on the screen really, below or above P.]

OK. What else? Well… Nothing. This is it really − for the moment that is. Just note that we’re not adding probability amplitudes here (unlike what we did a couple of months ago). We’re adding vectors representing something real here: electric field vectors. [As for how ‘real’ they are: I’ll entertain you about that later. :-)]

This was rather short, isn’t it? I hope you liked it because… Well… What will follow is actually much more boring, because it involves a lot more formulas. However, these formulas will help us get where we want to get, and that is to understand – somehow, if only from a classical perspective – why that empty space acts like an array of electromagnetic radiation sources.

Indeed, when everything is said and done, that’s the deep mystery of light really. Really really deep.

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Diffraction gratings

Pre-scriptum (dated 26 June 2020): Some of the relevant illustrations in this post were removed as a result of an attack by the dark force. Too bad, because I liked this post. In any case, despite the removal of the illustrations, I think you will still be able to reconstruct the main story line.

Original post:

Diffraction gratings are fascinating. The iridescent reflections from the grooves of a compact disc (CD), or from oil films, soap bubbles: it is all the same principle (or closely related – to be precise). In my April, 2014 posts, I introduced Feynman’s ‘arrows’ to explain it. Those posts talked about probability amplitudes, light as a bundle of photons, quantum electrodynamics. They were not wrong. In fact, the quantum-electrodynamical explanation is actually the only one that’s 100% correct (as far as we ‘know’, of course). But it is also more complicated than the classical explanation, which just explains light as waves.

To understand the classical explanation, one first needs to understand how electromagnetic waves interfere. That’s easy, you’ll say. It’s all about adding waves, isn’t it? And we have done that before, haven’t we? Yes. We’ve done it for sinusoidal waves. We also noted that, from a math point of view, the easiest way to go about it was to use vectors or complex numbers, and equate the real parts of the complex numbers with the actual physical quantities, i.e. the electric field in this case.

You’re right. Let’s continue to work with sinusoidal waves, but instead of having just two waves, we’ll consider a whole array of sources, because that’s what we’ll need to analyze when analyzing a diffraction grating.

First the simple case: two sources

Let’s first re-analyze the simple situation: two sources – or two dipole radiators as I called them in my previous post. The illustration below gives a top view of two such oscillators. They are separated, in the north-south direction, by a distance d.

Fig 29-10

Is that realistic? It is for radio waves: the wavelength of a 1 megahertz radio wave is 300 m (remember: λ = c/f). So, yes, we can separate two sources by a distance in the same order of magnitude as the wavelength of the radiation, but, as Feynman writes: “We cannot make little optical-frequency radio stations and hook them up with infinitesimal wires and drive them all with a given phase.”

For light, it will work differently – and we’ll describe how, but not now. As for now, we should continue with our radio waves.

The illustration above assumes that the radiation from the two sources is sinusoidal and has the same (maximum) amplitude A, but that the two sources might be out of phase: we’ll denote the difference by α. Hence, we can represent the radiation emitted by the two sources by the real part of the complex numbers Aeiωt and Aei(ωt + α) respectively. Now, we can move our detector around to measure the intensity of the radiation from these two antennas. If we place our detector at some point P, sufficiently far away from the sources, then the angle θ will result in another phase difference, due to the difference in distance from point P to the two oscillators. From simple geometry, we know that this difference will be equal to d·sinθ. The phase difference due to the distance difference will then be equal to the product of the wave number k (i.e. the rate of change of the phase (expressed in radians) with distance, i.e. per meter) and that distance d·sinθ. So the phase difference at arrival (i.e. at point P) would be

Φ2 – Φ1 = α + k· d·sinθ = α + (2π/λ)·d·sinθ

That’s pretty obvious, but let’s play a bit with this, in order to make we understand what’s going on. The illustration below gives two examples: α = 0 and α = π.

Fig 29-5

How do we get these numbers 0, 2 and 4, which indicate the intensity, i.e. the amount of energy that the field carries past per second, which is proportional to the square of the field, averaged in time? [If it would be (visible) light, instead of radio waves, the intensity would be the brightness of the light.]

Well… In the first case, we have α = 0 and d = λ/2 and, hence, at an angle of 30 degrees, we have d·sin(30°) = (λ/2)(1/2) = λ/4. Therefore, Φ2 – Φ1 = α + (2π/λ)·d·sinθ = 0 + (2π/λ)·(λ/4) = π/2. So what? Well… Let’s add the waves. We will have some combined wave with amplitude AR and phase ΦR:

Formula 1

Now, to calculate the length of this ‘vector’, i.e. the amplitude AR, we take the product of this complex number and its complex conjugate, and that will give us the length squared, and then we multiply it all out and so on and so on. To make a long story short, we’ll find that

AR2 = A12 + A22 + 2A1A2cos(Φ2 – Φ1)

The last term in this sum is the interference effect, and so that’s equal to zero in the case we’ve been studying above (α = 0, d = λ/2 and θ = 30°), so we get twice the intensity of one oscillator only. The other cases can be worked out in the same way.

Now, you should not think that the pattern is always symmetric, or simple, as the two illustrations below make clear.

Fig 29-6

Fig 29-7

With more oscillators, the patterns become even more interesting. The illustration below shows part of the intensity pattern of a six-dipole antenna array:

Fig 29-8

Let’s look at that now indeed: arrays with n oscillators.

Arrays with n oscillators

If we have six oscillators, like in the illustration above, we have to add something like this:

R = A[cos(ωt) + cos(ωt + Φ) + cos(ωt + 2Φ) + … + cos(ωt + 5Φ)]

From what we wrote above, it is obvious that the phase difference Φ can have two causes: the oscillators may be driven differently in phase, or we may be looking at them at an angle so that there is a difference in time delay. Hence, we have the same formula as the one above:

Φ = α + (2π/λ)·d·sinθ

Now, we have an interesting geometrical approach to finding the net amplitude AR. We can, once again, consider the various waves as vectors and add them, as shown below.

Fig 30-1

The length of all vectors is the same (A), and then we have the phase difference, i.e. the different angles: zero for A1, Φ for A1, 2Φ for A2, etcetera. So as we’re adding these vectors, we’re going around and forming an equiangular polygon with n sides, with the vertices (corner points) lying on a circle with radius r. It requires just a bit of trigonometry to establish that the following equality must hold: A = 2rsin(Φ/2). So that fixes r. We also have that the large angle OQT equals nΦ and, hence, AR = 2rsin(nΦ/2). We can now combine the results to find the following amplitude and intensity formula:

Formula 6

This formula is obvious for n = 1 and for n = 2: it gives us the results which were shown above already. But here we want to know how this thing behaves for large n. It is easy to see that the numerator above, i.e. sin2(nΦ/2), will always be larger than the denominator, sin2(Φ/2), and that both are – obviously – smaller or equal to 1. It can be demonstrated that this function of the angle Φ reaches its maximum value for Φ = 0. Indeed, taking the limit gives us I = I0n2. [We can intuitively see this because, if we express the angle in radians, we can substitute sin(Φ/2) and sin(nΦ/2) for Φ/2 and nΦ/2, and then we can eliminate the (Φ/2)2 factor to get n2.

It’s a bit more difficult to understand what happens next. If Φ becomes a bit larger, the ratio of the two sines begins to fall off (so it becomes smaller than n2). Note that the numerator, i.e. sin2(nΦ/2), will be equal to one if nΦ/2 = π/2, i.e. if Φ = π/n, and the ratio sin2(nΦ/2)/sin2(Φ/2) then becomes sin2(π/2)/sin2(π/2n) = 1/sin2(π/2n). Again, if we assume that n is (very) large, we can approximate and write that this ratio is more or less equal to 1/(π2/4n2) = 4n22. That means that the intensity there will be 4/ π2 times the intensity of the beam at the maximum, i.e. 40.53% of it. That’s the point at nΦ/2π = 0.5 on the graph below.

Fig 30-2

The graph above has a re-scaled vertical as well as a re-scaled horizontal axis. Indeed, instead of I, the vertical axis shows I/n2I0, so the maximum value is 1. And the horizontal axis does not show Φ but nΦ/2π, so if Φ = π/n, then nΦ/2π = 0.5 indeed. [Don’t worry about the dotted curve: that’s the solid-line curve multiplied by 10: it’s there to make sure you see what’s going on, as this ratio of those sines becomes very small very rapidly indeed.]

So, once we’re past that 40.53% point, we get at our first minimum, which is reached at nΦ/2π = 1 or Φ = 2π/n. The numerator sin2(nΦ/2) equals sin2(π) = 0 there indeed, so the whole ratio becomes zero. Then it goes up again, to our second maximum, which we get when our numerator comes close to one again, i.e. when sin2(nΦ/2) ≈ 1. That happens when nΦ/2 = 3π/2, or Φ = 3π/n. Again, when n is (very) large, Φ will be very small, and so we can substitute the denominator sin2(Φ/2) for Φ2/4. We then get a ratio equal to 1/(9π2/4), or an intensity equal to 4n2I0/9π2, i.e. only 4.5% of the intensity at the (first) maximum. So that’s tiny. [Well… All is relative, of course. :-)] We can go on and on like that but that’s not the point here: the point is that we have a very sharp central maximum with very weak subsidiary maxima on the sides.

But what about that big lobe at 30 degrees on that graph with the six-dipole antenna? Relax. We’re not done yet with this ‘quick’ analysis. Let’s look at the general case from yet another angle, so to say. 🙂

The general case

To focus our minds, we’ve depicted that array with n oscillators below. Once again, we note that the phase difference between two sources, one to the next, will depend on (1) the intrinsic phase difference between them, which we denote by α, and (2) the time delay because we’re observing the system in a given direction q from the normal, which effect we calculated as equal to (2π/λ)·d·sinθ. So the whole effect is Φ = α + (2π/λ)·d·sinθ = a + k·d·sinθ, with k the wave number.

To make things simple, let’s first assume that α = 0. We’re then in the case that we described above: we’ll have a sharp maximum at Φ = 0, so that means θ = 0. It’s easy to see why: all oscillators are in phase and so we have maximum positive (or constructive) interference.

Let’s now examine the first minimum. When looking back at that geometrical interpretation, with the polygon, all the arrows come back to the starting point: we’ve completed a full circle. Indeed, n times Φ gives nΦ = n·2π/n = 2π. So what’s going on here? Well… If we put that value in our formula Φ = α + (2π/λ)·d·sinθ, we get 2π/n = 0 + (2π/λ)·d·sinθ or, getting rid of the 2π factor, n·d·sinθ = λ.

Now, n·d is the total length of the array, i.e. L, and, from the illustration above, we see that n·d·sinλ = L·sinθ = Δ. So we have that n·d·sinθ = λ = Δ. Hence, Δ is equal to one wavelength.That means that the total phase difference between the first and the last oscillator is equal to 2π, and the contributions of all the oscillators in-between are uniformly distributed in phase between 0° and 360°. The net result is a vector AR with amplitude AR = 0 and, hence, the intensity is zero as well.

OK, you’ll say, you’re just repeating yourself here. What about the other lobe or lobes? Well… Let’s go back to that maximum. We had it at Φ = 0, but we will also have it at Φ = 2π, and at Φ = 4π, and at Φ = 6π etcetera, etcetera. We’ll have such sharp maximum – the maximum, in fact – at any Φ = m⋅2π, where m is any integer. Now, plugging that into the Φ = α + (2π/λ)·d·sinθ formula (again, assuming that α = 0), we get m⋅2π = (2π/λ)·d·sinθ or d·sinθ = mλ.

While that looks very similar to our n·d·sinθ = λ = Δ condition for the (first) minimum, we’re not looking at that Δ but at that δ angle measured from the individual sources, and so we have δ = Δ/n = mλ. What’s being said here, is that each successive source is out of phase by 360° and, because, being out of phase by 360° obviously means that you’re in phase once again, ensure that all sources are, once again, contributing in phase and produce a maximum that is just as good as the one we had for m = 0. Now, these maxima will also have a (first) minimum described by that other formula above, and so that’s how we get that pattern of lobes with weak ‘side lobes’.

Conditions

Now, the conditions presented above for maxima and minima obviously all depend on the distance d, i.e. the spacing of the array, and the wavelength λ. That brings us to an interesting point: if d is smaller than λ (so if the spacing is smaller than one wavelength), we have (d/λ)·sinθ = m < 1, so we only have one solution for m: m = 0. So we only have on beam in that case, the so-called zero-order beam centered at θ = 0. [Note that we also have a beam in the opposite direction.]

The point to note is that we can only have subsidiary great maxima if the spacing d of the array is greater than the wavelength λ. If we have such subsidiary great maxima, we’ll call them first-order, second-order etcetera beams, according to the value m.

Diffraction gratings

We are now, finally, ready to discuss diffraction gratings. A diffraction grating, in its simplest form, is a plane glass sheet with scratches on it: several hundred grooves, or several thousand even, to the millimeter. That is because the spacing has to be of the same order of magnitude of the wavelength of light, so that’s 400 to 700 nanometer (nm) indeed – with the 400-500 nm range corresponding to violet-blue light, and the (longer) 700+ nm range corresponding to red light. Remember, a nanometer is a billionth of a meter (1´10-9 m), so even one thousandth of a millimeter is 1000 nanometer, i.e. longer than the wavelength of red light. Of course, from what we wrote above, it is obvious that the spacing d must be wider than the wavelength of interest to cause second- and third-order beams and, therefore, diffraction but, still, the order of magnitude must be the same to produce anything of interest. Isn’t it amazing that scientists were able to produce such diffraction experiments around the turn of the 18th century already? One of the earliest apparatuses, made in 1785, by the first director of the United States Mint, used hair strung between two finely threaded screws. In any case, let’s go back to the physics of it.

In my previous post, I already noted Feynman’s observation that “we cannot literally make little optical-frequency radio stations and hook them up with infinitesimal wires and drive them all with a given phase.” What happens is something similar to the following set-up, and I’ll quote Feynman again (Vol. I, p. 30-3), just because it’s easier to quote than to paraphrase: “Suppose that we had a lot of parallel wires, equally spaced at a spacing d, and a radio-frequency source very far away, practically at infinity, which is generating an electric field which arrives at each one of the wires at the same phase. Then the external electric field will drive the electrons up and down in each wire. That is, the field which is coming from the original source will shake the electrons up and down, and in moving, these represent new generators. This phenomenon is called scattering: a light wave from some source can induce a motion of the electrons in a piece of material, and these motions generate their own waves.”

When Feynman says “light” here, he means electromagnetic radiation in general. But so what’s happening with visible light? Well… All of the glass in that piece that makes up our diffraction grating scatters light, but so the notches in it scatter differently than the rest of the glass. The light going through the ‘rest of the glass’ goes straight through (a phenomenon which should be explained in itself, but so we don’t do that here), but the notches act as sources and produce  secondary or even tertiary beams, as illustrated by the picture below, which shows a flash of light seen through such grating, showing three diffracted orders: the order m = 0 corresponds to a direct transmission of light through the grating, while the first-order beams (m = +1 and m = -1), show colors with increasing wavelengths (from violet-blue to red), being diffracted at increasing angles.

The ‘mechanics’ are very complicated, and the correct explanation in physics involve a good understanding of quantum electrodynamics, which we touched upon in our April, 2014 posts. I won’t do that here, because here we are introducing the so-called classical theory only. This classical theory does away with all of the complexity of a quantum-electrodynamical explanation and replaces it by what is now as the Huygens-Fresnel Principle, which was first formulated in 1678 (!), and which basically states that “every point which a luminous disturbance reaches becomes a source of a spherical wave, and the sum of these secondary waves determines the form of the wave at any subsequent time.”

500px-Refraction_-_Huygens-Fresnel_principle

This comes from Wikipedia, as do the illustrations below. It does not only ‘explain’ diffraction gratings, but it also ‘explains’ what happens when light goes through a slit, cf. the second (animated) illustration.

500px-Refraction_on_an_aperture_-_Huygens-Fresnel_principle

Huygens_Fresnel_Principle

Now that, light being diffracted as it is going through a slit, is obviously much more mysterious than a diffraction grating – and, you’ll admit, a diffraction grating is already mysterious enough, because it’s rather strange that only certain points in the grating (i.e. the notches) would act as sources, isn’t it? Now, if that’s difficult to understand, it’s even more difficult to understand why an empty space, i.e. a slit, would act as a diffraction grating! However, because this post has become way too long already, we’ll leave this discussion for later.

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Light and radiation

Pre-scriptum (dated 26 June 2020): Most of the relevant illustrations in this post were removed as a result of an attack by the dark force. In any case, you will probably prefer to read how my ideas on the theory of light and matter have evolved. If anything, posts like this document the historical path to them.

Original post:

Introduction: Scale Matters

One of the points which Richard Feynman, as a great physics teacher, does admirably well is to point out why scale matters. In fact, ‘old’ physics are not incorrect per se. It’s just that ‘new’ physics analyzes stuff at a much smaller scale.

For example, Snell’s Law, or Fermat’s Principle of Least Time, which were ‘discovered’ 500 years ago – and they are actually older, because they formalize something that the Greeks had already found out: refraction of light, as it travels from one medium (air, for example) into another (water, for example) – are still fine when studying focusing lenses and mirrors, i.e. geometrical optics. The dimensions of the analysis, or the equipment involved (i.e. the lenses or the mirrors), are huge as compared to the wavelength of the light and, hence, we can effectively look at light as a beam that travels from one point to another in a straight line, that bounces of a surface, or as a beam that gets refracted when it passes from one medium to another.

However, when we let the light pass through very narrow slits, it starts behaving like a wave. Geometrical optics does not help us, then, to understand its behavior: we will, effectively, analyze light as a wave-like thing at that scale, and analyze wave-like phenomena, such as interference, the Doppler effect and what have you. That level of analysis is referred to as the classical theory of electromagnetic radiation, and it’s what we’ll be introducing in this post.

The analysis of light as photons, i.e. as a bunch of ‘particles’ described by some kind of ‘wave function’ (which does not describe any real wave, but only some ‘probability amplitude’), is the third and final level of analysis, referred to as quantum mechanics or, to be more precise, as quantum electrodynamics (QED). [Note the terminology: quantum mechanics describes the behavior of matter particles, such as protons and electrons, while quantum electrodynamics (QED) describes the nature of photons, a force-carrying particle, and their interaction with matter particles.]

But so we’ll focus on the second level of analysis in this post.

Different mathematical approaches

One other thing which Feynman points out in his Lectures is that, even within a well-agreed level of analysis, there are different mathematical approaches to a problem. In fact, while, at any level of analysis, there’s (probably) only one fully mathematically correct analysis, approximate approaches may actually be easier to work with, not only because they actually allow us to solve a practical problem, but also because they help us to understand what’s going on.

Feynman’s treatment of electromagnetic radiation (Volume I, Chapters 28 to 34) is a case in point. While he notes that Maxwell’s field equations are actually the ones to be used, he writes them in a mathematical form that we can understand more easily, and then simplifies that mathematical form even further, in order to derive all that a sophomore student is supposed to know about electromagnetic radiation (EMR), which, of course, not only includes what we call light but also radio waves, radar waves, infrared waves and, on the other side of the spectrum, x-rays and gamma rays.

But let’s get down to business now.

The oscillating charge

Radiation is caused by some far-away electric charge (q) that’s moving in various directions in a non-uniform way, i.e. it is accelerating or decelerating, and perhaps reversing direction in the process. From our point of view (P), we draw a unit vector er’ in the direction of the charge. [If you want a drawing, there’s one further down.]

We write r’ (r prime), not r, because it is the retarded distance: when we look at the charge, we see where it was r’/c seconds ago: r’/c is indeed the time that’s needed for some influence to travel from the charge to the here and now, i.e. to P. So now we can write Coulomb’s Law:

E1 = –qer’/4πe0r’2

This formula can quickly be explained as follows:

  1. The minus sign makes the direction of the force come out alright: like charges do not attract but repel, unlike gravitation. [Indeed, for gravitation, there’s only one ‘charge’, a mass, and masses always attract. Hence, for gravitation, the force law is that like charges attract, but so that’s not the case here.]
  2. E and er’ and, hence, the electric force, are all directed along the line of sight.
  3. The Coulomb force is proportional to the amount of charge, and the factor of proportionality is 1/4πe0r’2.
  4. Finally, and most importantly in this context (study of EMR), the influence quickly diminishes with the distance: it varies inversely as the square of the distance (i.e. it varies as the inverse square).

Coulomb’s Law is not all that comes out of Maxwell’s field equations. Maxwell’s equations also cover electrodynamics. Fortunately, because we are, indeed, talking moving charges here, so electrostatics is only part of the picture and, in fact, the least important one in this case. 🙂 That’s why I wrote E1, with as subscript, above – not E.

So we have a second term, and I’ll actually be introducing a third term in a minute or so. But let’s first look at the second term. I am not sure how Feynman derives it from Maxwell’s equations – I am sure I’ll see the light 🙂 when reading Volume II – but, from Maxwell’s equations, he does, somehow, derive the following, secondary, effect:

Formula 1

This is a term I struggled with in a first read, and I still do. As mentioned above, I need to read Feynman’s Volume II, I guess. But, while I still don’t understand the why, I now understand what this expression catches. The term between brackets is the Coulomb effect, which we mentioned above already, and the time derivative is the rate of change. We multiply that with the time delay (i.e. r’/c). So what’s going on? As Feynman writes it: “Nature seems to be attempting to guess what the field at the present time is going to be, by taking the rate of change and multiplying by the time that is delayed.” 

OK. As said, I don’t really understand where this formula comes from but it makes sense, somehow. As for now, we just need to answer another question in order to understand what’s going on: in what direction is the Coulomb field changing?

It could be either: if the charge is moving along the direction of sight er’ won’t change but r’ will. However, if r’ does not change, then it’s er’ that changes direction, and that change will be perpendicular to the line of sight, or transverse (as opposed to radial), as Feynman puts it. Or, of course, it could be a combination of both. [Don’t worry too much if you’re not getting this: we will need this again in just a minute or so, and then I will also give you a drawing so you’ll see what I mean.]

The point is, these first two terms are actually not important because electromagnetic radiation is given by the third effect, which is written as:

Formula 3

Wow ! This looks even more complicated, doesn’t it? Let’s analyze it. The first thing to note is that there is no r’ or r’2 in this equation. However, that’s an optical illusion of sorts, because r’ does matter when looking at that second-order derivative. How? Well… Let’s go step by step and first look at that second-order derivative. It’s the acceleration (or deceleration) of er’. Indeed, visualize er’ wiggling about, trying to follow the charge by pointing at where the charge was r’/c seconds ago. Let me help you here by, finally, inserting hat drawing I promised you.

Capture

This acceleration will have a transverse as well as a radial component: we can imagine the end of er’ (i.e. the point of the arrow) being on the surface of a unit sphere indeed. So as it wiggles about, the tip of the arrow moves back a bit from the tangential line. That’s the radial component of the acceleration. It’s easy to see that it’s quite small as compared to the transverse component, which is the component along the line that’s tangent to the surface (i.e. perpendicular to er’).

Now, we need to watch out: we are not talking displacement or velocity here but acceleration. Hence, even if the displacement of the charge is very small, and even if velocities would not be phenomenal either (i.e. non-relativistic), the acceleration involved can take on any value really. Hence, even with small displacements, we can have large accelerations, so the radial component is small relative to the transverse component only, not in an absolute sense.

That being said, it’s easy to see that both the transverse as well as the radial component depend on the distance r’ but in a different way. I won’t bother you with the geometrical proof (it’s not that obvious). Just accept that the radial component varies, more or less as the inverse square of the distance. Hence, we will simplify and say that we’re considering large distances r’ only – i.e. large in comparison to the length of the unit vector, which just means large in comparison to one (1) – and then it’s only the transverse component of a that matters, which we’ll denote by ax.

However, if we drop that radial component, then we should drop E1 as well, because the Coulomb effect will be very small as compared to the radiation effect (i.e. E3). And, then, if we drop E1, we can drop the ‘correction’ E2 as well, of course. Indeed, that’s what Feynman does. He ends up with this third term only, which he terms the law of radiation:

Formula 4

So there we are. That’s all I wanted to introduce here. But let’s analyze it a bit more. Just to make sure we’re all getting it here.

The dipole radiator

All that simplification business above is tricky, you’ll say. First, why do we write t – r/c for the retarded time (t’)? It should be t – r’/c, no? You’re right. There’s another simplification here: we fix the delay time, assuming that the charge only moves very small distances at an effectively constant distance r. Think of some far-away antenna indeed.

Hmm… But then we have that 1/c2 factor, so that should reduce the effect to zilch, isn’t it? And then… Hey! Wait a minute! Where does that r suddenly come from? Well, we’ve replaced d2er’/dt2 by the lateral acceleration of the charge itself (i.e. its component perpendicular to the line of sight, denoted by ax) divided by r. That’s just similar triangles.

Phew! That’s a lot of simplifications and/or approximations indeed. How do we know this law really works? And, if it does, for what distance? When is that 1/r part (i.e. E3) so large as compared to the other two terms (E1 and E2) that the latter two don’t matter anymore? Well… That seems to depend on the wavelength of the radiation, but we haven’t introduced that concept yet. Let me conclude this first introduction by just noting this ‘law’ can easily be confirmed by experiment.

A so-called dipole oscillator or radiator can be constructed, as shown below: a generator drives electrons up and down in two wires (A and B). Why do we put the generator in the middle? That’s because we want a net effect: the radiation effect of the electrons in the wires connecting the generator with A and B will be neutral, because the electrons move right next to each other in opposite direction. With the generator in the middle, A and B form one antenna, which we’ll denote by G (for generator).

dipole radiator

Now, another antenna can act as a receiver, and we can amplify the signal to hear it. That’s the D (for detector) shown below. Now, one of the consequences of the above ‘law’ for electromagnetic radiation is, obviously, that the strength of the received signal should become weaker as we turn the detector. The strongest signal should be when D is parallel to G. At point 2, there is a projection effect and, hence, the strength of the field should be less. Indeed, remember that the strength of the field is proportional to the acceleration of the charge projected perpendicular to the line of sight. Hence, at point 3, it should be zero, because the projection is zero.

dipole radiator - field

Now, that’s what an experiment like this would indeed confirm. [I am tempted now to explain how a radio receiver works, but I will resist the temptation.]

I just need to make a last point here in order to make sure that we understand the formula above and – more importantly – that we can use in subsequent chapters without having to wonder where it comes from. The formula above implies that the direction of the field is at right angles to the line of sight. Now, if a charge is just accelerating up and down, in a motion of very small amplitude, i.e. like the motion in that antenna, then the magnitude (or strength let’s say) of the field will be given by the following formula:

Formula 5

θ, in this formula, is the angle between the axis of motion and the line of sight, as illustrated below:

Fig 29-1

So… That’s all we need to know for now. We’re done. As for now that is. This was quite technical, I guess, but I am afraid the next post will be even more technical. Sorry for that. I guess this is just a piece we need to get through.

Post scriptum:

You’ll remember that, with moving and accelerating charges, we should also have a magnetic field, usually denoted by B. That’s correct. If we have a changing electric field, then we will also have a magnetic field. There’s a formula for B:

B = –er’´E/c = –| er’||E|c–1sin(er’, En = –(E/c)·n

This is a vector cross-product. The angle between the unit vector er’ and E is π/2, so the sine is one. The vector n is the vector normal to both vectors as defined by the right-hand screw rule. [As for the minus sign, note that –a´b = b´a, so we could have reversed the vectors: the minus sign just reverses the direction of the normal vector.] In short, the magnetic field vector B is perpendicular to E, but its magnitude is tiny: E/c. That’s why Feynman neglects it, but we will come back on that in later posts.

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Logarithms: a bit of history (and the main rules)

Pre-scriptum (dated 26 June 2020): This post did not suffer much – if at all – from the attack by the dark force—which is good because I still like it. Enjoy !

Original post:

This post will probably be of little or no interest to you. I wrote it to get somewhat more acquainted with logarithms myself. Indeed, I struggle with them. I think they come across as difficult because we don’t learn about the logarithmic function when we learn about the exponential function: we only learn logarithms later – much later. And we don’t use them a lot: exponential functions pop up everywhere, but logarithms not so much. Therefore, we are not as familiar with them as we should be.

The second point issue is notation: x = loga(y) looks more terrifying than y = ax because… Well… Too many letters. It would be more logical to apply the same economy of symbols. We could just write x = ay instead of loga(y), for example, using a subscript in front of the variable–as opposed to a superscript behind the variable, as we do for the exponential function. Or, else, we could be equally verbose for the exponential function and write y = expa(x) instead of y = ax. In fact, you’ll find such more explicit expressions in spreadsheets and other software, because these don’t take subscripts or superscripts. And then, of course, we also have the use of the Euler number e in eand ln(x). While it’s just a real number, is not as familiar to us as π, and that’s again because we learned trigonometry before we learned advanced calculus.

Historically, however, the exponential and logarithmic functions were ‘invented’, so to say, around the same time and by the same people: they are associated with John Napier, a Scot (1550–1617), and Henry Briggs, an Englishman (1561–1630). Briggs is best known for the so-called common (i.e. base 10) logarithm tables, which he published in 1624 as the Arithmetica Logarithmica. It is logical that the mathematical formalism needed to deal with both was invented around the same time, because they are each other’s inverse: if y = ax, then x = loga(y).

These Briggs tables were used, in their original format more or less, until computers took over. Indeed, it’s funny to read what Feynman writes about these tables in 1965: “We are all familiar with the way to multiply numbers if we have a table of logarithms.” (Feynman’s Lectures, p. 22-4). Well… Not any more. And those slide rules, or slipsticks as they were called in the US, have disappeared as well, although you can still find circular slide rules on some expensive watches, like the one below.

It’s a watch for aviators, and it allows them to rapidly multiply numbers indeed: the time multiplied by the speed will give a pilot the distance covered. Of course, there’s all kinds of intricacies here because we’ll measure time in minutes (or even seconds), and speed in knots or miles per hour, and so that explains all the other fancy markings on it. 🙂 In case you have one, now you know what you’re paying for! A real aviator watch! 🙂

Captureasset-version-16f0646efa-navitimer-01-1

How does it work? Well… These slide rules can be used for a number of things but their most basic function is to multiply numbers indeed, and that function is based on the logb(ac) = logb(a) + logb(c). In fact, this works for any base so we can just write log(ac) = log(a) + log(c). So the numbers on the slide rule below are the a, b and c. Note that the slides start with 1 because we’re working with positive numbers only and log(1) = 0, so that corresponds with the zero point indeed. The example below is simple (2 times 3 is six, obviously): it would have been better to demonstrate 1.55×2.35 or something. But you see how it goes: we add log(2) and log(3) to get log(6) = log(2×3). For 1.55×2.35, the slider would show a position between 3.6 and 3.7. The calculator on my $30 Nokia phone gives me 3.6425. So, yes, it’s not far off. However, it’s hard to imagine that engineers and scientists actually used these slide rules over the past 300 years or so, if not longer.

403px-Slide_rule_example2_with_labels

Of course, Briggs’ tables are more accurate. It’s quite amazing really: he calculated the logarithms of 30,000 (natural) numbers to to fourteen decimal places. It’s quite instructive to check how he did that: all he did, basically, was to calculate successive square roots of 10.

Huh?

Yes. The secret behind is the basic rule of exponentiation: exponentiation is repeated multiplication, and so we can write: am+n =aman and, more importantly, am–n = ama–n = am/an. Because Briggs used the common base 10, we should write 10m–n = 10m/10n. Now Briggs had a table with the successive square roots of 10, like the one below (it’s only six significant digits behind the decimal point, not fourteen, but I just want to demonstrate the principle here), and so that’s basically what he used to calculate the logarithm (to base 10) of 30,000 numbers! Talking patience ! Can you imagine him doing that, day after day, week after week, month after month, year after year? Waw !

Capture

So how did he do it? Well… Let’s do it for x = log10(2) = log(2). So we need to find some x for which 10x = 2. From the table above, it’s obvious that log(2) cannot be 1/2 (= 0.5), because 101/2 = 3.162278, so that’s too big (bigger than 2). Hence, x = log(2) must be smaller than 0.5 = 1/2. On the other hand, we can see that x will be bigger than 1/4 = 0.25 because 101/4 = 1.778279, and so that’s less than 2.

In short, x = log(2) will be between 0.25 (= 1/4) and 0.5. What Briggs did then, is to take that 101/4 factor out using the 10m–n = 10m/10n formula indeed:

10x–0.25 = 10x/100.25 = 2/1.778279 = 1.124683

If you’re panicking already, relax. Just sit back. What we’re doing here, in this first step, is to write 2 as

2 = 10x = 10[0.25 + (x–0.25)] = 101/410x–0.25 = (1.778279)(1.124683)

[If you’re in doubt, just check using your calculator.] We now need log(10x–0.25) = log(1.124683). Now, 1.124683 is between 1.154782 and 1.074608 in the table. So we’ll use the lowest value (101/32) to take another factor out. Hence, we do another division: 1.124683/1.074608 = 1.046598. So now we have 2 = 10x = 10[1/4 + 1/32 + (x – 1/4 – 1/32)] = (1.778279)(1.074608)(1.046598).

We now need log(10x–1/4–1/32) = log(1.046598). We check the table once again, and see that 1.046598 is bigger than the value for 101/64, so now we can take that 101/64 value out by doing another division. (10x–1/4–1/32)/101/64 = 1.046598/1.036633 = 1.009613. Waw, this is getting small! However, we can still take an additional factor out because it’s larger than the 1.009035 value in the table. So we can do another division: 1.009613/1.009035 = 1.000573. So now we have 2 = 10x = 10[1/4 + 1/32 + 1/64 + 1/256 + (x – 1/4 –1/32 – 1/64 –1/256)] = 101/4101/32101/64101/25610x–1/4–1/32–1/64–1/256 = (1.778279)(1.074608)(1.036633)(1.009035)(1.000573).

Now, the last factor is outside of the range of our table: it’s too small to find a fraction. However, we had a linear approximation based on the gradient for very small fractions x: 10= 1 + 2.302585·r. So, in this case, we have 1.000573 = 1 + 2.302585·r and, hence, we can calculate r as 0.000248. [I can shown where this approximation comes from: just check my previous posts if you want to know. It’s not difficult.] So, now, we can finally write the result of our iterations:

2 = 10x ≈ 10(1/4 + 1/32 + 1/64 + 1/256 + 0.000248)

So log(2) is approximated by 0.25 + 0.03125 + 0.015625 + 0.00390625 + 0.000248 = 0.30103. Now, you can check this easily: it’s essentially correct, to an accuracy of six digits that is!

Hmm… But how did Briggs calculate these square roots of 10? Well… That was done ‘by cut and try’ apparently! Pf-ff ! Talk of patience indeed ! I think it’s amazing ! And I am sure he must have kept this table with the square roots of 10 in a very safe place ! 🙂

So, why did I show this? Well… I don’t know. Just to pay homage to those 17th century mathematicians, I guess. 🙂 But there’s another point as well. While the argument above basically demonstrated the am+n = amaformula or, to be more precise, the am–n = am/an formula, it also shows the so-called product rule for logarithms:

logb(ac) = logb(a) + logb(c)

Indeed, we wrote 2 as a product of individual factors 10and then we could see the exponents r in all of these individual factors add up to 2. However, the more formal proof is interesting, and much shorter too: 🙂

  1. Let m = loga(x) and n = loga(y)
  2. Write in exponent form: x = aand y = an
  3. Multiply x and y: xy = aman = am+n
  4. Now take loga of both sides: loga(xy) = loga(am+n) = (m+n)loga(a) = m+n = loga(x) + loga(y)

You’ll notice that we used another rule in this proof, and that’s the so-called power rule for logarithms:

loga(xn)= nloga(x)

This power rule is proved as follows:

  1. Let m = loga(x)
  2. Write in exponent form: x = am
  3. Raise both sides to the power of n: xn = (am)n
  4. Convert back to a logarithmic equation: loga(xn)= mn
  5. Substitute for m = loga(x): loga(xn)= n loga(x)

Are there any other rules?

Yes. Of course, we have the quotient rule:

loga(x/y) = loga(x) – loga(y)

The proof of this follows the proof of the product rule, and so I’ll let you work on that.

Finally, we have the ‘change-of-base’ rule, which shows us how we can easily switch from one base to another indeed:

The proof is as follows:

  1. Let x = loga b
  2. Write in exponent form: a= b
  3. Take log c of both sides and evaluate:

log c ax = log c b
x
log c a = log c b

[I copied these rules and proofs from onlinemathlearning.com, so let me acknowledge that here. :-)]

Is that it? Well… Yes. Or no. Let me add a few more lines on these logarithmic scales that you often encounter in various graphs. It the same scale as those logarithmic scales used for that slide that we showed above but it covers several orders of magnitude, all equally spaced: 1, 10, 100, 1000, etcetera, instead of 0, 1, 2, 3, etcetera. So each unit increase on the scale corresponds to a unit increase of the exponent for a given base (base 10 in this case): 101, 102, 103, etcetera. The illustration below (which I took from Wikipedia) compares logarithmic scales to linear ones, for one or both axes.

Logarithmic_Scales

So, on a logarithmic scale, the distance from 1 to 100 is the same as the distance from 10 to 1000, or the distance from 0.1 to 10, or the distance between any point that’s 100 (= 102) times another point. This is easily explained by the product rule, or the quotient rule rather:

log(10) – log(0.1) = log(101/1–1) = log(102) = 2

= log(1000) – log(10) = log(103/11) = log(102/) = 2

= log(100) – log(1) = log(102/100) = log(102) = 2

And, of course, we could say the same for the distance between 1 and 1000, and 0.1 and 100. The distance on the scale is 3 units here, while the point is 1000 = 10the other point.

Why would we use logarithmic scales? Well… Large quantities are often better expressed like that. For example, the Richter scale used to measure the magnitude of an earthquake is just a base–10 logarithmic scale. With magnitude, we mean the amplitude of the seismic waves here. So an earthquake that registers 5.0 units on the Richter scale has a ‘shaking amplitude’ that is 10 times greater than that of an earthquake that registers 4.0. Both are fairly light earthquakes, however: magnitude 7, 8 or 9 are the big killers. Note that, theoretically, we could have earthquakes of a magnitude higher than 10 on the Richter scale: scientists think that the asteroid that created the Chicxulub crater created a cataclysm that would have measured 13 on Richter’s scale, and they associate it with the extinction of the dinosaurs.

The decibel, measuring the level of sound, is another logarithmic unit, so the power associated with 40 decibel is not two times but one hundred times that of 20 decibel!

Now that we’re talking sound, it seems that logarithmic scales are more ‘natural’ when it comes to human perception in general, but I’ll let you have fun googling some more stuff on that! 🙂

Real exponentials and double roots: a post for my kids

Pre-scriptum (dated 26 June 2020): This post did not suffer much – if at all – from the attack by the dark force—which is good because I still like it. Enjoy !

Original post:

There is one loose end related to exponentials that I want to tie up here. It’s the issue of multiple roots (or multiple-valuedness as it’s called in the context of inverse functions).

Introduction

You’ll remember that, for integer exponents n, we had two inverse operations for an:

  1. The logarithm: the instruction here is to find n (i.e. the exponent) given the value an and given a (i.e. the base).
  2. The ‘nth root’ function: the instruction here is find a (i.e. the base) given the value an and given n (i.e. the exponent).

We have two inverse operations because the exponentiation operation is not commutative: while a + b = b + a (and, therefore, a×b = b×a, so multiplication is commutative as well), an is surely not the same as na (except if a = n, of course).

Having two inverse operations is somewhat confusing, of course. However, when we expand the domain of the exponential function to also include rational exponents, the ‘nth root’ function becomes an exponential function itself: a1/n. That’s nice, because it tidies things up. We only have one inverse operation now: the logarithm.

Now, my kids understand exponentials, but they find logarithms weird. There are two reasons for that. The most important one is that we don’t learn about the logarithm function when we learn about the exponential function. We only learn logarithms later – much later. Therefore, we are not as familiar with them as we should be. There is no good reason for that but that’s what it is. [I guess I am like Euler here: I’d suggest logarithms and complex numbers should be taught earlier in life. Then we would have less trouble understanding them.]

The second one is notation, I think. Indeed, x = loga(y) looks much more frightening than y = abecause… Well… Too many letters. It would be more logical to apply the same economy of symbols. We could just write x = ay instead of loga(y), for example, using a subscript in front of the variable–as opposed to a superscript behind the variable, as we do for the exponential function. Or, else, we could be equally verbose for the exponential function and write y = expa(x) instead of y = ax. In fact, you’ll find such more explicit expressions in spreadsheets and other software, because these don’t take subscripts or superscripts.

In any case, that’s not the point here. I will come back to the logarithmic function later. The point that I want to discuss here is that, while we sort of merged our ‘nth root function’ with our exponential function as we allowed for rational exponents as well (as opposed to integers only), we’re actually still taking roots, so to say, and then we note another problem: the square root function yields not one but two numbers when the base (a) is real and positive: ± a1/2.

In fact, that’s a more general problem.

Odd and even rational exponents 

You’ll remember the following rules for exponentiation:

1. For a positive real number a, we have always have two real nth roots when n is even: a1/n: ± a1/n. That’s obviously a consequence of having two real square roots ± a1/2, because the definition of even parity is that n can be written as n = 2k with k any integer, i.e. k ∈ Z (so k can be negative). Hence, a1/n can then be written as a1/2k = a1/2k = (a1/k)1/2. Hence, whatever the value of a1/k (if k is even, then we have two kth roots once again, but that doesn’t matter), we will have two real roots: plus (a1/k)1/2 and minus (a1/k)1/2

2. If n is uneven (or odd I should say), so n ∈ {2k+1: k ∈ Z}, we have only one real root a1/2k+1: that root is positive when a is positive and negative when a is negative.

3. For the sake of completeness, let me add the third case: a is negative and n is even. We know there’s no real nth root of a in that case. That’s why mathematicians invented i: we’ll associate an even root of a negative real number with two complex-valued roots: a1/n: ± ia1/n.

The first and second case are illustrated below for n = 2 and n = 3 respectively. The complex roots of the third case cannot be visualized because y is a real axis. Of course, we could imagine the complex-roots ± ia1/n if we would flip or mirror the blue and red graph (i.e. the graphs for n = 2) along the vertical axis and re-label that axis as the iy-axis, i.e. the imaginary axis. But so I’ll leave that to your imagination indeed. 

graph

How does this parity business turn out for rational exponents?

If r is a rational number r = m/n, we’ll have to express it as an irreducible fraction first, so the numerator m and denominator n have no other common divisors than 1, or –1 when considering negative numbers. But let’s look at positive numbers first. If we write r as an irreducible fraction m/n, then m and n cannot both be even. Why not? Because m and n can then both be divided by 2 and m/n is not an irreducible fraction in that case. Let’s assume m is even. Hence, n must be odd in that case. We can then write a2k/n as (ak/n)2. This number will always be positive, because we are squaring something. So it doesn’t matter if ak/n has one or two roots: we’ll square them and so the result will always be positive.

Now let’s assume the second possibility: m is odd. We can then write am/n as (a1/n)m. So now it will depend on whether or not n is even. If n is even, we have two real roots, if n is uneven, then we have only one. Let’s work a few examples:

  • 82/3 = (81/3)= 2= 4
  • 43/2 = (41/2)= (±2)=±2=±23 = ±8
  • 161/4 = (161/2)1/2 = (±4)1/2 =±41/2 =±2 = ±2
  • (–8)5/3 = (–81/3)= (–2)= 32

So we have two roots if m is odd and n is even, and only one root in all other cases. However, we said that m and n cannot both be even, hence, if n is even, m must be odd. In short, we can say that a rational exponent m/n is even (i.e. there will be two roots), if n is even. Does that work for complex roots as well? Let’s work that out with an example:

(–4)3/2 = (–41/2)= (±2i)=(±2)3i=±8i

So, yes! It works for complex roots as well. 🙂

OK. But let’s ask the obvious question now: where are these even numbers on the real line?

Well… They are everywhere: we can start from 1/2 and then change the numerator: 3/2, 5/2, etcetera. It’s all fine, as long as we use an odd number. However, we can also go down and change the denominator: 1/4, 1/6, 1/8 etcetera. And then we can, of course, take odd multiples of these fractions once again, such as 1025/1024 = 1.0009765625, for example, or on the other side, 1023/1024 = 0.9990234375. So we have two even numbers here right next to the odd number 1. We may increase the precision: we could take ± 1/3588 for example. 🙂

Of course, you may have noticed something here. The first thing, of course, is that we’ve defined these two even numbers 1.0009765625 and 0.9990234375 with a precision of 10 digits behind the decimal point, i.e. 1/1024 = 1/210 = 0.0009765625. The second point to note is that the last digit of these two rational coefficients, when expressed as a decimal, was 5. Now, you may think that should always be the case because of that 1/2 factor. But it’s not true: 1/6, for example, is a rational number that, written in decimal form, will yield 0.166666… This is an expression with a recurring decimal. And 1/10, of course, just yields 0.1. So there’s no easy rule here. You need to look at the fraction itself, and rational numbers are either as a finite decimal or an infinite repeating decimal. Of course, there are rules for that, but this is not a post on number theory, so I won’t write anything more on this: you can Google some more stuff yourself if you’re interested in this.

Irrational exponents

How does the business of parity work for irrational exponents? The gist of the rather long story above can be summarized easily. We can write am/n as am/n = am·(1/n) =(a1/n)m = a1/n·a1/n·a1/n·a1/n =·… (m times) and so whether or not we have multiple roots (two instead of one) depends on whether or not n is even. Indeed, remember – once again – that exponentiation is repeated multiplication, and so for the sign of the result, what matters is whether or not the number of times that we do that multiplication is even or odd, not only for integer but for rational exponents as well.

For irrational exponents, we also have repeated multiplication, but now we have an infinite expression, not a finite one:

a= ar(1/Δ + 1/Δ + 1/Δ + 1/Δ +…) = ar/Δ·ar/Δ·ar/Δ·ar/Δ

I explained this expression in my previous post: 1/Δ is an infinitesimally small fraction. In fact, I calculated rational powers of using the fraction 1/Δ = 1/1024 = 1/210. I used that fraction because I had started backwards, taking successive square roots of e, so e1/2, and then  e1/4, e1/8e1/16, etcetera.

However, as I mentioned when I started doing that, there was no compelling reason to cut things up by dividing them in 2. We could use 1/3 as the fraction to start with and, then, of course, or fraction 1/Δ would have been equal to 1/310 = 1/59049, so we have an odd number in the denominator here. So that’s one problem: we cannot say if Δ is even or odd. And the the second problem, of course, is that it’s an infinite expression and, hence, we cannot say if we multiplied 1/Δ an even or an odd number of times.

That leads to the third problem: we cannot say if r itself is even or uneven, which is basically what we were looking at: can we define irrational exponents as even or odd?

In short, the answer is no. In practice, that means that we will associate awith one ‘rth root’ only.

Hmm… That obviously makes a lot of sense but how do we ‘justify’ it from a more formal point of view? Where do these negative roots (for even powers) go? I am not sure. I guess there must be some more formal argument but I’ll leave that to you to look it up. I am fairly happy with what Wikipedia writes on that:

“[Real] Powers of a positive real number are always positive real numbers. […]  If the definition of exponentiation of real numbers is extended to allow negative results then the result is no longer well behaved.”

In fact, the article actually does give a somewhat more formal argument, as it writes:

  • Neither the logarithm method nor the rational exponent method can be used to define br as a real number for a negative real number b and an arbitrary real number r. Indeed, eris positive for every real number r, so ln(b) is not defined as a real number for b ≤ 0.
  • As for the rational exponent method, that cannot be used for negative values of b because it relies on continuity. The function f(r) = br has a unique continuous extension from the rational numbers to the real numbers for each b > 0. But when b < 0, the function f is not even continuous on the set of rational numbers r for which it is defined.

I am not quite sure I fully understand the last line, but I guess this refers to what I pointed out above: all these even and odd numbers that are so close to each other. When we go from rational to irrational exponents, we can no longer define odd or even.

The bottom line

The bottom line is that, in practice, we will only work with positive real bases. Hence, if b is negative, then we will define br as –(–b)r. Huh?

Yes. Think about it. If b is negative, we’ll just multiply it with –1 to ensure that the base is a positive real number. And then we just put a minus in front to get a graph such as, for example, that x1/3 function for the negative side of the x-axis as well.

You should also note that most applications, like the one I use to draw simple graphs like the ones above (rechneronline.de/function-graphs) are not capable of showing you both roots. They do check whether the exponent is even or odd though, because it plots the function x1/3 on both sides of the zero point, and the  x1/2 graph on the positive side only: it’s just not capable to associate more than one y value with one x value indeed. [In case you’re curious to see what it does with an irrational exponent, go and check it yourself: you can put in x^pi or x^e. Will it give function values for negative values of x as well? What’s your guess? :-)]

You’ll wonder why I am emphasizing this point. Well… I just wanted to note that we should be aware of the fact that, as we go from rational to irrational exponents, we sort of deliberately ‘forget’ about the second (negative) root. The point to note is that the issue of multiple-valued functions – such as discussed in the context of, for example, Riemann surfaces – is not necessarily related to complex-valued functions. We have it here (double roots), and we also have it, in general, for periodic functions.

But that’s for a next post. And there we’ll use our ‘natural’ exponential ex, and its inverse function, ln(x), an awful lot. So I’ll just conclude here with their graphs, noting, as Wikipedia does, that, nowadays, the term ‘exponential function’ is almost exclusively used as a shortcut for describing the natural exponential function ex. But, to my kids, I say: it’s good that you know where it comes from. 🙂

graph (6)

Post scriptum:

When thinking about such minor things, it’s always to good to think about why we are manipulating all these symbols. Exponentiation is repeated multiplication. What does it mean to multiply something with a negative number? A minus sign is an instruction to reverse direction, to turn around, 180 degrees. So we multiply the magnitudes of both numbers a and b, but we change the direction: if we’re walking down the positive real axis, then now we’re walking down the negative axis.

So repeated multiplication with a negative real number means we’re switching back and forth, wildly jumping from the positive to the negative side of the zero point and then back again. You’ll admit you would appreciate being told in advance how many times we need to do the multiplication if the multiplier is negative: if n is even, then we’ll end up going in the same direction:  (–1)= 1. No sign reversal. If n is uneven, then we know that, besides the ‘booster’ effect (i.e. the exponentiation operation), we’re expected to speed in the opposite direction: (–1)= –1.

Hence, if b would happen to be a negative real number, then defining br as –(–b)r, or assuming that, in general, our base will be a positive real number makes sense. Of course, the math has to keep track of the theoretical possibility that, if the exponent would happen to be even, b might be a negative number, but you can see it’s more of a theoretical possibility indeed. Not something we’d associate with something happening in real life.

In that sense, I should note that multiplication with a complex multiplier is much more ‘real-life’, so to say. Multiplying something with a complex number does the same to the magnitude of both numbers as real multiplication: it multiplies the magnitudes, thereby changing the scale. So the product of a vector that’s 2 units long and a vector that’s 3 units long will still be 6 units long. However, complex numbers also allow for a more gradual change of direction. Instead of just a gear to move forward and backward, we also get a steering wheel so to say: multiplying two complex numbers also adds their angles (as measured from some kind of zero direction obviously), besides multiplying their magnitudes. For example, suppose that the zero direction is east, and we have a vector pointing east indeed (that means its imaginary part is zero) that we need to multiply with a vector pointing north (so that’s a vector with a zero real part, along the imaginary axis), then the final vector will be pointing north.

However, with that subtlety comes complexity as well. With real numbers, you can go in the same direction by reversing direction two times, and so that’s why we have two 2nd roots (i.e. two square roots) of 1: (a) +1, so then we just stay where we are, and (b) –1, so then we rotate two times a full 180 degrees around the zero point: indeed, (–1)(–1) corresponds to two successive rotations by 180 degrees (or π in radians)–clockwise or counterclockwise, it doesn’t matter: one full loop around the zero point will get us back to square one, or point 1, I should say. 🙂

imaginary_rotation

With complex numbers, it all depends. The 3rd root (i.e. the cube root) of 1 was only 1 in the real space but, in the complex space, we have three 3 cube roots of unity. The first one (W= W3) is the root we’re used to: unity itself, so the angle here is zero, i.e. straight ahead. In fact, with 1, we just stay where we are: 1×1×1 = 1= 1 indeed. But that’s not the only way. The illustration below shows two other ways to end up where we are (i.e. at point 1):

  • The second cube root is W2: 120 degrees. You can see we get back at 1 by making three successive turns of 120 degrees indeed, so that’s one full loop around the or<igin. Using complex numbers (in polar notation), we write e2π/3×e2π/3×e2π/3 = e6π/3 = e2π e= 1.
  • The third cube root is W1: that’s 240 degrees ! Indeed, here we get back at square one by making three successive turns of 4π/3 radians, i.e .by making two loops, in total, around the origin: e4π/3×e4π/3×e4π/3 = e12π/3 = e4π e= 1.

In short, we gain flexibility (of course, we have four 4th roots (with which we make 0, 1, 2 and 3 loops around the origin respectively), 5th roots, and so on), and the great Leonhard Euler was obviously fully right: complex numbers are more ‘natural’ numbers as they allow us to model real-life situations much better.

However, if you think that double roots are a problem… Well… Think again ! With complex numbers, the problem of multiple-valuedness is much more ‘real’, I’d say. 🙂

Capture

P.S: As mentioned in my previous post, I talk about that problem of multiple-valuedness when talking about Riemann surfaces in my October-November 2013 posts, so I won’t repeat what I wrote there. It’s about time I get back to both Feynman as well Penrose. 🙂

Just one last (philosophical) question to test your understanding. Negative real numbers have no real square root. That includes –1 obviously. Why is that? Why do we have two square real roots for +1 and no (none!) (real) square roots for –1?

[…] No? Come on!

[…] OK. Let me tell you: it’s all a question of definition. What’s implicit here is that we have only one real direction: from zero to infinity along the positive axis, and then –1 is nothing but a reversal of direction. So it’s an operation really, not a ‘real’ number. In a philosophical sense, of course: negative numbers don’t exist, so to say! Indeed, ask yourself: what is a negative number? It’s an operation: we subtract things when we use the minus sign, and we reverse direction when multiplying numbers with –1. So, if we multiply something with –1 two times in succession, we are back where we are.

Of course, we could say that the negative direction is the ‘real’ direction and, hence, that it’s the positive numbers that don’t ‘really’ exist. Indeed, math doesn’t care about what we say, so let’s say that the negative axis is the ‘real’ one, in a physical sense. What happens then? Well… Let’s see… Let’s do what we did before. We still define –1 as a reversal of direction, or a rotation by 180 degrees and, hence, doing that two times should bring us back where we want to be, so that’s –1 now. OK. So we have (–1)(–1)(–1) = (–1). But so that means that (–1)(–1) = 1, and… How can we write something like that for –1? What number a gives us the result that a×a = –1. Hmm… Only this imaginary number: i×i = i2 = –1. So, no matter how hard you try: the way we use symbols is pretty consequent, and so you will find that (–1)(–1) = 1×1 = 1 (so we have two square roots of 1), but we will not find that 1×1 = –1. If you would want to do that, you’d have to define +1 as a reversal of direction, so that basically means that the + sign would take the function of the – sign. Huh?

🙂 You must think I’ve gone crazy. I don’t think so. The idea I want to convey here is that, no matter how abstract math may seem to be – when everything is said and done – it’s intimately connected to our most basic notions of space, and our motion in that space. We go from here to there, or backwards, we change direction, we count things, we measure lengths or distances,… All that math does is to capture that in a non-ambiguous and consistent way. That also results in terse ‘truths’ such as: 1 has two real square roots, +1 and –1, but the square roots of –1 are only imaginary: ± i.

However, that terse statement hides another fun ‘truth’: +i and −i are as real as –1. Indeed, they are a rotation by 90 degrees, counterclockwise (+i) or clockwise (−i), as opposed to, for example, a rotation by 180 degrees (–1), or a full loop (1). 🙂

Euler’s formula revisited

Pre-scriptum (dated 26 June 2020): This post – part of a series of rather simple posts on elementary math and physics – did not suffer much from the attack by the dark force—which is good because I still like it. Enjoy !

Original post:

This post intends to take some of the magic out of Euler’s formula. In fact, I started doing that in my previous post but I think that, in this post, I’ve done a better job at organizing the chain of thought. [Just to make sure: with ‘Euler’s formula’, I mean ei= cos(x) + isin(x). Euler produced a lot of formulas, indeed, but this one is, for math, what E = mcis for physics. :-)]

The grand idea is to start with an initial linear approximation for the value of the complex exponential eis near s = 0 (to be precise, we’ll use the eiε = 1 + iε formula) and then show how the ‘magic’ of i – through the i= –1 factor – gives us the sine and cosine functions. What we are going to do, basically, is to construct the sine and cosine functions algebraically.

Let us, as a starting point – just to get us focused – graph (i) the real exponential function ex, i.e. the blue graph, and (ii) the real and imaginary part of the complex exponential function ei= cos(x) + isin(x), i.e. the red and green graph—the cosine and sine function.   graph (5)From these graphs, it’s clear that ex and eiare two very different beasts.

1. eis just a real-valued function of x, so it ‘maps’ the real number x to some other real number y = ex. That y value ‘rockets’ away, thereby demonstrating the power of exponential growth. There’s nothing really ‘special’ about ex. Indeed, writing einstead of 10obviously looks better when you’re doing a blog on math or physics but, frankly, there’s no real reason to use that strange number e ≈ 2.718 when all you need is just a standard real exponential. In fact, if you’re a high school student and you want to attract attention with some paper involving something that grows or shrinks, I’d recommend the use of πx. 🙂

2. eiis something that’s very different. It’s a complex-valued function of x and it’s not about exponential growth (though it obviously is about exponentiation, i.e. repeated multiplication): y = eidoes not ‘explode’. On the contrary: y is just a periodic ‘thing’ with two components: a sine and a cosine. [Note that we could also change the base, to 10, for example: then we write 10ix. We’d also get something periodic, but let’s not get lost before we even start.]

Two different beasts, indeed. How can the addition of one tiny symbol – the little i in ei– can make such big difference?

The two beasts have one thing in common: the value of the function near x = 0 can be approximated by the same linear formula:

FormulaIn case you wonder where this comes from, it’s basically the definition of the derivative of a function, as illustrated below. izvodThis is nothing special. It’s a so-called first-order approximation of a function. The point to note is that we have a similar-looking formula for the complex-valued eifunction. Indeed, its derivative is d(eix)/dx = ieiand when we evaluate that derivative at x = 0, then we get ie= i. So… Yes, the grand result is that we can effectively write:

eiε ≈ 1 + iε for small ε

Of course, 1 + iε is also a different ‘beast’ than 1 +  ε. Indeed, 1 + ε is just a continuation of our usual walk along the real axis, but 1 +  iε points in a different direction (see below). This post will show you where it’s headed.

Capture

Let’s first work with eagain, and think about a value for ε. We could take any value, of course, like 0.1 or some fraction 1/n. We’ll use a fraction—for reasons that will become clear in a moment. So the question now is: what value should we use for n in that 1/n fraction? Well… Because we are going to use this approximation as the initial value in a series of calculations—be patient: I’ll explain in a moment—we’d like to have a sufficiently small fraction, so our subsequent calculations based on that initial value are not too far off. But what’s sufficiently small? Is it 1/10, or 1/100,000, or 1/10100? What gives us ‘good enough’ results? In fact, how do we define ‘good enough’?

Good question! In order to try to define what’s ‘good enough’, I’ll turn the whole thing on its head. In the table below, I calculate backwards from e= e by taking successive square roots of eHuh? What? Patience, please! Just go along with me for a while. First, I calculate e1/2, so our fraction ε, which I’ll just write as  x, is equal to 1/2 here, so the approximation for e1/2 is 1 + 1/2 = 1.5. That’s off. How much? Well… The actual value of e1/2 is about 1.648721 (see the table below (or use a calculator or spreadsheet yourself): note that, because I copied the table from Excel, ex is shown as e^x). Now, 1.648721 is 1.5 + 0.148721, so our approximation (1.5) is about 9% off (as compared to the actual value). Not all that much, but let’s see how we can improve. Let’s take the square root once again: (e1/2)1/2 e1/4, so x = 1/4. And then I do that again, so I get e1/8, and so on and so on. All the way down to x = 1/1024 = 1/210, so that’s ten iterations. Our approximation 1 + x (see the fifth/last column in the table below is then equal to 1 + 1/1024 = 1 + 0.0009765625, which we rounded to 1.000977 in the table.

e calculation

The actual value of e1/1024 is also about 1.000977, as you can see in the third column of the table. Not exactly, of course, but… Well… The accuracy of our approximation here is six digits behind the decimal point, so that’s equivalent to one part in a millionth. That’s not bad, but is it ‘good enough’? Hmm… Let’s think about it, but let’s first calculate some other things. The fourth column in the table above calculates the slope of that AB line in the illustration above: its value converges to one, as we would expect, because that’s the slope of the tangent line at x = 0. [So that’s the value of the derivative of eat x = 0. Just check it: dex/dx = ex, obviously, and e= 1.] Note that our 1 + x approximation also converges to 1—as it should!

So… Well… Let’s now just assume we’re happy with with that approximation that’s accurate to one part in a million, so let’s just continue to work with this fraction 1/1024 for x. Hence, we will write that e1/1024 ≈ 1 + 1/1024 and now we will use that value also for the complex exponentialHuh? What? Why? Just hang in here for a while. Be patient. 🙂 So we’ll just add the again and, using that eiε ≈ 1 + iε expression, we write:

ei/1024 ≈ 1 + i/1024

It’s quite obvious that 1 + i/1024 is a complex number: its real part is 1, and its imaginary part is 1/1024 = 0.0009765625.

Let’s now work our way up again by using that complex number 1 + i/1024 = 1 + i·0.0009765625 to calculate ei/512, ei/256, ei/128 etcetera. All the way back up to x = 1, i.e. ei. I’ll just use a different symbol for x: in the table below, I’ll substitute x for s because I’ll refer to the real part of our complex numbers as ‘x’ from time to time (even if I write a and b in the table below), and so I can’t use the symbol x to denote the fraction. [I could have started with s, but then… Well… Real numbers are usually denoted by x, and so it was easier to start that way.] In any case…

The thing to note is how I calculate those values ei/512, ei/256, ei/128 etcetera. I am doing it by squaring, i.e. I just multiply the (complex) number by itself. To be very explicit, note that ei/512 = (ei/1024)= ei·2/1024 = (ei/1024)(ei/1024). So all that I am doing in the table below is multiply the complex number that I have with itself, and then I have a new result, and then I square that once again, and then again, and again, and again etcetera. In other words, when going back up, I am just taking the square of a (complex) number. Of course, you know how to multiply a number with itself but, because we’re talking complex numbers here, we should actually write it out:

(a + i·b)= a– b2 + i·2ab = a– b2 + 2abi

[It would be good to always separate the imaginary unit from real numbers like a, b, or ab, but then I am lazy and so I hope you’ll always recognize that is the imaginary unit.] In any case… When we’re going back up (by squaring), the real part of the next number (i.e. the ‘x’ in x + iy) is a– b2 and the complex part (the ‘y’) is 2abi. So that’s what’s shown below—in the fourth and fifth column, that is.

Capture

Look at what happens. The x goes to zero and then becomes negative, and the y increases to one. Now, we went down from e1/n = e1 = e1/1 to e1/n = e1/1024, but we could have started with e2, or e4/n, or whatever. Hence, I should actually continue the calculations above so you can see what happens when s goes to 2, and then to 3, and then to 4, and so on and so on. What you’d see is that the value of the real and imaginary part of this complex exponential goes up and down between –1 and +1. You’d see both are periodic functions, like the sine and cosine functions, which I added in the last two columns of the table above. Now compare those a and b values (i.e. the second and third column) with the cosine and sine values (i.e. the last two columns). […] Do you see it? Do you see how close they are? Only a few parts in a million, indeed.

You need to let this sink it for a while. And I’d recommend you make a spreadsheet yourself, so you really ‘get’ what’s going on here. It’s all there is to the so-called ‘magic’ of Euler’s formula. That simple (a + ib)= a– b2 + 2abformula shows us why (and how) the real and imaginary part oscillate between –1 and +1, just like the cosine and sine function. In fact, the values are so close that it’s easy to understand what follows. They are the same—in the limit, of course

Indeed, these values a– b2 and 2ab, i.e. the real and imaginary part of the next complex number in our series, are what Feynman refers to as the algebraic cosine and sine functions, because we calculate them as (a + ib)= a– b2 + 2abi. These algebraic cosine and sine values are close to the real cosine and sine values, especially for small fractions s. Of course, there is a discrepancy becomes – when everything is said and done – we do carry a little error with us from the start, because we stopped at 1/n = 1/1024, before going back up.

There’s actually a much more obvious way to appreciate the error: we know that e1/1024 should be some point on the unit circle itself. Therefore, we should not equate a with 1 if we have some value b > 0. Or – what amounts to saying the same – if if b is slightly bigger than 0, then a should be slightly smaller than 1. So the eiε ≈ 1 + iε is an approximation only. It cannot be exact for positive values of ε. It’s only exact when ε = 0.

So we’re off—but not far off as you can see. In addition, you should note that the error becomes bigger and bigger for larger s. For example, in the line for s = 1, we calculated the values of the algebraic cosine and sine for s = 2 (see the a^2 – b^2 and 2ab column) as –0.416553 and 0.910186, but the actual values are cos(2) = –0.416146 and sin(2) = 0.909297, which shows our algebraic cosine and sine function is gradually losing accuracy indeed (we’re off like one part in a thousand here, instead of one part in a million). That’s what we’d expect, of course, as we’re multiplying the errors as we move ‘back up’.

The graph below plots the values of the table.

Capture

This graph also shows that, as we’re doubling our ratio r all the time, the data points are being spaced out more and more. This ‘spacing out’ gets a lot worse when further increasing s: from s = 1 (that’s the ‘highest’ point in the graph above), we’d go to s = 2, and then to s = 4, s = 8, etcetera. Now, these values are not shown above but you can imagine where they are: for s = 2, we’re somewhere in the second quadrant, for s = 4, we’re in the third, etcetera. So that does not make for a smooth graph. We need points in-between. So let’s ‘fix’ this problem by taking just one value for s out of the table (s = 1/4, for example) and we’ll continue to use that value as a multiplier.

That’s what’s done in the table below. It looks somewhat daunting at first but it’s simple really. First, we multiply the value we got for e1/4 with itself once again, so that gives us a real and an imaginary part for e1/8 (we had that already in the table above and you can check: we get the same here). We then take that value (i.e. e1/8) not to multiply it with itself but with e1/4 once again. Of course, because the complex numbers are not the same, we cannot use the (a + ib)= a– b2 + 2abi rule any more. We must now use the more general rule for multiplying different complex numbers: (a + ib)(c + id) = (ac – bd) + i(ad + bc). So that’s why I have an a, b, c and d column in this table: a and b are the components of the first number, and c and d of the second (i.e. e1/4 = 0.969031 + 0.247434i)

e calculation 4

In the table above, I let s range from zero (0) to seven (7) in steps of 0.25 (= 1/4). Once again, I’ve added the real cosine and sine values for these angles (they are, of course, expressed in radians), because that’s what s is here: an angle, aka as the phase of the complex number. So you can compare.

The table confirms, once again, that we’re slowly losing accuracy (we’re now 3 to 4 parts in a thousand off), but it is very slowly only indeed: we’d need to do many ‘loops’ around the center before we could actually see the difference on a graph. Hey! Let’s do a graph. [Excel is such a great tool, isn’t it?] Here we are: the thick black line describing a circle on the graph below connects the actual cosine and sine values associated with an angle of 1/4, 1/2, 3/8 etcetera, all the way up to 7 (7 is about 2.3π, so we’re some 40 degrees past our original point after the ‘loop’), while the little ‘+‘ marks are the data points for the algebraic cosine and sine. They match perfectly because our eye cannot see the little discrepancy.

graph with sine and cosine

So… That’s it. End of story.

What?

Yes. That’s it. End of story. I’ve done what I promised to do. I constructed the sine and cosine functions algebraically. No compass. 🙂 Just plain arithmetic, including one extra rule only: i= –1. That’s it.

So I hope I succeeded. The goal was to take some of the magic out of Euler’s formula by showing how that eiε = 1 + iε approximation and the definition of i= –1 gives us the cosine and sine function itself as we move around the unit circle starting from the unity point on the real axis, as shown in that little graph:

Capture

Of course, the ε we were working with was much smaller than the size of the arrow suggests (it was equal to 1/1024 ≈ 0.000977 to be precise) but that’s just to show how differentials work. 🙂 Pretty good, isn’t it? 🙂

Post scriptum:

I. If anything, all this post did was to demonstrate multiplication of complex numbers. Indeed, when everything is said and done, exponentiation is repeated multiplication–both for real as well as for complex exponents. The only difference is–well… Complex exponents give us these oscillating things, because a complex exponent effectively throws a sine and cosine function in.

Now, we can do all kinds of things with that. In this post, we constructed a circle without a compass. Now, that’s not as good as squaring the circle 🙂 but, still, it would have awed Pythagoras. Below, I construct a spiral doing the same kind of math: I start off with a complex number again but now it’s somewhat more off the unit circle (1 + 0.247434i). In fact, I took the same sine value as the one we had for ei/4 but I replaced the cosine value (0.969031) with 1 exactly). In other words, my ε is a lot bigger here.

Then I multiply that complex number 1 + 0.247434with itself to get the next number (0.938776 + 0.494868i), and then I multiply that result once again with my first number (1 + 0.247434i), just like we did when we were constructing the circle. And then it goes on and on and on. So the only difference is the initial value: that’s a bit more off the unit circle. [When we constructed the circle, our initial value was also a bit off but much less. Here we go for a much larger difference.]

Capture

graph

So you can see what happens: multiplying complex numbers amounts to adding angles and multiplying magnitudes: αeiβ·γeiδ = αγei(β+δ) =|αeiβ|·|γeiδ|ei(β+δ)| = |α||γ|ei(β+δ). So, because we started off with a complex number with magnitude slightly bigger than 1 (you calculate it using Pythagoras’ theorem: it’s 1.03, more or less, which is 3% off, as opposed less than one part in a million for the 1 + 0.000977i number), the next point is, of course, slightly off the unit circle too, and some more than 3% actually. And so that goes on and on and on and the ‘some more’ becomes bigger and bigger in the process.

Constructing a graph like this one is like doing the kind of silly stuff I did when programming little games with our Commodore 64 in the 1980s, so I shouldn’t dwell too much on this. In fact, now that I think of it: I should have started near –i, then my spiral would have resembled an e. 🙂 And, yes – for family reading this – this is also like the favorite hobby of our dad: calculating a better value for π. 🙂

However… The only thing I should note, perhaps, is that this kind of iterative process resembles – to some extent – the kind of process that iterative function systems (IFSs) use to create fractals. So… Well… It’s just nice, I guess. [OK. That’s just an excuse. Sorry.]

II. The other thing that I demonstrated in this post may seem to be trivial but I’ll emphasize it here because it helped me (not sure about you though) to understand the essence of real exponentials much better than I did before. So, what is it?

Well… It’s that rather remarkable fact that calculating (real) irrational powers amounts to doing some infinite iteration. What do I mean with that?

Well… Remember that we kept on taking the square root of e, so we calculated e1/2, and then (e1/2)1/2 = e1/4, and then (e1/4)1/2 e1/8, and then we went on: e1/16e1/32e1/64, all the way down to e1/1024, where we stopped. That was 10 iterations only. However, it was clear we could go on and on and on, to find that limit we know so well: e1/Δ tends to 1 (not to zero (0), and not to either!) for Δ → ∞.

Now, e = e1 is an exponential itself and so we can switch to another base, base-10 for example, using the general a= (bk)= bks = bt formula, with k = logb(a). Let’s do base-10: we get e1 = [10log10(e)]=  100.434294…etcetera. Now, because is an irrational number, log10(e) is irrational too, so we indeed have an infinite number of decimals behind the decimal point in 0.434294…etcetera. In fact, e is not only irrational but transcendental: we can’t calculate it algebraically, i.e. as the root of some polynomial with rational coefficients. Most irrational numbers are like that, by the way, so don’t think that being ‘transcendental’ is very special. In any case… That’s a finer point that doesn’t matter much here. You get the idea, I hope. It’s the following:

  1. When we have a rational power am/n , it helps to think of it as a product of m factors a1/n (and surely if we would want to calculate am/n without using a calculator, which, I admit, is not very fashionable anymore and so nobody ever does that: too bad, because the manual work involved does help to better understand things). Let’s write it down: am/n = am·(1/n) =(a1/n)m = a1/n·a1/n·a1/n·a1/n =·… (m times). That’s simple indeed: exponentiation is repeated multiplication. [Of course, if m is negative, then we just write am/n as 1/(am/n), but so that doesn’t change the general idea of exponentiation.]
  2. However, it is much more difficult to see why, and how, exponentiation with irrational powers amounts to repeated multiplication too. The rather lengthy exposé above shows… Well, perhaps not why, but surely how. [And in math, if we can show how, that usually amounts to showing why also, isn’t it? :-)] Indeed, when we think of ar (i.e. an irrational power of some (real) number a), we can think of it as a product of an infinite number of factors ar/Δ. Indeed, we can write aas:

a= ar(1/Δ + 1/Δ + 1/Δ + 1/Δ +…) = ar/Δ·ar/Δ·ar/Δ·ar/Δ

Not convinced? Let’s work an example: 10π = [eln10]π = [eln10]π = eln10·π = eln10·π = e7.233784… Of course, if you take your calculator, you’ll find something like 1385.455731, both for 10π  and e7.233784 (hopefully!), but so that’s not the point here. We’ve shown that is an infinite product e1/Δ·e1/Δ·e1/Δ·e1/Δ·… =e(1/Δ+1/Δ+1/Δ+1/Δ+…) eΔ/Δ with Δ some infinitely large (but integer) number. In our example, we stopped the calculation at Δ = 1024, but you see the logic: we could have gone on forever. Therefore, we can write e7.233784… as

e7.233784… = e7.233784…(1/Δ+1/Δ+1/Δ+1/Δ+…) = e7.233784…/Δ·e7.233784…/Δ·e7.233784…/Δ

Still not convinced? Let’s revert back to base 10. We can write the factors e7.233784…/Δ as e(ln10·π)/Δ = [eln10]π/Δ = 10π/Δ. So our original power 10π is equal to: 10π = 10π/Δ·10π/Δ·10π/Δ·10π/Δ·10π/Δ·10π/Δ… = 10π(Δ/Δ), and of course, 101/Δ also tends to 1 as Δ goes to infinity (not to zero, and not to 10 either). 🙂 So, yes, we can do this for any real number a and for any r really.

Again, this may look very trivial to the trained mathematical eye but, as a novice in Mathematical Wonderland, I felt I had to go through this to truly understand irrational powers. So it may or may not help you, depending on where you are in MW.

[Proving that the limit for Δ/Δ goes to 1 as Δ goes to ∞ should not be necessary, I hope? 🙂 But, just in case you wonder how the formula for rational and irrational powers could possibly be related, we can just write am/n = a(m/n)(1/Δ + 1/Δ + 1/Δ + 1/Δ +…) = am/nΔ·am/nΔ·am/nΔ·am/nΔ·…= (a1/Δ + 1/Δ + 1/Δ + 1/Δ +…)m/n = am/n, as we would expect. :-)]

III. So how does that a= ar/Δ·ar/Δ·ar/Δ·ar/Δ… formula work for complex exponentials? We just add the i, so we write air but we know what effect that has: we have a different beast now. A complex-valued function of r, or… Well… If we keep the exponent fixed, then it’s a complex-valued function of a! Indeed, do remember we have a choice here (and two inverse functions as well!).

However, note that we can write air in two slightly different ways. We have two interpretations here really:

A. The first interpretation is the easiest one: we write air as air =  (ar)i = (ar/Δ + r/Δ + r/Δ + r/Δ +…)i.

So we have a real power here, ar, and so that’s some real number, and then we raise it to the power i to create that new beast: a complex-valued function with two components, one imaginary and one real. And then we know how to relate these to the sine and cosine function: we just change the base to e and then we’re done.

In fact, now that we’re here, let’s go all the way and do it. As mentioned in my previous post  – it follows out of that a= (ek)= eks = eformula, with k = ln(a) – the only effect of a change of base is a change of scale of the horizontal axis: the graph of as is fully identical to the graph of et indeed: we just we need to substitute s by t = ks = ln(a)·s. That’s all. So we actually have our ‘Euler formula for aihere. For example, for base 10, we have 10i= cos[ln(a)·s] + isin[ln(a)·s].

But let’s not get lost in the nitty-gritty here. The idea here is that we let ‘act’ on ar, so to say. And then, of course, we can write ar as we want, but that doesn’t change the essence of what we’re dealing with.

B. The second interpretation is somewhat more tricky: we write air as air = air/Δ·air/Δ·air/Δ·air/Δ·…

So that’s a product of an (infinite) number of complex factors air/Δ. Now, that is a very different interpretation than the one above, even if the mathematical result when putting real numbers in for a and r will – obviously – have to be the same. If the result is the same, then what am I saying really? Well… Nothing much, I guess. Just that the interpretation of an exponentiation as repeated multiplication makes sense for complex exponentials as well:

  • For rational r, we’ll have a finite number of complex factors: aim/n = ai/n·ai/n·ai/n·ai/n·… (m times).
  • For irrational r, we’ll have an infinite number of complex factors air = air/Δ·air/Δ·air/Δ·air/Δ… etcetera.

So the difference with the first interpretation is that, instead of looking at aias a real number ar that’s being raised to the complex power i, we’re looking at aias a complex number ai that’s being raised to the real power r. As said, the mathematical result when putting real numbers in for a and r will – obviously – have to be the same. [Otherwise we’d be in serious trouble of course: math is math. We can’t have the same thing being associated with two different results.] But, as said, we can effectively interpret air in two ways.

[…]

What I am doing here, of course, is contemplating all kinds of mathematical operations here – including exponentiation – on the complex space, rather on the real space. So the first step is to raise a complex number to a real power (as opposed to raising a real number to a complex power). The next step will be to raise a complex number to a complex power. So then we’re talking complex-valued functions of complex variables.

Now, that’s what complex analysis is all about, and I’ve written very extensively about that in my October-November 2013 post. So I would encourage you to re-read those, now that you’ve got, hopefully, a bit more of an ‘intuitive’ understanding of complex numbers with the background given in this and my previous post.

Complex analysis involves mapping (i.e. mapping from one complex space to another) and that, in turn, involves the concept of so-called analytic and/or holomorphic functions. Understanding those advanced concepts is, in turn, essential to understanding the kind of things that Penrose is writing about in Chapter 9 to 12 of his Road to Reality. […] I’ll probably re-visit these chapters myself in the coming weeks, as I realize I might understand them somewhat better now. If I could get through these, I’d be at page 250 or so, so that’s only one quarter of the total volume. Just an indication of how long that Road to Reality really is. 🙂

And then I am still not sure if it really leads to ‘reality’ because, when everything is said and done, those new theories (supersymmetry, M-theory, or string theory in general) are quite speculative, aren’t they? 🙂

Reflecting on complex numbers (again)

Pre-scriptum (dated 26 June 2020): This post – part of a series of rather simple posts on elementary math and physics – did not suffer much from the attack by the dark force—which is good because I still like it. Enjoy !

Original post:

This will surely be not my most readable post – if only because it’s soooooo long and – at times – quite ‘philosophical’. Indeed, it’s not very rigorous or formal, unlike those posts on complex analysis I wrote last year. At the same time, I think this post digs ‘deeper’, in a sense. Indeed, I really wanted to get to the heart of the ‘magic’ behind complex numbers. I’ll let you judge if I achieved that goal.

Complex numbers: why are they useful?

The previous post demonstrated the power of complex numbers (i.e. why they are used for), but it didn’t say much about what they are really. Indeed, we had a simple differential equation–an expression modeling an oscillator (read: a spring with a mass on it), with two terms only: d2x/dt2 = –ω2x–but so we could not solve it because of the minus sign in front of the term with the x.

Indeed, the so-called characteristic equation for this differential equation is r2 = –ω2 and so we’re in trouble here because there is no real-valued r that solves this. However, allowing complex-valued roots (r = ±iω) to solve the characteristic equation does the trick. Let’s analyze what we did (and don’t worry if you don’t ‘get’ this: it’s not essential to understand what follows):

  • Using those complex roots, we wrote the general solution for the differential equation as Aeiωt+ Beiωt. Now, note that everything is complex in this general solution, not only the eiωt and eiωt  ‘components’ but also the (random) coefficients A and B.
  • However, because we wanted to find a real-valued function in the end (remember: x is a vertical displacement from an equilibrium position x = 0, so that’s ‘real’ indeed), we imposed the condition that Aeiωtand Beiωt had to be each other’s complex conjugate. Hence, B must beequal to A* and our ‘general’ (real-valued) solution was Aeiωt+ A*eiωt. So we only have one complex (but equally random) coefficient now – A – and we get the other one (A*) for free, so to say.
  • Writing A in polar notation, i.e. substituting A for A = x0eiΔ, which implies that A* = x0e–iΔ, yields A0eiΔeiωt + A0e-iΔeiω = A0[ei(ωt + Δ) + ei(ωt + Δ)].
  • Expanding this, using Euler’s formula (and the fact that cos(-α) = cosα but sin(-α) = sinα) then gives us, finally, the following (real-valued) functional form for x:

A0[cos(ωt + Δ) + isin(ωt + Δ) + cos(ωt + Δ) – isin(ωt + Δ)]

= 2A0cos(ωt + Δ) = x0cos(ωt + Δ)

That’s easy enough to follow, I guess (everything is relative of course), but do we really understand what we’re doing here? Let me rephrase what’s going on here:

  • In the initial problem, our dependent variable x(t) was the vertical displacement, so that was a real-valued function of a real-valued (independent) variable (time).
  • Now, we kept the independent variable t real – time is always real, never imaginary 🙂 – but so we made x = x(t) a complex (dependent) variable by equating x(t) with the complex-valued exponential ert. So we’re doing a substitution here really.
  • Now, if ert is complex-valued, it means, of course, that r is complex and so that allows us to equate r with the square root of a negative number (r = ±iω).
  • We then plug these imaginary roots back in and get a general complex-valued solution (as expected).
  • However, we then impose the condition that the imaginary part of our solution should be zero.

In other words, we had a family of complex-valued functions as a general solution for the differential equation, but we limited the solution set to a somewhat less general solution including real-valued functions only.

OK. We all get this. But it doesn’t mean we ‘understand’ complex numbers. Let’s try to take the magic out of those complex numbers.

Complex numbers: what are they?

I’ve devoted two or three posts to this already (October-November 2013) but let’s go back to basics. Let’s start with that imaginary unit i. The essence of– and, yes, I am using the term ‘essence’ in a very ‘philosophical’ sense here I guess: i‘s intrinsic nature, so to speak – is that its square is equal to minus one: i2= –1.

That’s it really. We don’t need more. Of course, we can associate i with lots of other things if we would want to (and we will, of course!), such as Euler’s formula for example, but these associations are not essential – or not as essential as this definition I should say. Indeed, while that ‘rule’ or ‘definition’ is totally weird and – at first sight – totally random, it’s the only one we need: all other arithmetic rules do not change and, in fact, it’s just that one extra rule that allows us to deal with any algebraic equation – so that’s literally every equation involving addition, multiplication and exponentiation (so that’s every polynomial basically). However, stating that i2= –1 still doesn’t answer the question: what is a complex number really?

In order to not get too confused, I’ve started to think we should just take complex numbers at face value: it’s the sum of (i) some real number and (ii) a so-called imaginary part, which consists of another real number multiplied with i. [So the only ‘imaginary’ bit is, once again, i: all the rest is real! ] Now, when I say the ‘sum’, then that’s not some kind of ‘new’ sum. Well… Let me qualify that. It’s not some kind of ‘new’ sum because we’re just adding two things the way we’re used to: two and two apples are four apples, and one orange plus two more is three. However, it is true that we’re adding two separate beasts now, so to say, and so we do keep the things with an i in them separate from the real bits. In short, we do keep the apples and the oranges separate.

Now, I would like to be able to say that multiplication of complex numbers is just as straightforward as adding them, but that’s not true. When we multiply complex numbers, that i2= –1 rule kicks in and produces some ‘effects’ that are logical but not all that ‘straightforward’ I’d say.

Let’s take a simple example–but a significant one (if only because we’ll use the result later): let’s multiply a complex number with itself, i.e. let’s take the square of a complex number. We get (a + bi)2= (a + bi)(a + bi) = a·a + a·(bi) + (bi)·a + (bi)·(bi) = a+ 2abi + b2i= a2 + 2abi – b2. That’s very different as compared to the square of a real sum a + b: (a + b)= a+ 2ab + b2. How? Just look at it: we’ve got a real bit (a2 – b2) and then an imaginary bit (2abi). So what?

Well… The thumbnail graph below illustrates the difference for a = b: it maps x to (a) 4x[i.e. (x + x)2] and to (b) 2x2 [i.e. (x + ix)2] respectively. Indeed, when we’re squaring real numbers, we get (a + b)= 4a2–i.e. a ‘real bit’ only, of course!–but when we’re squaring complex numbers, we need to keep track of two components: the real part and the imaginary part. However, the real part (a2 – b2) is zero in this case (a = b), and so it’s only the imaginary part 2abi = 2a2i that counts!

graph (2)

That’s kids stuff, you’ll say… In fact, when you’re a mathematician, you’ll say it’s a nonsensical graph. Why? Because it compares an apple and an orange really: we want to show 2ixreally, not 2x2.

That’s true. However, that’s why the graph is actually useful. The red graph introduces a new idea, and with a ‘new’ idea I mean something that’s not inherent in the i2= –1 identity: it associates i with the vertical axis in the two-dimensional plane.

Hmm… This is an idea that is ‘nice’ – very nice actually – but, once again, I should note that it’s not part of i‘s essence. Indeed, the Italian mathematicians who first ‘invented’ complex numbers in the early 16th century (Tartaglia (‘the Stammerer’) and da Vinci’s friend Cardano) introduced roots of –1 because they needed them to solve algebraic equations. That’s it. Full stop. It was only much later (some hundred years later that is!) that Euler and Descartes associated imaginary numbers (like 2ix2) with the vertical coordinate axis. To my readers who have managed not to fall asleep while reading this: please continue till the end, and you will understand why I am saying the idea of a geometrical interpretation is ‘not essential’.

To the same readers, I’ll also say the following, however: if we do associate complex numbers with a second dimension, then we can associate the algebraic operations with things we can visualize in space. Most of you–all of you I should say–know that already, obviously, but let’s just have a look at that to make sure we’re on the same page.

A very basic thing in physical mathematics is reversing the direction of something. Things go in one direction, but we should be able to visualize them going in the opposite direction. We may associate this with a variable going from 0 to infinity (+∞): it may be time (t), or a time-dependent variable x, y or z. Of course, we know what we have here: we think of the positive real axis. So, what we do when we multiply with –1 is reversing its direction, and so then we’re talking the negative real axis: a variable going from 0 to minus infinity (-∞). Therefore, we can associate multiplication by –1 with a full rotation around the center (i.e. around the zero point) by 180 degrees (i.e. by π, in radians).imaginary_rotation

You may think that’s a weird way of looking at multiplication by minus one. Well… Yes and no. But think of it: the concept of negative numbers is actually as ‘weird’ as the concept of the imaginary unit in a way. I mean… Think about it: we’re used to use negative numbers because we learned about them when we were very small kids but what are they really? What does it mean to have minus three apples? You know the answer of course: it probably means that you owe someone three apples but that you don’t have any right now. 🙂 […] But that’s not the point here. I hope you see what I mean: negative numbers are weird too, in a sense. Indeed, we should be aware of the fact that we often look at concepts as being ‘weird’ because we weren’t exposed to them early enough: the great mathematician Leonhard Euler thought complex numbers were so ‘essential’ to math and, hence, so ‘natural’ that he thought kids should learn complex numbers as soon as they started learning ‘real’ numbers. In fact, he probably thought we should only be using complex numbers because… Well… They make the arithmetic space complete, so to say. […] But then I guess that’s because Euler understood complex numbers in a way we don’t, which is why I am writing about them here. 🙂

OK. Back to the main story line. In order to understand complex numbers somewhat better, it is actually useful – but, again, not necessarily essential – to think of i as a halfway rotation, i.e. a rotation by 90 degrees only, clockwise or counterclockwise, as illustrated above: multiplication with i means a counterclockwise rotation by 90 degrees (or π/2 radians) and multiplication with –i means a clockwise rotation by the same amount. Again, the minus sign gives the direction here: clockwise or counterclockwise. It works indeed: i·i =(-i)·(-i) = –1.

OK. Let’s wrap this up: we might say that

  • a positive real number is associated with some (absolute) quantity (i.e. a magnitude);
  • a minus sign says: “Go the opposite way! Go back! Subtract!”– so it’s associated with the opposite direction or the opposite of something in general; and, finally,
  • the imaginary unit adds a second dimension: instead of moving on a line only, we can now walk around on a plane.

Once we understand that, it’s easy to understand why, in most applications of complex numbers, you’ll see the polar notation for complex numbers. Indeed, instead of writing a complex number z as z = a+ ib, we’ll usually see it written as:

z = reiθ with eiθ = cosθ + isinθ

Huh? Well… Yes. Let me throw it in here straight away. You know this formula: it’s Euler’s formula. The so-called ‘magical’ formula! Indeed, Feynman calls it ‘our jewel’: the ‘most remarkable formula in mathematics’ as he puts it. Waw ! If he says so, it must be right. 🙂 So let’s try to understand it.

Is it magical really? Well… I guess the answer is ‘Yes’ and ‘No’ at the same time:

  • No. There is no ‘magic’ here. Associating the real part a and the imaginary part b with a magnitude r and an angle θ (a = rcosθ and b = rcosθ) is actually just an application of the Pythagorean theorem, so that’s ‘magic’ you learnt when you were very little and, hence, it does not look like magic anymore. [Although you should try to appreciate its ‘magic’ once again, I feel. Remember that you heard about the Pythagorean theorem because your teacher wanted to tell you what the square root of 2 actually is: a so-called irrational number that we get by taking the ‘one-half power’ of 2, i.e. 21/2 = 20.5, or, what amounts to the same, the square root of 2. Of course, you and I are both used to irrational numbers now, like 21/2, but they are also ‘weird’. As weird as i. In fact, it is said that the Greek mathematician who claimed their existence was exiled, because these irrational numbers did not fit into the (early) Pythagorean school of thought. Indeed, that school of thought wanted to reduce geometry to whole numbers and their ratios only. So there was no place for irrational numbers there!]
  • Yes. It is ‘magical’. Associating eiθ – so that’s a complex exponential function really! – with the unit circle is something you learnt much later in life only, if ever. It’s a strange thing indeed: we have a real (but, I admit, irrational) number here – e is 2.718 followed by an infinite number of decimals as you know, just like π – and then we raise to the power iθ, so that’s i once again multiplied by a real number θ (i.e. the so-called phase or – to put it simply – the angle). By now, we know what it means to multiply something with i, and–of course–we also know what exponentiation is (it’s just a shorthand for repeated multiplication), but we haven’t defined complex exponentials yet.

In fact… That’s what we’re going to do here. But in a rather ‘weird’ way as you will see: we won’t define them really but we’ll calculate them. For the moment, however, we’ll leave it at this and just note that, through Euler’s relation, we can see how a fraction or a multiple of i, e.g. 0.1i or 2.3i, corresponds to a fraction or a multiple of the angle associated with i, i.e. 0.1 times π/2 or 2.3 times π/2. In other words, Euler’s formula shows how the second (spatial) dimension is associated with the concept of the angle.

[…] And then the third (spatial) dimension is, of course, easy to add: it’s just an angle in another direction. What direction? Well… An angle away from the plane that we just formed by introducing that first angle. 🙂 […] So, from our zero point (here and now), we use a ruler to draw lines, and then a compass to measure angles away from that line, and then we create a plane, and then we can just add dimensions as we please by adding more ‘angles’ away from what we already have (a line, or a plane, and any higher-dimensional thing really).

Dimensions

I feel I need to digress briefly here, just to make sure we’re on the same page. Dimensions. What is a dimension in physics or in math? What do we mean if we say that spacetime is a four-dimensional continuum? From what we wrote above, the concept of a spatial dimension should be obvious: we have three dimensions in space (the x, y and z direction), and so we need three numbers indeed to describe the position of an object, from our point of view that is (i.e. in our reference frame).

But so we also have a fourth number: time. By now, you also know that, just like position and/or motion in space, time is relative too: that is relative to some frame of reference indeed. So, yes, we need four numbers, i.e. four dimensions, to describe an event in spacetime. That being said, time is obviously still something different (I mean different than space), despite the fact that Einstein’s relativity theory relates it to space: indeed, we showed in our post on (special) relativity that there’s no such thing as absolute time. However, that actually reinforces the point: a point in time is something fundamentally different than a point in space. Despite the fact that

  1. Time is just like a space dimension in the physical-mathematical meaning of the term ‘dimension’ (a dimension of a space or an object is one of the coordinates that is needed to specify a point within that space, or to ‘locate’ the object – both in time and space that is); and that,
  2. We can express distance and time in the same units because the speed of light is absolute (so that allows us to express time in meter, despite the fact that time is relative or “local”, as Hendrik Lorentz called it); and that, finally,
  3. If we do that (i.e. if we express time and distance in equivalent units), the equations for space and time in the Lorentz transformation equations mirror each other nicely – ‘mixing’ the space and time variables in the same way, so to say – and, therefore, space and time do form a ‘kind of union’, as Minkowski famously said;

Despite all that, time and space are fundamentally different things. Perhaps not for God – because He (or She, or It?) is said to be Everywhere Always – but surely for us, humans. For us, humans, always busy constructing that mental space with our ruler and our compass, time is and remains the one and only truly independent variable. Indeed, for us, mortal beings, the clocks just tick (locally indeed – that’s why I am using a plural: clocks – but that doesn’t change the fact they’re ticking, and in one direction only).

And so things happen and equations such as the one we started with – i.e. the differential equation modeling the behavior of an oscillator – show us how they happen. In one of my previous posts, I also showed why the laws of physics do not allow us to reverse time, but I won’t talk about that here. Let’s get back to complex numbers. Indeed, I am only talking about dimensions here because, despite all I wrote above about the imaginary axis in the complex plane, the thing to note here is that we did not use complex numbers in the physical-mathematical problem above to bring in an extra spatial dimension.

We just did it because we could not solve the equation with one-dimensional numbers only: we needed to take the square root of a negative number and we couldn’t. That was it basically. So there was no intention of bringing in a y- or z-dimension, and we didn’t. If we would have wanted to do that, we would have had to insert another dependent variable in the differential equation, and so it would have become a so-called partial differential equation in two or three dependent variables (x, y and z), with time – once again – as the independent variable (t). [A differential equation in one variable only (real- or complex-valued), like the ones we’re used to now, are referred to as ordinary differential equations, as opposed to… no, not extraordinary, but partial differential equations.]

In fact, if we would have generalized to two- or three-dimensional space, we would have run into the same type of problem (roots of negative numbers) when trying to solve the partial differential equation and so we would have needed complex-valued variables to solve it analytically in this case too. So we would have three ‘dimensions’ but each ‘dimension’ would be associated with complex (i.e. ‘two-dimensional) numbers. Is this getting complicated? I guess so.

The point is that, when studying physics or math, we will have to get used to the fact that these ‘two-dimensional numbers’ which we introduced, i.e. complex numbers, are actually more ‘natural’ ‘numbers’ to work with from a purely analytic point of view (as for the meaning of ‘analytic’, just read it as ‘logical problem-solving’), especially when we write them in their polar form, i.e. as complex exponentials. We can then take advantage of that wonderful property that they already are a functional form (z =reiθ), so to speak, and that their first, second etcetera derivative is easy to calculate because that ‘functional form’ is an exponential, and exponentials come back to themselves when taking the derivative (with the coefficient in the exponent in front). That makes the differential equation a simple algebraic equation (i.e. without derivatives involved), which is easy to solve.

In short, we should just look at complex numbers here (i.e. in the context of my three previous posts, or in the context of differential equations in general) as a computational device, not as an attempt to add an extra spatial dimension to the analysis.

Now, that’s probably the reason why Feynman inserts a chapter on ‘algebra’ that, at first, does not seem to make much sense. As usual, however, I worked through it and then found it to be both instructive as well as intriguing because it makes the point that complex exponentials are, first and foremost, an algebraic thing, not a geometrical thing.

I’ll try to present his argument here but don’t worry if you can’t or don’t want to follow it all the way through because… Well… It’s a bit ‘weird’ indeed, and I must admit I haven’t quite come to terms with it myself. On the other hand, if you’re ready for some thinking ‘outside of the box’, I assure you that I haven’t found anything like this in a math textbook or on the Web. This proves the fact that Feynman was a bit of a maverick… Well… In any case, I’ll let you judge. Now that you’re here, I would really encourage you to read the whole thing, as loooooooong as it is.

Complex exponentials from an algebraic point of view: introduction

Exponentiation is nothing but repeated multiplication. That’s easy to understand when the exponents are integers: a to the power n (an) is a×a×a×a×… etcetera – repeated n times, so we have n factors (all equal to a) in the product. That’s very straightforward.

Now, to understand rational exponents (so that’s an m/n exponent, with m and n integers), we just need to understand one thing more, and that is the inverse operation of exponentiation, i.e. the nth root. We then get am/n = (am)1/n. So, that’s easy too. […] Well… No. Not that easy. In fact, our problems starts right here:

  • If n is even, and a is a positive real number, we have two (real) nth roots a1/n: ± a1/n.
  • However, if a is negative (and n is still even obviously), then we have a problem. There’s no real nth root of a in that case. That’s why Cardano invented i: we’ll associate an even root of a negative real number with two complex-valued roots.
  • What if n is uneven? Then we have only one real root: it’s positive when a is positive, and negative when a is negative. Done.

But let’s not complicate matters from the start. The point here is to do some algebra that should help us to understand complex exponentials. However, I will make one small digression, and that’s on logarithmic functions. It’s not essential but, again, useful. […] Well… Maybe. 🙂 I hope so. 🙂

We know that exponentials are actually associated with two inverse operations:

  1. Given some value y and some number n, we can take the nth root of y (y1/n) to find the original base x for which y = xn.
  2. Given some value y and some number a, we can take the logarithm (to base a) of y to find the original exponent x for which y = ax.

In the first case, the problem is: given n, find x for which y = xn. In the second case, the problem is: given a, find x for which y = ax. Is that complicated? Probably. In order to further confuse you, I’ve inserted a thumbnail graph with y = 2x (so that’s the exponential function with base 2) and y = log2x (so that’s the logarithmic function with base 2). You can see these two functions mirror each other, with the x = y line as the mirror axis.

graph

We usually find logarithms more ‘difficult’ than roots (I do, for sure), but that’s just because we usually learn about them much later in life–like in a senior high school class, for example, as opposed to a junior high school class (I am just guessing, but you know what I mean).

In addition, we have these extra symbols ‘log‘–L-O-G :-)–to express the function. Indeed, we use just two symbols to write the y = 2function: 2 and x – and then the meaning is clear from where we write these: we write 2 in normal script and x as a superscript and so we know that’s exponentiation. But so we’re not so economical for the logarithmic function. Not at all. In fact, we use three symbols for the logarithmic function: (1) ‘log’ (which is quite verbose as a symbol in itself, because it consists of three letters), (2) 2 and (3) x. That’s not economical at all! Indeed, why don’t we just write y = 2x or something? So that’s a subscript in front, instead of a superscript behind. It would work. It’s just a matter of getting used to it, i.e. it’s just a convention in other words.

Of course, I am joking a bit here but you get my point: in essence, the logarithmic function should not come across as being more ‘difficult’ or less ‘natural’ than the exponential function: exponentiation involves two numbers – a base and an exponent – and, hence, it’s logical that we have two inverse operations, rather than one. [You’ll say that a sum or a product involves (at least) two terms or two factors as well, so why don’t they have two inverse operations? Well… Addition and multiplication are commutative operations: a+b = b+a, and a·b = b·a. Exponentiation isn’t: a≠ na. That’s why. Check it: 2×3 = 3×2, but 23 = 8 ≠ 3= 9.]

Now, apart from us ‘liking’ exponential functions more than logarithmic functions because of the non-relevant fact that we learned about log functions only much later in our life, we will usually also have a strong preference for one or the other base for an exponential. The most preferred base is, obviously, ten (10). We use that base in so-called scientific notations for numbers. For example: the elementary charge (i.e. the charge of an electron) is approximately –1.6×10−19 coulombs. […] Oh… We have a minus sign in the exponent here (–19). So what’s that? Sorry. I forgot to mention that. But it’s easy: a–n = (an)–1 = 1/an.

Our most preferred base is 10 because we have a decimal system, and we have a decimal system because we have ten fingers. Indeed, the Maya used a base-20 system because they used their toes to count as well (so they counted in twenties instead of tens), and it also seems that some tribes had octal (base-8) systems because they used the spaces between their fingers, rather than the fingers themselves. And, of course, we all know that computers use a base-2 system because… Well… Because they’re computers. In any case, 10 is called the common base, because… Well… Because it’s common.

However, by now you know that, in physics and mathematics, we prefer that strange numberas a base. However, remember it’s not that strange: it’s just a number like π. Why do we call it ‘natural’? Because of that nice property: the derivative of the exponential function ecomes back to itself: d(ex)/dt = ex. That’s not the case for 10x. In fact, taking the derivative of 10is pretty easy too: we just need to put a coefficient in front. To be specific, we need to put the logarithm (to base e) of the base of our exponential function (i.e. 10) in front: d(10x)/dt = 10xln(10). [Ln(10) is yet another notation that has been introduced, it seems, to confuse young kids and ensure they hate logarithms: ln(10) is just loge(10) or, if I would have had my way in terms of conventions (which would ensure an ‘economic’ use of symbols), we could also write ln(10) = e10. :-)]

Stop! I am going way too fast here. We first need to define what irrational powers are! Indeed, from all that I’ve written so far, you can imagine what am/n is (am/n  = am)1/n, but what if m is not an integer? What if m equals the square root of 2, for example? In other words, what is 10or ex  or 2or whatever for irrational exponents?

We all sort of ‘know’ what irrationals are: it involves limits, infinitesimals, fractions of fractions, Dedekind cuts. Whatever, even if you don’t understand a word of what I am writing here, you do – intuitively: irrationals can be approximated by fractions of fractions. The grand idea is that we divide some number by 2, and then we divide by 2 once again (so we divide by 4), and then once again (so we take 1/8), and again (1/16), and so on and so on. These are Dedekind cuts. Of course, dividing by two is a pretty random way of cutting things up. Why don’t we divide by three, or by four, for example? Well… It’s the same as with those other ‘natural’ numbers: we have to start somewhere and so this  ‘binary’ way of cutting things up is probably the most ‘natural’. 🙂 [Have you noticed how many ‘natural’ numbers we’ve mentioned already: 10, e, π, 2… And one (1) itself of course. :-)]

So we’ll use something like Dedekind cuts for irrational powers as well. We’ll define them as a sort of limit (in fact, that’s exactly what they are) and so we have to find some approximation (or convergence) process that allows us to do so.

We’ll start with base 10 here because, as mentioned above, base 10 comes across as more ‘natural’ (or ‘common’) to us non-mathematicians than the so-called ‘natural’ base e. However, I should note that the base doesn’t matter much because it’s quite easy to switch from one base to another. Indeed, we can always write a= (bk)= bks = bt with a = band t = k·s (as for k, k is obviously equal to logb(a). From this simple formula, you can see that changing base amounts to changing the horizontal scale: we replace s by t = k·s. That’s it. So don’t worry about our choice of base. 🙂

Complex exponentials from an algebraic point of view: well… Not the introduction 🙂

Ouf! So much stuff! But so here we go. We take base 10 and see how such an approximation of an irrational power of 10 (10x) looks like. Of course, we can write any irrational number x as some (positive or negative) integer plus an endless series of decimals after the zero (e.g. e = 2 + 0.7182818284590452… etc). So let’s just focus on numbers between 0 and 1 as for now (so we’ll take the integer out of the total, so to speak). In fact, before we start, I’ll cheat and show you the result, just to make sure you can follow the argument a bit.

graph (3)Yes. That’s how 10x looks like, but so we don’t know that yet because we don’t know what irrational powers are, and so we can’t make a graph like that–yet. We only know very general things right now, such as:

  • 100 = 1 and 101 = 10 etcetera.
  • Most importantly, we know that 10m/n  = (10m)1/n = (101/n)for integer m and n.

In fact, we’ll use the second fact to calculate 10x for x = 1/2, 1/4, 1/8, 1/16, and so on and so on. We’ll go all the way down to where x becomes a fraction very close to zero: that’s the table below. Note that the x values in the table are rational fractions 1/2, 1/4, 1/8 etcetera indeed, so x is not an irrational exponent: x is a real number but rational, so x can be expressed either as a fraction of two integers m and n (m = 1 and n = 1, 4, 8, 16, 32 and so on here), or as a decimal number with a finite number of decimals behind the decimal point (0.5, 0.25, 0.125, 0.0625 etcetera).

Capture

The third column gives the value 10x for these fractions x = 1/2, 1/4, 1/8 etcetera. How do we get these? Hmm… It’s true. I am jumping over another hurdle here. The key assumption behind the table is that we know how to take the square root of a number, so that we can calculate 101/2, to quite some precision indeed, as 101/2 = 3.162278 (and there’s more decimals but we’re not too interested in them right now), and then that we can take the square root of that value (3.162278). That’s quite an assumption indeed.

However, if we don’t want this post to become a book in itself, then I must assume we can do that. In fact, I’ve done it with a calculator here but, before there were calculators, this kind of calculations could and had to be done with a table of logarithms. That’s because of a very convenient property of logarithms: logc(ab) =logc(a) + logc(b). However, as said, I should be writing a post here only, not a book. [Already now, this post beats the record in terms of length and verbosity…] So I’ll just ask you to accept that – at this stage – we know how to calculate the square root of something and, therefore, to accept that we can take the square root not only of 10 but of any number really, including 3.162278, and then the root of that number, and then the root of that result, and so and so on. So that gives us the values in the third column of the table above: they’re successive square roots. [Please do double-check! It will help you to understand what I am writing about here.]

So… Back to the main story. What we are doing in the table above is to take the square root in succession, so that’s (101/2)1/2 = 101/4, and then again: (101/4)1/2 = 101/8 , and then again: (101/8)1/2 = 101/16 , so we get 101/2, 101/4, 101/8, 101/16, 101/32 and so on and so on. All the way down. Well… Not all the way down. In fact, in the table above, we stop after ten iterations already, so that’s when x = 1/1024. [Note that 1/1024 is 2 to the power minus 10: 2–10 = 1/210   = 1/1024. I am just throwing that in here because that little ‘fact’ will come in handy later.]

Why do we stop after ten iterations? Well… Actually, there’s no real good reason to stop at exactly ten iterations. We could have 15 iterations: then x would be 1/215 = 1/32768. Or 20 (x = 1/1048576). Or 39 (x = 1/too many digits to write down). Whatever. However, we start to notice something interesting that actually allows us to stop. We note that 10 to the power x (10x) tends to one as x becomes very small.

Now you’re laughing. Well… Surely ! That’s what we’d expect, isn’t it? 10= 1. Is that the grand conclusion?

No.

The question is how small should x be? That’s where the fourth column of the table above comes in. We’re calculating a number there that converges to some value quite near to 2.3 as x goes to zero and – importantly – it converges rather quickly. In fact, if you’d do the calculations yourself, you’d see that it converges to 2.302585 after a while. [With Excel or some similar application, you can do 20 or more iterations in no time, and so that’s what you’ll find.]

Of course, we can keep going and continue adding zillions of decimals to this number but we don’t want to do that: 2.302585 is fine. We don’t need any more decimals. Why? Well… We’re going to use this number to approximate 10near x = 0: it turns out that we can get a real good approximation of 10x near x = 0 using that 2.302585 factor, so we can write that

10≈ 1 + 2.302585x

That approximation is the last column in the table above. In order to show you how good it is as an ‘approximation’, I’ve plotted the actual values for 10x (blue markers) and the approximated values for 10x (black markers) using that 1 + 2.302585x formula. You can see it’s a pretty good match indeed if x is small. And ‘small’ here is not that small: a ratio like x = 1/8 (i.e. x = 0.125) is good enough already! In fact, the graph below shows that 1/16 = 0.0625 is almost perfect! So we don’t need to ‘go down’ too far: ten iterations is plenty!

Capture

I’ve probably ‘lost’ you by now. What are we doing here really? How did we get that linear approximation formula, and why do we need it? Well… See the last column: we calculate (10x–1)/x, so that’s the difference between 10and 1 divided by the (fractional) exponent x and we see, indeed, that that number converges to a value very near to 2.302585. Why? Well… What we are actually doing is calculating the gradient of 10x, i.e. the slope of the tangent line to the (non-linear) 10x curve. That’s what’s shown in the graph below.

graph (1)

Working backwards, we can then re-write (10x–1)/x ≈ 2.302585 as 10≈ 1 + 2.302585x indeed.

So what we’ve got here is quite standard: we know we can approximate a non-linear curve with a linear curve, using the gradient near the point that we’re observing (and so that’s near the point x = 0 in this case) and so that‘s what we’re doing here.

Of course, you should remember that we cannot actually plot a smooth curve like that, for the moment that is, because we can only calculate 10x for rational real numbers. However, it’s easy to generalize and just ‘fill the gaps’ so to speak, and so that’s how irrational powers are defined really.

Hmm… So what’s the next step? Well… The next step is not to continue and continue and continue and continue etcetera to show that the smooth curve above is, indeed, the graph of 10x. No. The next step is to use that linear approximation to algebraically calculate the value of 10is, so that’s a power of 10 with a complex exponent.

HUH!? 

Yes. That’s the gem I found in Feynman’s 1965 Lectures. [Well… One of the gems, I should say. There are many. :-)]

It’s quite interesting. In his little chapter on ‘algebra’ (Lectures, I-22), Feynman just assumes that this ‘law’ that 10= 1 + 2.302585x is not only ‘correct’ for small real fractions x but also for very small complex fractions, and then he just reverses the procedure above to calculate 10ifor larger values of x. Let’s see how that goes.

However, let’s first switch the variable from x to s, because we’re talking complex numbers now. Indeed, I can’t use the symbol x as I used it above anymore because x is now the real part of some complex number 10is. In addition, I should note that Feynman introduces this delta (Δ). The idea behind is to make things somewhat easier to read by relating s to an integer: Δ = 1024s, so Δ = 1, 2, 4, 8,… 1024 for s = 1/1024, 1/512, 1/256 etcetera (see the second column in the table below). I am not entirely sure why he does that: Feynman must think fractions are harder to ‘read’. [Frankly, the introduction of this Δ makes Feynman’s exposé somewhat harder to ‘read’ IMHO – but that’s just a matter of taste, I guess.] Of course, the approximation then becomes

10= 1 + 2.302585·Δ/1024 = 1 + 0.0022486Δ. 

The table below is the one that Feynman uses. The important thing is that you understand the first line in this table: 10i/1024 = 1 + 0.00225i·Δ1 + 0.00225i·1 = 1 + 0.00225i. And then we go to the second line: 10i/512 = 10i/1024·10i/1024 = 102i/1024 = 10i/512, so we’re doing the reverse thing here: we don’t take square roots but we square what we’ve found already. So we multiply 1 + 0.00225i with itself and get (1+0.00225i)(1+0.00225i) =  1 + 2·0.00225i + 0.002252i2 = 1 – 0.000005 + 0.45i ≈ 0.999995 + 0.45i ≈ 1 + 0.0045i.

Capture 1

Let’s go to the third line now. In fact, what we’re doing here is working our way back up, i.e. all the way from s = 1/1024 to s = 1. And that’s where the ‘magic’ of i (i.e. the fact that i2 = –1) is starting to show: (0.999995+0.0045i)2 =  0.99999 + 2·0.999995·0.0045i + 0.00452i= 0.99997 + 0.009i. So the real part of 10iis changing as well – it is decreasing in fact! Why is that? Because of the term with the ifactor! [I write 0.99997 instead of 0.99996 because I round up here, while Feynman consistently rounds down.]

So now the game is clear: we take larger and larger fractions s (i/512, i/256, i/128, etcetera), and calculate 10iby squaring the previous result. After ten iterations, we get the grand result for s = i/1 = i:

10is = –0.66928 + 0.74332i (more or less that is)

Note the minus sign in front of the real part, and look at the intermediate values for x and y too. Isn’t that remarkable?

OK. Waw ! But… So what? What’s next?

Well… To graph 10is, we should not just keep squaring things because that amounts to doubling the exponent again and again and so that means the argument is just making larger and larger jumps along the positive real axis really (see that graph that I made above: the distance between the successive values of x gets larger and larger, and so that’s a bad recipe for a smooth graph).

So what can we do? Well… We should just take a sufficiently small power, i/8 for example, and multiply that with 1, 2, 3 etcetera so we get something more ‘regular’. That’s what’s done in the table below and what’s represented in the graph underneath (to get the scale of the horizontal axis, note that s = p/8).

Capture 2

Capture 3

Hey! Look at that! There we are! That’s the graph we were looking for: it shows a (complex) exponential (10is) as a periodic (complex-valued) function with the real part behaving like a cosine function and the imaginary part behaving like as a sine function.

Note the upper and lower bounds: +1 and –1. Indeed, it doesn’t seem to matter whether we use 10 or as a base: the x and y part oscillate between −1 and +1. So, whatever the base, we’ll see the same pattern: the base only changes the scale of the horizontal axis (i.e. s). However, that being said, because of this scale factor, I do need to say like a cosine/sine function when discussing that graph above. So I cannot say they are a cosine and a sine function. Feynman calls these functions algebraic sine and cosine functions.

But – remember! – we can always switch base through a clever substitution so 10is = eit and recalculate stuff to whatever number of decimals behind the decimal point we’d want. So let’s do that: let’s switch to base e. WOW! What happens?

We then [Finally! you’ll say!] get values that – Surprise ! Surprise ! – correspond to the real cosine and sine function. That then, in turn, allows us to just substitute the ‘algebraic’ cosine and sine function for the ‘real’ cosine in an equation that – Yes! – is Euler’s formula itself:

ei= cos(t) + isin(t)

So that’s it. End of story.

[…]

You’ll say: So what? Well… Not sure what to say. I think this is rather remarkable. This is not the formal mathematical proof of Euler’s formula (at least not of the kind that you’ll find in a textbook or on Wikipedia). No, we are just calculating the values x and y of ei= x + iy using an approximation process used to calculate real powers and then, well… Just some bold assumption involving infinitesimals really.

I think this is amazing stuff (even if I’ll downplay that statement a bit in my post scriptum). I really don’t understand these things the way I would like to understand them. I guess I just haven’t got the right kind of brain for these things. 😦 Indeed, just think about it: when we have the real exponential ex, then we’ve got that typical ‘rocket’ graph (i.e. the blue one in the graph below): just something blasting away indeed. But when we put in the exponent (eix), then we get two components oscillating up and down like the cosine and sine function. Well… Not only like the cosine and sine function: the green and red line– i.e. the real and imaginary part of eix!– actually are the cosine and sine function!

graph

Do you understand this in an intuitive way? Yes? You do? Waw ! Please write me and tell me how. I don’t. 😦

Oh well… The good thing about it is… Well… At least complex numbers will always stay ‘magical’ to me. 🙂

Post scriptum: When I write, above, that I don’t understand this in an intuitive way, I don’t mean to say it’s not logical. In fact, it is. It has to be, of course, because we’re talking math here! 🙂

The logic is pretty clear indeed. We have an exponential function here (y = 10x) and we’re evaluating that function in the neighborhood of x = 0 (we do it on the positive side only but we could, of course, do the same analysis on the other side as well). So then we use that very general mathematical procedure of calculating approximate values for the (non-linear) 10x curve using the gradient. So we plug in some differential value for x (in differential terms, we’d write Δx – but so the delta symbol here has nothing to do with Feynman’s Δ above) and, of course, we find Δy = 2.302585·Δx. So we add that to 1 (the value of 10at point x = 0) and, then, we go through these iterations, not using that linear equation any more, but the very fundamental property of an exponential function that 102x = (10x)2. So we start with an approximate value, but then the value we plug into these iterative calculations is the square of the previous value. So, to calculate the next points, we do not use an approximation method any more, but we just square the first result, and then the second and so on and so on, and that’s just calculation, not approximation.

[In fact, you may still wonder and think that it’s quite remarkable that the points we calculate using this process are so accurate, but that’s due to the rapid convergence of that value we found for the gradient. Well… Yes and no. Here I must admit that Feynman (and I) cheated a bit because we used a rather precise value for the gradient: 2.302585, so that’s six significant digits after the decimal point. Now, that value is actually calculated based on twenty (rather than 10) iterations when ‘going down’. But that little factoid is not embarrassing because it doesn’t change much: the argument itself is sound. Very sound.]

OK… That’s easy enough to understand. The thing that is not easy to understand – intuitively that is – is that we can just insert some complex differential Δs into that Δy = 2.302585·Δx equation. Isn’t it ‘weird’, indeed, that we can just use a complex fraction s = i/1024 to calculate our first point, instead of a real fraction x = 1/1024? It is. That’s the only thing really. Indeed, once we’ve done that, it’s plain sailing again: we just square the result to get the next result, and then we square that again, and so on and so on. However, that being said, the difference is that the ‘magic’ of i comes into play indeed. When squaring, we do not get a 4a2 result but an (a+bi)= a– b2 + 2abi. So it’s that minus sign and the i that give an entirely different ‘dynamic’ to how the function evolves from there (i.e. different as compared to working with a real base only). It’s all quite remarkable really because we start off with a really tiny value b here: 0.00225 to be precise, so that’s (less than) 1/445 ! [Of course, the real part a, at the point from where we start doing these iterations, is one.]

But so that first step is ‘weird’ indeed. Why is it no problem whatsoever to insert the complex fraction s = i/1024 into 1 + 2.302585o·s, instead of the real fraction 1/1024, and then afterwards, to square these complex numbers that we’re getting, instead of real numbers?

It just doesn’t feel right, does it? I must admit that, at first, I felt that Feynman was doing something ‘illegal’ too. But, obviously, he’s not. It’s plain mathematical logic. We have two functions here: one is linear (y = 1 + 2.302585·x), and the other is quadratic (y = x2) and so what’s happening really is that, at the point x = 0, we change the function. We substitute not x for ix really but y = 10for y = 10ix. So we still have an independent real variable x but, instead of a real-valued y = 10function, we now have a complex-valued y = 10ifunction.

However, the ‘output’ of that function, of course, is a complex y, not a real y. In our case, because we’re plotting a function really–to be precise, we’re calculating the exponential function y = 10through all these iterations–we get a complex-valued function of the shape that, by now, we know so well.

So it is ‘discontinuous’ in a way, and so I can’t say all that much about it. Look at the graph below where, once again, we have the real exponential function ex and then the two components of the complex exponential eix. This time, I’ve plotted them on both sides of the zero point because they’re continuous on both sides indeed. Imagine we’re walking along this blue ex curve from some negative x to zero. We’re familiar with the path. It has, for instance, that property we exploited above: as we doubled the ‘input’ (so from x we went to 2x), the ‘output’ went up not as the double but as the square of the original value: e2x = (ex)2. And then we also know that, around the point x = o, we can approximate it with a linear function. In fact, in this case, the linear approximation is super-simple: y = 1 + x. Indeed, the gradient for ex at point x = 0 is equal to 1! So, yes, we know and understand that blue curve. But then we arrive at point x = 0 and we decide something radical: we change the function!

graph (5)

Yes. That’s what we’re really doing in that very lengthy story above: ei is a complex-valued function of the real variable x. That’s something different. However, we continue to say that the approximation y = 1 + x must also be valid for complex x and y. So we say that ei= 1 + ix. Is that wrong? No. Not at all. Functional forms are functional forms and gradients are gradients: d(eix)/dx = ieix, and ieix at x = 0 is equal to ie0 = i! Hence, ei= 1 + ix is a perfectly legitimate linear approximation. And then it’s just the same thing again: we use that iteration mechanism to calculate successive squares of complex numbers because, for complex exponentials as well, we have e2(ix) = (eix)2.

So. The ‘magic’ is a lot of ‘confusion’ really. The point to note is that we do have a different function here: eiand e‘look’ similar–it’s just that i, right?but, in fact, when we replace x by ix in the exponent of e, that’s quite a radical change. We can use the same linear approximation at x = ix = 0 but then it’s over. Our blue graph stops: we’re no longer walking along it. I can’t even say it bifurcates, so to say, into the red and the green one, because it doesn’t. We’re talking apples and oranges indeed, and so the comparison is quickly done: they’re different. Full stop.

Is there any geometrical relationship between all these curves? Well… Yes and no. I can see one, at the very start: the gradient of our ex function at x = 0 is equal to unity (i.e. 1), and so that’s the same gradient as the gradient of the imaginary part of our new eifunction (the gradient of the real part is zero, before it becomes negative). But that’s just… I mean… That just comes out of Euler’s formula: e= cos(0) + isin(0). Honestly, it’s no use to try to be smart here and think about stuff like that. We’re no longer walking on the blue curve. We’re looking at a new function: a complex-valued function eix (instead of a real-valued function ex) of a real variable (x). That’s it. Just don’t try to relate the two too much: you switched functions. Full stop. It’s like changing trains! 🙂

So… What’s the conclusion? Well… I’d say: “Complex numbers can be analyzed as extensions of real numbers, so to say, but – frankly – they are different.

[…]

I’ll probably never understand complex numbers in the way I would like to understand them–that is like I understand that one plus one is two. However, this rather lengthy forage in the complex forest has helped me somewhat. I hope it helped you too.

Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Differential equations revisited: the math behind oscillators

Pre-scriptum (dated 26 June 2020): This post – part of a series of rather simple posts on elementary math and physics – does not seem to have been targeted in the the attack by the dark force—which is good because I still like it. While my views on the true nature of light, matter and the force or forces that act on them have evolved significantly as part of my explorations of a more realist (classical) explanation of quantum mechanics, I think most (if not all) of the analysis in this post remains valid and fun to read. In fact, I would dare to say the whole Universe consists of oscillators!

Original post:

When wrapping up my previous post, I said that I might be tempted to write something about how to solve these differential equations. The math behind them is pretty essential indeed. So let’s revisit the oscillator from a formal-mathematical point of view.

Modeling the problem

The simplest equation we used was the one for a hypothetical ‘ideal’ oscillator without friction and without any external driving force. The equation for a mechanical oscillator (i.e. a mass on a spring) is md2x/dt2 = –kx. The k in this equation is a factor of proportionality: the force pulling back is assumed to be proportional to the amount of stretch, and the minus sign is there because the force is pulling back indeed. As for the equation itself, it’s just Newton’s Law: the mass times the acceleration equals the force: ma = F.

You’ll remember we preferred to write this as d2x/dt2 = –(k/m)x = –ω02x with ω0= k/m. You’ll also remember that ωis an angular frequency, which we referred to as the natural frequency of the oscillator (because it determines the natural motion of the spring indeed). We also gave the general solution to the differential equation: x(t) = x0cos(ω0t + Δ). That solution basically states that, if we just let go of that spring, it will oscillate with frequency ω0 and some (maximum) amplitude x0, the value of which depends on the initial conditions. As for the Δ term, that’s just a phase shift depending on where x is when we start counting time: if x would happen to pass through the equilibrium point at time t = 0, then Δ would be π/2. So Δ allows us to shift the beginning of time, so to speak.

In my previous posts, I just presented that general equation as a fait accompli, noting that a cosine (or sine) function does indeed have that ‘nice’ property of come back to itself with a minus sign in front after taking the derivative two times: d2[cos(ω0t)]/dt2 = –ω02cos(ω0t). We could also write x(t) as a sine function because the sine and cosine function are basically the same except for a phase shift: x0cos(ω0t + Δ) = x0sin(ω0t + Δ + π/2).

Now, the point to note is that the sine or cosine function actually has two properties that are ‘nice’ (read ‘essential’ in the context of this discussion):

  1. Sinusoidal functions are periodic functions and so that’s why they represent an oscillation–because that’s something periodic too!
  2. Sinusoidal functions come back to themselves when we derive them two times and so that’s why it effectively solves our second-order differential equation.

However, in my previous post, I also mentioned in passing that sinusoidal functions share that second property with exponential functions: d2et/dt= d[det/dt]/dt = det/dt = et. So, if it we would not have had that minus sign in our differential equation, our solution would have been some exponential function, instead of a sine or a cosine function. So what’s going on here?

Solving differential equations using exponentials

Let’s scrap that minus sign and assume our problem would indeed be to solve the d2x/dt2 = ω02x equation. So we know we should use some exponential function, but we have that coefficient ω02. Well… That’s actually easy to deal with: we know that, when deriving an exponential function, we should bring the exponent down as a coefficient: d[eω0t]/dt = ω0eω0t. If we do it two times, we get d2[eω0t]/dt2 = ω02eω0t, so we can immediately see that eω0is a solution indeed.

But it’s not the only one: e–ω0t is a solution too: d2[e–ω0t]/dt2 = (–ω0)(–ω0)e–ω0t = ω02e–ω0t. So e–ω0solves the equation too. It is easy to see why: ω02 has two square roots–one positive, and one negative.

But we have more: in fact, every linear combination c1eω0+ c2e–ω0is also a solution to that second-order differential equation. Just check it by writing it all out: you’ll find that d2[c1eω0+ c2e–ω0t]/dt2 = ω02[c1eω0+c2e–ω0t] and so, yes, we have a whole family of functions here, that are all solutions to our differential equation.

Now, you may or may not remember that we had the same thing with first-order differential equations: we would find a whole family of functions, but only one would be the actual solution or the ‘real‘ solution I should say. So what’s the real solution here?

Well… That depends on the initial conditions: we need to know the value of x at time t= 0 (or some other point t = t1). And that’s not enough: we have two coefficients (cand c2), and, therefore, we need one more initial condition (it takes two equations to solve for two variables). That could be another value for x at some other point in time (e.g. t2) but, when solving problems like this, you’ll usually get the other ‘initial condition’ expressed in terms of the first derivative, so that’s in terms of dx/dt = v. For example, it is not illogical to assume that the initial velocity v0 would be zero. Indeed, we can imagine we pull or push the spring and then let it go. In fact, that’s what we’ve been assuming here all along in our example! Assuming that v0 = 0 is equivalent to writing that

d[c1eω0+ c2e–ω0t]/dt = 0 for t = 0

⇒ ω0c1 – ω0c2 = 0 (e= 1) ⇔  c1 = c2

Now we need the other initial condition. Let’s assume the initial value of x is equal to x0 = 2 (it’s just an example: we could take any value, including negative values). Then we get:

c1eω0+ c2e–ω0t = 2 for t = 0 ⇔ c1 + c= 2 (again, note that e= 1)

Combining the two gives us the grand result that c1 = c= 1 and, hence, the ‘real’ or actual solution is x = eω0e–ω0t. The graph below plots that function for ω= 1 and ω= 0.5 respectively. We could take other values for ω0 but, whatever the value, we’ll always get an exponential function like the ones below. It basically graphs what we expect to happen: the mass just accelerates away from its equilibrium point. Indeed, the differential equation is just a description of an accelerating object. Indeed, the e–ω0t term quickly goes to zero, and then it’s the eω0term that rockets that object sky-high – literally. [Note that the acceleration is actually not constant: the force is equal to kx and, hence, the force (and, therefore, the acceleration) actually increases as the mass goes further and further away from its equilibrium point. Also note that if the initial position would have been minus 2, i.e. x= –2, then the object would accelerate away in the other direction, i.e. downwards. Just check it to make sure you understand the equations.]

graph 2 graph

The point to note is our general solution. More formally, and more generally, we get it as follows:

  • If we have a linear second-order differential equation ax” + bx’ + cx = 0 (because of the zero on the right-hand side, we call such equation homogeneous, so it’s quite a mouthful: a linear and homogeneous DE of the second order), then we can find an exponential function ert that will be a solution for it.
  • If such function is a solution, then plugging in it yields ar2ert + brert + cert = 0 or (ar2 + br + c)ert = 0.
  • Now, we can read that as a condition, and the condition amounts to ar2 + br + c = 0. So that’s a quadratic equation we need to solve for r to find two specific solutions r1 and r2, which, in turn, will then yield our general solution:

 x(t) = c1er1+ c2er2t

Note that the general solution is based on the principle of superposition: any linear combination of two specific solutions will be a solution as well. I am mentioning this here because we’ll use that principle more than once.

Complex roots

The steps as described above implicitly assume that the quadratic equation above (i.e. ar2ert + brert + cert = 0), which is better known as the characteristic equation, does yield two real and distinct roots r1 and r2. In fact, it amounts to assuming that that exponential ert is a real-valued exponential function. We know how to find these real roots from our high school math classes: r = (–b ± [b– 4ac]1/2)/2a. However, what happens if the discrimant b– 4ac is negative?

If the disciminant is negative, we will still have two roots, but they will be complex roots. In fact, we can write these two complex roots as r = α ± βi, with i the imaginary unit. Hence, the two complex roots are each other’s complex conjugate and our er1and er2t can be written as:

er1= e(α+βi)t and er2e(α–βi)t

Also, the general solution based on these two particular solutions will be c1e(α+βi)t + c2e(α–βi)t.

[You may wonder why complex roots have to be complex conjugates from each other. Indeed, that’s not so obvious from the raw r = (–b ± [b– 4ac]1/2)/2a formula. But you can re-write it as r = –b/2a ± [b– 4ac]1/2)/2a and, if b– 4ac is negative, as r = –b/2a ± [(−b2+4ac)1/2/2a]. So that gives you the α and β and shows that the two roots are, in effect, each other’s complex conjugate.]

We should briefly pause here to think about what we are doing here really: if we allow r to be complex, then what we’re doing really is allow a complex-valued function (to be precise: we’re talking the complex exponential functions e(λ±μi)t, or any linear combination of the two) of a real variable (the time variable t) to be part of our ‘solution set’ as well.

Now, we’ve analyzed complex exponential functions before–long time ago: you can check out some of my posts last year (November 2013). In fact, we analyzed even more complex – in fact, I should say more complicated rather than more complex here: complex numbers don’t need to be complicated! 🙂 – because we were talking complex-valued functions of complex variables there! That’s not the case here: the argument t (i.e. the input into our function) is real, not complex, but the output – or the function itself – is complex-valued. Now, any complex exponential e(α+βi)t can be written as eαteiβt, and so that’s easy enough to understand:

1. The first factor (i.e. eαt) is just a real-valued exponential function and so we should be familiar with that. Depending on the value of α (negative or positive: see the graph below), it’s a factor that will create an envelope for our function. Indeed, when α is negative, the damping will cause the oscillation to stop after a while. When α is positive, we’ll have a solution resembling the second graph below: we have an amplitude that’s getting bigger and bigger, despite the friction factor (that’s obviously possible only because we keep reinforcing the movement, so we’re not switching off the force in that case). When α is equal to zero, then eαt is equal to unity and so the amplitude will not change as the spring goes up and down over time: we have no friction in that case.

graph 4

Envelope

2. The second factor (i.e. eiβt) is our periodic function. Indeed, eiβt is the same as eiθ and so just remember Euler’s formula to see what it is really:

eiθ = cos(θ) + isin(θ)

The two graphs below represent the idea: as the phase θ = ωt + Δ (the angular frequency or velocity times the time is equal to the phase, plus or minus some phase shift) goes round and round and round (i.e. increases with time), the two components of eiθ, i.e. the real and imaginary part eiθ, oscillate between –1 and 1 because they are both sinusoidal functions (cosine and sine respectively). Now, we could amplify the amplitude by putting another (real) factor in front (a magnitude different than 1) and write reiθ = r·cos(θ) + r·sin(θ) but that wouldn’t change the nature of this thing.

euler13 slkL9

But so how does all of this relate to that other ‘general’ solution which we’ve found for our oscillator, i.e. the one we got without considering these complex-valued exponential functions as solutions. Indeed, what’s the relation between that x = x0cos(ω0t + Δ) equation and that rather frightening c1e(α+βi)t + c2e(α–βi)t equation? Perhaps we should look at x = x0cos(ω0t + Δ) as the real part of that monster? Yes and no. More no than yes actually. Actually… No. We are not going to have some complex exponential and then forget about the imaginary part. What we will do, though, is to find that general solution – i.e. a family of complex-valued functions – but then we’ll only consider those functions for which the imaginary part is zero, so that’s the subset of real-valued functions only.

I guess this must sound like Chinese. Let’s go step by step.

Using complex roots to find real-valued functions

If we re-write d2x/dt2 = –ω02x in the more general ax” + bx’ + cx = 0 form, then we get x” + ω02x = 0 and so the discriminant b– 4ac is equal to –4ω02, and so that’s a negative number. So we need to go for these complex roots. However, before solving this, let’s first restate what we’re actually doing. We have a differential equation that, ultimately, depends on a real variable (the time variable t), but so now we allow complex-valued functions er1e(α+βi)t and er2e(α–βi)t as solutions. To be precise: these are complex-valued functions x of the real variable t.

That being said, it’s fine to note that real numbers are a subset of the complex numbers and so we can just shrug our shoulders and say all that we’re doing is switch to complex-valued functions because we got stuck with that negative determinant and so we had to allow for complex roots. However, in the end, we do want a real-valued solution x(t). So our x(t) = c1e(α+βi)t + c2e(α–βi)t has to be a real-valued function, not a complex-valued function.

That means that we have to take a subset of the family of functions that we’ve found. In other words, the imaginary part of  c1e(α+βi)t + c2e(α–βi)t has to be zero. How can it be zero? Well… It basically means that c1e(α+βi)t and c2e(α–βi)t have to be complex conjugates.

OK… But how do we do that? We need to find a way to write that c1e(α+βi)t + c2e(α–βi)t sum in a more manageable ζ + η form. We can do that by using Euler’s formula once again to re-write those two complex exponentials as follows:

  • e(α+βi)t = eαteiβt = eαt[cos(βt) + isin(βt)]
  • e(α–βi)t = eαte–iβt = eαt[cos(–βt) + isin(–βt)] = eαt[cos(βt) – isin(βt)]

Note that, for the e(α–βi)t expression, we’ve used the fact that cos(–θ) = cos(θ) and that sin(–θ) = –sin(θ). Also note that α and β are real numbers, so they do not have an imaginary part–unlike cand c2, which may or may not have an imaginary part (i.e. they could be pure real numbers, but they could be complex as well).

We can then re-write that c1e(α+βi)t + c2e(α–βi)t sum as:

c1e(α+βi)t + c2e(α–βi)t = c1eαt[cos(βt) + isin(βt)] + c2eαt[cos(βt) – isin(βt)]

= (c1 + c2)eαtcos(βt) + (c1 – c2)ieαtsin(βt)

So what? Well, we want that imaginary part in our solution to disappear and so it’s easy to see that the imaginary part will indeed disappear if c1 – c2 = 0, i.e. if c1 = c= c. So we have a fairly general real-valued solution x(t) = 2c·eαtcos(βt) here, with c some real number. [Note that c has to be some real number because, if we would assume that cand c(and, therefore, c) would be equal complex numbers, then the c1 – c2 factor would also disappear, but then we would have a complex c1 + c2 sum in front of the eαtcos(βt) factor, so that would defeat the purpose of finding real-valued function as a solution because (c1 + c2)eαtcos(βt) would still be complex! […] Are you still with me? :-)]

So, OK, we’ve got the solution and so that should be it, isn’t it? Well… No. Wait. Not yet. Because these coefficients  c1 and c2 may be complex, there’s another solution as well. Look at that formula above. Let us suppose that c1 would be equal to some (real) number c divided by i (so c= c/i), and that cwould be its opposite, so c= –c(i.e. minus c1). Then we would have two complex numbers consisting of an imaginary part only: c= c/i and c= –c= –c/i, and they would be each other’s complex conjugate. Indeed, note that 1/i = i–1= –i and so we can write c= –c·and c= c·i. Then we’d get the following for that c1e(α+βi)t + c2e(α–βi)t sum:

 (c1 + c2)eαtcos(βt) + (c1 – c2)ieαtsin(βt)

= (c/i – c/i)eαtcos(βt) + (c/i + c/i)ieαtsin(βt) = 2c·eαtsin(βt)

So, while cand c2 are complex, our grand result is a real-valued function once again or – to be precise – another family of real-valued functions (that’s because c can take on any value).

Are we done? Yes. There are no other possibilities. So now we just need to remember to apply the principle of superposition: any (real) linear combination of 2c·eαtcos(μt) and 2c·eαtsin(μt) will also be a (real-valued) solution, so the general (real-valued) solution for our problem is:

x(t) = a·2c·eαtcos(βt) + b·2c·eαtsin(βt) = Aeαtcos(βt) + Beαtsin(βt)

eαt[Acos(βt) + Bsin(βt)]

So what do we have here? Well, the first factor is, once again, an ‘envelope’ function: depending on the value of α, (i) negative, (ii) positive or (iii) zero, we have an oscillation that (i) damps out, (ii) goes out of control, or (iii) keeps oscillating in the same steady way forever.

The second part is equivalent to our ‘general’ x(t) = x0cos(ω0t + Δ) solution. Indeed, that x(t) = x0cos(ω0t + Δ) solution is somewhat less ‘general’ than the one above because it does not have the eαt factor. However, x(t) = x0cos(ω0t + Δ) solution is equivalent to the Acos(βt) + Bsin(βt) factor. How’s that? We can show how they are related by using the trigonometric formula for adding angles: cos(α + β) = cos(α)cos(β) – sin(α)sin(β). Indeed, we can write:

x0cos(ω0t + Δ) = x0cos(Δ)cos(ω0t) – x0sin(Δ)sin(ω0t) = Acos(βt) + Bsin(βt)

with A = x0cos(Δ), B = – x0sin(Δ) and, finally, μ = ω0

Are you convinced now? If not… Well… Nothing much I can do, I feel. In that case, I can only encourage you to do a full ‘work-out’ by reading the excellent overview of all possible situations in Paul’s Online MathNotes (tutorial.math.lamar.edu/Classes/DE/Vibrations.aspx).

Feynman’s treatment of second-order differential equations

Feynman takes a somewhat different approach in his Lectures. He solves them in a much more general way. At first, I thought his treatment was too confusing and, hence, I would not have mentioned it. However, I like the logic behind, even if his approach is somewhat more messy in terms of notations and all that. Let’s first look at the differential equation once again. Let’s take a system with a friction factor that’s proportional to the speed: Ff = –c·dx/dt. [See my previous post for some comments on that assumption: the assumption is, generally speaking, too much of a simplification but it makes for a ‘nice’ linear equation and so that’s why physicists present it that way.] To ease the math, c is usually written as c = mγ. Hence, γ = c/m is the friction per unit of mass. That makes sense, I’d think. In addition, we need to remember that ω02 = k/m, so k = mω02. Our differential equation then becomes m·d2x/dt2 = –γm·dx/dt – kx (mass times acceleration is the sum of the forces) or m·d2x/dt2 + γm·dx/dt + mω02·x = 0. Dividing the mass factor away gives us an even simpler form:

d2x/dt2 + γdx/dt + ω02x = 0

You’ll remember this differential equation from the previous post: we used it to calculate the (stored) energy and the Q of a mechanical oscillator. However, we didn’t show you how. You now understand why: the stuff above is not easy–the length of the arguments involved is why I am devoting an entire post to it!

Now, instead of assuming some exponential ert as a solution, real- or complex-valued, Feynman assumes a much more general complex-valued function as solution: he substitutes x for x = Aeiαt, with A a complex number as well so we can write A as A = A0eiΔ. That more general assumption allows for the inclusion of a phase shift straight from the start. Indeed, we can write x as x = A0eiΔeiαt = = A0ei(αt+Δ). Does that look complicated? It probably does, because we also have to remember that α is a complex number! So we’ve got a very general complex-valued exponential function indeed here!

However, let’s not get ahead of ourselves and follow Feynman. So he plugs in that complex-valued x = Aeiαt and we get:

(–α+ iγα + ω02)Aeiαt = 0

So far, so good. The logic now is more or less the same as the logic we developed above. We’ve got two factors here: (1) a quadratic equation –αiγα + ω02 (with one complex coefficient iγ) and (2) a complex exponential function Aeiαt. The second factor (Aeiαt) cannot be zero, because that’s x and we assume our oscillator is not standing still. So it’s the first factor (i.e. the quadratic equation in α with a complex coefficient iγ) which has to be zero. So we solve for the roots α and find

α = –iγ/(–2) ± [(–(iγ)2–4ω02)1/2/(-2)] = iγ/2 ± [(γ2–4ω02)1/2/(-2)]

= iγ/2 ± (ω0– γ2/4)1/2 iγ/2 ± ωγ

[We get this by bringing i and –2 inside of the square root expression. It’s not very straightforward but you should be able to figure it out.]

So that’s an interesting expression: the imaginary part of α is iγ/2 and its real part is (ω0– γ2/4)1/2, which we denoted as ωγ in the expression above. [Note that we assume there’s no problem with the square root expression: γ2/4 should be smaller than ω02 so ωγ is supposed to be some real positive number.] And so we’ve got the two solutions xand x2:

x= Aei(iγ/2 + ωγ)t =  Ae–γt/2+iωγ= Ae–γt/2eiωγ

x= Bei(iγ/2 – ωγ)t =  Be–γt/2–iωγ= Be–γt/2e–iωγt

Note, once again, that A and B can be any (complex) number and that, because of the principle of superposition, any linear combination of these two solutions will also be a solution. So the general solution is

x = Ae–γt/2eiωγ+ Be–γt/2e–iωγ= e–γt/2(Aeiωγ+ Be–iωγt) 

Now, we recognize the shape of this: a (real-valued) envelope function e–γt/2 and then a linear combination of two exponentials. But so we want something real-valued in the end so, once again, we need to impose the condition that Aeiωγand Be–iωγare complex conjugates of each other. Now, we can see that eiωγand e–iωγare complex conjugates but what does this say about A and B? Well… The complex conjugate of a product is the product of the complex conjugates of the factors involved: (z1z2)* = (z1*)(z1*). That implies that B has to be the complex conjugate of A: B = A*. So the final (real-valued) solution becomes:

x = e–γt/2(Aeiωγ+ A*e–iωγt) 

Now, I’ll leave it to you to prove that the second factor in the product above (Aeiωγ+ A*e–iωγt) is a real-valued function of the real variable t. It should be the same as x0cos(Δ)cos(ω0t) – x0sin(Δ)sin(ω0t), and that gives you a graph like the one below. However, I can readily imagine that, by now, you’re just thinking: Oh well… Whatever! 🙂

Transient

So the difference between Feynman’s approach and the one I presented above (which is the one you’ll find in most textbooks) is the assumption in terms of the specific solution: instead of substituting x for ert, with allowing r to take on complex values, Feynman substitutes x for Aeiαt, and allows both A and α  to take on complex values. It makes the calculations more complicated but, when everything is said and done, I think Feynman’s approach is more consistent because more encompassing. However, that’s subject to taste, and I gather, from comments on the Web, that many people think that this chapter in Feynman’s Lectures is not his best. So… Well… I’ll leave it to you to make the final judgment.

Note: The one critique that is relevant, in regard to Feynman’s treatment of the matter, is that he devotes quite a bit of time and space to explain how these oscillatory or periodic displacements can be viewed as being the real part of a complex exponential. Indeed, cos(ωt) is the real part of eiωt. But so that’s something different than (1) expanding the realm of possible solutions to a second-order differential equation from real-valued functions to complex-valued functions in order to (2) then, once we’ve found the general solution, consider only real-valued functions once again as ‘allowable’ solutions to that equation. I think that’s the gist of the matter really. It took me a while to fully ‘get’ this. I hope this post helps you to understand it somewhat quicker than I did. 🙂

Conclusion

I guess the only thing that I should do now is to work some examples. However, I’ll refer you Paul’s Online Math Notes for that once again (see the reference above). Indeed, it is about time I end my rather lengthy exposé (three posts on the same topic!) on oscillators and resonance. I hope you enjoyed it, although I can readily imagine that it’s hard to appreciate the math involved.

It is not easy indeed: I actually struggled with it, despite the fact that I think I understand complex analysis somewhat. However, the good thing is that, once we’re through it, we can really solve a lot of problems. As Feynman notes: “Linear (differential) equations are so important that perhaps fifty percent of the time we are solving linear equations in physics and engineering.” So, bearing in that mind, we should move on to the next.

Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

The electric oscillator

Pre-scriptum (dated 26 June 2020): This post – part of a series of rather simple posts on elementary math and physics – has suffered only a little bit from the attack by the dark force—which is good because I still like it. One illustration seems to have removed because of perceived ‘unfair use’, but you will be able to google equivalent stuff. While my views on the true nature of light, matter and the force or forces that act on them have evolved significantly as part of my explorations of a more realist (classical) explanation of quantum mechanics, I think most (if not all) of the analysis in this post remains valid and fun to read. In fact, I would dare to say the whole Universe is all about resonance phenomena!

Original post:

My previous post was too short to do justice to the topic (resonance phenomena). That’s why I’ll approach the topic using the relatively easy example of an electric oscillator. In addition, in this post I’ll also talk about the Q of an oscillator and the concept of a transient.

[…] Oh… Well… I admit there’s no real reason to write this post. It’s not essential – or not as essential as understanding something about complex numbers, for example. In fact, I admit the reason for writing this post is entirely emotional: my father was a rather distant figure, and we never got along, I guess, although he did patch things up near the end of his life–but I realize now, at the age of 45 (so that’s the age I associate with him), that we have a lot in common, including this desire to catch up with things physical and mathematical. He would not have been able to read a lot of what I am writing about in this blog, because he had gone to school only until 18 and, hence, differential equations and complex numbers must have frightened him more than they frighten me. In fact, even then, he actually might have understood something about differential equations, and perhaps something about complex numbers too. I don’t know. I should try to find the books he read. In any case, he surely did not have much of a clue about relativity theory or so. That being said, he sure knew a lot more about electric circuits than I ever will, and I guess that’s the real reason why I want to do a post on the electric oscillator here.

My father knew everything about electric motors, for example. Single-phase, split-phase, three-phase; synchronous or asynchronous; with two, four, six or eight poles; wound rotors or squirrel-cage rotors; centrifugal switches, capacitors… Electric motors (and engines in general) had no secrets for him. While I would understand the basic principle of the electric motor (he actually helped me build a little one – just using copper wire, a horsehoe magnet and a huge nail, and a piece of iron – to demonstrate in school), I had difficulty with the huge number of wires coming out of these things. [We had plenty of motors, because my father would bring old washing machines home to get the parts out.] Part of the problem was that he would never take the time to explain me how the capacitor that one needs to start a single-phase motor actually works.

Now I know, because I looked it up: single-phase electric (induction) motors have an auxiliary winding because they do not need have a starting torque. The magnetic field does not rotate: it just pulsates between 0 and 180 degrees and, hence, the rotor doesn’t know in which direction to go and, hence, if there’s no fuse to protect it, the wiring will start burning. [To explain why the wiring does not get (too) hot when it’s rotating is another story–which I won’t tell here because it involves changing electric and magnetic fields and so that’s a bit more complicated.] So now I have a bit more of an inkling of why there’s so many wires coming out of a simple (single-phase) electric motor:

  • We have wires coming from the rotor (or, to be precise, from the carbon brushes). [Not always though: a lot of those old electric motors had so-called squirrel-cage rotors, instead of wound rotors.]
  • We have wires going to the ‘run’ or ‘main’ winding in the stator (i.e. the stationary part of the motor).
  • We have wires going to the ‘start’ or ‘auxiliary’ winding. In fact, with single phase, the ‘run’ and ‘start’ winding will share one common ‘end’ and so there will be three wires only: usually black, brown and blue in Europe and, to make things complicated, the same wires will usually be red, yellow and black in the US. 🙂
  • We have wires coming from the capacitor and, most probably, also from some fuse somewhere, and then there’s a centrifugal switch to switch the auxiliary winding off once the motor is running, so that’s one or two more wires.
  • And then we also need to control the speed of the motor and so that implies even more wires and little control boxes.

Phew! Things become complicated here. The primitive way to change the speed of single-phase motor is to change the number of poles. How? Well… We can separate the windings and/or by placing taps in-between. In short, more wires. A motor with two poles only will run at 3000 rpm when supplies with 50 Hz power, but we can also have 4, 6, 8 and more poles. More poles, means slower velocity. For example, if we switch to 10 poles, then the motor will run at 600 rpm (yes, 10/2 = 3000/600 = 5, so it’s the same factor). However, changing the number of poles while the motor is running is rather impractical so, in practice, speed control is done through a device referred to as a variable frequency drive (VFD). But so my father would just cut the wires and we’d end up with a motor running at one speed only–not very handy because these things spin incredibly fast–and with too many wires.

I have to admire him for making sense of all those wires. He would do so by measuring the resistance off all the circuits. So he’d just pick two wires and measure the resistance from one end to the other. For example, the main winding has less resistance–typically less than 5 Ω (Ohm)–than the auxiliary winding–typically 10 to 20 Ω (Ohm). Why? The wiring used to run the motor will typically be thicker and, hence, offer less resistance. With a bit of knowledge like that, he’d figure out the wiring in no time, while I would just sit and stare and wonder how he did it.

In any case, let me explain here what I would have liked my father to explain to me, and that’s the components of an electric circuit, and how an electric oscillator works–more or less at least.

The electric oscillator

In an electric circuit, we can have passive and active elements. An example of an active element would be a generator. That’s not passive. So what’s passive?

First, we have a resistor. A resistor is any piece of some substance through we have some current flowing and which offers resistance to that flow of electric current. What does that mean? The resistance (denoted by the symbol R) will determine the flow of current (I) through the circuit as a function of the potential difference (i.e. the voltage) V across. In fact, the resistance is defined as the factor of proportionality between V and I. So that’s Ohm’s Law really:

V = RI = R(dq/dt)

As for the current (I) being equal to I = dq/dt, that’s the definition of electric current itself: a current transports electric charge through a wire, so we can measure the current at any point in the electric current as the time-rate of change dq/dt. Current is Coulomb per second, i.e. in amperes. One ampere amounts to 6.241×1018 unit charges (electrons), i.e. one Coulomb passing through the wire per second, so 1 A = 1 C/s.

As for voltage, we’ve encountered that in previous posts. It’s a difference in potential indeed. Potential is that scalar number Φ which we associated with the potential energy U of a particle with charge q: Φ = U/q. So it’s like the potential energy of the unit charge, and we calculated by using the electric field vector to calculate the amount of work we needed to do to bring a unit charge to some point r: Φ(r) = –∫ds (the minus sign is there because we’re doing work against the electromagnetic force). We’ve actually calculated the difference in potential, or the voltage (difference) for something that’s called a capacitor: two parallel plates with a lack of electrons on one, and too many on the other (see below). As a result, there’s a strong electric field between both, and a difference in potential, and we’ve calculated the voltage as V = ΔΦ = σd0 = qd0A, with the d the plate separation (distance between the two plates), σ the (surface) charge per unit area, ε0 the electric constant and A the area of the plates. So it’s like a battery… For now at least–I’ll correct this statement later.

how-does-a-capacitor-work

If we connect the two plates with a wire, i.e. a conductor, then we’ll have a current. Increasing the resistance of the circuit, by putting a resistor in, for example, will reduce the current and, hence, save the battery life somewhat. Of course, the resistor could be something that actually does work for us, a lamp, for example, or an electric motor.

Let me now correct that statement about a capacitor being like a battery. That statement is true and not true–but I must immediately add that it’s much more not true than true. 🙂 A battery is an active circuit element: it generates a voltage because of a chemical reaction that drives electrons through the circuit, and it will continue to provide power until all the reagents have been used up and the chemical reaction stops. In contrast, a capacitor is not active. There is a voltage only because charge has been stored on it (or, to be precise, because charges have been separated on it). Hence, when you connect the capacitor to a passive circuit,  the current will only flow until all of the charge has been drained. So there’s no active element. Also, unlike a battery, the voltage on a capacitor is variable: it’s proportional to the amount of charge stored on it.

OK. So we’ve got a resistor, a capacitor and a voltage source, e.g. a battery but, because we want to look at resonance phenomena, we’ll not have a battery but a voltage source that drives the circuit with a nice sine wave oscillation. Why a sine wave? Well… First, it makes the mathematical analysis easier (we’ll have second-order differential equations again and so d2cosx/dt2 = –cosx and so that’s nice). Second, the AC current that comes into our houses is a nice sine wave indeed. So let’s put it all together now, including our AC generator (instead of a battery). The circuit can then be represented as follows:

basic circuit

In this circuit, the charge q on the capacitor is analogous to the displacement x of the mass on that oscillating spring we analyzed in the previous post. Likewise:

  • I = dq/dt is analogous to the velocity v = dx/dt
  • The resistance R is analogous to the resistive coefficient γ
  • From our formula V = ΔΦ = σd0, it is easy to see that V is proportional to the charge q: V = q/C, with 1/C the factor of proportionality, aka as the capacitance of the capacitor. In other words, 1/C is analogous to the spring constant k.

But we’re missing something: what’s the analogy to the mass or intertia factor in this circuit? Well… There’s one passive element in this circuit which we haven’t explained as yet: the self-inductance L. The phenomenon of self-inductance is the following: a changing electric current in a coil builds up a changing magnetic field, and that induces a current (and, hence, a voltage) that’s opposite to the primary current (and, hence, an opposite voltage). So it resists the change in current and, as such, it’s analogous to mass indeed. The illustration below explains how it works. I’ve also inserted a diagram showing how transformers work, because that’s based on the same principle of changing currents inducing changing magnetic fields that, in turn, generate another current. What’s going on in transformers is referred to as mutual inductance and note, indeed, that it doesn’t work with DC (i.e. steady) current.

Self-inductance

763px-Transformer3d_col3

Now, I know that’s not all that easy to understand, but I should limit myself here to just giving the formula: the induced voltage is such a coil is proportional to the time-rate of change of the current I = dq/dt. So we have a second-order derivative here:

V = LdI/dt = L(d2q/dt2)

So now we’re finally ready to put it all together. In that ‘basic electric circuit’ above, we’ve got the three passive circuit elements – resistor, capacitor and self-inductance – connected in series, and so then we apply a sine wave voltage to the whole circuit. Of course, all the voltages – i.e. over the resistor, over the capacitor, and over the self-inductance – must add up to the total voltage we apply to the circuit (which we’ll denote by V(t), as it’s a changing voltage), taking into account their sign. We have: VR = RI = R(dq/dt); VC = q/C; and VL = L(dI/dt) = L(d2q/dt2). Hence, we get:

L(d2q/dt2) + R(dq/dt) + q/C = V(t)

This is, once again, a differential equation of the second-order, and its mathematical form is the same as that equation for the oscillating spring (with a driving force and damping). [I repeat the equation below (in the section on the Q and the energy of an oscillator, so you don’t need to scroll too far.] So the solution is going to be the same and we’re going to have resonance if the angular frequency ω of our sine wave (i.e. the AC voltage generated by our generator) is close or equal to some kind of natural frequency characterizing the circuit. So what is that natural frequency ω0? Well… Just like ω0was equal to k/m for our mechanical oscillator, we here get the grand result that ω0= 1/LC, and our friction parameter γ corresponds to R/L.

The Q and the energy of an oscillator

There’s another point I did not develop in my previous post, and that was the energy of an oscillator. To explain that, we’ll take the example of our mechanical spring once again. The equation for that one was:

m(d2x/dt2) + γm(dx/dt) + mω02x = F(t)

Now, from my posts on energy concepts, you’ll remember that a force does work, and that the work done is the product of the force and the displacement (i.e. the distance over which the force is doing work). Work done is energy, potential or kinetic (one gets converted into the other). In addition, you may or may not remember that the work done per second gives us the power, so the concept of power relates energy to time, rather than distance.

For infinitesimal quantities (i.e. using differentials), we can write that the differential work done in a time dt is equal to F·dx. The power that’s expended by the force is then F·dx/dt, so that turns out to be the product of the force and the velocity (dx/dt = v): P = F·v. Now, if we substitute F for that differential equation above, and re-arrange the terms a bit, we get a fairly monstrously looking equation:

P = F·(dx/dt) = m[(d2x/dt2)(dx/dt) + ω02x(dx/dt)] + γm(dx/dt)2

Now it turns out that we can write the first two terms on the left on this monstrous equation as d/dt[m(dx/dt)2/2 + mω02x2/2]. So we have a time derivative here of a sum of two terms we recognize: the first is the kinetic energy (mv2/2) and the second (mω02x2/2) is the potential energy of the spring. [I would need to show that to you but I hope you believe me here.] Both of them taken together are the energy that’s stored in the oscillation, i.e. the stored energy. Now, in the long run, this driving force will not add any more energy to this quantity (the spring will oscillate back and forth, but so we’ll have stable motion and that’s it really). In other words, this derivative must be zero.

But so that driving force continues to do work and so the power must go somewhere. Where? It must all go to that other term: γm(dx/dt)2. What is that term? Well… It’s the energy that gets lost in friction: these are so-called resistive losses, and they usually get dissipated through heating. Hence, what happens is that most of the power of an external force is first used to build up the oscillation, thereby storing energy in the oscillator, but, once that’s done, the system only needs a little bit of energy to compensate for the heating (resistive) losses. Now the interesting thing is to calculate how much energy an oscillator can store. We can calculate that as follows:

  • The energy carried by a physical wave is proportional to the square of its amplitude: E ∝ A2. Now, if it is a sinusoidal wave, we’ll need to take the average of the square of a sine or cosine function. Because sin2x and cos2x are the same functions really except for a phase difference of π/2, we can see that the average value for both functions should be 0.5 = 1/2. Hence, for any function Acosx, we can see that the average value of that square amplitude will be A2/2.
  • From your statistics classes, you may also remember that the mean of a product of a variable and some constant (e.g. γm(dx/dt)2) will be equal to the product of that constant and the mean of the variable. So we can write 〈γm(dx/dt)2〉 =  γm〈(dx/dt)2〉. Now, taking into account that the solution x for the differential equation is a cosine function x = x0cos(ωt+Δ), its derivative will also be a sinusoidal function but with ω in the amplitude as well. To make a long story short, 〈(dx/dt)2〉 is equal to ω2x02/2, and so we can write 〈γm(dx/dt)2〉 = γmω2x02/2.
  • So the expression above gives the energy being absorbed by the oscillator on a permanent basis, and we’ll denote that by 〈P〉 = γmω2x02/2. How much energy is stored?
  • Now that we’ve calculated 〈(dx/dt)2〉, we can calculate that too now. We’ll denote it by 〈E〉, and so 〈E〉 = 〈m(dx/dt)2/2 + mω02x2/2〉 = (1/2)m〈(dx/dt)2 + (1/2)mω02〈x2〉 = m(ω2 + ω02)x02/2. So what? Well… From the previous chapter, we know that x0 becomes very large if ω is near to ω0 (that’s what’s resonance is all about) and, hence, the stored energy will be quite large in that case. So the point is that we can get a large stored energy from a relatively small force, which is what you’d expect.

Now, the last thing I need to explain is the Q of an oscillator. The Q of an oscillator compares the stored energy with the amount of work that is done per cycle, multiplied by 2π for some historical reason I don’t understand to well:

Q = 2π·〈E〉/[〈P〉·2π/ω] = (ω2 + ω02)/2γω

Note that 2π/ω is the period, i.e. the time T0 that is needed to go through one cycle of the oscillation. As mentioned above, I am not sure about that 2π factor but it doesn’t matter too much: it’s just a constant and so we could divide by 2π and the result would not be substantially different: the Q is a relative number obviously, used to compare the efficiency of various oscillators when it comes to storing energy. Indeed, Q stands for quality: higher Q indicates a lower rate of energy loss relative to the stored energy of the resonator. So it implies that you do not need a lot of power to keep the oscillation going and, if the external driving force stops, that the oscillations will die out much more slowly. For example, a pendulum on a high-quality bearing, oscillating in air, will have a high Q, while a pendulum immersed in oil will have a low one.

But let me go back to the electric oscillator: we substitute m for L, R for mγ, and 1/C for mω02, and then we can see that, for ω = ω02 (so we calculate the Q at resonance), we find that Q = Lω/R, with ω the resonance frequency. Again, a circuit with high Q means that the circuit can store a very large amount of energy as compared to the work done per cycle of the voltage driving the oscillation.

An application of the Q: transients

Throughout this and my previous posts, I’ve managed to skirt around a more rigorous (i.e. mathematical) treatment of the subject-matter by not actually solving these second-order differential equations. So I won’t suddenly change tack and try to do that now. So this will, once again, be a rather intuitive approach. If you’d want a formal treatment, let me refer you to Paul’s Online Math Notes and, more in particular, the chapter on second-order DEs, which he wraps up with an overview of all differential equations you could possibly encounter when analyzing mechanical springs. But so here we go for the informal approach.

Above, we noted that the Q of a system is the ratio of (1) the stored energy (E) and (2) the work done per cycle, multiplied by 2π. Now, if we’d suddenly switch off the force, then no more work is being done, but the system will lose energy. Let’s suppose we have a system – an oscillating mechanical spring – for which we have a Q equal to 1000·2π, so we have Q/2π = 1000. So that means that the work done per cycle – when that driving force is still one – is one thousandth of its total energy. Hence, it’s not unreasonable to suggest that such system would also lose one thousandth of its total energy per cycle if we would just switch off the force and let go of it. Writing that assumption in terms of differential changes yields the following simple (first-order) differential equation:

dE/dt = –ωE/Q

Huh? Yes. Just think about it. A differential dE is associated with a differential dt. Now, the number of radians that the phase will go through during the infinitesimally short dt time interval is –ωdt, so the change in energy must be equal to dE = –ωdt·(E/Q) (the minus sign is there because we’re talking an energy loss obviously). So that gives us the equation above.

But what about ω? Well… If we just let that oscillator do what we would expect it to do, then it is not unreasonable to assume it would oscillate at its natural frequency. Hence, ω is likely to equal ω0. Combining these two assumptions (i.e. that differential equation above and the ω = ωassumption) gives us the following formula for E:

E = E0e–tω/Q = E0e–γt

[Note that γ is the same friction coefficient: Q = (ω2 + ω02)/2γω and, hence, if ω = ω0, then we get ω0/Q = γ indeed.]

Now, the energy goes as the square of the amplitude of the oscillation (i.e. the displacement x), so we would expect to find the square root of that e–γt in the solution for x, so that’s a e–γt/2 factor. If we’d formally solve it, we’d find the following solution for x indeed:

x = A0e–γt/2cos(ω0t + Δ)

The diagram below shows the envelope curve e–γt/2 as well as the x = e–γt/2cos(ω0t) curve (A0 and Δ depend on the initial conditions obviously). So that’s what’s called a transient: a solution of the differential equation when there is no force present.

Transient

Now, I could bombard you with even more equations, more concepts (like the concept of impedance indeed), but I won’t do that here. I hope this post managed to get the most important ideas across and, hence, I’ll conclude this mini-series (i.e. two successive posts) on resonance. As for my next post, I may be tempted to treat the topic of second-order differential equations more formally, that is from a purely mathematical perspective. But let’s see. 🙂

Post scriptum:

The idea of applying only a little bit of power to build up a large amounts of stored energy may or may not trigger some thoughts on how a photo flash works and, in fact, you’re right. A photo flash uses both a transformer (to step up voltage) as well as an oscillator circuit to store up energy. You can find the details on the Web. See, for example, http://electronics.howstuffworks.com/camera-flash3.htm 🙂

Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Resonance phenomena

Pre-scriptum (dated 26 June 2020): This post – part of a series of rather simple posts on elementary math and physics – has suffered only a little bit from the attack by the dark force—which is good because I still like it. A few illustrations were removed because of perceived ‘unfair use’, but you will be able to google equivalent stuff. While my views on the true nature of light, matter and the force or forces that act on them have evolved significantly as part of my explorations of a more realist (classical) explanation of quantum mechanics, I think most (if not all) of the analysis in this post remains valid and fun to read. In fact, I would dare to say the whole Universe is all about resonance!

Original post:

One of the most common behaviors of physical systems is the phenomenon of resonance: a body (not only a tuning fork but any body really, such as a body of water, such as the ocean for example) or a system (e.g. an electric circuit) will have a so-called natural frequency, and an external driving force will cause it to oscillate. How it will behave, then, can be modeled using a simple differential equation, and the so-called resonance curve will usually look the same, regardless of what we are looking at. Besides the standard example of an electric circuit consisting of (i) a capacitor, (ii) a resistor and (iii) an inductor, Feynman also gives the following non-standard examples:

1. When the Earth’s atmosphere was disturbed as a result of the Krakatoa volcano explosion in 1883, it resonated at its own natural frequency, and its period was measured to be 10 hours and 20 minutes.

[In case you wonder how one can measure that, an explosion such as that one creates all kinds of waves, but the so-called infrasonic waves are the one we  are talking about here. They circled the globe at least seven times, shattering windows hundreds of miles away. They did not only shatter windows in a radius , but they were also recorded worldwide. That’s how they could be measured a second, third, etc time. How? There was no wind or so, but the infrasonic waves (i.e. ‘sounds’ beneath the lowest limits of human hearing (about 16 or 17 Hz), down to 0.001 Hz) of such oscillation cause minute changes in the atmospheric pressure which can be measured by microbarometers. So the ‘ringing’ of the atmosphere was measurable indeed. A nice article on infrasound waves is journal.borderlands.com/1997/infrasound. Of course, the surface of the Earth was ‘ringing’ as well, and such seismic shocks then produce tsunami waves, which can also be analyzed in terms of natural frequencies.]

2. Crystals can be made to oscillate in response to a changing external electric field, and this crystal resonance phenomenon is used in quartz clocks: the quartz crystal resonator in a basic quartz wristwatch is usually in the shape of a very small tuner fork. Literally: there’s a tiny tuning fork in your wristwatch, made of quartz, that has been laser-trimmed to vibrate at exactly 32,768 Hz, i.e. 215 cycles per second.

3. Some quantum-mechanical phenomena can be analyzed in terms of resonance as well, but then it’s the energy of the interfering particles that assumes the role of the frequency of the external driving force when analyzing the response of the system. Feynman gives the example of gamma radiation from lithium as a function of the energy of protons bombarding the lithium nuclei to provoke the reaction. Indeed, when graphing the intensity of the gamma radiation emitted as a function of the energy, one also gets a resonance curve, as shown below. [Don’t you just love the fact it’s so old? A Physical Review article of 1948! There’s older stuff as well, because this journal actually started in 1893.]

Resonance curve gamma rays

However, let us analyze the phenomenon first in its most classical appearance: an oscillating spring.

Basics

We’ve seen the equation for an oscillating spring before. From a math point of view, it’s a differential equation (because one of the terms is a derivative of the dependent variable x) of the second order (because the derivative involved is of the second order):

m(d2x/dt2) = –kx

What’s written here is simply Newton’s Law: the force is –kx (the minus sign is there because the force is directed opposite to the displacement from the equilibrium position), and the force has to equal the oscillating mass on the spring times its acceleration: F = ma.

Now, this can be written as d2x/dt2 = –(k/m)x = –ω02x with ω0= k/m. This ωsymbol uses the Greek omega once again, which we used for the angular velocity of a rotating body. While we do not have anything that’s rotating here, ωis still an angular velocity or, to be more precise, it’s an angular frequency. Indeed, the solution to the differential equation above is

x = x0cos(ω0t + Δ)

The xfactor is the maximum amplitude and that’s, quite simply, determined by how far we pulled or pushed the spring when we started the motion. Now, ω0t + Δ = θ is referred to as the phase of the motion, and it’s easy to see that ωis an angular frequency indeed, because ωequals the time derivative dθ/dt. Hence, ωis the phase change, measured in radians, per second, and that’s the definition of angular frequency or angular velocity. Finally, we have Δ. That’s just a phase shift, and it basically depends on our t = 0 point.

Something on the math

I’ll do a separate post on the math that’s associated with this (second-order differential equations) but, in this case, we can solve the equation in a simple and intuitive way. Look at it: d2x/dt2 = –ω02x. It’s obvious that x has to be a function that comes back to itself after two derivations, but with a minus sign in front, and then we also have that coefficient –ω02. Hmm… What can we think of? An exponential function comes back to itself, and if there’s a coefficient in the exponent, then it will end up as a coefficient in front too: d(eat)/dt = aeat and, hence, d2(eat)/dt2 = a2eat. Waw ! That’s close. In fact, that’s the same equation as the one above, except for the minus sign.

In fact, if you’d quickly look at Paul’s Online Math Notes, you’ll see that we can indeed get the general solution for such second-order differential equation (to be precise: it’s a so-called linear and homogeneous second-order DE with constant coefficients) using that remarkable property of exponentials indeed. However, because of the minus sign, our solution for the equation above will involve complex exponentials, and so we’ll get a general function in a complex variable. However, we’ll then impose that our solution has to be real only and, hence, we’ll take a subset of our more general solution. However, don’t worry about that here now. There’s an easier way.

Apart from the exponential function, there are two other functions that come back to themselves after two derivatives: the sine and cosine functions. Indeed, d2cos(t)/dt2 = –cos(t) and d2sin(t)/dt2 = –sin(t). In fact, the sine and cosine function are obviously the same except for a phase shift equal π/2: cos(t) = sin(t + π/2), so we can choose either. Let’s work with the cosine as for now (we can always convert it to a sine function using that cos(t) = sin(t + π/2) identity). The nice thing about the cosine (and sine) function is that we do get that minus sign when deriving it two times, and we also get that coefficient in front. Indeed: d2cos(ω0t)/dt2 = –ω02cos(ω0t). In short, cos(ω0t) is the right function. The only thing we need to add is that xand Δ, i.e. the amplitude and some phase shift but, as mentioned above, it is easy to understand these will depend on the initial conditions (i.e. the value of x at point t = 0 and the initial pull or push on the spring). In short, x = x0cos(ω0t + Δ) is the complete general solution of the  simple (differential) equation we started with (i.e. m(d2x/dt2) = –kx).

Introducing a driving force

Now, most real-life oscillating systems will be driven by an external force, permanently or just for a short while, and they will also lose some of their energy in a so-called dissipative process: friction or, in an electric circuit, electrical resistance will cause the oscillation to slowly lose amplitude, thereby damping it.

Let’s look at the friction coefficient first. The friction will often be proportional to the speed with which the object moves. Indeed, in the case of a mass on a spring, the drag (i.e. the force that acts on a body as it travels through air or a fluid) is dependent on a lot of things: first and foremost, there’s the fluid itself (e.g. a thick liquid will create more drag than water), and then there’s also the size, shape and velocity of the object. I am following the treatment you’ll find in most textbooks here and so that includes an assumption that the resistance force is proportional to the velocity: Ff = –cv = –c(dx/dt). Furthermore, the constant of proportionality c will usually be written as a product of the mass and some other coefficient γ, so we have Ff = –cv = –mγ(dx/dt). That makes sense because we can look at γ = c/m as the friction per unit of mass.

That being said, the simplification as a whole (i.e. the assumption of proportionality with speed) is rather strange in light of the fact that drag forces are actually proportional to the square of the velocity. If you look it up, you’ll find a formula resembling FD = ρCDAv2/2, with ρ the fluid density, CD the drag coefficient of drag (determined by the shape of the object and a so-called Reynolds number, which is determined from experiments), and A the cross-section area. It’s also rather strange to relate drag to mass by writing c as c = mγ because drag has nothing to do with mass. What about dry friction? So that would be kinetic friction between two surfaces, like when the mass is sliding on a surface? Well… In that case, mass would play a role but velocity wouldn’t, because kinetic friction is independent of the sliding velocity.

So why do physicists use this simplification? One reason is that it works for electric circuits: the equivalent of the velocity in electrical resonance is the current I = dq/dt, so that’s the time derivative of the charge on the capacitor. Now, I is proportional to the voltage difference V, and the proportionality coefficient is the resistance R, so we have V = RI = R(dq/dt). So, in short, the resistance curve we’re actually going to derive below is one for electric circuits. The other reason is that this assumption makes it easier to solve the differential equation that’s involved: it makes for a linear differential equation indeed. In fact, that’s the main reason. After all, professors are professors and so they have to give their students stuff that’s not too difficult to solve. In any case, let’s not be bothered too much and so we’ll just go along with it.

Modeling the driving force is easy: we’ll just assume it’s a sinusoidal force with angular frequency ω (and ω is, obviously, more likely than not somewhat different than the natural frequency ω0). If F is sinusoidal force, we can write it as F = F0cos(ωt + Δ). [So we also assume there is some phase shift Δ.] So now we can write the full equation for our oscillating spring as:

m(d2x/dt2) + γm(dx/dt) + kx = F ⇔ (d2x/dt2)+ γ(dx/dt) + ω02x = F

How do  we solve something like that for x? Well, it’s a differential equation once again. In fact, it’s, once again, a linear differential equation with constant coefficients, and so there’s a general solution method for that. As I mentioned above, that general solution method will involve exponentials and, in general, complex exponentials. I won’t walk you through that. Indeed, I’ll just write the solution because this is not an exercise in solving differential equations. I just want you to understand the solution:

x = ρF0cos(ωt + Δ + θ)

ρ in this equation has nothing to do with some density or so. It’s a factor which depends on m, ω and ω0, in a fairly complicated way in fact:

Formula 1

As we can see from the equation above, the (maximum) amplitude of the oscillation is equal to ρF0. So we have the magnitude of the force F here multiplied by ρ. Hence, ρ is a magnification factor which, multiplied with F0, gives us the ‘amount’ of oscillation.  

As for the θ in the equation above, we’re using this Greek letter (theta) not to refer to the phase, as we usually do, because the phase here is the whole ωt + Δ + θ expression, not just theta! The theta (θ) here is a phase shift as compared to the original force phase ωt + Δ, and θ also depends on ω and ω0. Again, I won’t show how we derived this solution but just accept it as for now:

Formula 2

These three equations, taken together, should allow you to understand what’s going on really. We’ve got an oscillation x = ρF0cos(ωt + Δ + θ), so that’s an equation with this amplification or magnification factor ρ and some phase shift θ. Both depend on the difference between ωand ω, and the two graphs below show how exactly.

Graph 1 Graph 2

The first graph shows the resonance phenomenon and, hence, it’s what’s referred to as the resonance curve: if the difference between ωand ω is small, we get an enormous amplification effect. It would actually go to infinity if it weren’t for the frictional force (but, of course, if the frictional force was not there, the spring would just break as the oscillation builds up and the swings get bigger and bigger).

The second graph shows the phase shift θ. It is interesting to note that the lag θ is equal –π/2 when ω0 is equal to ω, but I’ll let you figure out why this makes sense. [It’s got something to do with that cos(t) = sin(t + π/2) identity, so it’s nothing ‘deep’ really.]

I guess I should, perhaps, also write something about the energy that gets stored in an oscillator like this because, in that resonance curve above, we actually have ρ squared on the vertical axis, and that’s because energy is proportional to the square of the amplitude: E ∝ A2. I should also explain a concept that’s closely related to energy: the so-called Q of an oscillator. It’s an interesting topic, if only because it helps us to understand why, for instance, the waves of the sea are such tremendous stores of energy! Furthermore, I should also write something about transients, i.e. oscillations that dampen because the driving force was turned off so to say. However, I’ll leave that for you to look it up if you’re interested in this topic. Here, I just wanted to present the essentials.

[…] Hey ! I managed to keep this post quite short for a change. Isn’t that good? 🙂

Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/