Babushka thinking

What is that we are trying to understand? As a kid, when I first heard about atoms consisting of a nucleus with electrons orbiting around it, I had this vision of worlds inside worlds, like a set of babushka dolls, one inside the other. Now I know that this model – which is nothing but the 1911 Rutherford model basically – is plain wrong, even if it continues to be used in the logo of the International Atomic Energy Agency, or the US Atomic Energy Commission. 

IAEA logo US_Atomic_Energy_Commission_logo

Electrons are not planet-like things orbiting around some center. If one wants to understand something about the reality of electrons, one needs to familiarize oneself with complex-valued wave functions whose argument represents a weird quantity referred to as a probability amplitude and, contrary to what you may think (unless you read my blog, or if you just happen to know a thing or two about quantum mechanics), the relation between that amplitude and the concept of probability tout court is not very straightforward.

Familiarizing oneself with the math involved in quantum mechanics is not an easy task, as evidenced by all those convoluted posts I’ve been writing. In fact, I’ve been struggling with these things for almost a year now and I’ve started to realize that Roger Penrose’s Road to Reality (or should I say Feynman’s Lectures?) may lead nowhere – in terms of that rather spiritual journey of trying to understand what it’s all about. If anything, they made me realize that the worlds inside worlds are not the same. They are different – very different.

When everything is said and done, I think that’s what’s nagging us as common mortals. What we are all looking for is some kind of ‘Easy Principle’ that explains All and Everything, and we just can’t find it. The point is: scale matters. At the macro-scale, we usually analyze things using some kind of ‘billiard-ball model’. At a smaller scale, let’s say the so-called wave zone, our ‘law’ of radiation holds, and we can analyze things in terms of electromagnetic or gravitational fields. But then, when we further reduce scale, by another order of magnitude really – when trying to get  very close to the source of radiation, or if we try to analyze what is oscillating really – we get in deep trouble: our easy laws do no longer hold, and the equally easy math – easy is relative of course 🙂 – we use to analyze fields or interference phenomena, becomes totally useless.

Religiously inclined people would say that God does not want us to understand all or, taking a somewhat less selfish picture of God, they would say that Reality (with a capital R to underline its transcendental aspects) just can’t be understood. Indeed, it is rather surprising – in my humble view at least – that things do seem to get more difficult as we drill down: in physics, it’s not the bigger things – like understanding thermonuclear fusion in the Sun, for example – but the smallest things which are difficult to understand. Of course, that’s partly because physics leaves some of the bigger things which are actually very difficult to understand – like how a living cell works, for example, or how our eye or our brain works – to other sciences to study (biology and biochemistry for cells, or for vision or brain functionality). In that respect, physics may actually be described as the science of the smallest things. The surprising thing, then, is that the smallest things are not necessarily the simplest things – on the contrary.

Still, that being said, I can’t help feeling some sympathy for the simpler souls who think that, if God exists, he seems to throw up barriers as mankind tries to advance its knowledge. Isn’t it strange, indeed, that the math describing the ‘reality’ of electrons and photons (i.e. quantum mechanics and quantum electrodynamics), as complicated as it is, becomes even more complicated – and, important to note, also much less accurate – when it’s used to try to describe the behavior of  quarks and gluons? Additional ‘variables’ are needed (physicists call these ‘variables’ quantum numbers; however, when everything is said and done, that’s what quantum numbers actually are: variables in a theory), and the agreement between experimental results and predictions in QCD is not as obvious as it is in QED.

Frankly, I don’t know much about quantum chromodynamics – nothing at all to be honest – but when I read statements such as “analytic or perturbative solutions in low-energy QCD are hard or impossible due to the highly nonlinear nature of the strong force” (I just took this one line from the Wikipedia article on QCD), I instinctively feel that QCD is, in fact, a different world as well – and then I mean different from QED, in which analytic or perturbative solutions are the norm. Hence, I already know that, once I’ll have mastered Feynman’s Volume III, it won’t help me all that much to get to the next level of understanding: understanding quantum chromodynamics will be yet another long grind. In short, understanding quantum mechanics is only a first step.

Of course, that should not surprise us, because we’re talking very different order of magnitudes here: femtometers (10–15 m), in the case of electrons, as opposed to attometers (10–18 m) or even zeptometers (10–21 m) when we’re talking quarks. Hence, if past experience (I mean the evolution of scientific thought) is any guidance, we actually should expect an entirely different world. Babushka thinking is not the way forward.

Babushka thinking

What’s babushka thinking? You know what babushkas are, don’t you? These dolls inside dolls. [The term ‘babushka’ is actually Russian for an old woman or grandmother, which is what these dolls usually depict.] Babushka thinking is the fallacy of thinking that worlds inside worlds are the same. It’s what I did as a kid. It’s what many of us still do. It’s thinking that, when everything is said and done, it’s just a matter of not being able to ‘see’ small things and that, if we’d have the appropriate equipment, we actually would find the same doll within the larger doll – the same but smaller – and then again the same doll with that smaller doll. In Asia, they have these funny expression: “Same-same but different.” Well… That’s what babushka thinking all about: thinking that you can apply the same concepts, tools and techniques to what is, in fact, an entirely different ballgame.


Let me illustrate it. We discussed interference. We could assume that the laws of interference, as described by superimposing various waves, always hold, at every scale, and that it’s just  the crudeness of our detection apparatus that prevents us from seeing what’s going on. Take two light sources, for example, and let’s say they are a billion wavelengths apart – so that’s anything between 400 to 700 meters for visible light (because the wavelength of visible light is 400 to 700 billionths of a meter). So then we won’t see any interference indeed, because we can’t register it. In fact, none of the standard equipment can. The interference term oscillates wildly up and down, from positive to negative and back again, if we move the detector just a tiny bit left or right – not more than the thickness of a hair (i.e. 0.07 mm or so). Hence, the range of angles θ (remember that angle θ was the key variable when calculating solutions for the resultant wave in previous posts) that are being covered by our eye – or by any standard sensor really – is so wide that the positive and negative interference averages out: all that we ‘see’ is the sum of the intensities of the two lights. The terms in the interference term cancel each other out. However, we are still essentially correct assuming there actually is interference: we just cannot see it – but it’s there.

Reinforcing the point, I should also note that, apart from this issue of ‘distance scale’, there is also the scale of time. Our eye has a tenth-of-a-second averaging time. That’s a huge amount of time when talking fundamental physics: remember that an atomic oscillator – despite its incredibly high Q – emits radiation for like 10-8 seconds only, so that’s one-hundred millionths of a second. Then another atom takes over, and another – and so that’s why we get unpolarized light: it’s all the same frequencies (because the electron oscillators radiate at their resonant frequencies), but so there is no fixed phase difference between all of these pulses: the interference between all of these pulses should result in ‘beats’ – as they interfere positively or negatively – but it all cancels out for us, because it’s too fast.

Indeed, while the ‘sensors’ in the retina of the human eye (there are actually four kind of cells there, but so the principal ones are referred to as ‘rod’ and ‘cone’ cells respectively) are, apparently, sensitive enough able to register individual photons, the “tenth-of-a-second averaging” time means that the cells – which are interconnected and ‘pre-process’ light really – will just amalgamate all those individual pulses into one signal of a certain color (frequency) and a certain intensity (energy). As one scientist puts it: “The neural filters only allow a signal to pass to the brain when at least about five to nine photons arrive within less than 100 ms.” Hence, that signal will not keep track of the spacing between those photons.

In short, information gets lost. But so that, in itself, does not invalidate babushka thinking. Let me visualize it by a non-very-mathematically-rigorous illustration. Suppose that we have some very regular wave train coming in, like the one below: one wave train consisting of three ‘groups’ separated between ‘nodes’.


All will depend on the period of the wave as compared to that one-tenth-of-a-second averaging time. In fact, we have two ‘periods’: the periodicity of the group – which is related to the concept of group velocity – and, hence, I’ll associate a ‘group wavelength’ and a ‘group period’ with that. [In case you haven’t heard of these terms before, don’t worry: I haven’t either. :-)] Now, if one tenth of a second covers like two or all three of the groups between the nodes (so that means that one tenth of a second is a multiple of the group period Tg), then even the envelope of the wave does not matter much in terms of ‘signal’: our brain will just get one pulse that averages it all out. We will see none of the detail of this wave train. Our eye will just get light in (remember that the intensity of the light is the square of the amplitude, so the negative amplitudes make contributions too) but we cannot distinguish any particular pulse: it’s just one signal. This is the most common situation when we are talking about electromagnetic radiation: many photons arrive but our eye just sends one signal to the brain: “Hey Boss! Light of color X and intensity Y coming from direction Z.”

In fact, it’s quite remarkable that our eye can distinguish colors in light of the fact that the wavelengths of various colors (violet, blue, green, yellow, orange and red) differs 30 to 40 billionths of a meter only! Better still: if the signal lasts long enough, we can distinguish shades whose wavelengths differ by 10 or 15 nm only, so that’s a difference of 1% or 2% only. In case you wonder how it works: Feynman devotes not less than two chapters in his Lectures to the physiology of the eye: not something you’ll find in other physics handbooks! There are apparently three pigments in the cells in our eyes, each sensitive to color in a different way and it is “the spectral absorption in those three pigments that produces the color sense.” So it’s a bit like the RGB system in a television – but then more complicated, of course!

But let’s go back to our wave there and analyze the second possibility. If a tenth of a second covers less than that ‘group wavelength’, then it’s different: we will actually see the individual groups as two or  three separate pulses. Hence, in that case, our eye – or whatever detector (another detector will just have another averaging time – will average over a group, but not over the whole wave train. [Just in case you wonder how we humans compare with our living beings: from what I wrote above, it’s obvious we can see ‘flicker’ only if the oscillation is in the range of 10 or 20 Hz. The eye of a bee is made to see the vibrations of feet and wings of other bees and, hence, its averaging time is much shorter, like a hundredth of a second and, hence, it can see flicker up to 200 oscillations per second! In addition, the eye of a bee is sensitive over a much wider range of ‘color’ – it sees UV light down to a wavelength of 300 nm (where as we don’t see light with a wavelength below 400 nm) – and, to top it all off, it has got a special sensitivity for polarized light, so light that gets reflected or diffracted looks different to the bee.]

Let’s go to the third and final case. If a tenth of a second would cover less than the wavelength of the the so-called carrier wave, i.e. the actual oscillation, then we will be able to distinguish the individual peaks and troughs of the carrier wave!

Of course, this discussion is not limited to our eye as a sensor: any instrument will be able to measure individual phenomena only within a certain range, with an upper and a lower range, i.e. the ‘biggest’ thing it can see, and the ‘smallest’. So that explains the so-called resolution of an optical or an electron microscope: whatever the instrument, it cannot really ‘see’ stuff that’s smaller than the wavelength of the ‘light’ (real light or – in the case of an electron microscope – electron beams) it uses to ‘illuminate’ the object it is looking at. [The actual formula for the resolution of a microscope is obviously a bit more complicated, but this statement does reflect the gist of it.]

However, all that I am writing above, suggests that we can think of what’s going on here as ‘waves within waves’, with the wave between nodes not being any different – in substance that is – as the wave as a whole: we’ve got something that’s oscillating, and within each individual oscillation, we find another oscillation. From a math point of view, babushka thinking is thinking we can analyze the world using Fourier’s machinery to decompose some function (see my posts on Fourier analysis). Indeed, in the example above, we have a modulated carrier wave (it is an example of amplitude modulation – the old-fashioned way of transmitting radio signals), and we see a wave within a wave and, hence, just like the Rutherford model of an atom, you may think there will always be ‘a wave within a wave’.

In this regard, you may think of fractals too: fractals are repeating or self-similar patterns that are always there, at every scale. However, the point to note is that fractals do not represent an accurate picture of how reality is actually structured: worlds within worlds are not the same.

Reality is no onion

Reality is not some kind of onion, from which you peel off a layer and then you find some other layer, similar to the first: “same-same but different”, as they’d say in Asia. The Coast of Britain is, in fact, finite, and the grain of sand you’ll pick up at one of its beaches will not look like the coastline when you put it under a microscope. In case you don’t believe me: I’ve inserted a real-life photo below. The magnification factor is a rather modest 300 times. Isn’t this amazing? [The credit for this nice picture goes to a certain Dr. Gary Greenberg. Please do google his stuff. It’s really nice.]


In short, fractals are wonderful mathematical structures but – in reality – there are limits to how small things get: we cannot carve a babushka doll out of the cellulose and lignin molecules that make up most of what we call wood. Likewise, the atoms that make up the D-glucose chains in the cellulose will never resemble the D-glucose chains. Hence, the babushka doll, the D-glucose chains that make up wood, and the atoms that make up the molecules within those macro-molecules are three different worlds. They’re not like layers of the same onion. Scale matters. The worlds inside words are different, and fundamentally so: not “same-same but different” but just plain different. Electrons are no longer point-like negative charges when we look at them at close range.

In fact, that’s the whole point: we can’t look at them at close range because we can’t ‘locate’ them. They aren’t particles. They are these strange ‘wavicles’ which we described, physically and mathematically, with a complex wave function relating their position (or their momentum) with some probability amplitude, and we also need to remember these funny rules for adding these amplitudes, depending on whether or not the ‘wavicle’ obeys Fermi or Bose statistics.

Weird, but – come to think of it – not more weird, in terms of mathematical description, than these electromagnetic waves. Indeed, when jotting down all these equations and developing all those mathematical argument, one often tends to forget that we are not talking some physical wave here. The field vector E (or B) is a mathematical construct: it tells us what force a charge will feel when we put it here or there. It’s not like a water or sound wave that makes some medium (water or air) actually move. The field is an influence that travels through empty space. But how can something actually through empty space? When it’s truly empty, you can’t travel through it, can you?

Oh – you’ll say – but we’ve got these photons, don’t we? Waves are not actually waves: they come in little packets of energy – photons. Yes. You’re right. But, as mentioned above, these photons aren’t little bullets – or particles if you want. They’re as weird as the wave and, in any case, even a billiard ball view of the world is not very satisfying: what happens exactly when two billiard balls collide in a so-called elastic collision? What are the springs on the surface of those balls – in light of the quick reaction, they must resemble more like little explosive charges that detonate on impact, isn’t it? – that make the two balls recoil from each other?

So any mathematical description of reality becomes ‘weird’ when you keep asking questions, like that little child I was – and I still am, in a way, I guess. Otherwise I would not be reading physics at the age of 45, would I? 🙂


Let me wrap up here. All of what I’ve been blogging about over the past few months concerns the classical world of physics. It consists of waves and fields on the one hand, and solid particles on the other – electrons and nucleons. But so we know it’s not like that when we have more sensitive apparatuses, like the apparatus used in that 2012 double-slit electron interference experiment at the University of Nebraska–Lincoln, that I described at length in one of my earlier posts. That apparatus allowed control of two slits – both not more than 62 nanometer wide (so that’s the difference between the wavelength of dark-blue and light-blue light!), and the monitoring of single-electron detection events. Back in 1963, Feynman already knew what this experiment would yield as a result. He was sure about it, even if he thought such instrument could never be built. [To be fully correct, he did have some vague idea about a new science, for which he himself coined the term ‘nanotechnology’, but what we can do today surpasses, most probably, all his expectations at the time. Too bad he died too young to see his dreams come through.]

The point to note is that this apparatus does not show us another layer of the same onion: it shows an entirely different world. While it’s part of reality, it’s not ‘our’ reality, nor is it the ‘reality’ of what’s being described by classical electromagnetic field theory. It’s different – and fundamentally so, as evidenced by those weird mathematical concepts one needs to introduce to sort of start to ‘understand’ it.

So… What do I want to say here? Nothing much. I just had to remind myself where I am right now. I myself often still fall prey to babushka thinking. We shouldn’t. We should wonder about the wood these dolls are made of. In physics, the wood seems to be math. The models I’ve presented in this blog are weird: what are those fields? And just how do they exert a force on some charge? What’s the mechanics behind? To these questions, classical physics does not have an answer really.

But, of course, quantum mechanics does not have a very satisfactory answer either: what does it mean when we say that the wave function collapses? Out of all of the possibilities in that wonderful indeterminate world ‘inside’ the quantum-mechanical universe, one was ‘chosen’ as something that actually happened: a photon imparts momentum to an electron, for example. We can describe it, mathematically, but – somehow – we still don’t really understand what’s going on.

So what’s going on? We open a doll, and we do not find another doll that is smaller but similar. No. What we find is a completely different toy. However – Surprise ! Surprise ! – it’s something that can be ‘opened’ as well, to reveal even weirder stuff, for which we need even weirder ‘tools’ to somehow understand how it works (like lattice QCD, if you’d want an example: just google it if you want to get an inkling of what that’s about). Where is this going to end? Did it end with the ‘discovery’ of the Higgs particle? I don’t think so.

However, with the ‘discovery’ (or, to be generous, let’s call it an experimental confirmation) of the Higgs particle, we may have hit a wall in terms of verifying our theories. At the center of a set of babushka dolls, you’ll usually have a little baby: a solid little thing that is not like the babushkas surrounding it: it’s young, male and solid, as opposed to the babushkas. Well… It seems that, in physics, we’ve got several of these little babies inside: electrons, photons, quarks, gluons, Higgs particles, etcetera. And we don’t know what’s ‘inside’ of them. Just that they’re different. Not “same-same but different”. No. Fundamentally different. So we’ve got a lot of ‘babies’ inside of reality, very different from the ‘layers’ around them, which make up ‘our’ reality. Hence, ‘Reality’ is not a fractal structure. What is it? Well… I’ve started to think we’ll never know. For all of the math and wonderful intellectualism involved, do we really get closer to an ‘understanding’ of what it’s all about?

I am not sure. The more I ‘understand’, the less I ‘know’ it seems. But then that’s probably why many physicists still nurture an acute sense of mystery, and why I am determined to keep reading. 🙂

Post scriptum: On the issue of the ‘mechanistic universe’ and the (related) issue of determinability and indeterminability, that’s not what I wanted to write about above, because I consider that solved. This post is meant to convey some wonder – on the different models of understanding that we need to apply to different scales. It’s got little to do with determinability or not. I think that issue got solved long time ago, and I’ll let Feynman summarize that discussion:

“The indeterminacy of quantum mechanics has given rise to all kinds of nonsense and questions on the meaning of freedom of will, and of the idea that the world is uncertain. […] Classical physics is also indeterminate. It is true, classically, that if we knew the position and the velocity of every particle in the world, or in a box of gas, we could predict exactly what would happen. And therefore the classical world is deterministic. Suppose, however, we have a finite accuracy and do not know exactly where just one atom is, say to one part in a billion. Then as it goes along it hits another atom, and because we did not know the position better than one part in a billion, we find an even larger error in the position after the collision. And that is amplified, of course, in the next collision, so that if we start with only a tiny error it rapidly magnifies to a very great uncertainty. […] Speaking more precisely, given an arbitrary accuracy, no matter how precise, one can find a time long enough that we cannot make predictions valid for that long a time. That length of time is not very large. It is not that the time is millions of years if the accuracy is one part in a billion. The time goes only logarithmically with the error. In only a very, very tiny time – less than the time it took to state the accuracy – we lose all our information. It is therefore not fair to say that from the apparent freedom and indeterminacy of the human mind, we should have realized that classical ‘deterministic’ physics could not ever hope to understand, and to welcome quantum mechanics as a release from a completely ‘mechanistic’ universe. For already in classical mechanics, there was indeterminability from a practical point of view.” (Feynman, Lectures, 1963, p. 38-10)

That really says it all, I think. I’ll just continue to keep my head down – i.e. stay away from philosophy as for now – and try to find a way to open the toy inside the toy. 🙂

Light: relating waves to photons

This is a concluding note on my ‘series’ on light. The ‘series’ gave you an overview of the ‘classical’ theory: light as an electromagnetic wave. It was very complete, including relativistic effects (see my previous post). I could have added more – there’s an equivalent for four-vectors, for example, when we’re dealing with frequencies and wave numbers: quantities that transform like space and time under the Lorentz transformations – but you got the essence.

One point we never ever touched upon, was that magnetic field vector though. It is there. It is tiny because of that 1/c factor, but it’s there. We wrote it as

magnetic field

All symbols in bold are vectors, of course. The force is another vector vector cross-product: F = qv×B, and you need to apply the usual right-hand screw rule to find the direction of the force. As it turns out, that force – as tiny as it is – is actually oriented in the direction of propagation, and it is what is responsible for the so-called radiation pressure.

So, yes, there is a ‘pushing momentum’. How strong is it? What power can it deliver? Can it indeed make space ships sail? Well… The magnitude of the unit vector er’ is obviously one, so it’s the values of the other vectors that we need to consider. If we substitute and average F, the thing we need to find is:

〈F〉 = q〈vE〉/c

But the charge q times the field is the electric force, and the force on the charge times the velocity is the work dW/dt being done on the charge. So that should equal the energy absorbed that is being absorbed from the light per second. Now, I didn’t look at that much. It’s actually one of the very few things I left – but I’ll refer you to Feynman’s Lectures if you want to find out more: there’s a fine section on light scattering, introducing the notion of the Thompson scattering cross section, but – as said – I think you had enough as for now. Just note that 〈F〉 = [dW/dt]/c and, hence, that the momentum that light delivers is equal to the energy that is absorbed (dW/dt) divided by c.

So the momentum carried is 1/c times the energy. Now, you may remember that Planck solved the ‘problem’ of black-body radiation – an anomaly that physicists couldn’t explain at the end of the 19th century – by re-introducing a corpuscular theory of light: he said light consisted of photons. We all know that photons are the kind of ‘particles’ that the Greek and medieval corpuscular theories of light envisaged but, well… They have a particle-like character – just as much as they have a wave-like character. They are actually neither, and they are physically and mathematically being described by these wave functions – which, in turn, are functions describing probability amplitudes. But I won’t entertain you with that here, because I’ve written about that in other posts. Let’s just go along with the ‘corpuscular’ theory of photons for a while.

Photons also have energy (which we’ll write as W instead of E, just to be consistent with the symbols above) and momentum (p), and Planck’s Law says how much:

W = hf and p = W/c

So that’s good: we find the same multiplier 1/c here for the momentum of a photon. In fact, this is more than just a coincidence of course: the “wave theory” of light and Planck’s “corpuscular theory” must of course link up, because they are both supposed to help us understand real-life phenomena.

There’s even more nice surprises. We spoke about polarized light, and we showed how the end of the electric field vector describes a circular or elliptical motion as the wave travels to space. It turns out that we can actually relate that to some kind of angular momentum of the wave (I won’t go into the details though – because I really think the previous posts have really been too heavy on equations and complicated mathematical arguments) and that we could also relate it to a model of photons carrying angular momentum, “like spinning rifle bullets” – as Feynman puts it.

However, he also adds: “But this ‘bullet’ picture is as incomplete as the ‘wave’ picture.” And so that’s true and that should be it. And it will be it. I will really end this ‘series’ now. It was quite a journey for me, as I am making my way through all of these complicated models and explanations of how things are supposed to work. But a fascinating one. And it sure gives me a much better feel for the ‘concepts’ that are hastily explained in all of these ‘popular’ books dealing with science and physics, hopefully preparing me better for what I should be doing, and that’s to read Penrose’s advanced mathematical theories.

Radiation and relativity

Now we are really going to do some very serious analysis: relativistic effects in radiation. Fasten your seat belts please.

The Doppler effect for physical waves traveling through a medium

In one of my post – I don’t remember which one – I wrote about the Doppler effect for sound waves, or any physical wave traveling through a medium. I also said it had nothing to do with relativity. What happens really is that the observer sort of catches up with wave or – the other way around: that he falls back – and, because the velocity of a wave in a medium is always the same (that also has nothing to do with relativity: it’s just a general principle that’s true – always), the frequency of the physical wave will appear to be different. [So that’s why a siren of an ambulance sounds different as it moves past you.] Wikipedia has an excellent article on that – so I’ll refer you to that – but so that article does not say anything – or nothing much – about the Doppler effect when electromagnetic radiation is involved. So that’s what we’ll talk about here. Before we do that, though, let me quickly jot down the formula for the Doppler effect for a physical wave traveling through a medium, so we are clear about the differences between the two ‘Doppler effects’:


In this formula, vp is the propagation speed of the wave in the medium which – as mentioned above – depends on the medium. Now the source and the receiver will have a velocity with respect to the medium as well (positive or negative – depending on whether they’re moving in the same direction as the wave or not), and so that’s vr (r for receiver) and v(s for source) respectively. So we’re adding speeds here to calculate some relative speed and then we take the ratio. Some people think that’s what relativity theory is all about. It’s not. Everything I’ve written so far is just Galilean relativity – so stuff that’s been known for a thousand years already, if not longer.

Relativity is weird: one aspect of it is length contraction – not often discussed – but the other thing is better known: there is no such thing as absolute time, so when we talk velocities, you need to specify according to whose time?

The Doppler effect for electromagnetic radiation

One thing that’s not relative – and which makes things look somewhat similar to what I wrote above – is that the speed of light is always equal to c. That was the startling fact that came out of Maxwell’s equations (startling because it is not consistent with Galilean relativity: it says that we cannot ‘catch up’ with a light wave!) around which Einstein build all of “new physics”, and so it’s something we use rather matter-of-factly in all that we’ll write below.

In all of the preceding posts about light, I wrote – more than once actually – that the movement of the oscillating charge (i.e. the source of the radiation) along the line of sight did not matter: the only thing that mattered was its acceleration in the xy-plane, which is perpendicular to our line of sight, which we’ll call the z-axis. Indeed, let me remind of you of the two equations defining electromagnetic radiation (see my post on light and radiation about a week ago):

Formula 3

Formula 4

The first formula gives the electromagnetic effect that dominates in the so-called wave zone, i.e. a few wavelengths away from the source – because the Coulomb force varies as the inverse of the square of the distance r, unlike this ‘radiation’ effect, which varies inversely as the distance only (E ∝ 1/r), so it falls off much less rapidly.

Now, that general observation still holds when we’re considering an oscillating charge that is moving towards or away from us with some relativistic speed (i.e. a speed getting close enough to c to produce relativistic length contraction and time dilation effects) but, because of the fact that we need to consider local times, our formula for the retarded time is no longer correct.

Huh? Yes. The matter is quite complicated, and Feynman starts with jotting down the derivatives for the displacement in the x- and y-directions, but I think I’ll skip that. I’ll go for the geometrical picture straight away, which is given below. As said, it’s going to be difficult, but try to hang in here, because it’s necessary to understand the Doppler effect in a correct way (I myself have been fooled by quite a few nonsensical explanations of it in popular books) and, as a bonus, you also get to understand synchrotron radiation and other exciting stuff.

Doppler effect with text

So what’s going on here? Well… Don’t look at the illustration above. We’ll come back at it. Let’s first build up the logic. We’ve got a charge moving vertically – from our point that is (the observer), but also moving in and out of us, i.e. in the z-direction. Indeed, note the arrow pointing to the observer: that’s us! So it could indeed be us looking at electrons going round and round and round – at phenomenal speeds – in a synchrotron indeed – but then one that’s been turned 90 degrees (probably easier for us to just lie down on the ground and look sideways). In any case, I hope you can imagine the situation. [If not, try again.] Now, if that charge was not moving at relativistic speeds (e.g. 0.94c, which is actually the number that Feynman uses for the graph above), then we would not have to worry about ‘our’ time t and the ‘local’ time τ. Huh? Local time? Yes.

We denote the time as measured in the reference frame of the moving charge as τ. Hence, as we are counting t = 1, 2, 3 etcetera, the electron is counting τ = 1, 2, 3 etcetera as it goes round and round and round. If the charge would not be moving at relativistic speeds, we’d do the standard thing and that’s to calculate the retarded acceleration of the charge a'(t) = a(t − r/c). [Remember that we used a prime to mark functions for which we should use a retarded argument and, yes, I know that the term ‘retarded’ sounds a bit funny, but that’s how it is. In any case, we’d have a'(t) = a(t − r/c) – so the prime vanishes as we put in the retarded argument.] Indeed, from the ‘law’ of radiation, we know that the field now and here is given by the acceleration of the charge at the retarded time, i.e. t – r/c. To sum it all up, we would, quite simply, relate t and τ as follows:

τ = t – r/c or, what amounts to the same, t = τ + r/c

So the effect that we see now, at time t, was produced at a distance r at time τ = t − r/c. That should make sense. [Again, if it doesn’t: read again. I can’t explain it in any other way.]

The crucial chain in this rather complicated chain of reasoning comes now. You’ll remember that one of the assumptions we used to derive our ‘law’ of radiation was that the assumption that “r is practically constant.” That does no longer hold. Indeed, that electron moving around in the synchrotron comes in and out at us at crazy speeds (0.94c is a bit more than 280,000 km per second), so r goes up and down too, and the relevant distance is not r but r + z(τ). This means the retardation effect is actually a bit larger: it’s not r/c but [r + z(τ)]/c = r/c + z(τ)/c. So we write:

τ = t – r/c – z(τ)/c or, what amounts to the same, t = τ + r/c + z(τ)/c

Hmm… You’ll say this is rather fishy business. Why would we use the actual distance and not the distance a while ago? Well… I’ll let you mull over that. We’ve got two points in space-time here and so they are separated both by distance as well as time. It makes sense to use the actual distance to calculate the actual separation in time, I’d say. If you’re not convinced, I can only refer you to those complicated derivations that Feynman briefly does before introducing this ‘easier’ geometric explanation. This brings us to the next point. We can measure time in seconds, but also in equivalent distance, i.e. light-seconds: the distance that light (remember: always at absolute speed c) travels in one second, i.e. approximately 299,792,458 meter. It’s just another ‘time unit’ to get used to.

Now that’s what’s being done above. To be sure, we get rid of the constant r/c, which is a constant indeed: that amounts to a shift of the origin by some constant (so we start counting earlier or later). In short, we have a new variable ‘t’ really that’s equal to t = τ + z(τ)/c. But we’re going to count time in meter (well – in units of c meter really), so we will just multiply this and we get:

ct = cτ + z(τ)

Why the different unit? Well… We’re talking relativistic speeds here, don’t we? And so the second is just an inappropriate unit. When we’re finished with this example, we’ll give you an even simpler example: a source just moving in on us, with an equally astronomical speed, so the functional shape of z(τ) will be some fraction of c (let’s say kc) times τ, so kcτ. So, to simplify things, just think of it as re-scaling the time axis in units that makes sense as compared to the speeds we are talking about.

Now we can finally analyze that graph on the right-hand side. If we would keep r fixed – so if we’d not care about the charge moving in and out of – the plot of x'(t) – i.e. the retarded position indeed – against ct would yield the sinusoidal graph plotted by the red numbers 1 to 13 here. In fact, instead of a sinusoidal graph, it resembles a normal distribution, but that’s just because we’re looking at one revolution only. In any case, the point to note is that – when everything is said and done – we need to calculate the retarded acceleration, so that’s the second derivative of x'(t). The animated illustration shows how that works: the second derivative (not the first) turns from positive to negative – and vice versa – at inflection points, when the curve goes from convex to concave, and vice versa. So, on the segment of that sinusoidal function marked by the red numbers, it’s positive at first (the slope of the tangent becomes steeper and steeper), then negative (cf. that tangent line turning from blue to green in the illustration below), and then it becomes positive again, as the negative slope becomes less negative at the second inflection point.

Animated_illustration_of_inflection_pointThat should be straightforward. However, the actual x'(t) curve is the black curve with the cusp. A curve like that is called a hypocycloid. Let me reproduce it once again for ease of reference.

Doppler effect with textWe relate x'(t) to ct (this is nothing but our new unit for t) by noting that ct = cτ + z(τ). Capito? It’s not easy to grasp: the instinct is to just equate t and τ and write x'(t) = x'(τ), but that would not be correct. No. We must measure in ‘our’ time and get a functional form for x’ as a function of t, not of τ. In fact, x'(t) − i.e. the retarded vertical position at time t – is not equal to x'(τ) but to x(τ), i.e. that’s the actual (instead of retarded) position at (local) time τ, and so that’s what the black graph above shows.

I admit it’s not easy. I myself am not getting it in an ‘intuitive’ way. But the logic is solid and leads us where it leads us. Perhaps it helps to think in terms of the curvature of this graph. In fact, we have to think in terms of curvature of this graph in order to understand what’s happening in terms of radiation. When the charge is moving away from us, i.e. during the first three ‘seconds’ (so that’s the 1-2-3 count), we see that the curvature is less than than what it would be – and also doesn’t change very much – if the displacement was given by the sinusoidal function, which means there’s very little radiation (because there’s little acceleration – negative or positive. However, as the charge moves towards us, we get that sharp cusp and, hence, we also get sharp curvature, which results in a sharp pulse of the electric field, rather than the regular – equally sinusoidal – amplitude we’d have if that electron was not moving in and out at us at relativistic speeds. In fact, that’s what synchrotron radiation is: we get these sharp pulses indeed. Feynman shows how they are measured – in very much detail – using a diffraction grating, but that would just be another diversion and so I’ll spare you of that.

Hmm… This has nothing to do with the Doppler effect, you’ll say. Well… Yes and no. The discussion above basically set the stage for that discussion. So let’s turn to that now. However, before I do that, I want to insert another graph for an oscillating charge moving in and out at us in some irregular way – rather than the nice circular route described above.

Doppler irregular

The Doppler effect

The illustration below is a similar diagram as the ones above – but looks much simpler. It shows what happens when an oscillating charge (which we assume to oscillate at its natural or resonant frequency ω0) moves towards us at some relativistic speed v (whatever speed – fairly close to c, so the ratio v/c is substantial). Note that the movement is from A to B – and that the observer (we!) are, once again, at the left – and, hence, the distance traveled is AB = vτ. So what’s the impact on the frequency? That’s shown on the x'(t) graph on the right: the curvature of the sinusoidal motion is much sharper, which means that its angular frequency as we see or measure it (and we’ll denote that by ω1) will be higher: if it’s a larger object emitting ‘white’ light (i.e. a mix of everything), then the light will not longer be ‘white’ but it will have shifted towards the violet spectrum. If it moves away from us, it will appear ‘more red’.

Doppler moving in

What’s the frequency change? Well… The z(τ) function is rather simple here: z(τ) = vτ. Let’s use f and ffor a moment, instead of the angular frequency ω and ω0, as we know they only differ by the factor 2π (ω = 2π/T = 2π·f, with f = 1/T, i.e. the reciprocal of the period). Hence, in a given time Δτ, the number of oscillations will be f0Δτ. These oscillations will be spread over a distance vΔτ, and the time needed to travel that distance is Δτ – of course! For the observer, however, the same number of oscillations now is compressed over a distance (c-v)Δτ. The time needed to travel that distance corresponds to a time interval Δt = (c − v)Δτ/c = (1 − v/c)Δτ. Now, hence, the frequency f will be equal to f0Δτ (the number of oscillations) divided by Δt = (1 − v/c)Δτ. Hence, we get this relatively simple equation:

f = f0/(1 − v/c) and ω = ω0/(1 − v/c)

 Is that it? It’s not quite the same as the formula we had for the Doppler effect of physical waves traveling through a medium, but it’s simple enough indeed. And it also seems to use relative speed. Where’s the Lorentz factor? Why did we need all that complicated machinery?

You are a smart ass ! You’re right. In fact, this is exactly the same formula: if we equal the speed of propagation with c, set the velocity of the receiver to zero, and substitute v (with a minus sign obviously) for the speed of the source, then we get what we get above:

The thing we need to add is that the natural frequency of an atomic oscillator is not the same as that measured when standing still: the time dilation effects kicks in. If w0  is the ‘true’ natural frequency (so measured locally, so to say), then the modified natural frequency – as corrected for the time dilation effect – will be w1 = w0(1 – v2/c2)1/2. Therefore, the grand final relativistic formula for the Doppler effect for electromagnetic radiation is:

Doppler - relativistic

You may feel cheated now: did you really have to suffer going through that story on the synchrotron radiation to get the formula above? I’d say: yes and no. No, because you could be happy with the Doppler formula alone. But, yes, because you don’t get the story about those sharp pulses just from the relativistic Doppler formula alone. So the final answer is: yes. I hope you felt it was worth the suffering 🙂

Loose ends: on energy of radiation and polarized light

I said I would move on to another topic, but let me wrap up some loose ends in this post. It will say a few things about the energy of a field; then it will analyze these electron oscillators in some more detail; and, finally, I’ll say a few words about polarized light.

The energy of a field

You may or may not remember, from our discussions on oscillators and energy, that the total energy in a linear oscillator is a constant sum of two variables: the kinetic energy mv2/2 and the potential energy (i.e. the energy stored in the spring as it expands and contracts) kx2/2 (remember that the force is -kx). So the kinetic energy is proportional to the square of the velocity, and the potential energy to the square of the displacement. Now, from the general solution that we had obtained for a linear oscillator – damped or not – we know that the displacement x, its velocity dx/dt, and even its acceleration are all proportional to the magnitude of the field – with different factors of proportionality of course. Indeed, we have x = qeE0eiωt/m(ω02–ω2), and so every time we take a derivative, we’ll be bring a iω factor down (and so we’ll have another factor of proportionality), but the E0 factor is still the same, and a factor of proportionality multiplied with some constant is still a factor of proportionality. Hence, the energy should be proportional to the square of the amplitude of the motion E0. What more can we say about it?

The first thing to note is that, for a field emanating from a point source, the magnitude of the field vector E will vary inversely with r. That’s clear from our formula for radiation:

Formula 5

Hence, the energy that the source can deliver will vary inversely as the square of the distance. That implies that the energy we can take out of a wave, within a given conical angle, will always be the same, not matter how far away we are. What we have is an energy flux spreading over a greater and greater effective area. That’s what’s illustrated below: the energy flowing within the cone OABCD is independent of the distance r at which it is measured.

Energy cone

However, these considerations do not answer the question: what is that factor of proportionality? What’s its value? What does it depend on?

We know that our formula for radiation is an approximate formula, but it’s accurate for what is called the “wave zone”, i.e. for all of space as soon as we are more than a few wavelengths away from the source. Likewise, Feynman derives an approximate formula only for the energy carried by a wave using the same framework that was used to derive the dispersion relation. It’s a bit boring – and you may just want to go to the final result – but, well… It’s kind of illustrative of how physics analyzes physical situations and derives approximate formulas to explain them.

Let’s look at that framework again: we had a wave coming in, and then a wave being transmitted. In-between, the plate absorbed some of the energy, i.e. there was some damping. The situation is shown below, and the exact formulas were derived in the previous post.

radiation and transparent sheet

Now, we can write the following energy equation for a unit area:

Energy in per second = energy out per second + work done per second

That’s simple, you’ll say. Yes, but let’s see where we get with this. For the energy that’s going in (per second), we can write that as α〈Es2〉, so that’s the averaged square of the amplitude of the electric field emanating from the source multiplied by a factor α. What factor α? Well… That’s exactly what we’re trying to find out: be patient.

For the energy that’s going out per second, we have α〈Es2 + Ea2〉. Why the same α? Well… The transmitted wave is traveling through the same medium as the incoming wave (air, most likely), so it should be the same factor of proportionality. Now, α〈Es2 + Ea2〉 = α[〈Es2〉 + 2〈Es〉〈Ea〉 + 〈Ea2〉]. However, we know that we’re looking at a very thin plate here only, and so the amplitude Ea must be small as compared to Ea. So we can leave its averaged square 〈Ea2〉 value out. Indeed, as mentioned above, we’re looking at an approximation here: any term that’s proportional with NΔz, we’ll leave in (and so we’ll leave 〈Es〉〈Ea〉 in), but terms that are proportional to (NΔz)2 or a higher power can be left out. [That’s, in fact, also the reason why we don’t bother to analyze the reflected wave.]

So we now have the last term: the work done per second in the plate. Work done is force times distance, and so the work done per second (i.e. the power being delivered) is the force times the velocity. [In fact, we should do a dot product but the force and the velocity point are along the same direction – except for a possible minus sign – and so that’s alright.] So, for each electron oscillator, the work done per second will be 〈qeEsv〉 and, hence, for a unit area, we’ll have NΔzqe〈Esv〉. So our energy equation becomes:

α〈Es2〉 = α〈Es2〉 + 2α〈Es〉〈Ea〉 + NΔzqe〈Esv〉

⇔ –2α〈Es〉〈Ea〉 = NΔzqe〈Esv〉

Now, we had a formula for Ea (we didn’t do the derivation of this one though: just accept it):

Formula 8

We can substitute this in the energy equation, noting that the average of Ea is not dependent from time. So the left-hand side of our energy equation becomes:


Formula 9

However, Es(at z) is Es(at atoms) retarded by z/c, so we can insert the same argument. But then, now that we’ve made sure that we got the same argument for Es and v, we know that such average is independent of time and, hence, it will be equal to the 〈Esv〉 factor on the right-hand side of our energy equation, which means this factor can be scrapped. The NΔzqe (and that 2 in the numerator and denominator) can be scrapped as well, of course. We then get the remarkably simple result that

α = ε0c

Hence, the energy carried in an electric wave per unit area and per unit time, which is also referred to as the intensity of the wave, equals:

〈S〉 = ε0c〈E〉

The rate of radiation of energy

Plugging our formula for radiation above into this formula, we get an expression for the power per square meter radiated in the direction q:

Formula 10

In this formula, a’ is, of course, the retarded acceleration, i.e. the value of a at point t – r/c. The formula makes it clear that the power varies inversely as the square of the distance, as it should, from what we wrote above. I’ll spare you the derivation (you’ve had enough of these derivations,  I am sure), but we can use this formula to calculate the total energy radiated in all directions, by integrating the formula over all directions. We get the following general formula:

Formula 10-5

This formula is no longer dependent on the distance r – which is also in line with what we said above: in a given cone, the energy flux is the same. In this case, the ‘cone’ is actually a sphere around the oscillating charge, as illustrated below.

Power out of a sphere

Now, we usually assume we have a nice sinusoidal function for the displacement of the charge and, hence, for the acceleration, so we’ll often assume that the acceleration a equals a = –ω2x0et. In that case, we can average over a cycle (note that the average of a cosine is one-half) and we get:

Formula 11

Now, historically, physicists used a value written as e2, not to be confused with the transcendental number e, equal to e2 = qe2/4πe0, which – when inserted above – yields the older form of the formula above:

P = 2e2a2/3c3

In fact, we actually worked with that e2 factor already, when we were talking about potential energy and calculated the potential energy between a proton and an electron at distance r: that potential energy was equal to e2/r but that was a while ago indeed – and so you’ll probably not remember.

Atomic oscillators

Now, I can imagine you’ve had enough of all these formulas. So let me conclude by giving some actual numbers and values for things. Let’s look at these atomic oscillators and put some values in indeed. Let’s start with calculating the Q of an atomic oscillator.

You’ll remember what the Q of an oscillator is: it is a measure of the ‘quality’ (that’s what the Q stands for really) of a particular oscillator. A high Q implies that, if we ‘hit’ the oscillator, it will ‘ring’ for many cycles, so its decay time will be quite long. It also means that the peak width of its ‘frequency response’ will be quite tall. Huh? The illustrations below will refresh your memory.

The first one (below) gives a very general form for a typical resonance: we have a fixed frequency f0 (which defines the period T, and vice versa), and so this oscillator ‘rings’ indeed, and slowly dies out. An associated concept is the decay time (τ) of an oscillation: that’s the time it takes for the amplitude of the oscillation to fall by a factor 1/e = 1/2.7182… ≈ 36.8% of the original value.

decay time

The second illustration (below) gives the frequency response curve. That assumes there is a continuous driving force, and we know that the oscillator will react to that driving force by oscillating – after an initial transient – at the same frequency driving force, but its amplitude will be determined by (i) the difference between the frequency of the driving force and the oscillator’s natural frequency (f0) as well as (ii) the damping factor. We will not prove it here, but the ‘peak height’ is equal to the low-frequency response (C) multiplied by the Q of the system, and the peak width is f0 divided by Q.

frequency response

But what is the Q for an atomic oscillator? Well… The Q of any system is the total energy content of the oscillator and the work done (or the energy loss) per radian. [If we define it per cycle, then we need to throw an additional 2π factor in – that’s just how the Q has been defined !] So we write:

Q = W/(dW/dΦ)

Now, dW/dΦ = (dW/dt)/(dΦ/dt) = (dW/dt)/ω, so Q = ωW/(dW/dt), which can be re-written as the first-order differential equation dW/dt = -(ω/Q)W. Now, that equation has the general solution

W = W0eωt/Q, with W0 the initial energy.

Using our energy equation – and assuming that our atomic oscillators are radiating at some natural (angular) frequency ω0, which we’ll relate to the wavelength λ = 2πc/ω0 – we can calculate the Q. But what do we use for W0? Well… The kinetic energy of the oscillator is mv2/2. Assuming the displacement x has that nice sinusoidal shape, we get mω2x02/4 for the mean kinetic energy, which we have to double to get the total energy (remember that, on average, the total energy of an oscillator is half kinetic, and half potential), so then we get W = mω2x02/2. Using me (the electron mass) for m, we can then plug it all in, divide and cancel what we need to divide and cancel, and we get the grand result:

 Q = Q = ωW/(dW/dt) = 3λmec2/4πe2 or 1/Q =  4πe2/3λmec2

The second form is preferred because it allows substituting e2/mec2 for yet another ‘historical’ constant, referred to as the classical electron radius r0 = e2/mec2 = 2.82×10–15 m. However, that’s yet another diversion, and I’ll try to spare you here. Indeed, we’re almost done so let’s sprint to the finish.

So all we need now is a value for λ. Well… Let’s just take one: a sodium atom emits light with a wavelength of approximately 600 nanometer. Yes, that’s the yellow-orange light emitted by low-pressure sodium-vapor lamps used for street lighting. So that’s a typical wavelength and we get a Q equal to

Q = 3λ/4πr0 ≈ 5×107.

So what? Well… This is great ! We can finally calculate things like the decay time now – for our atomic oscillators ! Now, there is a formula for the decay time: τ = 2Q/ω. This is a formula we can also write in terms of the wavelength λ because ω and λ are related through the speed of light: ω = 2πf = 2πc/λ. So we can write τ = Qλ/πc. In this case, we get τ ≈ 3.2×10–8 seconds (but please do check my calculation). It seems that that corresponds to experimental fact: light, as emitted by all these atomic oscillators, basically consists of very sharp pulses: one atom emits a pulse, and then another one takes over, etcetera. That’s why light is usually unpolarized – I’ll talk about that in a minute.

In addition, we can calculate the peak width Δf = f0/Q. In fact, we’ll not use frequency but wavelength: Δλ = λ/Q = 1.2×10–14. This also seems to correspond with the width of the so-called spectral lines of light-emitting sodium atoms.

Isn’t this great? With a few simple formulas, we’ve illustrated the strange world of atomic oscillators and electromagnetic radiation. I’ve covered an awful lot of ground here, I feel.

There is one more “loose end” which I’ll quickly throw in here. It’s the topic of polarization – as promised – and then we’re done really. I promise. 🙂


One of the properties of the ‘law’ of radiation as derived by Feynman is that the direction of the electric field is perpendicular to the line of sight. That’s – quite simply – because it’s only the component ax perpendicular to the line of sight that’s important. So if we have a source – i.e. an accelerating electric charge – moving in and out straight at us, we will not get a signal.

That being said, while the field is perpendicular to the line of sight – which we identify with the z-axis – the field still can have two components and, in fact, it is likely to have two components: an x- and a y-component. We show a beam with such x- and y-component below (so that beam ‘vibrates’ not only up and down but also sideways), and we assume it hits an atom – i.e. an electron oscillator – which, in turn, emits another beam. As you can see from the illustration, the light scattered at right angles to the incident beam will only ‘vibrate’ up and down: not sideways. We call such light ‘polarized’. The physical explanation is quite obvious from the illustration below: the motion of the electron oscillator is perpendicular to the z-direction only and, therefore, any radiation measured from a direction that’s perpendicular to that z-axis must be ‘plane polarized’ indeed.

Light can be polarized in various ways. In fact, if we have a ‘regular’ wave, it will always be polarized. With ‘regular’, we mean that both the vibration in the x- and y-direction will be sinusoidal: the phase may or may not be the same, that doesn’t matter. But both vibrations need to be sinusoidal. In that case, there are two broad possibilities: either the oscillations are ‘in phase’, or they are not. When the x- and y-vibrations are in phase, then the superposition of their amplitudes will look like the examples below. You should imagine here that you are looking at the end of the electric field vector, and so the electric field oscillates on a straight line.

Polarization in phase

When they are in phase, it means that the frequency of oscillation is the same. Now, that may not be the case, as shown in the examples below. However, even these ‘out of phase’ x- and y-vibrations produce a nice ellipsoidal motion and, hence, such beams are referred to as being ‘elliptically polarized’.

Polarization out of phase

So what’s unpolarized light then? Well… That’s light that’s – quite simply – not polarized. So it’s irregular. Most light is unpolarized because it was emitted by electron oscillators. From what I explained above, you now know that such electron oscillators emit light during a fraction of a second only – the window is of the order of 10-–8 seconds only actually – so that’s very short indeed (a hundred millionth of a second!). It’s a sharp little pulse basically, quickly followed by another pulse as another atom takes over, and then another and so on. So the light that’s being emitted cannot have a steady phase for more than 10-8 seconds. In that sense, such light will be ‘out of phase’.

In fact, that’s why two light sources don’t interfere. Indeed, we’ve been talking about interference effects all of the time but you may have noticed 🙂 that – in daily life – the combined intensity of light from two sources is just the sum of the intensities of the two lights: we don’t see interference. So there you are. [Now you will, of course, wonder why physics studies phenomena we don’t observe in daily life – but that’s an entirely different matter, and you would actually not be reading this post if you thought that.]

Now, with polarization, we can explain a number of things that we couldn’t explain before. One of them is birefringence: a material may have a different index of refraction depending on whether the light is linearly polarized in one direction rather than another, which explains why the amusing property of Iceland spar, a crystal that doubles the image of anything seen through it. But we won’t play with that here. You can look that up yourself.

Refraction and Dispersion of Light

In this post, we go right at the heart of classical physics. It’s going to be a very long post – and a very difficult one – but it will really give you a good ‘feel’ of what classical physics is all about. To understand classical physics – in order to compare it, later, with quantum mechanics – it’s essential, indeed, to try to follow the math in order to get a good feel for what ‘fields’ and ‘charges’ and ‘atomic oscillators’ actually represent.

As for the topic of this post itself, we’re going to look at refraction again: light gets dispersed as it travels from one medium to another, as illustrated below. 


Dispersion literally means “distribution over a wide area”, and so that’s what happens as the light travels through the prism: the various frequencies (i.e. the various colors that make up natural ‘white’ light) are being separated out over slightly different angles. In physics jargon, we say that the index of refraction depends on the frequency of the wave – but so we could also say that the breaking angle depends on the color. But that sounds less scientific, of course. In any case, it’s good to get the terminology right. Generally speaking, the term refraction (as opposed to dispersion) is used to refer to the bending (or ‘breaking’) of light of a specific frequency only, i.e. monochromatic light, as shown in the photograph below. […] OK. We’re all set now.


It is interesting to note that the photograph above shows how the monochromatic light is actually being obtained: if you look carefully, you’ll see two secondary beams on the left-hand side (with an intensity that is much less than the central beam – barely visible in fact). That suggests that the original light source was sent through a diffraction grating designed to filter only one frequency out of the original light beam. That beam is then sent through a bloc of transparent material (plastic in this case) and comes out again, but displaced parallel to itself. So the block of plastics ‘offsets’ the beam. So how do we explain that in classical physics?

The index of refraction and the dispersion equation

As I mentioned in my previous post, the Greeks had already found out, experimentally, what the index of refraction was. To be more precise, they had measured the θ1 and θ2 – depicted below – for light going from air to water. For example, if the angle in air (θ1) is 20°, then the angle in the water (θ2) will be 15°. It the angle in air is 70°, then the angle in the water will be 45°.   


Of course, it should be noted that a lot of the light will also be reflected from the water surface (yes, imagine the romance of the image of the moon reflected on the surface of glacial lake while you’re feeling damn cold) – but so that’s a phenomenon which is better  explained by introducing probability amplitudes, and looking at light as a bundle of photons, which we will not do here. I did that in previous posts, and so here, we will just acknowledge that there is a reflected beam but not say anything about it.

In any case, we should go step by step, and I am not doing that right now. Let’s first define the index of refraction. It is a number n which relates the angles above through the following relationship, which is referred to as Snell’s Law:

sinθ1 = n sinθ2

Using the numbers given above, we get: sin(20°) = n sin(15°), and sin(70°) = n sin(45°), so n must be equal to n = sin(20°)/sin(15°)  = sin(70°)/sin(45°) ≈ 1.33. Just for the record, Willibrord Snell was a medieval Dutch astronomer but, according to Wikipedia, some smart Persian, Ibn Sahl, had already jotted this down in a treatise – “On Burning Mirrors and Lenses” – while he was serving the Abbasid court of Baghdad, back in 984, i.e. more than a thousand years ago! What to say? It was obviously a time when the Sunni-Shia divide did not matter, and Arabs and ‘Persians’ were leading civilization. I guess I should just salute the Islamic Golden Age here, regret the time lost during Europe’s Dark Ages and, most importantly, regret where Baghdad is right now ! And, as for the ‘burning’ adjective, it just refers to the fact that large convex lenses can concentrate the sun’s rays to a very small area indeed, thereby causing ignition. [It seems that story about Archimedes burning Roman ships with a ‘death ray’ using mirrors – in all likelihood: something that did not happen – fascinated them as well.]

But let’s get back at it. Where were we? Oh – yes – the refraction index. It’s (usually) a positive number written as n = 1 + some other number which may be positive or negative, and which depends on the properties of the material. To be more specific, it depends on the resonant frequencies of the atoms (or, to be precise, I should say: the resonant frequencies of the electrons bound by the atom, because it’s the charges that generate the radiation). Plus a whole bunch of natural constants that we have encountered already, most of which are related to electrons. Let me jot down the formula – and please don’t be scared away now (you can stop a bit later, but not now 🙂 please):

Formula 1

N is just the number of charges (electrons) per unit volume of the material (e.g. the water, or that block of plastic), and qe and m are just the charge and mass of the electron. And then you have that electric constant once again, ε0, and… Well, that’s it ! That’s not too terrible, is it? So the only variables on the right-hand side are ω0 and ω, so that’s (i) the resonant frequency of the material (or the atoms – well, the electrons bound to the nucleus, to be precise, but then you know what I mean and so I hope you’ll allow me to use somewhat less precise language from time to time) and (ii) the frequency of the incoming light.

The equation above is referred to as the dispersion relation. It’s easy to see why: it relates the frequency of the incoming light to the index of refraction which, in turn, determinates that angle θ. So the formula does indeed determine how light gets dispersed, as a function of the frequencies in it, by some medium indeed (glass, air, water,…).

So the objective of this post is to show how we can derive that dispersion relation using classical physics only. As usual, I’ll follow Feynman – arguably the best physics teacher ever. 🙂 Let me warn you though: it is not a simple thing to do. However, as mentioned above, it goes to the heart of the “classical world view” in physics and so I do think it’s worth the trouble. Before we get going, however, let’s look at the properties of that formula above, and relate it some experimental facts, in order to make sure we more or less understand what it is that we are trying to understand. 🙂

First, we should note that the index of refraction has nothing to do with transparency. In fact, throughout this post, we’ll assume that we’re looking at very transparent materials only, i.e. materials that do not absorb the electromagnetic radiation that tries to go through them, or only absorb it a tiny little bit. In reality, we will have, of course, some – or, in the case of opaque (i.e. non-transparent) materials, a lot – of absorption going on, but so we will deal with that later. So, let me repeat: the index of refraction has nothing to do with transparency. A material can have a (very) high index of refraction but be fully transparent. In fact, diamond is a case in point: it has one of the highest indexes of refraction (2.42) of any material that’s naturally available, but it’s – obviously – perfectly transparent. [In case you’re interested in jewellery, the refraction index of its most popular substitute, cubic zirconia, comes very close (2.15-2.18) and, moreover, zirconia actually works better as a prism, so its disperses light better than diamond, which is why it reflects more colors. Hence, real diamond actually sparkles less than zirconia! So don’t be fooled! :-)]

Second, it’s obvious that the index of refraction depends on two variables indeed: the natural, or resonant frequency, ω0, and the frequency ω, which is the frequency of the incoming light. For most of the ordinary gases, including those that make up air (i.e. nitrogen (78%) and oxygen (21%), plus some vapor (averaging 1%) and the so-called noble gas argon (0.93%) – noble because, just like helium and neon, it’s colorless, odorless and doesn’t react easily), the natural frequencies of the electron oscillators are close to the frequency of ultraviolet light. [The greenhouse gases are a different story – which is why we’re in trouble on this planet. Anyway…] So that’s why air absorbs most of the UV, especially the cancer-causing ultraviolet-C light (UVC), which is formally classified as a carcinogen by the World Health Organization. The wavelength of UVC light is 100 to 300 nanometer – as opposed to visible light, which has a wavelength ranging from 400 to 700 nm – and, hence, the frequency of UV light is in the 1000 to 3000 Teraherz range (1 THz = 1012 oscillations per second) – as opposed to visible light, which has a frequency in the range of 400 to 800 THz. So, because we’re squaring those frequencies in the formula, ω2 can then be disregarded in comparison with ω02: for example, 15002 = 2,250,000 and that’s not very different from 15002 – 5002 = 2,000,000. Hence, if we leave the ω2 out, we are still dividing by a very large number. That’s why n is very close to one for visible light entering the atmosphere from space (i.e. the vacuum). Its value is, in fact, around 1.000292 for incoming light with a wavelength of 589.3 nm (the odd value is the mean of so-called sodium D light, a pretty common yellow-orange light (street lights!), so that’s why it’s used as a reference value – however, don’t worry about it).

That being said, while the n of air is close to one for all visible light, the index is still slightly higher for blue light as compared to red light, and that’s why the sky is blue, except in the morning and evening, when it’s reddish. Indeed, the illustration below is a bit silly, but it gives you the idea. [I took this from so I’ll refer you to that for the full narrative on that. :-)]


Where are we in this story? Oh… Yes. Two frequencies. So we should also note that – because we have two frequency variables – it also makes sense to talk about, for instance, the index of refraction of graphite (i.e. carbon in its most natural occurrence, like in coal) for x-rays. Indeed, coal is definitely not transparent to visible light (that has to do with the absorption phenomenon, which we’ll discuss later) but it is very ‘transparent’ to x-rays. Hence, we can talk about how graphite bends x-rays, for example. In fact, the frequency of x-rays is much higher than the natural frequency of the carbon atoms and, hence, in this case we can neglect the w02 factor, so we get a denominator that is negative (because only the -w2 remains relevant), so we get a refraction index that is (a bit) smaller than 1. [Of course, our body is transparent to x-rays too – to a large extent – but in different degrees, and that’s why we can take x-ray photographs of, for example, a broken rib or leg.]

OK. […] So that’s just to note that we can have a refraction index that is smaller than one and that’s not ‘anomalous’ – even if that’s a historical term that has survived. 

Finally, last but not least as they say, you may have heard that scientists and engineers have managed to construct so-called negative index metamaterials. That matter is (much) more complicated than you might think, however, and so I’ll refer you to the Web if you want to find out more about that.

Light going through a glass plate: the classical idea

OK. We’re now ready to crack the nut. We’ll closely follow my ‘Great Teacher’ Feynman (Lectures, Vol. I-31) as he derives that formula above. Let me warn you again: the narrative below is quite complicated, but really worth the trouble – I think. The key to it all is the illustration below. The idea is that we have some electromagnetic radiation emanating from a far-away source hitting a glass plate – or whatever other transparent material. [Of course, nothing is to scale here: it’s just to make sure you get the theoretical set-up.] 

radiation and transparent sheet

So, as I explained in my previous post, the source creates an oscillating electromagnetic field which will shake the electrons up and down in the glass plate, and then these shaking electrons will generate their own waves. So we look at the glass as an assembly of little “optical-frequency radio stations” indeed, that are all driven with a given phase. It creates two new waves: one reflecting back, and one modifying the original field.

Let’s be more precise. What do we have here? First, we have the field that’s generated by the source, which is denoted by Es above. Then we have the “reflected” wave (or field – not much difference in practice), so that’s Eb. As mentioned above, this is the classical theory, not the quantum-electrodynamical one, so we won’t say anything about this reflection really: just note that the classical theory acknowledges that some of the light is effectively being reflected.

OK. Now we go to the other side of the glass. What do we expect to see there? If we would not have the glass plate in-between, we’d have the same Es field obviously, but so we don’t: there is a glass plate. 🙂 Hence, the “transmitted” wave, or the field that’s arriving at point P let’s say, will be different than Es. Feynman writes it as Es + Ea

Hmm… OK. So what can we say about that? Not easy…

The index of refraction and the apparent speed of light in a medium

Snell’s Law – or Ibn Sahl’s Law – was re-formulated, by a 17th century French lawyer with an interesting in math and physics, Pierre de Fermat, as the Principle of Least Time. It is a way of looking at things really – but it’s very confusing actually. Fermat assumed that light traveling through a medium (water or glass, for instance) would travel slower, by a certain factor n, which – indeed – turns out to be the index of refraction. But let’s not run before we can walk. The Principle is illustrated below. If light has to travel from point S (the source) to point D (the detector), then the fastest way is not the straight line from S to D, but the broken S-L-D line. Now, I won’t go into the geometry of this but, with a bit of trial and error, you can verify for yourself that it turns out that the factor n will indeed be the same factor n as the one which was ‘discovered’ by Ibn Sahl: sinθ1 = n sinθ2.

Least time principle

What we have then, is that the apparent speed of the wave in the glass plate that we’re considering here will be equal to v = c/n. The apparent speed? So does that mean it is not the real speed? Hmm… That’s actually the crux of the matter. The answer is: yes and no. What? An ambiguous answer in physics? Yes. It’s ambiguous indeed. What’s the speed of a wave? We mentioned above that n could be smaller than one. Hence, in that case, we’d have a wave traveling faster than the speed of light. How can we make sense of that?

We can make sense of that by noting that the wave crests or nodes may be traveling faster than c, but that the wave itself – as a signal – cannot travel faster than light. It’s related to what we said about the difference between the group and phase velocity of a wave. The phase velocity – i.e. the nodes, which are mathematical points only – can travel faster than light, but the signal as such, i.e. the wave envelope in the illustration below, cannot.

Wave_group (1)

What is happening really is the following. A wave will hit one of these electron oscillators and start a so-called transient, i.e. a temporary response preceding the ‘steady state’ solution (which is not steady but dynamic – confusing language once again – so sorry!). So the transient settles down after a while and then we have an equilibrium (or steady state) oscillation which is likely to be out of phase with the driving field. That’s because there is damping: the electron oscillators resist before they go along with the driving force (and they continue to put up resistance, so the oscillation will die out when the driving force stops!). The illustration below shows how it works for the various cases:

delay and advance of phase

In case (b), the phase of the transmitted wave will appear to be delayed, which results in the wave appearing to travel slower, because the distance between the wave crests, i.e. the wavelength λ, is being shortened. In case (c), it’s the other way around: the phase appears to be advanced, which translated into a bigger distance between wave crests, or a lengthening of the wavelength, which translates into an apparent higher speed of the transmitted wave.

So here we just have a mathematical relationship between the (apparent) speed of a wave and its wavelength. The wavelength is the (apparent) speed of the wave (that’s the speed with which the nodes of the wave travel through space, or the phase velocity) divided by the frequency: λ = vp/f. However, from the illustration above, it is obvious that the signal, i.e. the start of the wave, is not earlier – or later – for either wave (b) and (c). In fact, the start of the wave, in time, is exactly the same for all three cases. Hence, the electromagnetic signal travels at the same speed c, always.

While this may seem obvious, it’s quite confusing, and therefore I’ll insert one more illustration below. What happens when the various wave fronts of the traveling field hit the glass plate (coming from the top-left hand corner), let’s say at time t = t0, as shown below, is that the wave crests will have the same spacing along the surface. That’s obvious because we have a regular wave with a fixed frequency and, hence, a fixed wavelength λ0, here. Now, these wave crests must also travel together as the wave continues its journey through the glass, which is what is shown by the red and green arrows below: they indicate where the wave crest is after one and two periods (T and 2T) respectively.

Wave crest and frequency

To understand what’s going on, you should note that the frequency f of the wave that is going through the glass sheet and, hence, its period T, has not changed. Indeed, the driven oscillation, which was illustrated for the two possible cases above (n > 1 and n < 1), after the transient has settled down, has the same frequency (f) as the driving source. It must. Always. That being said, the driven oscillation does have that phase delay (remember: we’re in the (b) case here, but we can make a similar analysis for the (c) case). In practice, that means that the (shortest) distance between the crests of the wave fronts at time t = t0 and the crests at time t0 + T will be smaller. Now, the (shortest) distance between the crests of a wave is, obviously, the wavelength divided by the frequency: λ = vp/f, with vp the speed of propagation, i.e. the phase velocity, of the wave, and f = 1/T. [The frequency f is the reciprocal of the period T – always. When studying physics, I found out it’s useful to keep track of a few relationships that hold always, and so this is one of them. :-)]

Now, the frequency is the same, but so the wavelength is shortened as the wave travels through the various layers of electron oscillators, each causing a delay of phase – and, hence, a shortening of the wavelength, as shown above. But, if f is the same, and the wavelength is shorter, then vp cannot be equal to the speed of the incoming light, so vp ≠ c. The apparent speed of the wave traveling through the glass, and the associated shortening of the wavelength, can be calculated using Snell’s Law. Indeed, knowing that n ≈ 1.33, we can calculate the apparent speed of light through the glass as v = c/n  ≈ 0.75c and, therefore, we can calculate the wavelength of the wave in the glass l as λ = 0.75λ0.

OK. I’ve been way too lengthy here. Let’s sum it all up:

  • The field in the glass sheet must have the shape that’s depicted above: there is no other way. So that means the direction of ‘propagation’ has been changed. As mentioned above, however, the direction of propagation is a ‘mathematical’ property of the field: it’s not the speed of the ‘signal’.
  • Because the direction of propagation is normal to the wave front, it implies that the bending of light rays comes about because the effective speed of the waves is different in the various materials or, to be even more precise, because the electron oscillators cause a delay of phase.
  • While the speed and direction of propagation of the wave, i.e. the phase velocity, accurately describes the behavior of the field, it is not the speed with which the signal is traveling (see above). That is why it can be larger or smaller than c, and so it should not raise any eyebrow. For x-rays in particular, we have a refractive index smaller than one. [It’s only slightly less than one, though, and, hence, x-ray images still have a very good resolution. So don’t worry about your doctor getting a bad image of your broken leg. 🙂 In case you want to know more about this: just Google x-ray optics, and you’ll find loads of information. :-)]  

Calculating the field

Are you still there? Probably not. If you are, I am afraid you won’t be there ten or twenty minutes from now. Indeed, you ain’t done nothing yet. All of the above was just setting the stage: we’re now ready for the pièce de résistance, as they say in French. We’re back at that illustration of the glass plate and the various fields in front and behind the plate. So we have electron oscillators in the glass plate. Indeed, as Feynman notes: “As far as problems involving light are concerned, the electrons behave as though they were held by springs. So we shall suppose that the electrons have a linear restoring force which, together with their mass m, makes them behave like little oscillators, with a resonant frequency ω0.”

So here we go:

1. From everything I wrote about oscillators in previous posts, you should remember that the equation for this motion can be written as m[d2x/dt2 + ω02) = F. That’s just Newton’s Law. Now, the driving force F comes from the electric field and will be equal to F = qeEs.

Now, we assume that we can chose the origin of time (i.e. the moment from which we start counting) such that the field Es = E0cos(ωt). To make calculations easier, we look at this as the real part of a complex function Es = E0eiωt. So we get:

m[d2x/dt2 + ω02] = qeE0eiωt

We’ve solved this before: its solution is x = x0eiωt. We can just substitute this in the equation above to find x0 (just substitute and take the first- and then second-order derivative of x indeed): x0 = qeE0/m(ω022). That, then, gives us the first piece in this lengthy derivation:

x = qeE0eiωt/m(ω02 2)

Just to make sure you understand what we’re doing: this piece gives us the motion of the electrons in the plate. That’s all.

2. Now, we need an equation for the field produced by a plane of oscillating charges, because that’s what we’ve got here: a plate or a plane of oscillating charges. That’s a complicated derivation in its own, which I won’t do there. I’ll just refer to another chapter of Feynman’s Lectures (Vol. I-30-7) and give you the solution for it (if I wouldn’t do that, this post would be even longer than it already is):

Formula 2

This formula introduces just one new variable, η, which is the number of charges per unit area of the plate (as opposed to N, which was the number of charges per unit volume in the plate), so that’s quite straightforward. Less straightforward is the formula itself: this formula says that the magnitude of the field is proportional to the velocity of the charges at time t – z/c, with z the shortest distance from P to the plane of charges. That’s a bit odd, actually, but so that’s the way it comes out: “a rather simple formula”, as Feynman puts it.

In any case, let’s use it. Differentiating x to get the velocity of the charges, and plugging it into the formula above yields:

Formula 3

Note that this is only Ea, the additional field generated by the oscillating charges in the glass plate. To get the total electric field at P, we still have to add Es, i.e. the field generated by the source itself. This may seem odd, because you may think that the glass plate sort of ‘shields’ the original field but, no, as Feynman puts it: “The total electric field in any physical circumstance is the sum of the fields from all the charges in the universe.”

3. As mentioned above, z is the distance from P to the plate. Let’s look at the set-up here once again. The transmitted wave, or Eafter the plate as we shall note it, consists of two components: Es and Ea. Es here will be equal to (the real part of) Es = E0eiω(t-z/c). Why t – z/c instead of just t? Well… We’re looking at Es here as measured in P, not at Es at the glass plate itself.   

radiation and transparent sheet

Now, we know that the wave ‘travels slower’ through the glass plate (in the sense that its phase velocity is less, as should be clear from the rather lengthy explanation on phase delay above, or – if n would be greater than one – a phase advance). So if the glass plate is of thickness Δz, and the phase velocity is is v = c/n, then the time it will take to travel through the glass plate will be Δz/(c/n) instead of Δz/c (speed is distance divided by time and, hence, time = distance divided by speed). So the additional time that is needed is Δt = Δz/(c/n) – Δz/c = nΔz/c – Δz/c = (n-1)Δz/c. That, then, implies that Eafter the plate is equal to a rather monstrously looking expression:    

Eafter plate = E0eiω[t (n1)Δz/c z/c) = eiω(n1)Δz/c)E0eiω(t z/c)

We get this by just substituting t for t – Δt.

So what? Well… We have a product of two complex numbers here and so we know that this involves adding angles – or substracting angles in this case, rather, because we’ve got a minus sign in the exponent of the first factor. So, all that we are saying here is that the insertion of the glass plate retards the phase of the field with an amount equal to w(n-1)Δz/c. What about that sum Eafter the plate = Es + Ea that we were supposed to get?

Well… We’ll use the formula for a first-order (linear) approximation of an exponential once again: ex ≈ 1 + x. Yes. We can do that because Δz is assumed to be very small, infinitesimally small in fact. [If it is not, then we’ll just have to assume that the plate consists of a lot of very thin plates.] So we can write that eiω(n-1)Δz/c) = 1 – iω(n-1)Δz/c, and then we, finally, get that sum we wanted:

Eafter plate = E0eiω[t z/c) iω(n-1)Δz·E0eiω(t z/c)/c

The first term is the original Es field, and the second term is the Ea field. Geometrically, they can be represented as follows:

Addition of fields

Why is Ea perpendicular to Es? Well… Look at the –i = 1/i factor. Multiplication with –i amounts to a clockwise rotation by 90°, and then just note that the magnitude of the vector must be small because of the ω(n-1)Δz/c factor.  

4. By now, you’ve either stopped reading (most probably) or, else, you wonder what I am getting at. Well… We have two formulas for Ea now:

Formula 4

and Ea = – iω(n-1)Δz·E0eiω(t – z/c)/c

Equating both yields:

Formula 5

But η, the number of charges per unit area, must be equal to NΔz, with N the number of charges per unit volume. Substituting and then cancelling the Δz finally gives us the formula we wanted, and that’s the classical dispersion relation whose properties we explored above:

Formula 6

Absorption and the absorption index

The model we used to explain the index of refraction had electron oscillators at its center. In the analysis we did, we did not introduce any damping factor. That’s obviously not correct: it means that a glass plate, once it had illuminated, would continue to emit radiation, because the electrons would oscillate forever. When introducing damping, the denominator in our dispersion relation becomes m(ω02 – ω2 + iγω), instead of m(ω02 – ω2). We derived this in our posts on oscillators. What it means is that the oscillator continues to oscillate with the same frequency as the driving force (i.e. not its natural frequency) – so that doesn’t change – but that there is an envelope curve, ensuring the oscillation dies out when the driving force is no longer being applied. The γ factor is the damping factor and, hence, determines how fast the damping happens.

We can see what it means by writing the complex index of refraction as n = n’ – in’’, with n’ and n’’ real numbers, describing the real and imaginary part of n respectively. Putting that complex n in the equation for the electric field behind the plate yields:

Eafter plate = eωn’’Δz/ceiω(n’1)Δz/cE0eiω(t z/c)

This is the same formula that we had derived already, but so we have an extra exponential factor: eωn’’Δz/c. It’s an exponential factor with a real exponent, because there were two i‘s that cancelled. The e-x function has a familiar shape (see below): e-x is 1 for x = 0, and between 0 and 1 for any value in-between. That value will depend on the thickness of the glass sheet. Hence, it is obvious that the glass sheet weakens the wave as it travels through it. Hence, the wave must also come out with less energy (the energy being proportional to the square of the amplitude). That’s no surprise: the damping we put in for the electron oscillators is a friction force and, hence, must cause a loss of energy.

Note that it is the n’’ term – i.e. the imaginary part of the refractive index n – that determines the degree of absorption (or attenuation, if you want). Hence, n’’ is usually referred to as the “absorption index”.

The complete dispersion relation

We need to add one more thing in order to get a fully complete dispersion relation. It’s the last thing: then we have a formula which can really be used to describe real-life phenomena. The one thing we need to add is that atoms have several resonant frequencies – even an atom with only one electron, like hydrogen ! In addition, we’ll usually want to take into account the fact that a ‘material’ actually consists of various chemical substances, so that’s another reason to consider more than one resonant frequency. The formula is easily derived from our first formula (see the previous post), when we assumed there was only one resonant frequency. Indeed, when we have Nk electrons per unit of volume, whose natural frequency is ωk and whose damping factor is γk, then we can just add the contributions of all oscillators and write:

Formula 7

The index described by this formula yields the following curve:

Several resonant frequencies

So we have a curve with a positive slope, and a value n > 1, for most frequencies, except for a very small range of ω’s for which the slope is negative, and for which the index of refraction has a value n < 1. As Feynman notes, these ω’s– and the negative slope – is sometimes referred to as ‘anomalous’ dispersion but, in fact, there’s nothing ‘abnormal’ about it.

The interesting thing is the iγkω term in the denominator, i.e. the imaginary component of the index, and how that compares with the (real) “resonance term” ωk2– ω2. If the resonance term becomes very small compared to iγkω, then the index will become almost completely imaginary, which means that the absorption effect becomes dominant. We can see that effect in the spectrum of light that we receive from the sun: there are ‘dark lines’, i.e. frequencies that have been strongly absorbed at the resonant frequencies of the atoms in the Sun and its ‘atmosphere’, and that allows us to actually tell what the Sun’s ‘atmosphere’ (or that of other stars) actually consists of.      

So… There we are. I am aware of the fact that this has been the longest post of all I’ve written. I apologize. But so it’s quite complete now. The only piece that’s missing is something on energy and, perhaps, some more detail on these electron oscillators. But I don’t think that’s so essential. It’s time to move on to another topic, I think.

Euler’s spiral

When talking diffraction, one of the more amusing curves is the curve showing the intensity of light near the edge of a shadow. It is shown below.

Fig 30-9

Light becomes more intense as we move away from the edge, then it overshoots (so it is brighter than further away), then the intensity wobbles and oscillates, to finally ‘settle’ at the intensity of the light elsewhere.

How do we get a curve like that? We get it through another amusing curve: the Cornu spiral (which was re-named as the Euler spiral for some reason I don’t understand), which we’ve encountered also when adding probability amplitudes. Let me first depict the ‘real’ situation below: we have an opaque object AB, so no light goes through AB itself. However, the light that goes past it, casts a shadow on a screen, which is denoted as QPR here. And so the curve above shows the intensity of the light near the edge of that shadow.

Fig 30-7

The first weird thing to note is what I said about diffraction of light through a slit (or a hole – in somewhat less respectful language) in my previous post: the diffraction patterns can be explained if we assume that there are sources distributed, with uniform density, across the open holes. This is a deep mystery, which I’ll attempt to explain later. As for now, I can only state what Feynman has to say about it: “Of course, actually there are no sources at the holes. In fact, that is the only place that there are certainly no sources. Nevertheless, we get the correct diffraction pattern by considering the holes to be the only places where there are sources.”

So we do the same here. We assume that we have a series of closely spaced ‘antennas’, or sources, starting from B, up to D, E, C and all the way up to infinity, and so we need to add the contributions – or the waves – from these sources to calculate the intensity at all of the points on the screen. Let’s start with the (random) point P. P defines the inflection point D: we’ll say the phase there is zero (because we can, of course, choose our point in time so as to make it zero). So we’ll associate the contribution from D with a tiny vector (an infinitesimal vector) with angle zero. That is shown below: it’s the ‘flat’ (horizontal) vector pointing straight east at the very center of this so-called Cornu spiral.

Fig 30-8

Now, in the neighborhood of D, i.e. just below or above point D, the phase difference will be very small, because the distance from those points near D to P will not differ much from the distance between D and P (i.e. the distance DP). However, as h increases, the phase difference will become larger and larger, it will not increase linearly with h but, because of the geometry involved, the path difference – and, hence, the phase difference (remember – from the previous post – that the phase difference was the product of the wave number and the difference in distance) will increase proportionally with the square of h. In fact, using similar triangles once again, we can easily show that this path difference EF can be approximated by EF ≈ h2/s. However, don’t lose sleep if you wouldn’t manage to figure that out. 🙂

The point to note is that, when you look at that spiral above, the angle of each vector that we’re adding, increases more and more, so that’s why we get a spiral, and not a polygon in a circle, such as the one we encountered in our previous post: the phase differences there were linearly proportional and, hence, each vector added a constant angle to the previous one. Likewise, if we go down from D, to the edge B, the angles will decrease. Of course, if we’re adding contributions to get the amplitude or intensity for point P, we will not get any contributions from points below B. The last (or, I should say, the first) contribution that we get is denoted by the vector BP on that spiral curve, so if we want to get the total contribution, then we have to start adding vectors from there. [Don’t worry: you’ll understand why the other vectors, ‘down south’, are there in a few minutes.]

So we start from BP and go all the way… Well… You see that, once, we’re ‘up north’, in the center of the upper-most spiral, we’re not adding much anymore, because the additional vectors are just sharply changing direction and going round and round and round. In short, most of the contribution to the amplitude of the resultant vector BP∞ is given by points near D. Now, we have chosen point P randomly, and you can easily see from that Cornu spiral that the amplitude, or the intensity rather (which is the square of the amplitude) of that vector BP∞, increases initially, to reach some maximum, depending upon where P is located above B, but then it falls and oscillates indeed, producing the curve with which we started this post.

OK. […] So what else do we have here? Well… That Cornu spiral also shows how we should add arrows to get the intensity at point Q. We’d be adding arrows in the upper-most spiral only and, hence, we would not get much of a total contribution as a result. That’s what marked by vector BQ. On the other hand, if we’d be adding contributions to calculate the intensity at a point much higher than P, i.e. R, then we’d be using pretty much all of the arrows, down from the spiral ‘south’ all the way up to the spiral ‘north’. So that’s BR obviously and, as you can see, most of the contribution comes, once again, from points near D, so that’s the points near the edge. [So now you know why we have an infinite number of arrows in both directions: we need to be able to calculate the intensity from any point on the screen really, below or above P.]

OK. What else? Well… Nothing. This is it really − for the moment that is. Just note that we’re not adding probability amplitudes here (unlike what we did a couple of months ago). We’re adding vectors representing something real here: electric field vectors. [As for how ‘real’ they are: I’ll entertain you about that later. :-)]

This was rather short, isn’t it? I hope you liked it because… Well… What will follow is actually much more boring, because it involves a lot more formulas. However, these formulas will help us get where we want to get, and that is to understand – somehow, if only from a classical perspective – why that empty space acts like an array of electromagnetic radiation sources.

Indeed, when everything is said and done, that’s the deep mystery of light really. Really really deep. 

Diffraction gratings

Diffraction gratings are fascinating. The iridescent reflections from the grooves of a compact disc (CD), or from oil films, soap bubbles: it is all the same principle (or closely related – to be precise). In my April, 2014 posts, I introduced Feynman’s ‘arrows’ to explain it. Those posts talked about probability amplitudes, light as a bundle of photons, quantum electrodynamics. They were not wrong. In fact, the quantum-electrodynamical explanation is actually the only one that’s 100% correct (as far as we ‘know’, of course). But it is also more complicated than the classical explanation, which just explains light as waves.

To understand the classical explanation, one first needs to understand how electromagnetic waves interfere. That’s easy, you’ll say. It’s all about adding waves, isn’t it? And we have done that before, haven’t we? Yes. We’ve done it for sinusoidal waves. We also noted that, from a math point of view, the easiest way to go about it was to use vectors or complex numbers, and equate the real parts of the complex numbers with the actual physical quantities, i.e. the electric field in this case.

You’re right. Let’s continue to work with sinusoidal waves, but instead of having just two waves, we’ll consider a whole array of sources, because that’s what we’ll need to analyze when analyzing a diffraction grating.

First the simple case: two sources

Let’s first re-analyze the simple situation: two sources – or two dipole radiators as I called them in my previous post. The illustration below gives a top view of two such oscillators. They are separated, in the north-south direction, by a distance d.

Fig 29-10

Is that realistic? It is for radio waves: the wavelength of a 1 megahertz radio wave is 300 m (remember: λ = c/f). So, yes, we can separate two sources by a distance in the same order of magnitude as the wavelength of the radiation, but, as Feynman writes: “We cannot make little optical-frequency radio stations and hook them up with infinitesimal wires and drive them all with a given phase.”

For light, it will work differently – and we’ll describe how, but not now. As for now, we should continue with our radio waves.

The illustration above assumes that the radiation from the two sources is sinusoidal and has the same (maximum) amplitude A, but that the two sources might be out of phase: we’ll denote the difference by α. Hence, we can represent the radiation emitted by the two sources by the real part of the complex numbers Aeiωt and Aei(ωt + α) respectively. Now, we can move our detector around to measure the intensity of the radiation from these two antennas. If we place our detector at some point P, sufficiently far away from the sources, then the angle θ will result in another phase difference, due to the difference in distance from point P to the two oscillators. From simple geometry, we know that this difference will be equal to d·sinθ. The phase difference due to the distance difference will then be equal to the product of the wave number k (i.e. the rate of change of the phase (expressed in radians) with distance, i.e. per meter) and that distance d·sinθ. So the phase difference at arrival (i.e. at point P) would be

Φ2 – Φ1 = α + k· d·sinθ = α + (2π/λ)·d·sinθ

That’s pretty obvious, but let’s play a bit with this, in order to make we understand what’s going on. The illustration below gives two examples: α = 0 and α = π.

Fig 29-5

How do we get these numbers 0, 2 and 4, which indicate the intensity, i.e. the amount of energy that the field carries past per second, which is proportional to the square of the field, averaged in time? [If it would be (visible) light, instead of radio waves, the intensity would be the brightness of the light.]

Well… In the first case, we have α = 0 and d = λ/2 and, hence, at an angle of 30 degrees, we have d·sin(30°) = (λ/2)(1/2) = λ/4. Therefore, Φ2 – Φ1 = α + (2π/λ)·d·sinθ = 0 + (2π/λ)·(λ/4) = π/2. So what? Well… Let’s add the waves. We will have some combined wave with amplitude AR and phase ΦR:

Formula 1

Now, to calculate the length of this ‘vector’, i.e. the amplitude AR, we take the product of this complex number and its complex conjugate, and that will give us the length squared, and then we multiply it all out and so on and so on. To make a long story short, we’ll find that

AR2 = A12 + A22 + 2A1A2cos(Φ2 – Φ1)

The last term in this sum is the interference effect, and so that’s equal to zero in the case we’ve been studying above (α = 0, d = λ/2 and θ = 30°), so we get twice the intensity of one oscillator only. The other cases can be worked out in the same way.

Now, you should not think that the pattern is always symmetric, or simple, as the two illustrations below make clear.

Fig 29-6

Fig 29-7

With more oscillators, the patterns become even more interesting. The illustration below shows part of the intensity pattern of a six-dipole antenna array:

Fig 29-8

Let’s look at that now indeed: arrays with n oscillators.

Arrays with n oscillators

If we have six oscillators, like in the illustration above, we have to add something like this:

R = A[cos(ωt) + cos(ωt + Φ) + cos(ωt + 2Φ) + … + cos(ωt + 5Φ)]

From what we wrote above, it is obvious that the phase difference Φ can have two causes: the oscillators may be driven differently in phase, or we may be looking at them at an angle so that there is a difference in time delay. Hence, we have the same formula as the one above:

Φ = α + (2π/λ)·d·sinθ

Now, we have an interesting geometrical approach to finding the net amplitude AR. We can, once again, consider the various waves as vectors and add them, as shown below.

Fig 30-1

The length of all vectors is the same (A), and then we have the phase difference, i.e. the different angles: zero for A1, Φ for A1, 2Φ for A2, etcetera. So as we’re adding these vectors, we’re going around and forming an equiangular polygon with n sides, with the vertices (corner points) lying on a circle with radius r. It requires just a bit of trigonometry to establish that the following equality must hold: A = 2rsin(Φ/2). So that fixes r. We also have that the large angle OQT equals nΦ and, hence, AR = 2rsin(nΦ/2). We can now combine the results to find the following amplitude and intensity formula:

Formula 6

This formula is obvious for n = 1 and for n = 2: it gives us the results which were shown above already. But here we want to know how this thing behaves for large n. It is easy to see that the numerator above, i.e. sin2(nΦ/2), will always be larger than the denominator, sin2(Φ/2), and that both are – obviously – smaller or equal to 1. It can be demonstrated that this function of the angle Φ reaches its maximum value for Φ = 0. Indeed, taking the limit gives us I = I0n2. [We can intuitively see this because, if we express the angle in radians, we can substitute sin(Φ/2) and sin(nΦ/2) for Φ/2 and nΦ/2, and then we can eliminate the (Φ/2)2 factor to get n2.

It’s a bit more difficult to understand what happens next. If Φ becomes a bit larger, the ratio of the two sines begins to fall off (so it becomes smaller than n2). Note that the numerator, i.e. sin2(nΦ/2), will be equal to one if nΦ/2 = π/2, i.e. if Φ = π/n, and the ratio sin2(nΦ/2)/sin2(Φ/2) then becomes sin2(π/2)/sin2(π/2n) = 1/sin2(π/2n). Again, if we assume that n is (very) large, we can approximate and write that this ratio is more or less equal to 1/(π2/4n2) = 4n22. That means that the intensity there will be 4/ π2 times the intensity of the beam at the maximum, i.e. 40.53% of it. That’s the point at nΦ/2π = 0.5 on the graph below.

Fig 30-2

The graph above has a re-scaled vertical as well as a re-scaled horizontal axis. Indeed, instead of I, the vertical axis shows I/n2I0, so the maximum value is 1. And the horizontal axis does not show Φ but nΦ/2π, so if Φ = π/n, then nΦ/2π = 0.5 indeed. [Don’t worry about the dotted curve: that’s the solid-line curve multiplied by 10: it’s there to make sure you see what’s going on, as this ratio of those sines becomes very small very rapidly indeed.]

So, once we’re past that 40.53% point, we get at our first minimum, which is reached at nΦ/2π = 1 or Φ = 2π/n. The numerator sin2(nΦ/2) equals sin2(π) = 0 there indeed, so the whole ratio becomes zero. Then it goes up again, to our second maximum, which we get when our numerator comes close to one again, i.e. when sin2(nΦ/2) ≈ 1. That happens when nΦ/2 = 3π/2, or Φ = 3π/n. Again, when n is (very) large, Φ will be very small, and so we can substitute the denominator sin2(Φ/2) for Φ2/4. We then get a ratio equal to 1/(9π2/4), or an intensity equal to 4n2I0/9π2, i.e. only 4.5% of the intensity at the (first) maximum. So that’s tiny. [Well… All is relative, of course. :-)] We can go on and on like that but that’s not the point here: the point is that we have a very sharp central maximum with very weak subsidiary maxima on the sides.

But what about that big lobe at 30 degrees on that graph with the six-dipole antenna? Relax. We’re not done yet with this ‘quick’ analysis. Let’s look at the general case from yet another angle, so to say. 🙂

The general case

To focus our minds, we’ve depicted that array with n oscillators below. Once again, we note that the phase difference between two sources, one to the next, will depend on (1) the intrinsic phase difference between them, which we denote by α, and (2) the time delay because we’re observing the system in a given direction q from the normal, which effect we calculated as equal to (2π/λ)·d·sinθ. So the whole effect is Φ = α + (2π/λ)·d·sinθ = a + k·d·sinθ, with k the wave number.

To make things simple, let’s first assume that α = 0. We’re then in the case that we described above: we’ll have a sharp maximum at Φ = 0, so that means θ = 0. It’s easy to see why: all oscillators are in phase and so we have maximum positive (or constructive) interference.

Let’s now examine the first minimum. When looking back at that geometrical interpretation, with the polygon, all the arrows come back to the starting point: we’ve completed a full circle. Indeed, n times Φ gives nΦ = n·2π/n = 2π. So what’s going on here? Well… If we put that value in our formula Φ = α + (2π/λ)·d·sinθ, we get 2π/n = 0 + (2π/λ)·d·sinθ or, getting rid of the 2π factor, n·d·sinθ = λ.

Now, n·d is the total length of the array, i.e. L, and, from the illustration above, we see that n·d·sinλ = L·sinθ = Δ. So we have that n·d·sinθ = λ = Δ. Hence, Δ is equal to one wavelength.That means that the total phase difference between the first and the last oscillator is equal to 2π, and the contributions of all the oscillators in-between are uniformly distributed in phase between 0° and 360°. The net result is a vector AR with amplitude AR = 0 and, hence, the intensity is zero as well.

OK, you’ll say, you’re just repeating yourself here. What about the other lobe or lobes? Well… Let’s go back to that maximum. We had it at Φ = 0, but we will also have it at Φ = 2π, and at Φ = 4π, and at Φ = 6π etcetera, etcetera. We’ll have such sharp maximum – the maximum, in fact – at any Φ = m⋅2π, where m is any integer. Now, plugging that into the Φ = α + (2π/λ)·d·sinθ formula (again, assuming that α = 0), we get m⋅2π = (2π/λ)·d·sinθ or d·sinθ = mλ.

While that looks very similar to our n·d·sinθ = λ = Δ condition for the (first) minimum, we’re not looking at that Δ but at that δ angle measured from the individual sources, and so we have δ = Δ/n = mλ. What’s being said here, is that each successive source is out of phase by 360° and, because, being out of phase by 360° obviously means that you’re in phase once again, ensure that all sources are, once again, contributing in phase and produce a maximum that is just as good as the one we had for m = 0. Now, these maxima will also have a (first) minimum described by that other formula above, and so that’s how we get that pattern of lobes with weak ‘side lobes’.


Now, the conditions presented above for maxima and minima obviously all depend on the distance d, i.e. the spacing of the array, and the wavelength λ. That brings us to an interesting point: if d is smaller than λ (so if the spacing is smaller than one wavelength), we have (d/λ)·sinθ = m < 1, so we only have one solution for m: m = 0. So we only have on beam in that case, the so-called zero-order beam centered at θ = 0. [Note that we also have a beam in the opposite direction.]

The point to note is that we can only have subsidiary great maxima if the spacing d of the array is greater than the wavelength λ. If we have such subsidiary great maxima, we’ll call them first-order, second-order etcetera beams, according to the value m.

Diffraction gratings

We are now, finally, ready to discuss diffraction gratings. A diffraction grating, in its simplest form, is a plane glass sheet with scratches on it: several hundred grooves, or several thousand even, to the millimeter. That is because the spacing has to be of the same order of magnitude of the wavelength of light, so that’s 400 to 700 nanometer (nm) indeed – with the 400-500 nm range corresponding to violet-blue light, and the (longer) 700+ nm range corresponding to red light. Remember, a nanometer is a billionth of a meter (1´10-9 m), so even one thousandth of a millimeter is 1000 nanometer, i.e. longer than the wavelength of red light. Of course, from what we wrote above, it is obvious that the spacing d must be wider than the wavelength of interest to cause second- and third-order beams and, therefore, diffraction but, still, the order of magnitude must be the same to produce anything of interest. Isn’t it amazing that scientists were able to produce such diffraction experiments around the turn of the 18th century already? One of the earliest apparatuses, made in 1785, by the first director of the United States Mint, used hair strung between two finely threaded screws. In any case, let’s go back to the physics of it.

In my previous post, I already noted Feynman’s observation that “we cannot literally make little optical-frequency radio stations and hook them up with infinitesimal wires and drive them all with a given phase.” What happens is something similar to the following set-up, and I’ll quote Feynman again (Vol. I, p. 30-3), just because it’s easier to quote than to paraphrase: “Suppose that we had a lot of parallel wires, equally spaced at a spacing d, and a radio-frequency source very far away, practically at infinity, which is generating an electric field which arrives at each one of the wires at the same phase. Then the external electric field will drive the electrons up and down in each wire. That is, the field which is coming from the original source will shake the electrons up and down, and in moving, these represent new generators. This phenomenon is called scattering: a light wave from some source can induce a motion of the electrons in a piece of material, and these motions generate their own waves.”

When Feynman says “light” here, he means electromagnetic radiation in general. But so what’s happening with visible light? Well… All of the glass in that piece that makes up our diffraction grating scatters light, but so the notches in it scatter differently than the rest of the glass. The light going through the ‘rest of the glass’ goes straight through (a phenomenon which should be explained in itself, but so we don’t do that here), but the notches act as sources and produce  secondary or even tertiary beams, as illustrated by the picture below, which shows a flash of light seen through such grating, showing three diffracted orders: the order m = 0 corresponds to a direct transmission of light through the grating, while the first-order beams (m = +1 and m = -1), show colors with increasing wavelengths (from violet-blue to red), being diffracted at increasing angles.

The ‘mechanics’ are very complicated, and the correct explanation in physics involve a good understanding of quantum electrodynamics, which we touched upon in our April, 2014 posts. I won’t do that here, because here we are introducing the so-called classical theory only. This classical theory does away with all of the complexity of a quantum-electrodynamical explanation and replaces it by what is now as the Huygens-Fresnel Principle, which was first formulated in 1678 (!), and which basically states that “every point which a luminous disturbance reaches becomes a source of a spherical wave, and the sum of these secondary waves determines the form of the wave at any subsequent time.”


This comes from Wikipedia, as do the illustrations below. It does not only ‘explain’ diffraction gratings, but it also ‘explains’ what happens when light goes through a slit, cf. the second (animated) illustration.



Now that, light being diffracted as it is going through a slit, is obviously much more mysterious than a diffraction grating – and, you’ll admit, a diffraction grating is already mysterious enough, because it’s rather strange that only certain points in the grating (i.e. the notches) would act as sources, isn’t it? Now, if that’s difficult to understand, it’s even more difficult to understand why an empty space, i.e. a slit, would act as a diffraction grating! However, because this post has become way too long already, we’ll leave this discussion for later.

Light and radiation

Introduction: Scale Matters

One of the points which Richard Feynman, as a great physics teacher, does admirably well is to point out why scale matters. In fact, ‘old’ physics are not incorrect per se. It’s just that ‘new’ physics analyzes stuff at a much smaller scale.

For example, Snell’s Law, or Fermat’s Principle of Least Time, which were ‘discovered’ 500 years ago – and they are actually older, because they formalize something that the Greeks had already found out: refraction of light, as it travels from one medium (air, for example) into another (water, for example) – are still fine when studying focusing lenses and mirrors, i.e. geometrical optics. The dimensions of the analysis, or the equipment involved (i.e. the lenses or the mirrors), are huge as compared to the wavelength of the light and, hence, we can effectively look at light as a beam that travels from one point to another in a straight line, that bounces of a surface, or as a beam that gets refracted when it passes from one medium to another.

However, when we let the light pass through very narrow slits, it starts behaving like a wave. Geometrical optics does not help us, then, to understand its behavior: we will, effectively, analyze light as a wave-like thing at that scale, and analyze wave-like phenomena, such as interference, the Doppler effect and what have you. That level of analysis is referred to as the classical theory of electromagnetic radiation, and it’s what we’ll be introducing in this post.

The analysis of light as photons, i.e. as a bunch of ‘particles’ described by some kind of ‘wave function’ (which does not describe any real wave, but only some ‘probability amplitude’), is the third and final level of analysis, referred to as quantum mechanics or, to be more precise, as quantum electrodynamics (QED). [Note the terminology: quantum mechanics describes the behavior of matter particles, such as protons and electrons, while quantum electrodynamics (QED) describes the nature of photons, a force-carrying particle, and their interaction with matter particles.]

But so we’ll focus on the second level of analysis in this post.

Different mathematical approaches

One other thing which Feynman points out in his Lectures is that, even within a well-agreed level of analysis, there are different mathematical approaches to a problem. In fact, while, at any level of analysis, there’s (probably) only one fully mathematically correct analysis, approximate approaches may actually be easier to work with, not only because they actually allow us to solve a practical problem, but also because they help us to understand what’s going on. 

Feynman’s treatment of electromagnetic radiation (Volume I, Chapters 28 to 34) is a case in point. While he notes that Maxwell’s field equations are actually the ones to be used, he writes them in a mathematical form that we can understand more easily, and then simplifies that mathematical form even further, in order to derive all that a sophomore student is supposed to know about electromagnetic radiation (EMR), which, of course, not only includes what we call light but also radio waves, radar waves, infrared waves and, on the other side of the spectrum, x-rays and gamma rays. 

But let’s get down to business now.

The oscillating charge

Radiation is caused by some far-away electric charge (q) that’s moving in various directions in a non-uniform way, i.e. it is accelerating or decelerating, and perhaps reversing direction in the process. From our point of view (P), we draw a unit vector er’ in the direction of the charge. [If you want a drawing, there’s one further down.]

We write r’ (r prime), not r, because it is the retarded distance: when we look at the charge, we see where it was r’/c seconds ago: r’/c is indeed the time that’s needed for some influence to travel from the charge to the here and now, i.e. to P. So now we can write Coulomb’s Law:

E1 = –qer’/4πe0r’2

This formula can quickly be explained as follows:

  1. The minus sign makes the direction of the force come out alright: like charges do not attract but repel, unlike gravitation. [Indeed, for gravitation, there’s only one ‘charge’, a mass, and masses always attract. Hence, for gravitation, the force law is that like charges attract, but so that’s not the case here.]
  2. E and er’ and, hence, the electric force, are all directed along the line of sight.
  3. The Coulomb force is proportional to the amount of charge, and the factor of proportionality is 1/4πe0r’2.
  4. Finally, and most importantly in this context (study of EMR), the influence quickly diminishes with the distance: it varies inversely as the square of the distance (i.e. it varies as the inverse square).

Coulomb’s Law is not all that comes out of Maxwell’s field equations. Maxwell’s equations also cover electrodynamics. Fortunately, because we are, indeed, talking moving charges here, so electrostatics is only part of the picture and, in fact, the least important one in this case. 🙂 That’s why I wrote E1, with as subscript, above – not E.

So we have a second term, and I’ll actually be introducing a third term in a minute or so. But let’s first look at the second term. I am not sure how Feynman derives it from Maxwell’s equations – I am sure I’ll see the light 🙂 when reading Volume II – but, from Maxwell’s equations, he does, somehow, derive the following, secondary, effect:

Formula 1

This is a term I struggled with in a first read, and I still do. As mentioned above, I need to read Feynman’s Volume II, I guess. But, while I still don’t understand the why, I now understand what this expression catches. The term between brackets is the Coulomb effect, which we mentioned above already, and the time derivative is the rate of change. We multiply that with the time delay (i.e. r’/c). So what’s going on? As Feynman writes it: “Nature seems to be attempting to guess what the field at the present time is going to be, by taking the rate of change and multiplying by the time that is delayed.” 

OK. As said, I don’t really understand where this formula comes from but it makes sense, somehow. As for now, we just need to answer another question in order to understand what’s going on: in what direction is the Coulomb field changing?

It could be either: if the charge is moving along the direction of sight er’ won’t change but r’ will. However, if r’ does not change, then it’s er’ that changes direction, and that change will be perpendicular to the line of sight, or transverse (as opposed to radial), as Feynman puts it. Or, of course, it could be a combination of both. [Don’t worry too much if you’re not getting this: we will need this again in just a minute or so, and then I will also give you a drawing so you’ll see what I mean.]

The point is, these first two terms are actually not important because electromagnetic radiation is given by the third effect, which is written as:

Formula 3

Wow ! This looks even more complicated, doesn’t it? Let’s analyze it. The first thing to note is that there is no r’ or r’2 in this equation. However, that’s an optical illusion of sorts, because r’ does matter when looking at that second-order derivative. How? Well… Let’s go step by step and first look at that second-order derivative. It’s the acceleration (or deceleration) of er’. Indeed, visualize er’ wiggling about, trying to follow the charge by pointing at where the charge was r’/c seconds ago. Let me help you here by, finally, inserting hat drawing I promised you.


This acceleration will have a transverse as well as a radial component: we can imagine the end of er’ (i.e. the point of the arrow) being on the surface of a unit sphere indeed. So as it wiggles about, the tip of the arrow moves back a bit from the tangential line. That’s the radial component of the acceleration. It’s easy to see that it’s quite small as compared to the transverse component, which is the component along the line that’s tangent to the surface (i.e. perpendicular to er’).

Now, we need to watch out: we are not talking displacement or velocity here but acceleration. Hence, even if the displacement of the charge is very small, and even if velocities would not be phenomenal either (i.e. non-relativistic), the acceleration involved can take on any value really. Hence, even with small displacements, we can have large accelerations, so the radial component is small relative to the transverse component only, not in an absolute sense.

That being said, it’s easy to see that both the transverse as well as the radial component depend on the distance r’ but in a different way. I won’t bother you with the geometrical proof (it’s not that obvious). Just accept that the radial component varies, more or less as the inverse square of the distance. Hence, we will simplify and say that we’re considering large distances r’ only – i.e. large in comparison to the length of the unit vector, which just means large in comparison to one (1) – and then it’s only the transverse component of a that matters, which we’ll denote by ax.

However, if we drop that radial component, then we should drop E1 as well, because the Coulomb effect will be very small as compared to the radiation effect (i.e. E3). And, then, if we drop E1, we can drop the ‘correction’ E2 as well, of course. Indeed, that’s what Feynman does. He ends up with this third term only, which he terms the law of radiation:

Formula 4

So there we are. That’s all I wanted to introduce here. But let’s analyze it a bit more. Just to make sure we’re all getting it here.

The dipole radiator

All that simplification business above is tricky, you’ll say. First, why do we write t – r/c for the retarded time (t’)? It should be t – r’/c, no? You’re right. There’s another simplification here: we fix the delay time, assuming that the charge only moves very small distances at an effectively constant distance r. Think of some far-away antenna indeed.

Hmm… But then we have that 1/c2 factor, so that should reduce the effect to zilch, isn’t it? And then… Hey! Wait a minute! Where does that r suddenly come from? Well, we’ve replaced d2er’/dt2 by the lateral acceleration of the charge itself (i.e. its component perpendicular to the line of sight, denoted by ax) divided by r. That’s just similar triangles.

Phew! That’s a lot of simplifications and/or approximations indeed. How do we know this law really works? And, if it does, for what distance? When is that 1/r part (i.e. E3) so large as compared to the other two terms (E1 and E2) that the latter two don’t matter anymore? Well… That seems to depend on the wavelength of the radiation, but we haven’t introduced that concept yet. Let me conclude this first introduction by just noting this ‘law’ can easily be confirmed by experiment.

A so-called dipole oscillator or radiator can be constructed, as shown below: a generator drives electrons up and down in two wires (A and B). Why do we put the generator in the middle? That’s because we want a net effect: the radiation effect of the electrons in the wires connecting the generator with A and B will be neutral, because the electrons move right next to each other in opposite direction. With the generator in the middle, A and B form one antenna, which we’ll denote by G (for generator).

dipole radiator

Now, another antenna can act as a receiver, and we can amplify the signal to hear it. That’s the D (for detector) shown below. Now, one of the consequences of the above ‘law’ for electromagnetic radiation is, obviously, that the strength of the received signal should become weaker as we turn the detector. The strongest signal should be when D is parallel to G. At point 2, there is a projection effect and, hence, the strength of the field should be less. Indeed, remember that the strength of the field is proportional to the acceleration of the charge projected perpendicular to the line of sight. Hence, at point 3, it should be zero, because the projection is zero.

dipole radiator - field

Now, that’s what an experiment like this would indeed confirm. [I am tempted now to explain how a radio receiver works, but I will resist the temptation.]

I just need to make a last point here in order to make sure that we understand the formula above and – more importantly – that we can use in subsequent chapters without having to wonder where it comes from. The formula above implies that the direction of the field is at right angles to the line of sight. Now, if a charge is just accelerating up and down, in a motion of very small amplitude, i.e. like the motion in that antenna, then the magnitude (or strength let’s say) of the field will be given by the following formula:

Formula 5

θ, in this formula, is the angle between the axis of motion and the line of sight, as illustrated below:

Fig 29-1

So… That’s all we need to know for now. We’re done. As for now that is. This was quite technical, I guess, but I am afraid the next post will be even more technical. Sorry for that. I guess this is just a piece we need to get through.

Post scriptum:

You’ll remember that, with moving and accelerating charges, we should also have a magnetic field, usually denoted by B. That’s correct. If we have a changing electric field, then we will also have a magnetic field. There’s a formula for B:

B = –er’´E/c = –| er’||E|c–1sin(er’, En = –(E/c)·n

This is a vector cross-product. The angle between the unit vector er’ and E is π/2, so the sine is one. The vector n is the vector normal to both vectors as defined by the right-hand screw rule. [As for the minus sign, note that –a´b = b´a, so we could have reversed the vectors: the minus sign just reverses the direction of the normal vector.] In short, the magnetic field vector B is perpendicular to E, but its magnitude is tiny: E/c. That’s why Feynman neglects it, but we will come back on that in later posts.