Light: relating waves to photons

Pre-scriptum (dated 26 June 2020): Some of the relevant illustrations in this post were removed as a result of an attack by the dark force. In any case, my ideas on the nature of light and photons have evolved considerably, so you should probably read my papers instead of these old blog posts.

Original post:

This is a concluding note on my ‘series’ on light. The ‘series’ gave you an overview of the ‘classical’ theory: light as an electromagnetic wave. It was very complete, including relativistic effects (see my previous post). I could have added more – there’s an equivalent for four-vectors, for example, when we’re dealing with frequencies and wave numbers: quantities that transform like space and time under the Lorentz transformations – but you got the essence.

One point we never ever touched upon, was that magnetic field vector though. It is there. It is tiny because of that 1/c factor, but it’s there. We wrote it as

All symbols in bold are vectors, of course. The force is another vector vector cross-product: F = qv×B, and you need to apply the usual right-hand screw rule to find the direction of the force. As it turns out, that force – as tiny as it is – is actually oriented in the direction of propagation, and it is what is responsible for the so-called radiation pressure.

So, yes, there is a ‘pushing momentum’. How strong is it? What power can it deliver? Can it indeed make space ships sail? Well… The magnitude of the unit vector e_r’is obviously one, so it’s the values of the other vectors that we need to consider. If we substitute and average F, the thing we need to find is:

〈F〉 = q〈vE〉/c

But the charge q times the field is the electric force, and the force on the charge times the velocity is the work dW/dt being done on the charge. So that should equal the energy absorbed that is being absorbed from the light per second. Now, I didn’t look at that much. It’s actually one of the very few things I left – but I’ll refer you to Feynman’s Lectures if you want to find out more: there’s a fine section on light scattering, introducing the notion of the Thompson scattering cross section, but – as said – I think you had enough as for now. Just note that 〈F〉 = [dW/dt]/c and, hence, that the momentum that light delivers is equal to the energy that is absorbed (dW/dt) divided by c.

So the momentum carried is 1/c times the energy. Now, you may remember that Planck solved the ‘problem’ of black-body radiation – an anomaly that physicists couldn’t explain at the end of the 19th century – by re-introducing a corpuscular theory of light: he said light consisted of photons. We all know that photons are the kind of ‘particles’ that the Greek and medieval corpuscular theories of light envisaged but, well… They have a particle-like character – just as much as they have a wave-like character. They are actually neither, and they are physically and mathematically being described by these wave functions – which, in turn, are functions describing probability amplitudes. But I won’t entertain you with that here, because I’ve written about that in other posts. Let’s just go along with the ‘corpuscular’ theory of photons for a while.

Photons also have energy (which we’ll write as W instead of E, just to be consistent with the symbols above) and momentum (p), and Planck’s Law says how much:

W = hf and p = W/c

So that’s good: we find the same multiplier 1/c here for the momentum of a photon. In fact, this is more than just a coincidence of course: the “wave theory” of light and Planck’s “corpuscular theory” must of course link up, because they are both supposed to help us understand real-life phenomena.

There’s even more nice surprises. We spoke about polarized light, and we showed how the end of the electric field vector describes a circular or elliptical motion as the wave travels to space. It turns out that we can actually relate that to some kind of angular momentum of the wave (I won’t go into the details though – because I really think the previous posts have really been too heavy on equations and complicated mathematical arguments) and that we could also relate it to a model of photons carrying angular momentum, “like spinning rifle bullets” – as Feynman puts it.

However, he also adds: “But this ‘bullet’ picture is as incomplete as the ‘wave’ picture.” And so that’s true and that should be it. And it will be it. I will really end this ‘series’ now. It was quite a journey for me, as I am making my way through all of these complicated models and explanations of how things are supposed to work. But a fascinating one. And it sure gives me a much better feel for the ‘concepts’ that are hastily explained in all of these ‘popular’ books dealing with science and physics, hopefully preparing me better for what I should be doing, and that’s to read Penrose’s advanced mathematical theories.

Some content on this page was disabled on June 20, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Radiation and relativity

Pre-scriptum (dated 26 June 2020): Some of the relevant illustrations in this post were removed as a result of an attack by the dark force. Too bad, because I liked this post. In any case, despite the removal of the illustrations, you should be able to reconstruct the main story line.

Original post:

Now we are really going to do some very serious analysis: relativistic effects in radiation. Fasten your seat belts please.

The Doppler effect for physical waves traveling through a medium

In one of my post – I don’t remember which one – I wrote about the Doppler effect for sound waves, or any physical wave traveling through a medium. I also said it had nothing to do with relativity. What happens really is that the observer sort of catches up with wave or – the other way around: that he falls back – and, because the velocity of a wave in a medium is always the same (that also has nothing to do with relativity: it’s just a general principle that’s true – always), the frequency of the physical wave will appear to be different. [So that’s why a siren of an ambulance sounds different as it moves past you.] Wikipedia has an excellent article on that – so I’ll refer you to that – but so that article does not say anything – or nothing much – about the Doppler effect when electromagnetic radiation is involved. So that’s what we’ll talk about here. Before we do that, though, let me quickly jot down the formula for the Doppler effect for a physical wave traveling through a medium, so we are clear about the differences between the two ‘Doppler effects’:

In this formula, v_p is the propagation speed of the wave in the medium which – as mentioned above – depends on the medium. Now the source and the receiver will have a velocity with respect to the medium as well (positive or negative – depending on whether they’re moving in the same direction as the wave or not), and so that’s v_r (r for receiver) and v_s(s for source) respectively. So we’re adding speeds here to calculate some relative speed and then we take the ratio. Some people think that’s what relativity theory is all about. It’s not. Everything I’ve written so far is just Galilean relativity – so stuff that’s been known for a thousand years already, if not longer.

Relativity is weird: one aspect of it is length contraction – not often discussed – but the other thing is better known: there is no such thing as absolute time, so when we talk velocities, you need to specify according to whose time?

The Doppler effect for electromagnetic radiation

One thing that’s not relative – and which makes things look somewhat similar to what I wrote above – is that the speed of light is always equal to c. That was the startling fact that came out of Maxwell’s equations (startling because it is not consistent with Galilean relativity: it says that we cannot ‘catch up’ with a light wave!) around which Einstein build all of “new physics”, and so it’s something we use rather matter-of-factly in all that we’ll write below.

In all of the preceding posts about light, I wrote – more than once actually – that the movement of the oscillating charge (i.e. the source of the radiation) along the line of sight did not matter: the only thing that mattered was its acceleration in the xy-plane, which is perpendicular to our line of sight, which we’ll call the z-axis. Indeed, let me remind of you of the two equations defining electromagnetic radiation (see my post on light and radiation about a week ago):

The first formula gives the electromagnetic effect that dominates in the so-called wave zone, i.e. a few wavelengths away from the source – because the Coulomb force varies as the inverse of the square of the distance r, unlike this ‘radiation’ effect, which varies inversely as the distance only (E ∝ 1/r), so it falls off much less rapidly.

Now, that general observation still holds when we’re considering an oscillating charge that is moving towards or away from us with some relativistic speed (i.e. a speed getting close enough to c to produce relativistic length contraction and time dilation effects) but, because of the fact that we need to consider local times, our formula for the retarded time is no longer correct.

Huh? Yes. The matter is quite complicated, and Feynman starts with jotting down the derivatives for the displacement in the x- and y-directions, but I think I’ll skip that. I’ll go for the geometrical picture straight away, which is given below. As said, it’s going to be difficult, but try to hang in here, because it’s necessary to understand the Doppler effect in a correct way (I myself have been fooled by quite a few nonsensical explanations of it in popular books) and, as a bonus, you also get to understand synchrotron radiation and other exciting stuff.

So what’s going on here? Well… Don’t look at the illustration above. We’ll come back at it. Let’s first build up the logic. We’ve got a charge moving vertically – from our point that is (the observer), but also moving in and out of us, i.e. in the z-direction. Indeed, note the arrow pointing to the observer: that’s us! So it could indeed be us looking at electrons going round and round and round – at phenomenal speeds – in a synchrotron indeed – but then one that’s been turned 90 degrees (probably easier for us to just lie down on the ground and look sideways). In any case, I hope you can imagine the situation. [If not, try again.] Now, if that charge was not moving at relativistic speeds (e.g. 0.94c, which is actually the number that Feynman uses for the graph above), then we would not have to worry about ‘our’ time t and the ‘local’ time τ. Huh? Local time? Yes.

We denote the time as measured in the reference frame of the moving charge as τ. Hence, as we are counting t = 1, 2, 3 etcetera, the electron is counting τ = 1, 2, 3 etcetera as it goes round and round and round. If the charge would not be moving at relativistic speeds, we’d do the standard thing and that’s to calculate the retarded acceleration of the charge a'(t) = a(t − r/c). [Remember that we used a prime to mark functions for which we should use a retarded argument and, yes, I know that the term ‘retarded’ sounds a bit funny, but that’s how it is. In any case, we’d have a'(t) = a(t − r/c) – so the prime vanishes as we put in the retarded argument.] Indeed, from the ‘law’ of radiation, we know that the field now and here is given by the acceleration of the charge at the retarded time, i.e. t – r/c. To sum it all up, we would, quite simply, relate t and τ as follows:

τ = t – r/c or, what amounts to the same, t = τ + r/c

So the effect that we see now, at time t, was produced at a distance r at time τ = t − r/c. That should make sense. [Again, if it doesn’t: read again. I can’t explain it in any other way.]

The crucial chain in this rather complicated chain of reasoning comes now. You’ll remember that one of the assumptions we used to derive our ‘law’ of radiation was that the assumption that “r is practically constant.” That does no longer hold. Indeed, that electron moving around in the synchrotron comes in and out at us at crazy speeds (0.94c is a bit more than 280,000 km per second), so r goes up and down too, and the relevant distance is not r but r + z(τ). This means the retardation effect is actually a bit larger: it’s not r/c but [r + z(τ)]/c = r/c + z(τ)/c. So we write:

τ = t – r/c – z(τ)/c or, what amounts to the same, t = τ + r/c + z(τ)/c

Hmm… You’ll say this is rather fishy business. Why would we use the actual distance and not the distance a while ago? Well… I’ll let you mull over that. We’ve got two points in space-time here and so they are separated both by distance as well as time. It makes sense to use the actual distance to calculate the actual separation in time, I’d say. If you’re not convinced, I can only refer you to those complicated derivations that Feynman briefly does before introducing this ‘easier’ geometric explanation. This brings us to the next point. We can measure time in seconds, but also in equivalent distance, i.e. light-seconds: the distance that light (remember: always at absolute speed c) travels in one second, i.e. approximately 299,792,458 meter. It’s just another ‘time unit’ to get used to.

Now that’s what’s being done above. To be sure, we get rid of the constant r/c, which is a constant indeed: that amounts to a shift of the origin by some constant (so we start counting earlier or later). In short, we have a new variable ‘t’ really that’s equal to t = τ + z(τ)/c. But we’re going to count time in meter (well – in units of c meter really), so we will just multiply this and we get:

ct = cτ + z(τ)

Why the different unit? Well… We’re talking relativistic speeds here, don’t we? And so the second is just an inappropriate unit. When we’re finished with this example, we’ll give you an even simpler example: a source just moving in on us, with an equally astronomical speed, so the functional shape of z(τ) will be some fraction of c (let’s say kc) times τ, so kcτ. So, to simplify things, just think of it as re-scaling the time axis in units that makes sense as compared to the speeds we are talking about.

Now we can finally analyze that graph on the right-hand side. If we would keep r fixed – so if we’d not care about the charge moving in and out of – the plot of x'(t) – i.e. the retarded position indeed – against ct would yield the sinusoidal graph plotted by the red numbers 1 to 13 here. In fact, instead of a sinusoidal graph, it resembles a normal distribution, but that’s just because we’re looking at one revolution only. In any case, the point to note is that – when everything is said and done – we need to calculate the retarded acceleration, so that’s the second derivative of x'(t). The animated illustration shows how that works: the second derivative (not the first) turns from positive to negative – and vice versa – at inflection points, when the curve goes from convex to concave, and vice versa. So, on the segment of that sinusoidal function marked by the red numbers, it’s positive at first (the slope of the tangent becomes steeper and steeper), then negative (cf. that tangent line turning from blue to green in the illustration below), and then it becomes positive again, as the negative slope becomes less negative at the second inflection point.

That should be straightforward. However, the actual x'(t) curve is the black curve with the cusp. A curve like that is called a hypocycloid. Let me reproduce it once again for ease of reference.

We relate x'(t) to ct (this is nothing but our new unit for t) by noting that ct = cτ + z(τ). Capito? It’s not easy to grasp: the instinct is to just equate t and τ and write x'(t) = x'(τ), but that would not be correct. No. We must measure in ‘our’ time and get a functional form for x’ as a function of t, not of τ. In fact, x'(t) − i.e. the retarded vertical position at time t – is not equal to x'(τ) but to x(τ), i.e. that’s the actual (instead of retarded) position at (local) time τ, and so that’s what the black graph above shows.

I admit it’s not easy. I myself am not getting it in an ‘intuitive’ way. But the logic is solid and leads us where it leads us. Perhaps it helps to think in terms of the curvature of this graph. In fact, we have to think in terms of curvature of this graph in order to understand what’s happening in terms of radiation. When the charge is moving away from us, i.e. during the first three ‘seconds’ (so that’s the 1-2-3 count), we see that the curvature is less than than what it would be – and also doesn’t change very much – if the displacement was given by the sinusoidal function, which means there’s very little radiation (because there’s little acceleration – negative or positive. However, as the charge moves towards us, we get that sharp cusp and, hence, we also get sharp curvature, which results in a sharp pulse of the electric field, rather than the regular – equally sinusoidal – amplitude we’d have if that electron was not moving in and out at us at relativistic speeds. In fact, that’s what synchrotron radiation is: we get these sharp pulses indeed. Feynman shows how they are measured – in very much detail – using a diffraction grating, but that would just be another diversion and so I’ll spare you of that.

Hmm… This has nothing to do with the Doppler effect, you’ll say. Well… Yes and no. The discussion above basically set the stage for that discussion. So let’s turn to that now. However, before I do that, I want to insert another graph for an oscillating charge moving in and out at us in some irregular way – rather than the nice circular route described above.

The Doppler effect

The illustration below is a similar diagram as the ones above – but looks much simpler. It shows what happens when an oscillating charge (which we assume to oscillate at its natural or resonant frequency ω₀) moves towards us at some relativistic speed v (whatever speed – fairly close to c, so the ratio v/c is substantial). Note that the movement is from A to B – and that the observer (we!) are, once again, at the left – and, hence, the distance traveled is AB = vτ. So what’s the impact on the frequency? That’s shown on the x'(t) graph on the right: the curvature of the sinusoidal motion is much sharper, which means that its angular frequency as we see or measure it (and we’ll denote that by ω₁) will be higher: if it’s a larger object emitting ‘white’ light (i.e. a mix of everything), then the light will not longer be ‘white’ but it will have shifted towards the violet spectrum. If it moves away from us, it will appear ‘more red’.

What’s the frequency change? Well… The z(τ) function is rather simple here: z(τ) = vτ. Let’s use fand f₀for a moment, instead of the angular frequency ωand ω₀, as we know they only differ by the factor 2π (ω = 2π/T = 2π·f, with f = 1/T, i.e. the reciprocal of the period). Hence, in a given time Δτ, the number of oscillations will be f₀Δτ. These oscillations will be spread over a distance vΔτ, and the time needed to travel that distance is Δτ – of course! For the observer, however, the same number of oscillations now is compressed over a distance (c-v)Δτ. The time needed to travel that distance corresponds to a time interval Δt = (c − v)Δτ/c = (1 − v/c)Δτ. Now, hence, the frequency fwill be equal to f₀Δτ (the number of oscillations) divided by Δt = (1 − v/c)Δτ. Hence, we get this relatively simple equation:

f= f₀/(1 − v/c) and ω= ω₀/(1 − v/c)

Is that it? It’s not quite the same as the formula we had for the Doppler effect of physical waves traveling through a medium, but it’s simple enough indeed. And it also seems to use relative speed. Where’s the Lorentz factor? Why did we need all that complicated machinery?

You are a smart ass ! You’re right. In fact, this is exactly the same formula: if we equal the speed of propagation with c, set the velocity of the receiver to zero, and substitute v (with a minus sign obviously) for the speed of the source, then we get what we get above:

The thing we need to add is that the natural frequency of an atomic oscillator is not the same as that measured when standing still: the time dilation effects kicks in. If w₀ is the ‘true’ natural frequency (so measured locally, so to say), then the modified natural frequency – as corrected for the time dilation effect – will be w₁= w₀(1 – v²/c²)^1/2. Therefore, the grand final relativistic formula for the Doppler effect for electromagnetic radiation is:

You may feel cheated now: did you really have to suffer going through that story on the synchrotron radiation to get the formula above? I’d say: yes and no. No, because you could be happy with the Doppler formula alone. But, yes, because you don’t get the story about those sharp pulses just from the relativistic Doppler formula alone. So the final answer is: yes. I hope you felt it was worth the suffering 🙂

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Loose ends: on energy of radiation and polarized light

Original post:

I said I would move on to another topic, but let me wrap up some loose ends in this post. It will say a few things about the energy of a field; then it will analyze these electron oscillators in some more detail; and, finally, I’ll say a few words about polarized light.

The energy of a field

You may or may not remember, from our discussions on oscillators and energy, that the total energy in a linear oscillator is a constant sum of two variables: the kinetic energy mv²/2 and the potential energy (i.e. the energy stored in the spring as it expands and contracts) kx²/2 (remember that the force is -kx). So the kinetic energy is proportional to the square of the velocity, and the potential energy to the square of the displacement. Now, from the general solution that we had obtained for a linear oscillator – damped or not – we know that the displacement x, its velocity dx/dt, and even its acceleration are all proportional to the magnitude of the field – with different factors of proportionality of course. Indeed, we have x = q_eE₀eⁱ^ωt/m(ω₀²–ω²), and so every time we take a derivative, we’ll be bring a iω factor down (and so we’ll have another factor of proportionality), but the E₀ factor is still the same, and a factor of proportionality multiplied with some constant is still a factor of proportionality. Hence, the energy should be proportional to the square of the amplitude of the motion E₀. What more can we say about it?

The first thing to note is that, for a field emanating from a point source, the magnitude of the field vector E will vary inversely with r. That’s clear from our formula for radiation:

Hence, the energy that the source can deliver will vary inversely as the square of the distance. That implies that the energy we can take out of a wave, within a given conical angle, will always be the same, not matter how far away we are. What we have is an energy flux spreading over a greater and greater effective area. That’s what’s illustrated below: the energy flowing within the cone OABCD is independent of the distance r at which it is measured.

However, these considerations do not answer the question: what is that factor of proportionality? What’s its value? What does it depend on?

We know that our formula for radiation is an approximate formula, but it’s accurate for what is called the “wave zone”, i.e. for all of space as soon as we are more than a few wavelengths away from the source. Likewise, Feynman derives an approximate formula only for the energy carried by a wave using the same framework that was used to derive the dispersion relation. It’s a bit boring – and you may just want to go to the final result – but, well… It’s kind of illustrative of how physics analyzes physical situations and derives approximate formulas to explain them.

Let’s look at that framework again: we had a wave coming in, and then a wave being transmitted. In-between, the plate absorbed some of the energy, i.e. there was some damping. The situation is shown below, and the exact formulas were derived in the previous post.

Now, we can write the following energy equation for a unit area:

Energy in per second = energy out per second + work done per second

That’s simple, you’ll say. Yes, but let’s see where we get with this. For the energy that’s going in (per second), we can write that as α〈E_s²〉, so that’s the averaged square of the amplitude of the electric field emanating from the source multiplied by a factor α. What factor α? Well… That’s exactly what we’re trying to find out: be patient.

For the energy that’s going out per second, we have α〈E_s² + E_a²〉. Why the same α? Well… The transmitted wave is traveling through the same medium as the incoming wave (air, most likely), so it should be the same factor of proportionality. Now, α〈E_s² + E_a²〉 = α[〈E_s²〉 + 2〈E_s〉〈E_a〉 + 〈E_a²〉]. However, we know that we’re looking at a very thin plate here only, and so the amplitude E_a must be small as compared to E_a. So we can leave its averaged square 〈E_a²〉 value out. Indeed, as mentioned above, we’re looking at an approximation here: any term that’s proportional with NΔz, we’ll leave in (and so we’ll leave 〈E_s〉〈E_a〉 in), but terms that are proportional to (NΔz)² or a higher power can be left out. [That’s, in fact, also the reason why we don’t bother to analyze the reflected wave.]

So we now have the last term: the work done per second in the plate. Work done is force times distance, and so the work done per second (i.e. the power being delivered) is the force times the velocity. [In fact, we should do a dot product but the force and the velocity point are along the same direction – except for a possible minus sign – and so that’s alright.] So, for each electron oscillator, the work done per second will be 〈q_eE_sv〉 and, hence, for a unit area, we’ll have NΔzq_e〈E_sv〉. So our energy equation becomes:

α〈E_s²〉 = α〈E_s²〉 + 2α〈E_s〉〈E_a〉 + NΔzq_e〈E_sv〉

⇔ –2α〈E_s〉〈E_a〉 = NΔzq_e〈E_sv〉

Now, we had a formula for E_a (we didn’t do the derivation of this one though: just accept it):

We can substitute this in the energy equation, noting that the average of E_a is not dependent from time. So the left-hand side of our energy equation becomes:

However, E_s(at z) is E_s(at atoms) retarded by z/c, so we can insert the same argument. But then, now that we’ve made sure that we got the same argument for E_s and v, we know that such average is independent of time and, hence, it will be equal to the 〈E_sv〉 factor on the right-hand side of our energy equation, which means this factor can be scrapped. The NΔzq_e (and that 2 in the numerator and denominator) can be scrapped as well, of course. We then get the remarkably simple result that

α = ε₀c

Hence, the energy carried in an electric wave per unit area and per unit time, which is also referred to as the intensity of the wave, equals:

〈S〉 = ε₀c〈E〉

The rate of radiation of energy

Plugging our formula for radiation above into this formula, we get an expression for the power per square meter radiated in the direction q:

In this formula, a’ is, of course, the retarded acceleration, i.e. the value of a at point t – r/c. The formula makes it clear that the power varies inversely as the square of the distance, as it should, from what we wrote above. I’ll spare you the derivation (you’ve had enough of these derivations, I am sure), but we can use this formula to calculate the total energy radiated in all directions, by integrating the formula over all directions. We get the following general formula:

This formula is no longer dependent on the distance r – which is also in line with what we said above: in a given cone, the energy flux is the same. In this case, the ‘cone’ is actually a sphere around the oscillating charge, as illustrated below.

Now, we usually assume we have a nice sinusoidal function for the displacement of the charge and, hence, for the acceleration, so we’ll often assume that the acceleration a equals a = –ω²x₀e^iω^t. In that case, we can average over a cycle (note that the average of a cosine is one-half) and we get:

Now, historically, physicists used a value written as e², not to be confused with the transcendental number e, equal to e² = q_e²/4πe₀, which – when inserted above – yields the older form of the formula above:

P = 2e²a²/3c³

In fact, we actually worked with that e² factor already, when we were talking about potential energy and calculated the potential energy between a proton and an electron at distance r: that potential energy was equal to e²/r but that was a while ago indeed – and so you’ll probably not remember.

Atomic oscillators

Now, I can imagine you’ve had enough of all these formulas. So let me conclude by giving some actual numbers and values for things. Let’s look at these atomic oscillators and put some values in indeed. Let’s start with calculating the Q of an atomic oscillator.

You’ll remember what the Q of an oscillator is: it is a measure of the ‘quality’ (that’s what the Q stands for really) of a particular oscillator. A high Q implies that, if we ‘hit’ the oscillator, it will ‘ring’ for many cycles, so its decay time will be quite long. It also means that the peak width of its ‘frequency response’ will be quite tall. Huh? The illustrations below will refresh your memory.

The first one (below) gives a very general form for a typical resonance: we have a fixed frequency f₀ (which defines the period T, and vice versa), and so this oscillator ‘rings’ indeed, and slowly dies out. An associated concept is the decay time (τ) of an oscillation: that’s the time it takes for the amplitude of the oscillation to fall by a factor 1/e = 1/2.7182… ≈ 36.8% of the original value.

The second illustration (below) gives the frequency response curve. That assumes there is a continuous driving force, and we know that the oscillator will react to that driving force by oscillating – after an initial transient – at the same frequency driving force, but its amplitude will be determined by (i) the difference between the frequency of the driving force and the oscillator’s natural frequency (f₀) as well as (ii) the damping factor. We will not prove it here, but the ‘peak height’ is equal to the low-frequency response (C) multiplied by the Q of the system, and the peak width is f₀ divided by Q.

But what is the Q for an atomic oscillator? Well… The Q of any system is the total energy content of the oscillator and the work done (or the energy loss) per radian. [If we define it per cycle, then we need to throw an additional 2π factor in – that’s just how the Q has been defined !] So we write:

Q = W/(dW/dΦ)

Now, dW/dΦ = (dW/dt)/(dΦ/dt) = (dW/dt)/ω, so Q = ωW/(dW/dt), which can be re-written as the first-order differential equation dW/dt = -(ω/Q)W. Now, that equation has the general solution

W = W₀e^–^ωt/Q, with W₀ the initial energy.

Using our energy equation – and assuming that our atomic oscillators are radiating at some natural (angular) frequency ω₀, which we’ll relate to the wavelength λ = 2πc/ω₀ – we can calculate the Q. But what do we use for W₀? Well… The kinetic energy of the oscillator is mv²/2. Assuming the displacement x has that nice sinusoidal shape, we get mω²x₀²/4 for the mean kinetic energy, which we have to double to get the total energy (remember that, on average, the total energy of an oscillator is half kinetic, and half potential), so then we get W = mω²x₀²/2. Using m_e (the electron mass) for m, we can then plug it all in, divide and cancel what we need to divide and cancel, and we get the grand result:

Q = Q = ωW/(dW/dt) = 3λm_ec²/4πe² or 1/Q = 4πe²/3λm_ec²

The second form is preferred because it allows substituting e²/m_ec² for yet another ‘historical’ constant, referred to as the classical electron radius r₀ = e²/m_ec² = 2.82×10^–15 m. However, that’s yet another diversion, and I’ll try to spare you here. Indeed, we’re almost done so let’s sprint to the finish.

So all we need now is a value for λ. Well… Let’s just take one: a sodium atom emits light with a wavelength of approximately 600 nanometer. Yes, that’s the yellow-orange light emitted by low-pressure sodium-vapor lamps used for street lighting. So that’s a typical wavelength and we get a Q equal to

Q = 3λ/4πr₀ ≈ 5×10⁷.

So what? Well… This is great ! We can finally calculate things like the decay time now – for our atomic oscillators ! Now, there is a formula for the decay time: τ = 2Q/ω. This is a formula we can also write in terms of the wavelength λ because ω and λ are related through the speed of light: ω = 2πf = 2πc/λ. So we can write τ = Qλ/πc. In this case, we get τ ≈ 3.2×10^–8 seconds (but please do check my calculation). It seems that that corresponds to experimental fact: light, as emitted by all these atomic oscillators, basically consists of very sharp pulses: one atom emits a pulse, and then another one takes over, etcetera. That’s why light is usually unpolarized – I’ll talk about that in a minute.

In addition, we can calculate the peak width Δf = f₀/Q. In fact, we’ll not use frequency but wavelength: Δλ = λ/Q = 1.2×10^–14. This also seems to correspond with the width of the so-called spectral lines of light-emitting sodium atoms.

Isn’t this great? With a few simple formulas, we’ve illustrated the strange world of atomic oscillators and electromagnetic radiation. I’ve covered an awful lot of ground here, I feel.

There is one more “loose end” which I’ll quickly throw in here. It’s the topic of polarization – as promised – and then we’re done really. I promise. 🙂

Polarization

One of the properties of the ‘law’ of radiation as derived by Feynman is that the direction of the electric field is perpendicular to the line of sight. That’s – quite simply – because it’s only the component ax perpendicular to the line of sight that’s important. So if we have a source – i.e. an accelerating electric charge – moving in and out straight at us, we will not get a signal.

That being said, while the field is perpendicular to the line of sight – which we identify with the z-axis – the field still can have two components and, in fact, it is likely to have two components: an x- and a y-component. We show a beam with such x- and y-component below (so that beam ‘vibrates’ not only up and down but also sideways), and we assume it hits an atom – i.e. an electron oscillator – which, in turn, emits another beam. As you can see from the illustration, the light scattered at right angles to the incident beam will only ‘vibrate’ up and down: not sideways. We call such light ‘polarized’. The physical explanation is quite obvious from the illustration below: the motion of the electron oscillator is perpendicular to the z-direction only and, therefore, any radiation measured from a direction that’s perpendicular to that z-axis must be ‘plane polarized’ indeed.

Light can be polarized in various ways. In fact, if we have a ‘regular’ wave, it will always be polarized. With ‘regular’, we mean that both the vibration in the x- and y-direction will be sinusoidal: the phase may or may not be the same, that doesn’t matter. But both vibrations need to be sinusoidal. In that case, there are two broad possibilities: either the oscillations are ‘in phase’, or they are not. When the x- and y-vibrations are in phase, then the superposition of their amplitudes will look like the examples below. You should imagine here that you are looking at the end of the electric field vector, and so the electric field oscillates on a straight line.

When they are in phase, it means that the frequency of oscillation is the same. Now, that may not be the case, as shown in the examples below. However, even these ‘out of phase’ x- and y-vibrations produce a nice ellipsoidal motion and, hence, such beams are referred to as being ‘elliptically polarized’.

So what’s unpolarized light then? Well… That’s light that’s – quite simply – not polarized. So it’s irregular. Most light is unpolarized because it was emitted by electron oscillators. From what I explained above, you now know that such electron oscillators emit light during a fraction of a second only – the window is of the order of 10^-–8 seconds only actually – so that’s very short indeed (a hundred millionth of a second!). It’s a sharp little pulse basically, quickly followed by another pulse as another atom takes over, and then another and so on. So the light that’s being emitted cannot have a steady phase for more than 10^-8 seconds. In that sense, such light will be ‘out of phase’.

In fact, that’s why two light sources don’t interfere. Indeed, we’ve been talking about interference effects all of the time but you may have noticed 🙂 that – in daily life – the combined intensity of light from two sources is just the sum of the intensities of the two lights: we don’t see interference. So there you are. [Now you will, of course, wonder why physics studies phenomena we don’t observe in daily life – but that’s an entirely different matter, and you would actually not be reading this post if you thought that.]

Now, with polarization, we can explain a number of things that we couldn’t explain before. One of them is birefringence: a material may have a different index of refraction depending on whether the light is linearly polarized in one direction rather than another, which explains why the amusing property of Iceland spar, a crystal that doubles the image of anything seen through it. But we won’t play with that here. You can look that up yourself.

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Refraction and Dispersion of Light

Pre-scriptum (dated 26 June 2020): Some of the relevant illustrations in this post were removed as a result of an attack by the dark force. Too bad, because I liked this post. In any case, despite the removal of the illustrations, I think you will still be able to reconstruct the main story line.

Original post:

In this post, we go right at the heart of classical physics. It’s going to be a very long post – and a very difficult one – but it will really give you a good ‘feel’ of what classical physics is all about. To understand classical physics – in order to compare it, later, with quantum mechanics – it’s essential, indeed, to try to follow the math in order to get a good feel for what ‘fields’ and ‘charges’ and ‘atomic oscillators’ actually represent.

As for the topic of this post itself, we’re going to look at refraction again: light gets dispersed as it travels from one medium to another, as illustrated below.

Dispersion literally means “distribution over a wide area”, and so that’s what happens as the light travels through the prism: the various frequencies (i.e. the various colors that make up natural ‘white’ light) are being separated out over slightly different angles. In physics jargon, we say that the index of refraction depends on the frequency of the wave – but so we could also say that the breaking angle depends on the color. But that sounds less scientific, of course. In any case, it’s good to get the terminology right. Generally speaking, the term refraction (as opposed to dispersion) is used to refer to the bending (or ‘breaking’) of light of a specific frequency only, i.e. monochromatic light, as shown in the photograph below. […] OK. We’re all set now.

$Refraction_photo$

It is interesting to note that the photograph above shows how the monochromatic light is actually being obtained: if you look carefully, you’ll see two secondary beams on the left-hand side (with an intensity that is much less than the central beam – barely visible in fact). That suggests that the original light source was sent through a diffraction grating designed to filter only one frequency out of the original light beam. That beam is then sent through a bloc of transparent material (plastic in this case) and comes out again, but displaced parallel to itself. So the block of plastics ‘offsets’ the beam. So how do we explain that in classical physics?

The index of refraction and the dispersion equation

As I mentioned in my previous post, the Greeks had already found out, experimentally, what the index of refraction was. To be more precise, they had measured the θ₁ and θ₂ – depicted below – for light going from air to water. For example, if the angle in air (θ₁) is 20°, then the angle in the water (θ₂) will be 15°. It the angle in air is 70°, then the angle in the water will be 45°.

$Refraction_at_interface$

Of course, it should be noted that a lot of the light will also be reflected from the water surface (yes, imagine the romance of the image of the moon reflected on the surface of glacial lake while you’re feeling damn cold) – but so that’s a phenomenon which is better explained by introducing probability amplitudes, and looking at light as a bundle of photons, which we will not do here. I did that in previous posts, and so here, we will just acknowledge that there is a reflected beam but not say anything about it.

In any case, we should go step by step, and I am not doing that right now. Let’s first define the index of refraction. It is a number n which relates the angles above through the following relationship, which is referred to as Snell’s Law:

sinθ₁ = n sinθ₂

Using the numbers given above, we get: sin(20°) = n sin(15°), and sin(70°) = n sin(45°), so n must be equal to n = sin(20°)/sin(15°) = sin(70°)/sin(45°) ≈ 1.33. Just for the record, Willibrord Snell was a medieval Dutch astronomer but, according to Wikipedia, some smart Persian, Ibn Sahl, had already jotted this down in a treatise – “On Burning Mirrors and Lenses” – while he was serving the Abbasid court of Baghdad, back in 984, i.e. more than a thousand years ago! What to say? It was obviously a time when the Sunni-Shia divide did not matter, and Arabs and ‘Persians’ were leading civilization. I guess I should just salute the Islamic Golden Age here, regret the time lost during Europe’s Dark Ages and, most importantly, regret where Baghdad is right now ! And, as for the ‘burning’ adjective, it just refers to the fact that large convex lenses can concentrate the sun’s rays to a very small area indeed, thereby causing ignition. [It seems that story about Archimedes burning Roman ships with a ‘death ray’ using mirrors – in all likelihood: something that did not happen – fascinated them as well.]

But let’s get back at it. Where were we? Oh – yes – the refraction index. It’s (usually) a positive number written as n = 1 + some other number which may be positive or negative, and which depends on the properties of the material. To be more specific, it depends on the resonant frequencies of the atoms (or, to be precise, I should say: the resonant frequencies of the electrons bound by the atom, because it’s the charges that generate the radiation). Plus a whole bunch of natural constants that we have encountered already, most of which are related to electrons. Let me jot down the formula – and please don’t be scared away now (you can stop a bit later, but not now 🙂 please):

N is just the number of charges (electrons) per unit volume of the material (e.g. the water, or that block of plastic), and q_e and m are just the charge and mass of the electron. And then you have that electric constant once again, ε₀, and… Well, that’s it ! That’s not too terrible, is it? So the only variables on the right-hand side are ω₀ and ω, so that’s (i) the resonant frequency of the material (or the atoms – well, the electrons bound to the nucleus, to be precise, but then you know what I mean and so I hope you’ll allow me to use somewhat less precise language from time to time) and (ii) the frequency of the incoming light.

The equation above is referred to as the dispersion relation. It’s easy to see why: it relates the frequency of the incoming light to the index of refraction which, in turn, determinates that angle θ. So the formula does indeed determine how light gets dispersed, as a function of the frequencies in it, by some medium indeed (glass, air, water,…).

So the objective of this post is to show how we can derive that dispersion relation using classical physics only. As usual, I’ll follow Feynman – arguably the best physics teacher ever. 🙂 Let me warn you though: it is not a simple thing to do. However, as mentioned above, it goes to the heart of the “classical world view” in physics and so I do think it’s worth the trouble. Before we get going, however, let’s look at the properties of that formula above, and relate it some experimental facts, in order to make sure we more or less understand what it is that we are trying to understand. 🙂

First, we should note that the index of refraction has nothing to do with transparency. In fact, throughout this post, we’ll assume that we’re looking at very transparent materials only, i.e. materials that do not absorb the electromagnetic radiation that tries to go through them, or only absorb it a tiny little bit. In reality, we will have, of course, some – or, in the case of opaque (i.e. non-transparent) materials, a lot – of absorption going on, but so we will deal with that later. So, let me repeat: the index of refraction has nothing to do with transparency. A material can have a (very) high index of refraction but be fully transparent. In fact, diamond is a case in point: it has one of the highest indexes of refraction (2.42) of any material that’s naturally available, but it’s – obviously – perfectly transparent. [In case you’re interested in jewellery, the refraction index of its most popular substitute, cubic zirconia, comes very close (2.15-2.18) and, moreover, zirconia actually works better as a prism, so its disperses light better than diamond, which is why it reflects more colors. Hence, real diamond actually sparkles less than zirconia! So don’t be fooled! :-)]

Second, it’s obvious that the index of refraction depends on two variables indeed: the natural, or resonant frequency, ω₀, and the frequency ω, which is the frequency of the incoming light. For most of the ordinary gases, including those that make up air (i.e. nitrogen (78%) and oxygen (21%), plus some vapor (averaging 1%) and the so-called noble gas argon (0.93%) – noble because, just like helium and neon, it’s colorless, odorless and doesn’t react easily), the natural frequencies of the electron oscillators are close to the frequency of ultraviolet light. [The greenhouse gases are a different story – which is why we’re in trouble on this planet. Anyway…] So that’s why air absorbs most of the UV, especially the cancer-causing ultraviolet-C light (UVC), which is formally classified as a carcinogen by the World Health Organization. The wavelength of UVC light is 100 to 300 nanometer – as opposed to visible light, which has a wavelength ranging from 400 to 700 nm – and, hence, the frequency of UV light is in the 1000 to 3000 Teraherz range (1 THz = 10¹² oscillations per second) – as opposed to visible light, which has a frequency in the range of 400 to 800 THz. So, because we’re squaring those frequencies in the formula, ω² can then be disregarded in comparison with ω₀²: for example, 1500² = 2,250,000 and that’s not very different from 1500² – 500² = 2,000,000. Hence, if we leave the ω² out, we are still dividing by a very large number. That’s why n is very close to one for visible light entering the atmosphere from space (i.e. the vacuum). Its value is, in fact, around 1.000292 for incoming light with a wavelength of 589.3 nm (the odd value is the mean of so-called sodium D light, a pretty common yellow-orange light (street lights!), so that’s why it’s used as a reference value – however, don’t worry about it).

That being said, while the n of air is close to one for all visible light, the index is still slightly higher for blue light as compared to red light, and that’s why the sky is blue, except in the morning and evening, when it’s reddish. Indeed, the illustration below is a bit silly, but it gives you the idea. [I took this from http://mathdept.ucr.edu/ so I’ll refer you to that for the full narrative on that. :-)]

Where are we in this story? Oh… Yes. Two frequencies. So we should also note that – because we have two frequency variables – it also makes sense to talk about, for instance, the index of refraction of graphite (i.e. carbon in its most natural occurrence, like in coal) for x-rays. Indeed, coal is definitely not transparent to visible light (that has to do with the absorption phenomenon, which we’ll discuss later) but it is very ‘transparent’ to x-rays. Hence, we can talk about how graphite bends x-rays, for example. In fact, the frequency of x-rays is much higher than the natural frequency of the carbon atoms and, hence, in this case we can neglect the w₀² factor, so we get a denominator that is negative (because only the -w² remains relevant), so we get a refraction index that is (a bit) smaller than 1. [Of course, our body is transparent to x-rays too – to a large extent – but in different degrees, and that’s why we can take x-ray photographs of, for example, a broken rib or leg.]

OK. […] So that’s just to note that we can have a refraction index that is smaller than one and that’s not ‘anomalous’ – even if that’s a historical term that has survived.

Finally, last but not least as they say, you may have heard that scientists and engineers have managed to construct so-called negative index metamaterials. That matter is (much) more complicated than you might think, however, and so I’ll refer you to the Web if you want to find out more about that.

Light going through a glass plate: the classical idea

OK. We’re now ready to crack the nut. We’ll closely follow my ‘Great Teacher’ Feynman (Lectures, Vol. I-31) as he derives that formula above. Let me warn you again: the narrative below is quite complicated, but really worth the trouble – I think. The key to it all is the illustration below. The idea is that we have some electromagnetic radiation emanating from a far-away source hitting a glass plate – or whatever other transparent material. [Of course, nothing is to scale here: it’s just to make sure you get the theoretical set-up.]

So, as I explained in my previous post, the source creates an oscillating electromagnetic field which will shake the electrons up and down in the glass plate, and then these shaking electrons will generate their own waves. So we look at the glass as an assembly of little “optical-frequency radio stations” indeed, that are all driven with a given phase. It creates two new waves: one reflecting back, and one modifying the original field.

Let’s be more precise. What do we have here? First, we have the field that’s generated by the source, which is denoted by E_s above. Then we have the “reflected” wave (or field – not much difference in practice), so that’s E_b. As mentioned above, this is the classical theory, not the quantum-electrodynamical one, so we won’t say anything about this reflection really: just note that the classical theory acknowledges that some of the light is effectively being reflected.

OK. Now we go to the other side of the glass. What do we expect to see there? If we would not have the glass plate in-between, we’d have the same E_s field obviously, but so we don’t: there is a glass plate. 🙂 Hence, the “transmitted” wave, or the field that’s arriving at point P let’s say, will be different than E_s. Feynman writes it as E_s + E_a.

Hmm… OK. So what can we say about that? Not easy…

The index of refraction and the apparent speed of light in a medium

Snell’s Law – or Ibn Sahl’s Law – was re-formulated, by a 17^th century French lawyer with an interesting in math and physics, Pierre de Fermat, as the Principle of Least Time. It is a way of looking at things really – but it’s very confusing actually. Fermat assumed that light traveling through a medium (water or glass, for instance) would travel slower, by a certain factor n, which – indeed – turns out to be the index of refraction. But let’s not run before we can walk. The Principle is illustrated below. If light has to travel from point S (the source) to point D (the detector), then the fastest way is not the straight line from S to D, but the broken S-L-D line. Now, I won’t go into the geometry of this but, with a bit of trial and error, you can verify for yourself that it turns out that the factor n will indeed be the same factor n as the one which was ‘discovered’ by Ibn Sahl: sinθ₁ = n sinθ₂.

What we have then, is that the apparent speed of the wave in the glass plate that we’re considering here will be equal to v = c/n. The apparent speed? So does that mean it is not the real speed? Hmm… That’s actually the crux of the matter. The answer is: yes and no. What? An ambiguous answer in physics? Yes. It’s ambiguous indeed. What’s the speed of a wave? We mentioned above that n could be smaller than one. Hence, in that case, we’d have a wave traveling faster than the speed of light. How can we make sense of that?

We can make sense of that by noting that the wave crests or nodes may be traveling faster than c, but that the wave itself – as a signal – cannot travel faster than light. It’s related to what we said about the difference between the group and phase velocity of a wave. The phase velocity – i.e. the nodes, which are mathematical points only – can travel faster than light, but the signal as such, i.e. the wave envelope in the illustration below, cannot.

What is happening really is the following. A wave will hit one of these electron oscillators and start a so-called transient, i.e. a temporary response preceding the ‘steady state’ solution (which is not steady but dynamic – confusing language once again – so sorry!). So the transient settles down after a while and then we have an equilibrium (or steady state) oscillation which is likely to be out of phase with the driving field. That’s because there is damping: the electron oscillators resist before they go along with the driving force (and they continue to put up resistance, so the oscillation will die out when the driving force stops!). The illustration below shows how it works for the various cases:

In case (b), the phase of the transmitted wave will appear to be delayed, which results in the wave appearing to travel slower, because the distance between the wave crests, i.e. the wavelength λ, is being shortened. In case (c), it’s the other way around: the phase appears to be advanced, which translated into a bigger distance between wave crests, or a lengthening of the wavelength, which translates into an apparent higher speed of the transmitted wave.

So here we just have a mathematical relationship between the (apparent) speed of a wave and its wavelength. The wavelength is the (apparent) speed of the wave (that’s the speed with which the nodes of the wave travel through space, or the phase velocity) divided by the frequency: λ = v_p/f. However, from the illustration above, it is obvious that the signal, i.e. the start of the wave, is not earlier – or later – for either wave (b) and (c). In fact, the start of the wave, in time, is exactly the same for all three cases. Hence, the electromagnetic signal travels at the same speed c, always.

While this may seem obvious, it’s quite confusing, and therefore I’ll insert one more illustration below. What happens when the various wave fronts of the traveling field hit the glass plate (coming from the top-left hand corner), let’s say at time t = t₀, as shown below, is that the wave crests will have the same spacing along the surface. That’s obvious because we have a regular wave with a fixed frequency and, hence, a fixed wavelength λ₀, here. Now, these wave crests must also travel together as the wave continues its journey through the glass, which is what is shown by the red and green arrows below: they indicate where the wave crest is after one and two periods (T and 2T) respectively.

To understand what’s going on, you should note that the frequency f of the wave that is going through the glass sheet and, hence, its period T, has not changed. Indeed, the driven oscillation, which was illustrated for the two possible cases above (n > 1 and n < 1), after the transient has settled down, has the same frequency (f) as the driving source. It must. Always. That being said, the driven oscillation does have that phase delay (remember: we’re in the (b) case here, but we can make a similar analysis for the (c) case). In practice, that means that the (shortest) distance between the crests of the wave fronts at time t = t₀ and the crests at time t₀ + T will be smaller. Now, the (shortest) distance between the crests of a wave is, obviously, the wavelength divided by the frequency: λ = v_p/f, with v_p the speed of propagation, i.e. the phase velocity, of the wave, and f = 1/T. [The frequency f is the reciprocal of the period T – always. When studying physics, I found out it’s useful to keep track of a few relationships that hold always, and so this is one of them. :-)]

Now, the frequency is the same, but so the wavelength is shortened as the wave travels through the various layers of electron oscillators, each causing a delay of phase – and, hence, a shortening of the wavelength, as shown above. But, if f is the same, and the wavelength is shorter, then v_p cannot be equal to the speed of the incoming light, so v_p ≠ c. The apparent speed of the wave traveling through the glass, and the associated shortening of the wavelength, can be calculated using Snell’s Law. Indeed, knowing that n ≈ 1.33, we can calculate the apparent speed of light through the glass as v = c/n ≈ 0.75c and, therefore, we can calculate the wavelength of the wave in the glass l as λ = 0.75λ₀.

OK. I’ve been way too lengthy here. Let’s sum it all up:

The field in the glass sheet must have the shape that’s depicted above: there is no other way. So that means the direction of ‘propagation’ has been changed. As mentioned above, however, the direction of propagation is a ‘mathematical’ property of the field: it’s not the speed of the ‘signal’.
Because the direction of propagation is normal to the wave front, it implies that the bending of light rays comes about because the effective speed of the waves is different in the various materials or, to be even more precise, because the electron oscillators cause a delay of phase.
While the speed and direction of propagation of the wave, i.e. the phase velocity, accurately describes the behavior of the field, it is not the speed with which the signal is traveling (see above). That is why it can be larger or smaller than c, and so it should not raise any eyebrow. For x-rays in particular, we have a refractive index smaller than one. [It’s only slightly less than one, though, and, hence, x-ray images still have a very good resolution. So don’t worry about your doctor getting a bad image of your broken leg. 🙂 In case you want to know more about this: just Google x-ray optics, and you’ll find loads of information. :-)]

Calculating the field

Are you still there? Probably not. If you are, I am afraid you won’t be there ten or twenty minutes from now. Indeed, you ain’t done nothing yet. All of the above was just setting the stage: we’re now ready for the pièce de résistance, as they say in French. We’re back at that illustration of the glass plate and the various fields in front and behind the plate. So we have electron oscillators in the glass plate. Indeed, as Feynman notes: “As far as problems involving light are concerned, the electrons behave as though they were held by springs. So we shall suppose that the electrons have a linear restoring force which, together with their mass m, makes them behave like little oscillators, with a resonant frequency ω₀.”

So here we go:

1. From everything I wrote about oscillators in previous posts, you should remember that the equation for this motion can be written as m[d²x/dt²+ ω₀²) = F. That’s just Newton’s Law. Now, the driving force F comes from the electric field and will be equal to F = q_eE_s.

Now, we assume that we can chose the origin of time (i.e. the moment from which we start counting) such that the field E_s = E₀cos(ωt). To make calculations easier, we look at this as the real part of a complex function E_s = E₀eⁱ^ωt. So we get:

m[d²x/dt²+ ω₀²] = q_eE₀eⁱ^ωt

We’ve solved this before: its solution is x = x₀eⁱ^ωt. We can just substitute this in the equation above to find x₀ (just substitute and take the first- and then second-order derivative of x indeed): x₀ = q_eE₀/m(ω₀²-ω²). That, then, gives us the first piece in this lengthy derivation:

x = q_eE₀eⁱ^ωt/m(ω₀²-ω²)

Just to make sure you understand what we’re doing: this piece gives us the motion of the electrons in the plate. That’s all.

2. Now, we need an equation for the field produced by a plane of oscillating charges, because that’s what we’ve got here: a plate or a plane of oscillating charges. That’s a complicated derivation in its own, which I won’t do there. I’ll just refer to another chapter of Feynman’s Lectures (Vol. I-30-7) and give you the solution for it (if I wouldn’t do that, this post would be even longer than it already is):

This formula introduces just one new variable, η, which is the number of charges per unit area of the plate (as opposed to N, which was the number of charges per unit volume in the plate), so that’s quite straightforward. Less straightforward is the formula itself: this formula says that the magnitude of the field is proportional to the velocity of the charges at time t – z/c, with z the shortest distance from P to the plane of charges. That’s a bit odd, actually, but so that’s the way it comes out: “a rather simple formula”, as Feynman puts it.

In any case, let’s use it. Differentiating x to get the velocity of the charges, and plugging it into the formula above yields:

Note that this is only E_a, the additional field generated by the oscillating charges in the glass plate. To get the total electric field at P, we still have to add E_s, i.e. the field generated by the source itself. This may seem odd, because you may think that the glass plate sort of ‘shields’ the original field but, no, as Feynman puts it: “The total electric field in any physical circumstance is the sum of the fields from all the charges in the universe.”

3. As mentioned above, z is the distance from P to the plate. Let’s look at the set-up here once again. The transmitted wave, or E_{after the plate} as we shall note it, consists of two components: E_s and E_a. E_s here will be equal to (the real part of) E_s = E₀eⁱ^ω(t^-z/c). Why t – z/c instead of just t? Well… We’re looking at E_s here as measured in P, not at E_s at the glass plate itself.

Now, we know that the wave ‘travels slower’ through the glass plate (in the sense that its phase velocity is less, as should be clear from the rather lengthy explanation on phase delay above, or – if n would be greater than one – a phase advance). So if the glass plate is of thickness Δz, and the phase velocity is is v = c/n, then the time it will take to travel through the glass plate will be Δz/(c/n) instead of Δz/c (speed is distance divided by time and, hence, time = distance divided by speed). So the additional time that is needed is Δt = Δz/(c/n) – Δz/c = nΔz/c – Δz/c = (n-1)Δz/c. That, then, implies that E_{after the plate} is equal to a rather monstrously looking expression:

E_{after plate} = E₀eⁱ^ω^[t^–⁽ⁿ^–^1)Δ^z/c^–^z/c) = e^–ⁱ^ω⁽ⁿ^–^1)Δ^z/c)E₀eⁱ^ω^(t^–^z/c)

We get this by just substituting t for t – Δt.

So what? Well… We have a product of two complex numbers here and so we know that this involves adding angles – or substracting angles in this case, rather, because we’ve got a minus sign in the exponent of the first factor. So, all that we are saying here is that the insertion of the glass plate retards the phase of the field with an amount equal to w(n-1)Δz/c. What about that sum E_{after the plate} = E_s + E_a that we were supposed to get?

Well… We’ll use the formula for a first-order (linear) approximation of an exponential once again: e^x ≈ 1 + x. Yes. We can do that because Δz is assumed to be very small, infinitesimally small in fact. [If it is not, then we’ll just have to assume that the plate consists of a lot of very thin plates.] So we can write that e^–i^ω(n^-1)^Δz/c) = 1 – iω(n-1)Δz/c, and then we, finally, get that sum we wanted:

E_{after plate} = E₀eⁱ^ω^[t^–^z/c)− iω(n-1)Δz·E₀eⁱ^ω^(t^–^z/c)/c

The first term is the original E_s field, and the second term is the E_a field. Geometrically, they can be represented as follows:

Why is E_a perpendicular to E_s? Well… Look at the –i = 1/i factor. Multiplication with –i amounts to a clockwise rotation by 90°, and then just note that the magnitude of the vector must be small because of the ω(n-1)Δz/c factor.

4. By now, you’ve either stopped reading (most probably) or, else, you wonder what I am getting at. Well… We have two formulas for E_a now:

and E_a = – iω(n-1)Δz·E₀eⁱ^ω(t^{– z/c)}/c

Equating both yields:

But η, the number of charges per unit area, must be equal to NΔz, with N the number of charges per unit volume. Substituting and then cancelling the Δz finally gives us the formula we wanted, and that’s the classical dispersion relation whose properties we explored above:

Absorption and the absorption index

The model we used to explain the index of refraction had electron oscillators at its center. In the analysis we did, we did not introduce any damping factor. That’s obviously not correct: it means that a glass plate, once it had illuminated, would continue to emit radiation, because the electrons would oscillate forever. When introducing damping, the denominator in our dispersion relation becomes m(ω₀² – ω² + iγω), instead of m(ω₀² – ω²). We derived this in our posts on oscillators. What it means is that the oscillator continues to oscillate with the same frequency as the driving force (i.e. not its natural frequency) – so that doesn’t change – but that there is an envelope curve, ensuring the oscillation dies out when the driving force is no longer being applied. The γ factor is the damping factor and, hence, determines how fast the damping happens.

We can see what it means by writing the complex index of refraction as n = n’ – in’’, with n’ and n’’ real numbers, describing the real and imaginary part of n respectively. Putting that complex n in the equation for the electric field behind the plate yields:

E_{after plate} = e^–^ω^n’’^Δ^z/ce^–ⁱ^ω^(n’^–^1)Δ^z/cE₀eⁱ^ω(t^–^z/c)

This is the same formula that we had derived already, but so we have an extra exponential factor: e^–^ωn’’^Δz/c. It’s an exponential factor with a real exponent, because there were two i‘s that cancelled. The e^-x function has a familiar shape (see below): e^-x is 1 for x = 0, and between 0 and 1 for any value in-between. That value will depend on the thickness of the glass sheet. Hence, it is obvious that the glass sheet weakens the wave as it travels through it. Hence, the wave must also come out with less energy (the energy being proportional to the square of the amplitude). That’s no surprise: the damping we put in for the electron oscillators is a friction force and, hence, must cause a loss of energy.

Note that it is the n’’ term – i.e. the imaginary part of the refractive index n – that determines the degree of absorption (or attenuation, if you want). Hence, n’’ is usually referred to as the “absorption index”.

The complete dispersion relation

We need to add one more thing in order to get a fully complete dispersion relation. It’s the last thing: then we have a formula which can really be used to describe real-life phenomena. The one thing we need to add is that atoms have several resonant frequencies – even an atom with only one electron, like hydrogen ! In addition, we’ll usually want to take into account the fact that a ‘material’ actually consists of various chemical substances, so that’s another reason to consider more than one resonant frequency. The formula is easily derived from our first formula (see the previous post), when we assumed there was only one resonant frequency. Indeed, when we have N_k electrons per unit of volume, whose natural frequency is ω_k and whose damping factor is γ_k, then we can just add the contributions of all oscillators and write:

The index described by this formula yields the following curve:

So we have a curve with a positive slope, and a value n > 1, for most frequencies, except for a very small range of ω’s for which the slope is negative, and for which the index of refraction has a value n < 1. As Feynman notes, these ω’s– and the negative slope – is sometimes referred to as ‘anomalous’ dispersion but, in fact, there’s nothing ‘abnormal’ about it.

The interesting thing is the iγ_kω term in the denominator, i.e. the imaginary component of the index, and how that compares with the (real) “resonance term” ω_k²– ω². If the resonance term becomes very small compared to iγ_kω, then the index will become almost completely imaginary, which means that the absorption effect becomes dominant. We can see that effect in the spectrum of light that we receive from the sun: there are ‘dark lines’, i.e. frequencies that have been strongly absorbed at the resonant frequencies of the atoms in the Sun and its ‘atmosphere’, and that allows us to actually tell what the Sun’s ‘atmosphere’ (or that of other stars) actually consists of.

So… There we are. I am aware of the fact that this has been the longest post of all I’ve written. I apologize. But so it’s quite complete now. The only piece that’s missing is something on energy and, perhaps, some more detail on these electron oscillators. But I don’t think that’s so essential. It’s time to move on to another topic, I think.

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Euler’s spiral

Pre-scriptum (dated 26 June 2020): Most of the relevant illustrations in this post were removed as a result of an attack by the dark force. Too bad, because I liked this post. In any case, despite the removal of the illustrations, you should be able to reconstruct the main story line.

Original post:

When talking diffraction, one of the more amusing curves is the curve showing the intensity of light near the edge of a shadow. It is shown below.

Light becomes more intense as we move away from the edge, then it overshoots (so it is brighter than further away), then the intensity wobbles and oscillates, to finally ‘settle’ at the intensity of the light elsewhere.

How do we get a curve like that? We get it through another amusing curve: the Cornu spiral (which was re-named as the Euler spiral for some reason I don’t understand), which we’ve encountered also when adding probability amplitudes. Let me first depict the ‘real’ situation below: we have an opaque object AB, so no light goes through AB itself. However, the light that goes past it, casts a shadow on a screen, which is denoted as QPR here. And so the curve above shows the intensity of the light near the edge of that shadow.

The first weird thing to note is what I said about diffraction of light through a slit (or a hole – in somewhat less respectful language) in my previous post: the diffraction patterns can be explained if we assume that there are sources distributed, with uniform density, across the open holes. This is a deep mystery, which I’ll attempt to explain later. As for now, I can only state what Feynman has to say about it: “Of course, actually there are no sources at the holes. In fact, that is the only place that there are certainly no sources. Nevertheless, we get the correct diffraction pattern by considering the holes to be the only places where there are sources.”

So we do the same here. We assume that we have a series of closely spaced ‘antennas’, or sources, starting from B, up to D, E, C and all the way up to infinity, and so we need to add the contributions – or the waves – from these sources to calculate the intensity at all of the points on the screen. Let’s start with the (random) point P. P defines the inflection point D: we’ll say the phase there is zero (because we can, of course, choose our point in time so as to make it zero). So we’ll associate the contribution from D with a tiny vector (an infinitesimal vector) with angle zero. That is shown below: it’s the ‘flat’ (horizontal) vector pointing straight east at the very center of this so-called Cornu spiral.

Now, in the neighborhood of D, i.e. just below or above point D, the phase difference will be very small, because the distance from those points near D to P will not differ much from the distance between D and P (i.e. the distance DP). However, as h increases, the phase difference will become larger and larger, it will not increase linearly with h but, because of the geometry involved, the path difference – and, hence, the phase difference (remember – from the previous post – that the phase difference was the product of the wave number and the difference in distance) will increase proportionally with the square of h. In fact, using similar triangles once again, we can easily show that this path difference EF can be approximated by EF ≈ h²/s. However, don’t lose sleep if you wouldn’t manage to figure that out. 🙂

The point to note is that, when you look at that spiral above, the angle of each vector that we’re adding, increases more and more, so that’s why we get a spiral, and not a polygon in a circle, such as the one we encountered in our previous post: the phase differences there were linearly proportional and, hence, each vector added a constant angle to the previous one. Likewise, if we go down from D, to the edge B, the angles will decrease. Of course, if we’re adding contributions to get the amplitude or intensity for point P, we will not get any contributions from points below B. The last (or, I should say, the first) contribution that we get is denoted by the vector B_P on that spiral curve, so if we want to get the total contribution, then we have to start adding vectors from there. [Don’t worry: you’ll understand why the other vectors, ‘down south’, are there in a few minutes.]

So we start from B_P and go all the way… Well… You see that, once, we’re ‘up north’, in the center of the upper-most spiral, we’re not adding much anymore, because the additional vectors are just sharply changing direction and going round and round and round. In short, most of the contribution to the amplitude of the resultant vector B_P∞ is given by points near D. Now, we have chosen point P randomly, and you can easily see from that Cornu spiral that the amplitude, or the intensity rather (which is the square of the amplitude) of that vector B_P∞, increases initially, to reach some maximum, depending upon where P is located above B, but then it falls and oscillates indeed, producing the curve with which we started this post.

OK. […] So what else do we have here? Well… That Cornu spiral also shows how we should add arrows to get the intensity at point Q. We’d be adding arrows in the upper-most spiral only and, hence, we would not get much of a total contribution as a result. That’s what marked by vector B_Q. On the other hand, if we’d be adding contributions to calculate the intensity at a point much higher than P, i.e. R, then we’d be using pretty much all of the arrows, down from the spiral ‘south’ all the way up to the spiral ‘north’. So that’s B_R obviously and, as you can see, most of the contribution comes, once again, from points near D, so that’s the points near the edge. [So now you know why we have an infinite number of arrows in both directions: we need to be able to calculate the intensity from any point on the screen really, below or above P.]

OK. What else? Well… Nothing. This is it really − for the moment that is. Just note that we’re not adding probability amplitudes here (unlike what we did a couple of months ago). We’re adding vectors representing something real here: electric field vectors. [As for how ‘real’ they are: I’ll entertain you about that later. :-)]

This was rather short, isn’t it? I hope you liked it because… Well… What will follow is actually much more boring, because it involves a lot more formulas. However, these formulas will help us get where we want to get, and that is to understand – somehow, if only from a classical perspective – why that empty space acts like an array of electromagnetic radiation sources.

Indeed, when everything is said and done, that’s the deep mystery of light really. Really really deep.

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Diffraction gratings

Pre-scriptum (dated 26 June 2020): Some of the relevant illustrations in this post were removed as a result of an attack by the dark force. Too bad, because I liked this post. In any case, despite the removal of the illustrations, I think you will still be able to reconstruct the main story line.

Original post:

Diffraction gratings are fascinating. The iridescent reflections from the grooves of a compact disc (CD), or from oil films, soap bubbles: it is all the same principle (or closely related – to be precise). In my April, 2014 posts, I introduced Feynman’s ‘arrows’ to explain it. Those posts talked about probability amplitudes, light as a bundle of photons, quantum electrodynamics. They were not wrong. In fact, the quantum-electrodynamical explanation is actually the only one that’s 100% correct (as far as we ‘know’, of course). But it is also more complicated than the classical explanation, which just explains light as waves.

To understand the classical explanation, one first needs to understand how electromagnetic waves interfere. That’s easy, you’ll say. It’s all about adding waves, isn’t it? And we have done that before, haven’t we? Yes. We’ve done it for sinusoidal waves. We also noted that, from a math point of view, the easiest way to go about it was to use vectors or complex numbers, and equate the real parts of the complex numbers with the actual physical quantities, i.e. the electric field in this case.

You’re right. Let’s continue to work with sinusoidal waves, but instead of having just two waves, we’ll consider a whole array of sources, because that’s what we’ll need to analyze when analyzing a diffraction grating.

First the simple case: two sources

Let’s first re-analyze the simple situation: two sources – or two dipole radiators as I called them in my previous post. The illustration below gives a top view of two such oscillators. They are separated, in the north-south direction, by a distance d.

Is that realistic? It is for radio waves: the wavelength of a 1 megahertz radio wave is 300 m (remember: λ = c/f). So, yes, we can separate two sources by a distance in the same order of magnitude as the wavelength of the radiation, but, as Feynman writes: “We cannot make little optical-frequency radio stations and hook them up with infinitesimal wires and drive them all with a given phase.”

For light, it will work differently – and we’ll describe how, but not now. As for now, we should continue with our radio waves.

The illustration above assumes that the radiation from the two sources is sinusoidal and has the same (maximum) amplitude A, but that the two sources might be out of phase: we’ll denote the difference by α. Hence, we can represent the radiation emitted by the two sources by the real part of the complex numbers Aeⁱ^ωt and Aeⁱ⁽^{ωt + α}⁾ respectively. Now, we can move our detector around to measure the intensity of the radiation from these two antennas. If we place our detector at some point P, sufficiently far away from the sources, then the angle θ will result in another phase difference, due to the difference in distance from point P to the two oscillators. From simple geometry, we know that this difference will be equal to d·sinθ. The phase difference due to the distance difference will then be equal to the product of the wave number k (i.e. the rate of change of the phase (expressed in radians) with distance, i.e. per meter) and that distance d·sinθ. So the phase difference at arrival (i.e. at point P) would be

Φ₂ – Φ₁ = α + k· d·sinθ = α + (2π/λ)·d·sinθ

That’s pretty obvious, but let’s play a bit with this, in order to make we understand what’s going on. The illustration below gives two examples: α = 0 and α = π.

How do we get these numbers 0, 2 and 4, which indicate the intensity, i.e. the amount of energy that the field carries past per second, which is proportional to the square of the field, averaged in time? [If it would be (visible) light, instead of radio waves, the intensity would be the brightness of the light.]

Well… In the first case, we have α = 0 and d = λ/2 and, hence, at an angle of 30 degrees, we have d·sin(30°) = (λ/2)(1/2) = λ/4. Therefore, Φ₂ – Φ₁ = α + (2π/λ)·d·sinθ = 0 + (2π/λ)·(λ/4) = π/2. So what? Well… Let’s add the waves. We will have some combined wave with amplitude A_R and phase Φ_R:

Now, to calculate the length of this ‘vector’, i.e. the amplitude A_R, we take the product of this complex number and its complex conjugate, and that will give us the length squared, and then we multiply it all out and so on and so on. To make a long story short, we’ll find that

A_R² = A₁² + A₂² + 2A₁A₂cos(Φ₂ – Φ₁)

The last term in this sum is the interference effect, and so that’s equal to zero in the case we’ve been studying above (α = 0, d = λ/2 and θ = 30°), so we get twice the intensity of one oscillator only. The other cases can be worked out in the same way.

Now, you should not think that the pattern is always symmetric, or simple, as the two illustrations below make clear.

With more oscillators, the patterns become even more interesting. The illustration below shows part of the intensity pattern of a six-dipole antenna array:

Let’s look at that now indeed: arrays with n oscillators.

Arrays with n oscillators

If we have six oscillators, like in the illustration above, we have to add something like this:

R = A[cos(ωt) + cos(ωt + Φ) + cos(ωt + 2Φ) + … + cos(ωt + 5Φ)]

From what we wrote above, it is obvious that the phase difference Φ can have two causes: the oscillators may be driven differently in phase, or we may be looking at them at an angle so that there is a difference in time delay. Hence, we have the same formula as the one above:

Φ = α + (2π/λ)·d·sinθ

Now, we have an interesting geometrical approach to finding the net amplitude A_R. We can, once again, consider the various waves as vectors and add them, as shown below.

The length of all vectors is the same (A), and then we have the phase difference, i.e. the different angles: zero for A₁, Φ for A₁, 2Φ for A₂, etcetera. So as we’re adding these vectors, we’re going around and forming an equiangular polygon with n sides, with the vertices (corner points) lying on a circle with radius r. It requires just a bit of trigonometry to establish that the following equality must hold: A = 2rsin(Φ/2). So that fixes r. We also have that the large angle OQT equals nΦ and, hence, A_R = 2rsin(nΦ/2). We can now combine the results to find the following amplitude and intensity formula:

This formula is obvious for n = 1 and for n = 2: it gives us the results which were shown above already. But here we want to know how this thing behaves for large n. It is easy to see that the numerator above, i.e. sin²(nΦ/2), will always be larger than the denominator, sin²(Φ/2), and that both are – obviously – smaller or equal to 1. It can be demonstrated that this function of the angle Φ reaches its maximum value for Φ = 0. Indeed, taking the limit gives us I = I₀n². [We can intuitively see this because, if we express the angle in radians, we can substitute sin(Φ/2) and sin(nΦ/2) for Φ/2 and nΦ/2, and then we can eliminate the (Φ/2)² factor to get n².

It’s a bit more difficult to understand what happens next. If Φ becomes a bit larger, the ratio of the two sines begins to fall off (so it becomes smaller than n²). Note that the numerator, i.e. sin²(nΦ/2), will be equal to one if nΦ/2 = π/2, i.e. if Φ = π/n, and the ratio sin²(nΦ/2)/sin²(Φ/2) then becomes sin²(π/2)/sin²(π/2n) = 1/sin²(π/2n). Again, if we assume that n is (very) large, we can approximate and write that this ratio is more or less equal to 1/(π²/4n²) = 4n²/π². That means that the intensity there will be 4/ π² times the intensity of the beam at the maximum, i.e. 40.53% of it. That’s the point at nΦ/2π = 0.5 on the graph below.

The graph above has a re-scaled vertical as well as a re-scaled horizontal axis. Indeed, instead of I, the vertical axis shows I/n²I₀, so the maximum value is 1. And the horizontal axis does not show Φ but nΦ/2π, so if Φ = π/n, then nΦ/2π = 0.5 indeed. [Don’t worry about the dotted curve: that’s the solid-line curve multiplied by 10: it’s there to make sure you see what’s going on, as this ratio of those sines becomes very small very rapidly indeed.]

So, once we’re past that 40.53% point, we get at our first minimum, which is reached at nΦ/2π = 1 or Φ = 2π/n. The numerator sin²(nΦ/2) equals sin²(π) = 0 there indeed, so the whole ratio becomes zero. Then it goes up again, to our second maximum, which we get when our numerator comes close to one again, i.e. when sin²(nΦ/2) ≈ 1. That happens when nΦ/2 = 3π/2, or Φ = 3π/n. Again, when n is (very) large, Φ will be very small, and so we can substitute the denominator sin²(Φ/2) for Φ²/4. We then get a ratio equal to 1/(9π²/4), or an intensity equal to 4n²I₀/9π², i.e. only 4.5% of the intensity at the (first) maximum. So that’s tiny. [Well… All is relative, of course. :-)] We can go on and on like that but that’s not the point here: the point is that we have a very sharp central maximum with very weak subsidiary maxima on the sides.

But what about that big lobe at 30 degrees on that graph with the six-dipole antenna? Relax. We’re not done yet with this ‘quick’ analysis. Let’s look at the general case from yet another angle, so to say. 🙂

The general case

To focus our minds, we’ve depicted that array with n oscillators below. Once again, we note that the phase difference between two sources, one to the next, will depend on (1) the intrinsic phase difference between them, which we denote by α, and (2) the time delay because we’re observing the system in a given direction q from the normal, which effect we calculated as equal to (2π/λ)·d·sinθ. So the whole effect is Φ = α + (2π/λ)·d·sinθ = a + k·d·sinθ, with k the wave number.

To make things simple, let’s first assume that α = 0. We’re then in the case that we described above: we’ll have a sharp maximum at Φ = 0, so that means θ = 0. It’s easy to see why: all oscillators are in phase and so we have maximum positive (or constructive) interference.

Let’s now examine the first minimum. When looking back at that geometrical interpretation, with the polygon, all the arrows come back to the starting point: we’ve completed a full circle. Indeed, n times Φ gives nΦ = n·2π/n = 2π. So what’s going on here? Well… If we put that value in our formula Φ = α + (2π/λ)·d·sinθ, we get 2π/n = 0 + (2π/λ)·d·sinθ or, getting rid of the 2π factor, n·d·sinθ = λ.

Now, n·d is the total length of the array, i.e. L, and, from the illustration above, we see that n·d·sinλ = L·sinθ = Δ. So we have that n·d·sinθ = λ = Δ. Hence, Δ is equal to one wavelength.That means that the total phase difference between the first and the last oscillator is equal to 2π, and the contributions of all the oscillators in-between are uniformly distributed in phase between 0° and 360°. The net result is a vector A_R with amplitude A_R = 0 and, hence, the intensity is zero as well.

OK, you’ll say, you’re just repeating yourself here. What about the other lobe or lobes? Well… Let’s go back to that maximum. We had it at Φ = 0, but we will also have it at Φ = 2π, and at Φ = 4π, and at Φ = 6π etcetera, etcetera. We’ll have such sharp maximum – the maximum, in fact – at any Φ = m⋅2π, where m is any integer. Now, plugging that into the Φ = α + (2π/λ)·d·sinθ formula (again, assuming that α = 0), we get m⋅2π = (2π/λ)·d·sinθ or d·sinθ = mλ.

While that looks very similar to our n·d·sinθ = λ = Δ condition for the (first) minimum, we’re not looking at that Δ but at that δ angle measured from the individual sources, and so we have δ = Δ/n = mλ. What’s being said here, is that each successive source is out of phase by 360° and, because, being out of phase by 360° obviously means that you’re in phase once again, ensure that all sources are, once again, contributing in phase and produce a maximum that is just as good as the one we had for m = 0. Now, these maxima will also have a (first) minimum described by that other formula above, and so that’s how we get that pattern of lobes with weak ‘side lobes’.

Conditions

Now, the conditions presented above for maxima and minima obviously all depend on the distance d, i.e. the spacing of the array, and the wavelength λ. That brings us to an interesting point: if d is smaller than λ (so if the spacing is smaller than one wavelength), we have (d/λ)·sinθ = m < 1, so we only have one solution for m: m = 0. So we only have on beam in that case, the so-called zero-order beam centered at θ = 0. [Note that we also have a beam in the opposite direction.]

The point to note is that we can only have subsidiary great maxima if the spacing d of the array is greater than the wavelength λ. If we have such subsidiary great maxima, we’ll call them first-order, second-order etcetera beams, according to the value m.

Diffraction gratings

We are now, finally, ready to discuss diffraction gratings. A diffraction grating, in its simplest form, is a plane glass sheet with scratches on it: several hundred grooves, or several thousand even, to the millimeter. That is because the spacing has to be of the same order of magnitude of the wavelength of light, so that’s 400 to 700 nanometer (nm) indeed – with the 400-500 nm range corresponding to violet-blue light, and the (longer) 700+ nm range corresponding to red light. Remember, a nanometer is a billionth of a meter (1´10^-9 m), so even one thousandth of a millimeter is 1000 nanometer, i.e. longer than the wavelength of red light. Of course, from what we wrote above, it is obvious that the spacing d must be wider than the wavelength of interest to cause second- and third-order beams and, therefore, diffraction but, still, the order of magnitude must be the same to produce anything of interest. Isn’t it amazing that scientists were able to produce such diffraction experiments around the turn of the 18^th century already? One of the earliest apparatuses, made in 1785, by the first director of the United States Mint, used hair strung between two finely threaded screws. In any case, let’s go back to the physics of it.

In my previous post, I already noted Feynman’s observation that “we cannot literally make little optical-frequency radio stations and hook them up with infinitesimal wires and drive them all with a given phase.” What happens is something similar to the following set-up, and I’ll quote Feynman again (Vol. I, p. 30-3), just because it’s easier to quote than to paraphrase: “Suppose that we had a lot of parallel wires, equally spaced at a spacing d, and a radio-frequency source very far away, practically at infinity, which is generating an electric field which arrives at each one of the wires at the same phase. Then the external electric field will drive the electrons up and down in each wire. That is, the field which is coming from the original source will shake the electrons up and down, and in moving, these represent new generators. This phenomenon is called scattering: a light wave from some source can induce a motion of the electrons in a piece of material, and these motions generate their own waves.”

When Feynman says “light” here, he means electromagnetic radiation in general. But so what’s happening with visible light? Well… All of the glass in that piece that makes up our diffraction grating scatters light, but so the notches in it scatter differently than the rest of the glass. The light going through the ‘rest of the glass’ goes straight through (a phenomenon which should be explained in itself, but so we don’t do that here), but the notches act as sources and produce secondary or even tertiary beams, as illustrated by the picture below, which shows a flash of light seen through such grating, showing three diffracted orders: the order m = 0 corresponds to a direct transmission of light through the grating, while the first-order beams (m = +1 and m = -1), show colors with increasing wavelengths (from violet-blue to red), being diffracted at increasing angles.

The ‘mechanics’ are very complicated, and the correct explanation in physics involve a good understanding of quantum electrodynamics, which we touched upon in our April, 2014 posts. I won’t do that here, because here we are introducing the so-called classical theory only. This classical theory does away with all of the complexity of a quantum-electrodynamical explanation and replaces it by what is now as the Huygens-Fresnel Principle, which was first formulated in 1678 (!), and which basically states that “every point which a luminous disturbance reaches becomes a source of a spherical wave, and the sum of these secondary waves determines the form of the wave at any subsequent time.”

$500px-Refraction_-_Huygens-Fresnel_principle$

This comes from Wikipedia, as do the illustrations below. It does not only ‘explain’ diffraction gratings, but it also ‘explains’ what happens when light goes through a slit, cf. the second (animated) illustration.

$500px-Refraction_on_an_aperture_-_Huygens-Fresnel_principle$

Now that, light being diffracted as it is going through a slit, is obviously much more mysterious than a diffraction grating – and, you’ll admit, a diffraction grating is already mysterious enough, because it’s rather strange that only certain points in the grating (i.e. the notches) would act as sources, isn’t it? Now, if that’s difficult to understand, it’s even more difficult to understand why an empty space, i.e. a slit, would act as a diffraction grating! However, because this post has become way too long already, we’ll leave this discussion for later.

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Light and radiation

Pre-scriptum (dated 26 June 2020): Most of the relevant illustrations in this post were removed as a result of an attack by the dark force. In any case, you will probably prefer to read how my ideas on the theory of light and matter have evolved. If anything, posts like this document the historical path to them.

Original post:

Introduction: Scale Matters

One of the points which Richard Feynman, as a great physics teacher, does admirably well is to point out why scale matters. In fact, ‘old’ physics are not incorrect per se. It’s just that ‘new’ physics analyzes stuff at a much smaller scale.

For example, Snell’s Law, or Fermat’s Principle of Least Time, which were ‘discovered’ 500 years ago – and they are actually older, because they formalize something that the Greeks had already found out: refraction of light, as it travels from one medium (air, for example) into another (water, for example) – are still fine when studying focusing lenses and mirrors, i.e. geometrical optics. The dimensions of the analysis, or the equipment involved (i.e. the lenses or the mirrors), are huge as compared to the wavelength of the light and, hence, we can effectively look at light as a beam that travels from one point to another in a straight line, that bounces of a surface, or as a beam that gets refracted when it passes from one medium to another.

However, when we let the light pass through very narrow slits, it starts behaving like a wave. Geometrical optics does not help us, then, to understand its behavior: we will, effectively, analyze light as a wave-like thing at that scale, and analyze wave-like phenomena, such as interference, the Doppler effect and what have you. That level of analysis is referred to as the classical theory of electromagnetic radiation, and it’s what we’ll be introducing in this post.

The analysis of light as photons, i.e. as a bunch of ‘particles’ described by some kind of ‘wave function’ (which does not describe any real wave, but only some ‘probability amplitude’), is the third and final level of analysis, referred to as quantum mechanics or, to be more precise, as quantum electrodynamics (QED). [Note the terminology: quantum mechanics describes the behavior of matter particles, such as protons and electrons, while quantum electrodynamics (QED) describes the nature of photons, a force-carrying particle, and their interaction with matter particles.]

But so we’ll focus on the second level of analysis in this post.

Different mathematical approaches

One other thing which Feynman points out in his Lectures is that, even within a well-agreed level of analysis, there are different mathematical approaches to a problem. In fact, while, at any level of analysis, there’s (probably) only one fully mathematically correct analysis, approximate approaches may actually be easier to work with, not only because they actually allow us to solve a practical problem, but also because they help us to understand what’s going on.

Feynman’s treatment of electromagnetic radiation (Volume I, Chapters 28 to 34) is a case in point. While he notes that Maxwell’s field equations are actually the ones to be used, he writes them in a mathematical form that we can understand more easily, and then simplifies that mathematical form even further, in order to derive all that a sophomore student is supposed to know about electromagnetic radiation (EMR), which, of course, not only includes what we call light but also radio waves, radar waves, infrared waves and, on the other side of the spectrum, x-rays and gamma rays.

But let’s get down to business now.

The oscillating charge

Radiation is caused by some far-away electric charge (q) that’s moving in various directions in a non-uniform way, i.e. it is accelerating or decelerating, and perhaps reversing direction in the process. From our point of view (P), we draw a unit vector e_r’ in the direction of the charge. [If you want a drawing, there’s one further down.]

We write r’ (r prime), not r, because it is the retarded distance: when we look at the charge, we see where it was r’/c seconds ago: r’/c is indeed the time that’s needed for some influence to travel from the charge to the here and now, i.e. to P. So now we can write Coulomb’s Law:

E₁ = –qe_r’/4πe₀r’²

This formula can quickly be explained as follows:

The minus sign makes the direction of the force come out alright: like charges do not attract but repel, unlike gravitation. [Indeed, for gravitation, there’s only one ‘charge’, a mass, and masses always attract. Hence, for gravitation, the force law is that like charges attract, but so that’s not the case here.]
E and e_r’ and, hence, the electric force, are all directed along the line of sight.
The Coulomb force is proportional to the amount of charge, and the factor of proportionality is 1/4πe₀r’².
Finally, and most importantly in this context (study of EMR), the influence quickly diminishes with the distance: it varies inversely as the square of the distance (i.e. it varies as the inverse square).

Coulomb’s Law is not all that comes out of Maxwell’s field equations. Maxwell’s equations also cover electrodynamics. Fortunately, because we are, indeed, talking moving charges here, so electrostatics is only part of the picture and, in fact, the least important one in this case. 🙂 That’s why I wrote E₁, with as subscript, above – not E.

So we have a second term, and I’ll actually be introducing a third term in a minute or so. But let’s first look at the second term. I am not sure how Feynman derives it from Maxwell’s equations – I am sure I’ll see the light 🙂 when reading Volume II – but, from Maxwell’s equations, he does, somehow, derive the following, secondary, effect:

This is a term I struggled with in a first read, and I still do. As mentioned above, I need to read Feynman’s Volume II, I guess. But, while I still don’t understand the why, I now understand what this expression catches. The term between brackets is the Coulomb effect, which we mentioned above already, and the time derivative is the rate of change. We multiply that with the time delay (i.e. r’/c). So what’s going on? As Feynman writes it: “Nature seems to be attempting to guess what the field at the present time is going to be, by taking the rate of change and multiplying by the time that is delayed.”

OK. As said, I don’t really understand where this formula comes from but it makes sense, somehow. As for now, we just need to answer another question in order to understand what’s going on: in what direction is the Coulomb field changing?

It could be either: if the charge is moving along the direction of sight e_r’ won’t change but r’ will. However, if r’ does not change, then it’s e_r’ that changes direction, and that change will be perpendicular to the line of sight, or transverse (as opposed to radial), as Feynman puts it. Or, of course, it could be a combination of both. [Don’t worry too much if you’re not getting this: we will need this again in just a minute or so, and then I will also give you a drawing so you’ll see what I mean.]

The point is, these first two terms are actually not important because electromagnetic radiation is given by the third effect, which is written as:

Wow ! This looks even more complicated, doesn’t it? Let’s analyze it. The first thing to note is that there is no r’ or r’² in this equation. However, that’s an optical illusion of sorts, because r’ does matter when looking at that second-order derivative. How? Well… Let’s go step by step and first look at that second-order derivative. It’s the acceleration (or deceleration) of e_r’. Indeed, visualize e_r’ wiggling about, trying to follow the charge by pointing at where the charge was r’/c seconds ago. Let me help you here by, finally, inserting hat drawing I promised you.

This acceleration will have a transverse as well as a radial component: we can imagine the end of e_r’ (i.e. the point of the arrow) being on the surface of a unit sphere indeed. So as it wiggles about, the tip of the arrow moves back a bit from the tangential line. That’s the radial component of the acceleration. It’s easy to see that it’s quite small as compared to the transverse component, which is the component along the line that’s tangent to the surface (i.e. perpendicular to e_r’).

Now, we need to watch out: we are not talking displacement or velocity here but acceleration. Hence, even if the displacement of the charge is very small, and even if velocities would not be phenomenal either (i.e. non-relativistic), the acceleration involved can take on any value really. Hence, even with small displacements, we can have large accelerations, so the radial component is small relative to the transverse component only, not in an absolute sense.

That being said, it’s easy to see that both the transverse as well as the radial component depend on the distance r’ but in a different way. I won’t bother you with the geometrical proof (it’s not that obvious). Just accept that the radial component varies, more or less as the inverse square of the distance. Hence, we will simplify and say that we’re considering large distances r’ only – i.e. large in comparison to the length of the unit vector, which just means large in comparison to one (1) – and then it’s only the transverse component of a that matters, which we’ll denote by a_x.

However, if we drop that radial component, then we should drop E₁ as well, because the Coulomb effect will be very small as compared to the radiation effect (i.e. E₃). And, then, if we drop E₁, we can drop the ‘correction’ E₂ as well, of course. Indeed, that’s what Feynman does. He ends up with this third term only, which he terms the law of radiation:

So there we are. That’s all I wanted to introduce here. But let’s analyze it a bit more. Just to make sure we’re all getting it here.

The dipole radiator

All that simplification business above is tricky, you’ll say. First, why do we write t – r/c for the retarded time (t’)? It should be t – r’/c, no? You’re right. There’s another simplification here: we fix the delay time, assuming that the charge only moves very small distances at an effectively constant distance r. Think of some far-away antenna indeed.

Hmm… But then we have that 1/c² factor, so that should reduce the effect to zilch, isn’t it? And then… Hey! Wait a minute! Where does that r suddenly come from? Well, we’ve replaced d²e_r’/dt² by the lateral acceleration of the charge itself (i.e. its component perpendicular to the line of sight, denoted by a_x) divided by r. That’s just similar triangles.

Phew! That’s a lot of simplifications and/or approximations indeed. How do we know this law really works? And, if it does, for what distance? When is that 1/r part (i.e. E₃) so large as compared to the other two terms (E₁ and E₂) that the latter two don’t matter anymore? Well… That seems to depend on the wavelength of the radiation, but we haven’t introduced that concept yet. Let me conclude this first introduction by just noting this ‘law’ can easily be confirmed by experiment.

A so-called dipole oscillator or radiator can be constructed, as shown below: a generator drives electrons up and down in two wires (A and B). Why do we put the generator in the middle? That’s because we want a net effect: the radiation effect of the electrons in the wires connecting the generator with A and B will be neutral, because the electrons move right next to each other in opposite direction. With the generator in the middle, A and B form one antenna, which we’ll denote by G (for generator).

Now, another antenna can act as a receiver, and we can amplify the signal to hear it. That’s the D (for detector) shown below. Now, one of the consequences of the above ‘law’ for electromagnetic radiation is, obviously, that the strength of the received signal should become weaker as we turn the detector. The strongest signal should be when D is parallel to G. At point 2, there is a projection effect and, hence, the strength of the field should be less. Indeed, remember that the strength of the field is proportional to the acceleration of the charge projected perpendicular to the line of sight. Hence, at point 3, it should be zero, because the projection is zero.

Now, that’s what an experiment like this would indeed confirm. [I am tempted now to explain how a radio receiver works, but I will resist the temptation.]

I just need to make a last point here in order to make sure that we understand the formula above and – more importantly – that we can use in subsequent chapters without having to wonder where it comes from. The formula above implies that the direction of the field is at right angles to the line of sight. Now, if a charge is just accelerating up and down, in a motion of very small amplitude, i.e. like the motion in that antenna, then the magnitude (or strength let’s say) of the field will be given by the following formula:

θ, in this formula, is the angle between the axis of motion and the line of sight, as illustrated below:

So… That’s all we need to know for now. We’re done. As for now that is. This was quite technical, I guess, but I am afraid the next post will be even more technical. Sorry for that. I guess this is just a piece we need to get through.

Post scriptum:

You’ll remember that, with moving and accelerating charges, we should also have a magnetic field, usually denoted by B. That’s correct. If we have a changing electric field, then we will also have a magnetic field. There’s a formula for B:

B = –e_r’´E/c = –| e_r’||E|c^–1sin(e_r’, E)·n = –(E/c)·n

This is a vector cross-product. The angle between the unit vector e_r’ and E is π/2, so the sine is one. The vector n is the vector normal to both vectors as defined by the right-hand screw rule. [As for the minus sign, note that –a´b = b´a, so we could have reversed the vectors: the minus sign just reverses the direction of the normal vector.] In short, the magnetic field vector B is perpendicular to E, but its magnitude is tiny: E/c. That’s why Feynman neglects it, but we will come back on that in later posts.

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Logarithms: a bit of history (and the main rules)

Pre-scriptum (dated 26 June 2020): This post did not suffer much – if at all – from the attack by the dark force—which is good because I still like it. Enjoy !

Original post:

This post will probably be of little or no interest to you. I wrote it to get somewhat more acquainted with logarithms myself. Indeed, I struggle with them. I think they come across as difficult because we don’t learn about the logarithmic function when we learn about the exponential function: we only learn logarithms later – much later. And we don’t use them a lot: exponential functions pop up everywhere, but logarithms not so much. Therefore, we are not as familiar with them as we should be.

The second point issue is notation: x = log_a(y) looks more terrifying than y = a^x because… Well… Too many letters. It would be more logical to apply the same economy of symbols. We could just write x = _ay instead of log_a(y), for example, using a subscript in front of the variable–as opposed to a superscript behind the variable, as we do for the exponential function. Or, else, we could be equally verbose for the exponential function and write y = exp_a(x) instead of y = a^x. In fact, you’ll find such more explicit expressions in spreadsheets and other software, because these don’t take subscripts or superscripts. And then, of course, we also have the use of the Euler number e in e^xand ln(x). While it’s just a real number, e is not as familiar to us as π, and that’s again because we learned trigonometry before we learned advanced calculus.

Historically, however, the exponential and logarithmic functions were ‘invented’, so to say, around the same time and by the same people: they are associated with John Napier, a Scot (1550–1617), and Henry Briggs, an Englishman (1561–1630). Briggs is best known for the so-called common (i.e. base 10) logarithm tables, which he published in 1624 as the Arithmetica Logarithmica. It is logical that the mathematical formalism needed to deal with both was invented around the same time, because they are each other’s inverse: if y = a^x, then x = log_a(y).

These Briggs tables were used, in their original format more or less, until computers took over. Indeed, it’s funny to read what Feynman writes about these tables in 1965: “We are all familiar with the way to multiply numbers if we have a table of logarithms.” (Feynman’s Lectures, p. 22-4). Well… Not any more. And those slide rules, or slipsticks as they were called in the US, have disappeared as well, although you can still find circular slide rules on some expensive watches, like the one below.

It’s a watch for aviators, and it allows them to rapidly multiply numbers indeed: the time multiplied by the speed will give a pilot the distance covered. Of course, there’s all kinds of intricacies here because we’ll measure time in minutes (or even seconds), and speed in knots or miles per hour, and so that explains all the other fancy markings on it. 🙂 In case you have one, now you know what you’re paying for! A real aviator watch! 🙂

How does it work? Well… These slide rules can be used for a number of things but their most basic function is to multiply numbers indeed, and that function is based on the log_b(ac) = log_b(a) + log_b(c). In fact, this works for any base so we can just write log(ac) = log(a) + log(c). So the numbers on the slide rule below are the a, b and c. Note that the slides start with 1 because we’re working with positive numbers only and log(1) = 0, so that corresponds with the zero point indeed. The example below is simple (2 times 3 is six, obviously): it would have been better to demonstrate 1.55×2.35 or something. But you see how it goes: we add log(2) and log(3) to get log(6) = log(2×3). For 1.55×2.35, the slider would show a position between 3.6 and 3.7. The calculator on my $30 Nokia phone gives me 3.6425. So, yes, it’s not far off. However, it’s hard to imagine that engineers and scientists actually used these slide rules over the past 300 years or so, if not longer.

Of course, Briggs’ tables are more accurate. It’s quite amazing really: he calculated the logarithms of 30,000 (natural) numbers to to fourteen decimal places. It’s quite instructive to check how he did that: all he did, basically, was to calculate successive square roots of 10.

Huh?

Yes. The secret behind is the basic rule of exponentiation: exponentiation is repeated multiplication, and so we can write: a^m+n =a^maⁿ and, more importantly, a^m–n = a^ma^–n = a^m/aⁿ. Because Briggs used the common base 10, we should write 10^m–n = 10^m/10ⁿ. Now Briggs had a table with the successive square roots of 10, like the one below (it’s only six significant digits behind the decimal point, not fourteen, but I just want to demonstrate the principle here), and so that’s basically what he used to calculate the logarithm (to base 10) of 30,000 numbers! Talking patience ! Can you imagine him doing that, day after day, week after week, month after month, year after year? Waw !

So how did he do it? Well… Let’s do it for x = log₁₀(2) = log(2). So we need to find some x for which 10^x = 2. From the table above, it’s obvious that log(2) cannot be 1/2 (= 0.5), because 10^1/2= 3.162278, so that’s too big (bigger than 2). Hence, x = log(2) must be smaller than 0.5 = 1/2. On the other hand, we can see that x will be bigger than 1/4 = 0.25 because 10^1/4= 1.778279, and so that’s less than 2.

In short, x = log(2) will be between 0.25 (= 1/4) and 0.5. What Briggs did then, is to take that 10^1/4factor out using the 10^m–n = 10^m/10ⁿ formula indeed:

10^x–0.25 = 10^x/10^0.25 = 2/1.778279 = 1.124683

If you’re panicking already, relax. Just sit back. What we’re doing here, in this first step, is to write 2 as

2 = 10^x = 10^{[0.25 + (x–0.25)]} = 10^1/410^x–0.25= (1.778279)(1.124683)

[If you’re in doubt, just check using your calculator.] We now need log(10^x–0.25) = log(1.124683). Now, 1.124683 is between 1.154782 and 1.074608 in the table. So we’ll use the lowest value (10^1/32) to take another factor out. Hence, we do another division: 1.124683/1.074608 = 1.046598. So now we have 2 = 10^x = 10^{[1/4 + 1/32 + (x – 1/4 – 1/32)]} = (1.778279)(1.074608)(1.046598).

We now need log(10^{x–1/4–1/32}) = log(1.046598). We check the table once again, and see that 1.046598 is bigger than the value for 10^1/64, so now we can take that 10^1/64value out by doing another division. (10^{x–1/4–1/32})/10^1/64 = 1.046598/1.036633 = 1.009613. Waw, this is getting small! However, we can still take an additional factor out because it’s larger than the 1.009035 value in the table. So we can do another division: 1.009613/1.009035 = 1.000573. So now we have 2 = 10^x = 10^{[1/4 + 1/32 + 1/64 + 1/256 + (x – 1/4 –1/32 – 1/64 –1/256)]} = 10^1/410^1/3210^1/6410^1/25610^{x–1/4–1/32–1/64–1/256}= (1.778279)(1.074608)(1.036633)(1.009035)(1.000573).

Now, the last factor is outside of the range of our table: it’s too small to find a fraction. However, we had a linear approximation based on the gradient for very small fractions x: 10^r= 1 + 2.302585·r. So, in this case, we have 1.000573 = 1 + 2.302585·r and, hence, we can calculate r as 0.000248. [I can shown where this approximation comes from: just check my previous posts if you want to know. It’s not difficult.] So, now, we can finally write the result of our iterations:

2 = 10^x ≈ 10^{(1/4 + 1/32 + 1/64 + 1/256 + 0.000248)}

So log(2) is approximated by 0.25 + 0.03125 + 0.015625 + 0.00390625 + 0.000248 = 0.30103. Now, you can check this easily: it’s essentially correct, to an accuracy of six digits that is!

Hmm… But how did Briggs calculate these square roots of 10? Well… That was done ‘by cut and try’ apparently! Pf-ff ! Talk of patience indeed ! I think it’s amazing ! And I am sure he must have kept this table with the square roots of 10 in a very safe place ! 🙂

So, why did I show this? Well… I don’t know. Just to pay homage to those 17th century mathematicians, I guess. 🙂 But there’s another point as well. While the argument above basically demonstrated the a^m+n = a^maⁿformula or, to be more precise, the a^m–n = a^m/aⁿ formula, it also shows the so-called product rule for logarithms:

log_b(ac) = log_b(a) + log_b(c)

Indeed, we wrote 2 as a product of individual factors 10^rand then we could see the exponents r in all of these individual factors add up to 2. However, the more formal proof is interesting, and much shorter too: 🙂

Let m = log_a(x) and n = log_a(y)
Write in exponent form: x = a^mand y = aⁿ
Multiply x and y: xy = a^maⁿ = a^m+n
Now take log_a of both sides: log_a(xy) = log_a(a^m+n) = (m+n)log_a(a) = m+n = log_a(x) + log_a(y)

You’ll notice that we used another rule in this proof, and that’s the so-called power rule for logarithms:

log_a(xⁿ)= nlog_a(x)

This power rule is proved as follows:

Let m = log_a(x)
Write in exponent form: x = a^m
Raise both sides to the power of n: xⁿ = (a^m)ⁿ
Convert back to a logarithmic equation: log_a(xⁿ)= mn
Substitute for m = log_a(x): log_a(xⁿ)= n log_a(x)

Are there any other rules?

Yes. Of course, we have the quotient rule:

log_a(x/y) = log_a(x) – log_a(y)

The proof of this follows the proof of the product rule, and so I’ll let you work on that.

Finally, we have the ‘change-of-base’ rule, which shows us how we can easily switch from one base to another indeed:

The proof is as follows:

Let x = log_a b
Write in exponent form: a^x= b
Take log_c of both sides and evaluate:

log _c a^x = log _c bxlog _c a = log _c b

[I copied these rules and proofs from onlinemathlearning.com, so let me acknowledge that here. :-)]

Is that it? Well… Yes. Or no. Let me add a few more lines on these logarithmic scales that you often encounter in various graphs. It the same scale as those logarithmic scales used for that slide that we showed above but it covers several orders of magnitude, all equally spaced: 1, 10, 100, 1000, etcetera, instead of 0, 1, 2, 3, etcetera. So each unit increase on the scale corresponds to a unit increase of the exponent for a given base (base 10 in this case): 10¹, 10², 10³, etcetera. The illustration below (which I took from Wikipedia) compares logarithmic scales to linear ones, for one or both axes.

So, on a logarithmic scale, the distance from 1 to 100 is the same as the distance from 10 to 1000, or the distance from 0.1 to 10, or the distance between any point that’s 100 (= 10²) times another point. This is easily explained by the product rule, or the quotient rule rather:

log(10) – log(0.1) = log(10¹/1^–1) = log(10²) = 2

= log(1000) – log(10) = log(10³/1¹) = log(10²/) = 2

= log(100) – log(1) = log(10²/10⁰) = log(10²) = 2

And, of course, we could say the same for the distance between 1 and 1000, and 0.1 and 100. The distance on the scale is 3 units here, while the point is 1000 = 10³the other point.

Why would we use logarithmic scales? Well… Large quantities are often better expressed like that. For example, the Richter scale used to measure the magnitude of an earthquake is just a base–10 logarithmic scale. With magnitude, we mean the amplitude of the seismic waves here. So an earthquake that registers 5.0 units on the Richter scale has a ‘shaking amplitude’ that is 10 times greater than that of an earthquake that registers 4.0. Both are fairly light earthquakes, however: magnitude 7, 8 or 9 are the big killers. Note that, theoretically, we could have earthquakes of a magnitude higher than 10 on the Richter scale: scientists think that the asteroid that created the Chicxulub crater created a cataclysm that would have measured 13 on Richter’s scale, and they associate it with the extinction of the dinosaurs.

The decibel, measuring the level of sound, is another logarithmic unit, so the power associated with 40 decibel is not two times but one hundred times that of 20 decibel!

Now that we’re talking sound, it seems that logarithmic scales are more ‘natural’ when it comes to human perception in general, but I’ll let you have fun googling some more stuff on that! 🙂

Real exponentials and double roots: a post for my kids

Pre-scriptum (dated 26 June 2020): This post did not suffer much – if at all – from the attack by the dark force—which is good because I still like it. Enjoy !

Original post:

There is one loose end related to exponentials that I want to tie up here. It’s the issue of multiple roots (or multiple-valuedness as it’s called in the context of inverse functions).

Introduction

You’ll remember that, for integer exponents n, we had two inverse operations for aⁿ:

The logarithm: the instruction here is to find n (i.e. the exponent) given the value aⁿ and given a (i.e. the base).
The ‘n^throot’ function: the instruction here is find a (i.e. the base) given the value aⁿ and given n (i.e. the exponent).

We have two inverse operations because the exponentiation operation is not commutative: while a + b = b + a (and, therefore, a×b = b×a, so multiplication is commutative as well), aⁿis surely not the same as n^a (except if a = n, of course).

Having two inverse operations is somewhat confusing, of course. However, when we expand the domain of the exponential function to also include rational exponents, the ‘n^throot’ function becomes an exponential function itself: a^1/n. That’s nice, because it tidies things up. We only have one inverse operation now: the logarithm.

Now, my kids understand exponentials, but they find logarithms weird. There are two reasons for that. The most important one is that we don’t learn about the logarithm function when we learn about the exponential function. We only learn logarithms later – much later. Therefore, we are not as familiar with them as we should be. There is no good reason for that but that’s what it is. [I guess I am like Euler here: I’d suggest logarithms and complex numbers should be taught earlier in life. Then we would have less trouble understanding them.]

The second one is notation, I think. Indeed, x = log_a(y) looks much more frightening than y = a^xbecause… Well… Too many letters. It would be more logical to apply the same economy of symbols. We could just write x = _ay instead of log_a(y), for example, using a subscript in front of the variable–as opposed to a superscript behind the variable, as we do for the exponential function. Or, else, we could be equally verbose for the exponential function and write y = exp_a(x) instead of y = a^x. In fact, you’ll find such more explicit expressions in spreadsheets and other software, because these don’t take subscripts or superscripts.

In any case, that’s not the point here. I will come back to the logarithmic function later. The point that I want to discuss here is that, while we sort of merged our ‘n^throot function’ with our exponential function as we allowed for rational exponents as well (as opposed to integers only), we’re actually still taking roots, so to say, and then we note another problem: the square root function yields not one but two numbers when the base (a) is real and positive: ± a^1/2.

In fact, that’s a more general problem.

Odd and even rational exponents

You’ll remember the following rules for exponentiation:

1. For a positive real number a, we have always have two real n^throots when n is even: a^1/n: ± a^1/n. That’s obviously a consequence of having two real square roots ± a^1/2, because the definition of even parity is that n can be written as n = 2k with k any integer, i.e. k ∈ Z (so k can be negative). Hence, a^1/ncan then be written as a^1/2k= a^1/2k= (a^1/k)^1/2. Hence, whatever the value of a^1/k(if k is even, then we have two k^throots once again, but that doesn’t matter), we will have two real roots: plus (a^1/k)^1/2and minus (a^1/k)^1/2

2. If n is uneven (or odd I should say), so n ∈ {2k+1: k ∈ Z}, we have only one real root a^1/2k+1: that root is positive when a is positive and negative when a is negative.

3. For the sake of completeness, let me add the third case: a is negative and n is even. We know there’s no real n^throot of a in that case. That’s why mathematicians invented i: we’ll associate an even root of a negative real number with two complex-valued roots: a^1/n: ± ia^1/n.

The first and second case are illustrated below for n = 2 and n = 3 respectively. The complex roots of the third case cannot be visualized because y is a real axis. Of course, we could imagine the complex-roots ± ia^1/nif we would flip or mirror the blue and red graph (i.e. the graphs for n = 2) along the vertical axis and re-label that axis as the iy-axis, i.e. the imaginary axis. But so I’ll leave that to your imagination indeed.

How does this parity business turn out for rational exponents?

If r is a rational number r = m/n, we’ll have to express it as an irreducible fraction first, so the numerator m and denominator n have no other common divisors than 1, or –1 when considering negative numbers. But let’s look at positive numbers first. If we write r as an irreducible fraction m/n, then m and n cannot both be even. Why not? Because m and n can then both be divided by 2 and m/n is not an irreducible fraction in that case. Let’s assume m is even. Hence, n must be odd in that case. We can then write a^2k/nas (a^k/n)². This number will always be positive, because we are squaring something. So it doesn’t matter if a^k/n has one or two roots: we’ll square them and so the result will always be positive.

Now let’s assume the second possibility: m is odd. We can then write a^m/nas (a^1/n)^m. So now it will depend on whether or not n is even. If n is even, we have two real roots, if n is uneven, then we have only one. Let’s work a few examples:

8^2/3= (8^1/3)²= 2²= 4
4^3/2= (4^1/2)³= (±2)³=±2³=±2³= ±8
16^1/4= (16^1/2)^1/2= (±4)^1/2=±4^1/2=±2= ±2
(–8)^5/3= (–8^1/3)⁵= (–2)⁵= 32

So we have two roots if m is odd and n is even, and only one root in all other cases. However, we said that m and n cannot both be even, hence, if n is even, m must be odd. In short, we can say that a rational exponent m/n is even (i.e. there will be two roots), if n is even. Does that work for complex roots as well? Let’s work that out with an example:

(–4)^3/2= (–4^1/2)³= (±2i)³=(±2)³i³=±8i

So, yes! It works for complex roots as well. 🙂

OK. But let’s ask the obvious question now: where are these even numbers on the real line?

Well… They are everywhere: we can start from 1/2 and then change the numerator: 3/2, 5/2, etcetera. It’s all fine, as long as we use an odd number. However, we can also go down and change the denominator: 1/4, 1/6, 1/8 etcetera. And then we can, of course, take odd multiples of these fractions once again, such as 1025/1024 = 1.0009765625, for example, or on the other side, 1023/1024 = 0.9990234375. So we have two even numbers here right next to the odd number 1. We may increase the precision: we could take ± 1/3588 for example. 🙂

Of course, you may have noticed something here. The first thing, of course, is that we’ve defined these two even numbers 1.0009765625 and 0.9990234375 with a precision of 10 digits behind the decimal point, i.e. 1/1024 = 1/2¹⁰= 0.0009765625. The second point to note is that the last digit of these two rational coefficients, when expressed as a decimal, was 5. Now, you may think that should always be the case because of that 1/2 factor. But it’s not true: 1/6, for example, is a rational number that, written in decimal form, will yield 0.166666… This is an expression with a recurring decimal. And 1/10, of course, just yields 0.1. So there’s no easy rule here. You need to look at the fraction itself, and rational numbers are either as a finite decimal or an infinite repeating decimal. Of course, there are rules for that, but this is not a post on number theory, so I won’t write anything more on this: you can Google some more stuff yourself if you’re interested in this.

Irrational exponents

How does the business of parity work for irrational exponents? The gist of the rather long story above can be summarized easily. We can write a^m/n as a^m/n = a^m·(1/n) =(a^1/n)^m = a^1/n·a^1/n·a^1/n·a^1/n =·… (m times) and so whether or not we have multiple roots (two instead of one) depends on whether or not n is even. Indeed, remember – once again – that exponentiation is repeated multiplication, and so for the sign of the result, what matters is whether or not the number of times that we do that multiplication is even or odd, not only for integer but for rational exponents as well.

For irrational exponents, we also have repeated multiplication, but now we have an infinite expression, not a finite one:

a^r= a^{r(1/Δ + 1/Δ + 1/Δ + 1/Δ +…)}= a^r/Δ·a^r/Δ·a^r/Δ·a^r/Δ…

I explained this expression in my previous post: 1/Δ is an infinitesimally small fraction. In fact, I calculated rational powers of e using the fraction 1/Δ = 1/1024 = 1/2¹⁰. I used that fraction because I had started backwards, taking successive square roots of e, so e^1/2, and then e^1/4, e^1/8, e^1/16, etcetera.

However, as I mentioned when I started doing that, there was no compelling reason to cut things up by dividing them in 2. We could use 1/3 as the fraction to start with and, then, of course, or fraction 1/Δ would have been equal to 1/3¹⁰= 1/59049, so we have an odd number in the denominator here. So that’s one problem: we cannot say if Δ is even or odd. And the the second problem, of course, is that it’s an infinite expression and, hence, we cannot say if we multiplied 1/Δ an even or an odd number of times.

That leads to the third problem: we cannot say if r itself is even or uneven, which is basically what we were looking at: can we define irrational exponents as even or odd?

In short, the answer is no. In practice, that means that we will associate a^rwith one ‘r^throot’ only.

Hmm… That obviously makes a lot of sense but how do we ‘justify’ it from a more formal point of view? Where do these negative roots (for even powers) go? I am not sure. I guess there must be some more formal argument but I’ll leave that to you to look it up. I am fairly happy with what Wikipedia writes on that:

“[Real] Powers of a positive real number are always positive real numbers. […] If the definition of exponentiation of real numbers is extended to allow negative results then the result is no longer well behaved.”

In fact, the article actually does give a somewhat more formal argument, as it writes:

Neither the logarithm method nor the rational exponent method can be used to define b^r as a real number for a negative real number b and an arbitrary real number r. Indeed, e^ris positive for every real number r, so ln(b) is not defined as a real number for b ≤ 0.
As for the rational exponent method, that cannot be used for negative values of b because it relies on continuity. The function f(r) = b^r has a unique continuous extension from the rational numbers to the real numbers for each b > 0. But when b < 0, the function f is not even continuous on the set of rational numbers r for which it is defined.

I am not quite sure I fully understand the last line, but I guess this refers to what I pointed out above: all these even and odd numbers that are so close to each other. When we go from rational to irrational exponents, we can no longer define odd or even.

The bottom line

The bottom line is that, in practice, we will only work with positive real bases. Hence, if b is negative, then we will define b^r as –(–b)^r. Huh?

Yes. Think about it. If b is negative, we’ll just multiply it with –1 to ensure that the base is a positive real number. And then we just put a minus in front to get a graph such as, for example, that x^1/3function for the negative side of the x-axis as well.

You should also note that most applications, like the one I use to draw simple graphs like the ones above (rechneronline.de/function-graphs) are not capable of showing you both roots. They do check whether the exponent is even or odd though, because it plots the function x^1/3on both sides of the zero point, and the x^1/2graph on the positive side only: it’s just not capable to associate more than one y value with one x value indeed. [In case you’re curious to see what it does with an irrational exponent, go and check it yourself: you can put in x^pi or x^e. Will it give function values for negative values of x as well? What’s your guess? :-)]

You’ll wonder why I am emphasizing this point. Well… I just wanted to note that we should be aware of the fact that, as we go from rational to irrational exponents, we sort of deliberately ‘forget’ about the second (negative) root. The point to note is that the issue of multiple-valued functions – such as discussed in the context of, for example, Riemann surfaces – is not necessarily related to complex-valued functions. We have it here (double roots), and we also have it, in general, for periodic functions.

But that’s for a next post. And there we’ll use our ‘natural’ exponential e^x, and its inverse function, ln(x), an awful lot. So I’ll just conclude here with their graphs, noting, as Wikipedia does, that, nowadays, the term ‘exponential function’ is almost exclusively used as a shortcut for describing the natural exponential function e^x. But, to my kids, I say: it’s good that you know where it comes from. 🙂

Post scriptum:

When thinking about such minor things, it’s always to good to think about why we are manipulating all these symbols. Exponentiation is repeated multiplication. What does it mean to multiply something with a negative number? A minus sign is an instruction to reverse direction, to turn around, 180 degrees. So we multiply the magnitudes of both numbers a and b, but we change the direction: if we’re walking down the positive real axis, then now we’re walking down the negative axis.

So repeated multiplication with a negative real number means we’re switching back and forth, wildly jumping from the positive to the negative side of the zero point and then back again. You’ll admit you would appreciate being told in advance how many times we need to do the multiplication if the multiplier is negative: if n is even, then we’ll end up going in the same direction: (–1)ⁿ= 1. No sign reversal. If n is uneven, then we know that, besides the ‘booster’ effect (i.e. the exponentiation operation), we’re expected to speed in the opposite direction: (–1)ⁿ= –1.

Hence, if b would happen to be a negative real number, then defining b^r as –(–b)^r, or assuming that, in general, our base will be a positive real number makes sense. Of course, the math has to keep track of the theoretical possibility that, if the exponent would happen to be even, b might be a negative number, but you can see it’s more of a theoretical possibility indeed. Not something we’d associate with something happening in real life.

In that sense, I should note that multiplication with a complex multiplier is much more ‘real-life’, so to say. Multiplying something with a complex number does the same to the magnitude of both numbers as real multiplication: it multiplies the magnitudes, thereby changing the scale. So the product of a vector that’s 2 units long and a vector that’s 3 units long will still be 6 units long. However, complex numbers also allow for a more gradual change of direction. Instead of just a gear to move forward and backward, we also get a steering wheel so to say: multiplying two complex numbers also adds their angles (as measured from some kind of zero direction obviously), besides multiplying their magnitudes. For example, suppose that the zero direction is east, and we have a vector pointing east indeed (that means its imaginary part is zero) that we need to multiply with a vector pointing north (so that’s a vector with a zero real part, along the imaginary axis), then the final vector will be pointing north.

However, with that subtlety comes complexity as well. With real numbers, you can go in the same direction by reversing direction two times, and so that’s why we have two 2^ndroots (i.e. two square roots) of 1: (a) +1, so then we just stay where we are, and (b) –1, so then we rotate two times a full 180 degrees around the zero point: indeed, (–1)(–1) corresponds to two successive rotations by 180 degrees (or π in radians)–clockwise or counterclockwise, it doesn’t matter: one full loop around the zero point will get us back to square one, or point 1, I should say. 🙂

With complex numbers, it all depends. The 3^rdroot (i.e. the cube root) of 1 was only 1 in the real space but, in the complex space, we have three 3 cube roots of unity. The first one (W¹= W³) is the root we’re used to: unity itself, so the angle here is zero, i.e. straight ahead. In fact, with 1, we just stay where we are: 1×1×1 = 1³= 1 indeed. But that’s not the only way. The illustration below shows two other ways to end up where we are (i.e. at point 1):

The second cube root is W²: 120 degrees. You can see we get back at 1 by making three successive turns of 120 degrees indeed, so that’s one full loop around the or<igin. Using complex numbers (in polar notation), we write e^2π/3×e^2π/3×e^2π/3 = e^6π/3 = e^2π= e⁰= 1.
The third cuberoot is W¹: that’s 240 degrees ! Indeed, here we get back at square one by making three successive turns of 4π/3 radians, i.e .by making two loops, in total, around the origin: e^4π/3×e^4π/3×e^4π/3 = e^12π/3 = e^4π= e⁰= 1.

In short, we gain flexibility (of course, we have four 4^th roots (with which we make 0, 1, 2 and 3 loops around the origin respectively), 5^th roots, and so on), and the great Leonhard Euler was obviously fully right: complex numbers are more ‘natural’ numbers as they allow us to model real-life situations much better.

However, if you think that double roots are a problem… Well… Think again ! With complex numbers, the problem of multiple-valuedness is much more ‘real’, I’d say. 🙂

P.S: As mentioned in my previous post, I talk about that problem of multiple-valuedness when talking about Riemann surfaces in my October-November 2013 posts, so I won’t repeat what I wrote there. It’s about time I get back to both Feynman as well Penrose. 🙂

Just one last (philosophical) question to test your understanding. Negative real numbers have no real square root. That includes –1 obviously. Why is that? Why do we have two square real roots for +1 and no (none!) (real) square roots for –1?

[…] No? Come on!

[…] OK. Let me tell you: it’s all a question of definition. What’s implicit here is that we have only one real direction: from zero to infinity along the positive axis, and then –1 is nothing but a reversal of direction. So it’s an operation really, not a ‘real’ number. In a philosophical sense, of course: negative numbers don’t exist, so to say! Indeed, ask yourself: what is a negative number? It’s an operation: we subtract things when we use the minus sign, and we reverse direction when multiplying numbers with –1. So, if we multiply something with –1 two times in succession, we are back where we are.

Of course, we could say that the negative direction is the ‘real’ direction and, hence, that it’s the positive numbers that don’t ‘really’ exist. Indeed, math doesn’t care about what we say, so let’s say that the negative axis is the ‘real’ one, in a physical sense. What happens then? Well… Let’s see… Let’s do what we did before. We still define –1 as a reversal of direction, or a rotation by 180 degrees and, hence, doing that two times should bring us back where we want to be, so that’s –1 now. OK. So we have (–1)(–1)(–1) = (–1). But so that means that (–1)(–1) = 1, and… How can we write something like that for –1? What number a gives us the result that a×a = –1. Hmm… Only this imaginary number: i×i = i² = –1. So, no matter how hard you try: the way we use symbols is pretty consequent, and so you will find that (–1)(–1) = 1×1 = 1 (so we have two square roots of 1), but we will not find that 1×1 = –1. If you would want to do that, you’d have to define +1 as a reversal of direction, so that basically means that the + sign would take the function of the – sign. Huh?

🙂 You must think I’ve gone crazy. I don’t think so. The idea I want to convey here is that, no matter how abstract math may seem to be – when everything is said and done – it’s intimately connected to our most basic notions of space, and our motion in that space. We go from here to there, or backwards, we change direction, we count things, we measure lengths or distances,… All that math does is to capture that in a non-ambiguous and consistent way. That also results in terse ‘truths’ such as: 1 has two real square roots, +1 and –1, but the square roots of –1 are only imaginary: ± i.

However, that terse statement hides another fun ‘truth’: +i and −i are as real as –1. Indeed, they are a rotation by 90 degrees, counterclockwise (+i) or clockwise (−i), as opposed to, for example, a rotation by 180 degrees (–1), or a full loop (1). 🙂

Euler’s formula revisited

Pre-scriptum (dated 26 June 2020): This post – part of a series of rather simple posts on elementary math and physics – did not suffer much from the attack by the dark force—which is good because I still like it. Enjoy !

Original post:

This post intends to take some of the magic out of Euler’s formula. In fact, I started doing that in my previous post but I think that, in this post, I’ve done a better job at organizing the chain of thought. [Just to make sure: with ‘Euler’s formula’, I mean e^ix= cos(x) + isin(x). Euler produced a lot of formulas, indeed, but this one is, for math, what E = mc²is for physics. :-)]

The grand idea is to start with an initial linear approximation for the value of the complex exponential e^is near s = 0 (to be precise, we’ll use the eⁱ^ε = 1 + iε formula) and then show how the ‘magic’ of i – through the i²= –1 factor – gives us the sine and cosine functions. What we are going to do, basically, is to construct the sine and cosine functions algebraically.

Let us, as a starting point – just to get us focused – graph (i) the real exponential function e^x, i.e. the blue graph, and (ii) the real and imaginary part of the complex exponential function e^ix= cos(x) + isin(x), i.e. the red and green graph—the cosine and sine function. From these graphs, it’s clear that e^x and e^ixare two very different beasts.

1. e^xis just a real-valued function of x, so it ‘maps’ the real number x to some other real number y = e^x. That y value ‘rockets’ away, thereby demonstrating the power of exponential growth. There’s nothing really ‘special’ about e^x. Indeed, writing e^xinstead of 10^xobviously looks better when you’re doing a blog on math or physics but, frankly, there’s no real reason to use that strange number e ≈ 2.718 when all you need is just a standard real exponential. In fact, if you’re a high school student and you want to attract attention with some paper involving something that grows or shrinks, I’d recommend the use of π^x. 🙂

2. e^ixis something that’s very different. It’s a complex-valued function of x and it’s not about exponential growth (though it obviously is about exponentiation, i.e. repeated multiplication): y = e^ixdoes not ‘explode’. On the contrary: y is just a periodic ‘thing’ with two components: a sine and a cosine. [Note that we could also change the base, to 10, for example: then we write 10^ix. We’d also get something periodic, but let’s not get lost before we even start.]

Two different beasts, indeed. How can the addition of one tiny symbol – the little i in e^ix– can make such big difference?

The two beasts have one thing in common: the value of the function near x = 0 can be approximated by the same linear formula:

In case you wonder where this comes from, it’s basically the definition of the derivative of a function, as illustrated below. This is nothing special. It’s a so-called first-order approximation of a function. The point to note is that we have a similar-looking formula for the complex-valued e^ixfunction. Indeed, its derivative is d(e^ix)/dx = ie^ixand when we evaluate that derivative at x = 0, then we get ie⁰= i. So… Yes, the grand result is that we can effectively write:

e^iε≈ 1 + iε for small ε

Of course, 1 + iε is also a different ‘beast’ than 1 + ε. Indeed, 1 + ε is just a continuation of our usual walk along the real axis, but 1 + iε points in a different direction (see below). This post will show you where it’s headed.

Let’s first work with e^xagain, and think about a value for ε. We could take any value, of course, like 0.1 or some fraction 1/n. We’ll use a fraction—for reasons that will become clear in a moment. So the question now is: what value should we use for n in that 1/n fraction? Well… Because we are going to use this approximation as the initial value in a series of calculations—be patient: I’ll explain in a moment—we’d like to have a sufficiently small fraction, so our subsequent calculations based on that initial value are not too far off. But what’s sufficiently small? Is it 1/10, or 1/100,000, or 1/10¹⁰⁰? What gives us ‘good enough’ results? In fact, how do we define ‘good enough’?

Good question! In order to try to define what’s ‘good enough’, I’ll turn the whole thing on its head. In the table below, I calculate backwards from e¹= e by taking successive square roots of e. Huh? What? Patience, please! Just go along with me for a while. First, I calculate e^1/2, so our fraction ε, which I’ll just write as x, is equal to 1/2 here, so the approximation for e^1/2 is 1 + 1/2 = 1.5. That’s off. How much? Well… The actual value of e^1/2 is about 1.648721 (see the table below (or use a calculator or spreadsheet yourself): note that, because I copied the table from Excel, e^x is shown as e^x). Now, 1.648721 is 1.5 + 0.148721, so our approximation (1.5) is about 9% off (as compared to the actual value). Not all that much, but let’s see how we can improve. Let’s take the square root once again: (e^1/2)^1/2= e^1/4, so x = 1/4. And then I do that again, so I get e^1/8, and so on and so on. All the way down to x = 1/1024 = 1/2¹⁰, so that’s ten iterations. Our approximation 1 + x (see the fifth/last column in the table below is then equal to 1 + 1/1024 = 1 + 0.0009765625, which we rounded to 1.000977 in the table.

The actual value of e^1/1024is also about 1.000977, as you can see in the third column of the table. Not exactly, of course, but… Well… The accuracy of our approximation here is six digits behind the decimal point, so that’s equivalent to one part in a millionth. That’s not bad, but is it ‘good enough’? Hmm… Let’s think about it, but let’s first calculate some other things. The fourth column in the table above calculates the slope of that AB line in the illustration above: its value converges to one, as we would expect, because that’s the slope of the tangent line at x = 0. [So that’s the value of the derivative of e^xat x = 0. Just check it: de^x/dx = e^x, obviously, and e⁰= 1.] Note that our 1 + x approximation also converges to 1—as it should!

So… Well… Let’s now just assume we’re happy with with that approximation that’s accurate to one part in a million, so let’s just continue to work with this fraction 1/1024 for x. Hence, we will write that e^1/1024≈ 1 + 1/1024 and now we will use that value also for the complex exponential. Huh? What? Why? Just hang in here for a while. Be patient. 🙂 So we’ll just add the i again and, using that e^iε≈ 1 + iε expression, we write:

e^i/1024≈ 1 + i/1024

It’s quite obvious that 1 + i/1024 is a complex number: its real part is 1, and its imaginary part is 1/1024 = 0.0009765625.

Let’s now work our way up again by using that complex number 1 + i/1024 = 1 + i·0.0009765625 to calculate e^i/512, e^i/256, e^i/128etcetera. All the way back up to x = 1, i.e. eⁱ. I’ll just use a different symbol for x: in the table below, I’ll substitute x for s because I’ll refer to the real part of our complex numbers as ‘x’ from time to time (even if I write a and b in the table below), and so I can’t use the symbol x to denote the fraction. [I could have started with s, but then… Well… Real numbers are usually denoted by x, and so it was easier to start that way.] In any case…

The thing to note is how I calculate those values e^i/512, e^i/256, e^i/128etcetera. I am doing it by squaring, i.e. I just multiply the (complex) number by itself. To be very explicit, note that e^i/512 = (e^i/1024)²= e^i·2/1024 = (e^i/1024)(e^i/1024). So all that I am doing in the table below is multiply the complex number that I have with itself, and then I have a new result, and then I square that once again, and then again, and again, and again etcetera. In other words, when going back up, I am just taking the square of a (complex) number. Of course, you know how to multiply a number with itself but, because we’re talking complex numbers here, we should actually write it out:

(a + i·b)²= a²– b² + i·2ab = a²– b² + 2abi

[It would be good to always separate the imaginary unit i from real numbers like a, b, or ab, but then I am lazy and so I hope you’ll always recognize that i is the imaginary unit.] In any case… When we’re going back up (by squaring), the real part of the next number (i.e. the ‘x’ in x + iy) is a²– b² and the complex part (the ‘y’) is 2abi. So that’s what’s shown below—in the fourth and fifth column, that is.

Look at what happens. The x goes to zero and then becomes negative, and the y increases to one. Now, we went down from e^1/n = e¹ = e^1/1to e^1/n = e^1/1024, but we could have started with e², or e^4/n, or whatever. Hence, I should actually continue the calculations above so you can see what happens when s goes to 2, and then to 3, and then to 4, and so on and so on. What you’d see is that the value of the real and imaginary part of this complex exponential goes up and down between –1 and +1. You’d see both are periodic functions, like the sine and cosine functions, which I added in the last two columns of the table above. Now compare those a and b values (i.e. the second and third column) with the cosine and sine values (i.e. the last two columns). […] Do you see it? Do you see how close they are? Only a few parts in a million, indeed.

You need to let this sink it for a while. And I’d recommend you make a spreadsheet yourself, so you really ‘get’ what’s going on here. It’s all there is to the so-called ‘magic’ of Euler’s formula. That simple (a + ib)²= a²– b² + 2abi formula shows us why (and how) the real and imaginary part oscillate between –1 and +1, just like the cosine and sine function. In fact, the values are so close that it’s easy to understand what follows. They are the same—in the limit, of course.

Indeed, these values a²– b² and 2ab, i.e. the real and imaginary part of the next complex number in our series, are what Feynman refers to as the algebraic cosine and sine functions, because we calculate them as (a + ib)²= a²– b² + 2abi. These algebraic cosine and sine values are close to the real cosine and sine values, especially for small fractions s. Of course, there is a discrepancy becomes – when everything is said and done – we do carry a little error with us from the start, because we stopped at 1/n = 1/1024, before going back up.

There’s actually a much more obvious way to appreciate the error: we know that e^1/1024 should be some point on the unit circle itself. Therefore, we should not equate a with 1 if we have some value b > 0. Or – what amounts to saying the same – if if b is slightly bigger than 0, then a should be slightly smaller than 1. So the e^iε≈ 1 + iε is an approximation only. It cannot be exact for positive values of ε. It’s only exact when ε = 0.

So we’re off—but not far off as you can see. In addition, you should note that the error becomes bigger and bigger for larger s. For example, in the line for s = 1, we calculated the values of the algebraic cosine and sine for s = 2 (see the a^2 – b^2 and 2ab column) as –0.416553 and 0.910186, but the actual values are cos(2) = –0.416146 and sin(2) = 0.909297, which shows our algebraic cosine and sine function is gradually losing accuracy indeed (we’re off like one part in a thousand here, instead of one part in a million). That’s what we’d expect, of course, as we’re multiplying the errors as we move ‘back up’.

The graph below plots the values of the table.

This graph also shows that, as we’re doubling our ratio r all the time, the data points are being spaced out more and more. This ‘spacing out’ gets a lot worse when further increasing s: from s = 1 (that’s the ‘highest’ point in the graph above), we’d go to s = 2, and then to s = 4, s = 8, etcetera. Now, these values are not shown above but you can imagine where they are: for s = 2, we’re somewhere in the second quadrant, for s = 4, we’re in the third, etcetera. So that does not make for a smooth graph. We need points in-between. So let’s ‘fix’ this problem by taking just one value for s out of the table (s = 1/4, for example) and we’ll continue to use that value as a multiplier.

That’s what’s done in the table below. It looks somewhat daunting at first but it’s simple really. First, we multiply the value we got for e^1/4with itself once again, so that gives us a real and an imaginary part for e^1/8(we had that already in the table above and you can check: we get the same here). We then take that value (i.e. e^1/8) not to multiply it with itself but with e^1/4once again. Of course, because the complex numbers are not the same, we cannot use the (a + ib)²= a²– b² + 2abi rule any more. We must now use the more general rule for multiplying different complex numbers: (a + ib)(c + id) = (ac – bd) + i(ad + bc). So that’s why I have an a, b, c and d column in this table: a and b are the components of the first number, and c and d of the second (i.e. e^1/4= 0.969031 + 0.247434i)

In the table above, I let s range from zero (0) to seven (7) in steps of 0.25 (= 1/4). Once again, I’ve added the real cosine and sine values for these angles (they are, of course, expressed in radians), because that’s what s is here: an angle, aka as the phase of the complex number. So you can compare.

The table confirms, once again, that we’re slowly losing accuracy (we’re now 3 to 4 parts in a thousand off), but it is very slowly only indeed: we’d need to do many ‘loops’ around the center before we could actually see the difference on a graph. Hey! Let’s do a graph. [Excel is such a great tool, isn’t it?] Here we are: the thick black line describing a circle on the graph below connects the actual cosine and sine values associated with an angle of 1/4, 1/2, 3/8 etcetera, all the way up to 7 (7 is about 2.3π, so we’re some 40 degrees past our original point after the ‘loop’), while the little ‘+‘ marks are the data points for the algebraic cosine and sine. They match perfectly because our eye cannot see the little discrepancy.

So… That’s it. End of story.

What?

Yes. That’s it. End of story. I’ve done what I promised to do. I constructed the sine and cosine functions algebraically. No compass. 🙂 Just plain arithmetic, including one extra rule only: i²= –1. That’s it.

So I hope I succeeded. The goal was to take some of the magic out of Euler’s formula by showing how that eⁱ^ε = 1 + iε approximation and the definition of i²= –1 gives us the cosine and sine function itself as we move around the unit circle starting from the unity point on the real axis, as shown in that little graph:

Of course, the ε we were working with was much smaller than the size of the arrow suggests (it was equal to 1/1024 ≈ 0.000977 to be precise) but that’s just to show how differentials work. 🙂 Pretty good, isn’t it? 🙂

Post scriptum:

I. If anything, all this post did was to demonstrate multiplication of complex numbers. Indeed, when everything is said and done, exponentiation is repeated multiplication–both for real as well as for complex exponents. The only difference is–well… Complex exponents give us these oscillating things, because a complex exponent effectively throws a sine and cosine function in.

Now, we can do all kinds of things with that. In this post, we constructed a circle without a compass. Now, that’s not as good as squaring the circle 🙂 but, still, it would have awed Pythagoras. Below, I construct a spiral doing the same kind of math: I start off with a complex number again but now it’s somewhat more off the unit circle (1 + 0.247434i). In fact, I took the same sine value as the one we had for e^i/4but I replaced the cosine value (0.969031) with 1 exactly). In other words, my ε is a lot bigger here.

Then I multiply that complex number 1 + 0.247434i with itself to get the next number (0.938776 + 0.494868i), and then I multiply that result once again with my first number (1 + 0.247434i), just like we did when we were constructing the circle. And then it goes on and on and on. So the only difference is the initial value: that’s a bit more off the unit circle. [When we constructed the circle, our initial value was also a bit off but much less. Here we go for a much larger difference.]

So you can see what happens: multiplying complex numbers amounts to adding angles and multiplying magnitudes: αe^iβ·γe^iδ = αγe^i(β+^δ)=|αe^iβ|·|γe^iδ|e^i(β+^δ)| = |α||γ|e^i(β+^δ). So, because we started off with a complex number with magnitude slightly bigger than 1 (you calculate it using Pythagoras’ theorem: it’s 1.03, more or less, which is 3% off, as opposed less than one part in a million for the 1 + 0.000977i number), the next point is, of course, slightly off the unit circle too, and some more than 3% actually. And so that goes on and on and on and the ‘some more’ becomes bigger and bigger in the process.

Constructing a graph like this one is like doing the kind of silly stuff I did when programming little games with our Commodore 64 in the 1980s, so I shouldn’t dwell too much on this. In fact, now that I think of it: I should have started near –i, then my spiral would have resembled an e. 🙂 And, yes – for family reading this – this is also like the favorite hobby of our dad: calculating a better value for π. 🙂

However… The only thing I should note, perhaps, is that this kind of iterative process resembles – to some extent – the kind of process that iterative function systems (IFSs) use to create fractals. So… Well… It’s just nice, I guess. [OK. That’s just an excuse. Sorry.]

II. The other thing that I demonstrated in this post may seem to be trivial but I’ll emphasize it here because it helped me (not sure about you though) to understand the essence of real exponentials much better than I did before. So, what is it?

Well… It’s that rather remarkable fact that calculating (real) irrational powers amounts to doing some infinite iteration. What do I mean with that?

Well… Remember that we kept on taking the square root of e, so we calculated e^1/2, and then (e^1/2)^1/2= e^1/4, and then (e^1/4)^1/2= e^1/8, and then we went on: e^1/16, e^1/32, e^1/64, all the way down to e^1/1024, where we stopped. That was 10 iterations only. However, it was clear we could go on and on and on, to find that limit we know so well: e^1/Δtends to 1 (not to zero (0), and not to e either!) for Δ → ∞.

Now, e = e¹ is an exponential itself and so we can switch to another base, base-10 for example, using the general a^s= (b^k)^s= b^ks= b^tformula, with k = log_b(a). Let’s do base-10: we get e¹ = [10^log₁₀(e)]¹= 10^{0.434294…etcetera}. Now, because e is an irrational number, log₁₀(e) is irrational too, so we indeed have an infinite number of decimals behind the decimal point in 0.434294…etcetera. In fact, e is not only irrational but transcendental: we can’t calculate it algebraically, i.e. as the root of some polynomial with rational coefficients. Most irrational numbers are like that, by the way, so don’t think that being ‘transcendental’ is very special. In any case… That’s a finer point that doesn’t matter much here. You get the idea, I hope. It’s the following:

When we have a rational power a^m/n , it helps to think of it as a product of m factors a^1/n (and surely if we would want to calculate a^m/n without using a calculator, which, I admit, is not very fashionable anymore and so nobody ever does that: too bad, because the manual work involved does help to better understand things). Let’s write it down: a^m/n = a^m·(1/n) =(a^1/n)^m = a^1/n·a^1/n·a^1/n·a^1/n =·… (m times). That’s simple indeed: exponentiation is repeated multiplication. [Of course, if m is negative, then we just write a^m/nas 1/(a^m/n), but so that doesn’t change the general idea of exponentiation.]
However, it is much more difficult to see why, and how, exponentiation with irrational powers amounts to repeated multiplication too. The rather lengthy exposé above shows… Well, perhaps not why, but surely how. [And in math, if we can show how, that usually amounts to showing why also, isn’t it? :-)] Indeed, when we think of a^r (i.e. an irrational power of some (real) number a), we can think of it as a product of an infinite number of factors a^r/Δ. Indeed, we can write a^ras:

a^r= a^{r(1/Δ + 1/Δ + 1/Δ + 1/Δ +…)}= a^r/Δ·a^r/Δ·a^r/Δ·a^r/Δ…

Not convinced? Let’s work an example: 10^π= [e^ln10]^π= [e^ln10]^π = e^ln10·^π= e^ln10·^π= e^7.233784…Of course, if you take your calculator, you’ll find something like 1385.455731, both for 10^π and e^7.233784 (hopefully!), but so that’s not the point here. We’ve shown that e is an infinite product e^1/Δ·e^1/Δ·e^1/Δ·e^1/Δ·… =e^(1/Δ+^{1/Δ+1/Δ+1/Δ+…)}= e^Δ/Δ with Δ some infinitely large (but integer) number. In our example, we stopped the calculation at Δ = 1024, but you see the logic: we could have gone on forever. Therefore, we can write e^7.233784… as

e^7.233784… = e^7.233784…(^1/Δ+^{1/Δ+1/Δ+1/Δ+…)}= e^{7.233784…/Δ}·e^{7.233784…/Δ}·e^{7.233784…/Δ}…

Still not convinced? Let’s revert back to base 10. We can write the factors e^{7.233784…/Δ}as e^{(ln10·π)/Δ}= [e^ln10]^π/Δ= 10^π/Δ. So our original power 10^πis equal to: 10^π= 10^π/Δ·10^π/Δ·10^π/Δ·10^π/Δ·10^π/Δ·10^π/Δ… = 10^π(Δ/Δ), and of course, 10^1/Δ also tends to 1 as Δ goes to infinity (not to zero, and not to 10 either). 🙂 So, yes, we can do this for any real number a and for any r really.

Again, this may look very trivial to the trained mathematical eye but, as a novice in Mathematical Wonderland, I felt I had to go through this to truly understand irrational powers. So it may or may not help you, depending on where you are in MW.

[Proving that the limit for Δ/Δ goes to 1 as Δ goes to ∞ should not be necessary, I hope? 🙂 But, just in case you wonder how the formula for rational and irrational powers could possibly be related, we can just write a^m/n= a^{(m/n)(1/Δ + 1/Δ + 1/Δ + 1/Δ +…)}= a^m/nΔ·a^m/nΔ·a^m/nΔ·a^m/nΔ·…= (a^{1/Δ + 1/Δ + 1/Δ + 1/Δ +…})^m/n= a^m/n, as we would expect. :-)]

III. So how does that a^r= a^r/Δ·a^r/Δ·a^r/Δ·a^r/Δ… formula work for complex exponentials? We just add the i, so we write a^ir but we know what effect that has: we have a different beast now. A complex-valued function of r, or… Well… If we keep the exponent fixed, then it’s a complex-valued function of a! Indeed, do remember we have a choice here (and two inverse functions as well!).

However, note that we can write a^ir in two slightly different ways. We have two interpretations here really:

A. The first interpretation is the easiest one: we write a^ir as a^ir = (a^r)ⁱ= (a^{r/Δ + r/Δ + r/Δ + r/Δ +…})ⁱ.

So we have a real power here, a^r, and so that’s some real number, and then we raise it to the power i to create that new beast: a complex-valued function with two components, one imaginary and one real. And then we know how to relate these to the sine and cosine function: we just change the base to e and then we’re done.

In fact, now that we’re here, let’s go all the way and do it. As mentioned in my previous post – it follows out of that a^s= (e^k)^s= e^ks= e^tformula, with k = ln(a) – the only effect of a change of base is a change of scale of the horizontal axis: the graph of a^sis fully identical to the graph of e^tindeed: we just we need to substitute s by t = ks = ln(a)·s. That’s all. So we actually have our ‘Euler formula for a^ishere. For example, for base 10, we have 10^is= cos[ln(a)·s] + isin[ln(a)·s].

But let’s not get lost in the nitty-gritty here. The idea here is that we let i ‘act’ on a^r, so to say. And then, of course, we can write a^r as we want, but that doesn’t change the essence of what we’re dealing with.

B. The second interpretation is somewhat more tricky: we write a^ir as a^ir = a^ir/Δ·a^ir/Δ·a^ir/Δ·a^ir/Δ·…

So that’s a product of an (infinite) number of complex factors a^ir/Δ. Now, that is a very different interpretation than the one above, even if the mathematical result when putting real numbers in for a and r will – obviously – have to be the same. If the result is the same, then what am I saying really? Well… Nothing much, I guess. Just that the interpretation of an exponentiation as repeated multiplication makes sense for complex exponentials as well:

For rational r, we’ll have a finite number of complex factors: a^im/n = a^i/n·a^i/n·a^i/n·a^i/n·… (m times).
For irrational r, we’ll have an infinite number of complex factors a^ir = a^ir/Δ·a^ir/Δ·a^ir/Δ·a^ir/Δ… etcetera.

So the difference with the first interpretation is that, instead of looking at a^iras a real number a^rthat’s being raised to the complex power i, we’re looking at a^iras a complex number aⁱthat’s being raised to the real power r. As said, the mathematical result when putting real numbers in for a and r will – obviously – have to be the same. [Otherwise we’d be in serious trouble of course: math is math. We can’t have the same thing being associated with two different results.] But, as said, we can effectively interpret a^ir in two ways.

[…]

What I am doing here, of course, is contemplating all kinds of mathematical operations here – including exponentiation – on the complex space, rather on the real space. So the first step is to raise a complex number to a real power (as opposed to raising a real number to a complex power). The next step will be to raise a complex number to a complex power. So then we’re talking complex-valued functions of complex variables.

Now, that’s what complex analysis is all about, and I’ve written very extensively about that in my October-November 2013 post. So I would encourage you to re-read those, now that you’ve got, hopefully, a bit more of an ‘intuitive’ understanding of complex numbers with the background given in this and my previous post.

Complex analysis involves mapping (i.e. mapping from one complex space to another) and that, in turn, involves the concept of so-called analytic and/or holomorphic functions. Understanding those advanced concepts is, in turn, essential to understanding the kind of things that Penrose is writing about in Chapter 9 to 12 of his Road to Reality. […] I’ll probably re-visit these chapters myself in the coming weeks, as I realize I might understand them somewhat better now. If I could get through these, I’d be at page 250 or so, so that’s only one quarter of the total volume. Just an indication of how long that Road to Reality really is. 🙂

And then I am still not sure if it really leads to ‘reality’ because, when everything is said and done, those new theories (supersymmetry, M-theory, or string theory in general) are quite speculative, aren’t they? 🙂

Reflecting on complex numbers (again)

Original post:

This will surely be not my most readable post – if only because it’s soooooo long and – at times – quite ‘philosophical’. Indeed, it’s not very rigorous or formal, unlike those posts on complex analysis I wrote last year. At the same time, I think this post digs ‘deeper’, in a sense. Indeed, I really wanted to get to the heart of the ‘magic’ behind complex numbers. I’ll let you judge if I achieved that goal.

Complex numbers: why are they useful?

The previous post demonstrated the power of complex numbers (i.e. why they are used for), but it didn’t say much about what they are really. Indeed, we had a simple differential equation–an expression modeling an oscillator (read: a spring with a mass on it), with two terms only: d²x/dt² = –ω²x–but so we could not solve it because of the minus sign in front of the term with the x.

Indeed, the so-called characteristic equation for this differential equation is r² = –ω² and so we’re in trouble here because there is no real-valued r that solves this. However, allowing complex-valued roots (r = ±iω) to solve the characteristic equation does the trick. Let’s analyze what we did (and don’t worry if you don’t ‘get’ this: it’s not essential to understand what follows):

Using those complex roots, we wrote the general solution for the differential equation as Aeⁱ^ωt+ Be^–iωt. Now, note that everything is complex in this general solution, not only the eⁱ^ωtand e^–iωt ‘components’ but also the (random) coefficients A and B.
However, because we wanted to find a real-valued function in the end (remember: x is a vertical displacement from an equilibrium position x = 0, so that’s ‘real’ indeed), we imposed the condition that Aeⁱ^ωtand Be^–iωt had to be each other’s complex conjugate. Hence, B must beequal to A* and our ‘general’ (real-valued) solution was Aeⁱ^ωt+ A*e^–iωt. So we only have one complex (but equally random) coefficient now – A – and we get the other one (A*) for free, so to say.
Writing A in polar notation, i.e. substituting A for A = x₀e^iΔ, which implies that A* = x₀e^–iΔ, yields A₀eⁱ^Δeⁱ^ωt+ A₀e^-i^Δe^–iω = A₀[eⁱ^{(ωt + Δ}⁾+ e^{–i(ωt + Δ}⁾].
Expanding this, using Euler’s formula (and the fact that cos(-α) = cosα but sin(-α) = –sinα) then gives us, finally, the following (real-valued) functional form for x:

A₀[cos(ωt + Δ) + isin(ωt + Δ) + cos(ωt + Δ) – isin(ωt + Δ)]

= 2A₀cos(ωt + Δ) = x₀cos(ωt + Δ)

That’s easy enough to follow, I guess (everything is relative of course), but do we really understand what we’re doing here? Let me rephrase what’s going on here:

In the initial problem, our dependent variable x(t) was the vertical displacement, so that was a real-valued function of a real-valued (independent) variable (time).
Now, we kept the independent variable t real – time is always real, never imaginary 🙂 – but so we made x = x(t) a complex (dependent) variable by equating x(t) with the complex-valued exponential e^rt. So we’re doing a substitution here really.
Now, if e^rt is complex-valued, it means, of course, that r is complex and so that allows us to equate r with the square root of a negative number (r = ±iω).
We then plug these imaginary roots back in and get a general complex-valued solution (as expected).
However, we then impose the condition that the imaginary part of our solution should be zero.

In other words, we had a family of complex-valued functions as a general solution for the differential equation, but we limited the solution set to a somewhat less general solution including real-valued functions only.

OK. We all get this. But it doesn’t mean we ‘understand’ complex numbers. Let’s try to take the magic out of those complex numbers.

Complex numbers: what are they?

I’ve devoted two or three posts to this already (October-November 2013) but let’s go back to basics. Let’s start with that imaginary unit i. The essence of i – and, yes, I am using the term ‘essence’ in a very ‘philosophical’ sense here I guess: i‘s intrinsic nature, so to speak – is that its square is equal to minus one: i²= –1.

That’s it really. We don’t need more. Of course, we can associate i with lots of other things if we would want to (and we will, of course!), such as Euler’s formula for example, but these associations are not essential – or not as essential as this definition I should say. Indeed, while that ‘rule’ or ‘definition’ is totally weird and – at first sight – totally random, it’s the only one we need: all other arithmetic rules do not change and, in fact, it’s just that one extra rule that allows us to deal with any algebraic equation – so that’s literally every equation involving addition, multiplication and exponentiation (so that’s every polynomial basically). However, stating that i²= –1 still doesn’t answer the question: what is a complex number really?

In order to not get too confused, I’ve started to think we should just take complex numbers at face value: it’s the sum of (i) some real number and (ii) a so-called imaginary part, which consists of another real number multiplied with i. [So the only ‘imaginary’ bit is, once again, i: all the rest is real! ] Now, when I say the ‘sum’, then that’s not some kind of ‘new’ sum. Well… Let me qualify that. It’s not some kind of ‘new’ sum because we’re just adding two things the way we’re used to: two and two apples are four apples, and one orange plus two more is three. However, it is true that we’re adding two separate beasts now, so to say, and so we do keep the things with an i in them separate from the real bits. In short, we do keep the apples and the oranges separate.

Now, I would like to be able to say that multiplication of complex numbers is just as straightforward as adding them, but that’s not true. When we multiply complex numbers, that i²= –1 rule kicks in and produces some ‘effects’ that are logical but not all that ‘straightforward’ I’d say.

Let’s take a simple example–but a significant one (if only because we’ll use the result later): let’s multiply a complex number with itself, i.e. let’s take the square of a complex number. We get (a + bi)²= (a + bi)(a + bi) = a·a + a·(bi) + (bi)·a + (bi)·(bi) = a²+ 2abi + b²i²= a² + 2abi – b². That’s very different as compared to the square of a real sum a + b: (a + b)²= a²+ 2ab + b². How? Just look at it: we’ve got a real bit (a² – b²) and then an imaginary bit (2abi). So what?

Well… The thumbnail graph below illustrates the difference for a = b: it maps x to (a) 4x²[i.e. (x + x)²] and to (b) 2x² [i.e. (x + ix)²] respectively. Indeed, when we’re squaring real numbers, we get (a + b)²= 4a²–i.e. a ‘real bit’ only, of course!–but when we’re squaring complex numbers, we need to keep track of two components: the real part and the imaginary part. However, the real part (a² – b²) is zero in this case (a = b), and so it’s only the imaginary part 2abi = 2a²i that counts!

That’s kids stuff, you’ll say… In fact, when you’re a mathematician, you’ll say it’s a nonsensical graph. Why? Because it compares an apple and an orange really: we want to show 2ix²really, not 2x².

That’s true. However, that’s why the graph is actually useful. The red graph introduces a new idea, and with a ‘new’ idea I mean something that’s not inherent in the i²= –1 identity: it associates i with the vertical axis in the two-dimensional plane.

Hmm… This is an idea that is ‘nice’ – very nice actually – but, once again, I should note that it’s not part of i‘s essence. Indeed, the Italian mathematicians who first ‘invented’ complex numbers in the early 16th century (Tartaglia (‘the Stammerer’) and da Vinci’s friend Cardano) introduced roots of –1 because they needed them to solve algebraic equations. That’s it. Full stop. It was only much later (some hundred years later that is!) that Euler and Descartes associated imaginary numbers (like 2ix²) with the vertical coordinate axis. To my readers who have managed not to fall asleep while reading this: please continue till the end, and you will understand why I am saying the idea of a geometrical interpretation is ‘not essential’.

To the same readers, I’ll also say the following, however: if we do associate complex numbers with a second dimension, then we can associate the algebraic operations with things we can visualize in space. Most of you–all of you I should say–know that already, obviously, but let’s just have a look at that to make sure we’re on the same page.

A very basic thing in physical mathematics is reversing the direction of something. Things go in one direction, but we should be able to visualize them going in the opposite direction. We may associate this with a variable going from 0 to infinity (+∞): it may be time (t), or a time-dependent variable x, y or z. Of course, we know what we have here: we think of the positive real axis. So, what we do when we multiply with –1 is reversing its direction, and so then we’re talking the negative real axis: a variable going from 0 to minus infinity (-∞). Therefore, we can associate multiplication by –1 with a full rotation around the center (i.e. around the zero point) by 180 degrees (i.e. by π, in radians).

You may think that’s a weird way of looking at multiplication by minus one. Well… Yes and no. But think of it: the concept of negative numbers is actually as ‘weird’ as the concept of the imaginary unit in a way. I mean… Think about it: we’re used to use negative numbers because we learned about them when we were very small kids but what are they really? What does it mean to have minus three apples? You know the answer of course: it probably means that you owe someone three apples but that you don’t have any right now. 🙂 […] But that’s not the point here. I hope you see what I mean: negative numbers are weird too, in a sense. Indeed, we should be aware of the fact that we often look at concepts as being ‘weird’ because we weren’t exposed to them early enough: the great mathematician Leonhard Euler thought complex numbers were so ‘essential’ to math and, hence, so ‘natural’ that he thought kids should learn complex numbers as soon as they started learning ‘real’ numbers. In fact, he probably thought we should only be using complex numbers because… Well… They make the arithmetic space complete, so to say. […] But then I guess that’s because Euler understood complex numbers in a way we don’t, which is why I am writing about them here. 🙂

OK. Back to the main story line. In order to understand complex numbers somewhat better, it is actually useful – but, again, not necessarily essential – to think of i as a halfway rotation, i.e. a rotation by 90 degrees only, clockwise or counterclockwise, as illustrated above: multiplication with i means a counterclockwise rotation by 90 degrees (or π/2 radians) and multiplication with –i means a clockwise rotation by the same amount. Again, the minus sign gives the direction here: clockwise or counterclockwise. It works indeed: i·i =(-i)·(-i) = –1.

OK. Let’s wrap this up: we might say that

a positive real number is associated with some (absolute) quantity (i.e. a magnitude);
a minus sign says: “Go the opposite way! Go back! Subtract!”– so it’s associated with the opposite direction or the opposite of something in general; and, finally,
the imaginary unit adds a second dimension: instead of moving on a line only, we can now walk around on a plane.

Once we understand that, it’s easy to understand why, in most applications of complex numbers, you’ll see the polar notation for complex numbers. Indeed, instead of writing a complex number z as z = a+ ib, we’ll usually see it written as:

z = re^iθ with e^iθ = cosθ + isinθ

Huh? Well… Yes. Let me throw it in here straight away. You know this formula: it’s Euler’s formula. The so-called ‘magical’ formula! Indeed, Feynman calls it ‘our jewel’: the ‘most remarkable formula in mathematics’ as he puts it. Waw ! If he says so, it must be right. 🙂 So let’s try to understand it.

Is it magical really? Well… I guess the answer is ‘Yes’ and ‘No’ at the same time:

No. There is no ‘magic’ here. Associating the real part a and the imaginary part b with a magnitude r and an angle θ (a = rcosθ and b = rcosθ) is actually just an application of the Pythagorean theorem, so that’s ‘magic’ you learnt when you were very little and, hence, it does not look like magic anymore. [Although you should try to appreciate its ‘magic’ once again, I feel. Remember that you heard about the Pythagorean theorem because your teacher wanted to tell you what the square root of 2 actually is: a so-called irrational number that we get by taking the ‘one-half power’ of 2, i.e. 2^1/2= 2^0.5, or, what amounts to the same, the square root of 2. Of course, you and I are both used to irrational numbers now, like 2^1/2, but they are also ‘weird’. As weird as i. In fact, it is said that the Greek mathematician who claimed their existence was exiled, because these irrational numbers did not fit into the (early) Pythagorean school of thought. Indeed, that school of thought wanted to reduce geometry to whole numbers and their ratios only. So there was no place for irrational numbers there!]
Yes. It is ‘magical’. Associating e^iθ – so that’s a complex exponential function really! – with the unit circle is something you learnt much later in life only, if ever. It’s a strange thing indeed: we have a real (but, I admit, irrational) number here – e is 2.718 followed by an infinite number of decimals as you know, just like π – and then we raise to the power iθ, so that’s i once again multiplied by a real number θ (i.e. the so-called phase or – to put it simply – the angle). By now, we know what it means to multiply something with i, and–of course–we also know what exponentiation is (it’s just a shorthand for repeated multiplication), but we haven’t defined complex exponentials yet.

In fact… That’s what we’re going to do here. But in a rather ‘weird’ way as you will see: we won’t define them really but we’ll calculate them. For the moment, however, we’ll leave it at this and just note that, through Euler’s relation, we can see how a fraction or a multiple of i, e.g. 0.1i or 2.3i, corresponds to a fraction or a multiple of the angle associated with i, i.e. 0.1 times π/2 or 2.3 times π/2. In other words, Euler’s formula shows how the second (spatial) dimension is associated with the concept of the angle.

[…] And then the third (spatial) dimension is, of course, easy to add: it’s just an angle in another direction. What direction? Well… An angle away from the plane that we just formed by introducing that first angle. 🙂 […] So, from our zero point (here and now), we use a ruler to draw lines, and then a compass to measure angles away from that line, and then we create a plane, and then we can just add dimensions as we please by adding more ‘angles’ away from what we already have (a line, or a plane, and any higher-dimensional thing really).

Dimensions

I feel I need to digress briefly here, just to make sure we’re on the same page. Dimensions. What is a dimension in physics or in math? What do we mean if we say that spacetime is a four-dimensional continuum? From what we wrote above, the concept of a spatial dimension should be obvious: we have three dimensions in space (the x, y and z direction), and so we need three numbers indeed to describe the position of an object, from our point of view that is (i.e. in our reference frame).

But so we also have a fourth number: time. By now, you also know that, just like position and/or motion in space, time is relative too: that is relative to some frame of reference indeed. So, yes, we need four numbers, i.e. four dimensions, to describe an event in spacetime. That being said, time is obviously still something different (I mean different than space), despite the fact that Einstein’s relativity theory relates it to space: indeed, we showed in our post on (special) relativity that there’s no such thing as absolute time. However, that actually reinforces the point: a point in time is something fundamentally different than a point in space. Despite the fact that

Time is just like a space dimension in the physical-mathematical meaning of the term ‘dimension’ (a dimension of a space or an object is one of the coordinates that is needed to specify a point within that space, or to ‘locate’ the object – both in time and space that is); and that,
We can express distance and time in the same units because the speed of light is absolute (so that allows us to express time in meter, despite the fact that time is relative or “local”, as Hendrik Lorentz called it); and that, finally,
If we do that (i.e. if we express time and distance in equivalent units), the equations for space and time in the Lorentz transformation equations mirror each other nicely – ‘mixing’ the space and time variables in the same way, so to say – and, therefore, space and time do form a ‘kind of union’, as Minkowski famously said;

Despite all that, time and space are fundamentally different things. Perhaps not for God – because He (or She, or It?) is said to be Everywhere Always – but surely for us, humans. For us, humans, always busy constructing that mental space with our ruler and our compass, time is and remains the one and only truly independent variable. Indeed, for us, mortal beings, the clocks just tick (locally indeed – that’s why I am using a plural: clocks – but that doesn’t change the fact they’re ticking, and in one direction only).

And so things happen and equations such as the one we started with – i.e. the differential equation modeling the behavior of an oscillator – show us how they happen. In one of my previous posts, I also showed why the laws of physics do not allow us to reverse time, but I won’t talk about that here. Let’s get back to complex numbers. Indeed, I am only talking about dimensions here because, despite all I wrote above about the imaginary axis in the complex plane, the thing to note here is that we did not use complex numbers in the physical-mathematical problem above to bring in an extra spatial dimension.

We just did it because we could not solve the equation with one-dimensional numbers only: we needed to take the square root of a negative number and we couldn’t. That was it basically. So there was no intention of bringing in a y- or z-dimension, and we didn’t. If we would have wanted to do that, we would have had to insert another dependent variable in the differential equation, and so it would have become a so-called partial differential equation in two or three dependent variables (x, y and z), with time – once again – as the independent variable (t). [A differential equation in one variable only (real- or complex-valued), like the ones we’re used to now, are referred to as ordinary differential equations, as opposed to… no, not extraordinary, but partial differential equations.]

In fact, if we would have generalized to two- or three-dimensional space, we would have run into the same type of problem (roots of negative numbers) when trying to solve the partial differential equation and so we would have needed complex-valued variables to solve it analytically in this case too. So we would have three ‘dimensions’ but each ‘dimension’ would be associated with complex (i.e. ‘two-dimensional) numbers. Is this getting complicated? I guess so.

The point is that, when studying physics or math, we will have to get used to the fact that these ‘two-dimensional numbers’ which we introduced, i.e. complex numbers, are actually more ‘natural’ ‘numbers’ to work with from a purely analytic point of view (as for the meaning of ‘analytic’, just read it as ‘logical problem-solving’), especially when we write them in their polar form, i.e. as complex exponentials. We can then take advantage of that wonderful property that they already are a functional form (z =re^iθ), so to speak, and that their first, second etcetera derivative is easy to calculate because that ‘functional form’ is an exponential, and exponentials come back to themselves when taking the derivative (with the coefficient in the exponent in front). That makes the differential equation a simple algebraic equation (i.e. without derivatives involved), which is easy to solve.

In short, we should just look at complex numbers here (i.e. in the context of my three previous posts, or in the context of differential equations in general) as a computational device, not as an attempt to add an extra spatial dimension to the analysis.

Now, that’s probably the reason why Feynman inserts a chapter on ‘algebra’ that, at first, does not seem to make much sense. As usual, however, I worked through it and then found it to be both instructive as well as intriguing because it makes the point that complex exponentials are, first and foremost, an algebraic thing, not a geometrical thing.

I’ll try to present his argument here but don’t worry if you can’t or don’t want to follow it all the way through because… Well… It’s a bit ‘weird’ indeed, and I must admit I haven’t quite come to terms with it myself. On the other hand, if you’re ready for some thinking ‘outside of the box’, I assure you that I haven’t found anything like this in a math textbook or on the Web. This proves the fact that Feynman was a bit of a maverick… Well… In any case, I’ll let you judge. Now that you’re here, I would really encourage you to read the whole thing, as loooooooong as it is.

Complex exponentials from an algebraic point of view: introduction

Exponentiation is nothing but repeated multiplication. That’s easy to understand when the exponents are integers: a to the power n (aⁿ) is a×a×a×a×… etcetera – repeated n times, so we have n factors (all equal to a) in the product. That’s very straightforward.

Now, to understand rational exponents (so that’s an m/n exponent, with m and n integers), we just need to understand one thing more, and that is the inverse operation of exponentiation, i.e. the n^throot. We then get a^m/n= (a^m)^1/n. So, that’s easy too. […] Well… No. Not that easy. In fact, our problems starts right here:

If n is even, and a is a positive real number, we have two (real) n^throots a^1/n: ± a^1/n.
However, if a is negative (and n is still even obviously), then we have a problem. There’s no real n^throot of a in that case. That’s why Cardano invented i: we’ll associate an even root of a negative real number with two complex-valued roots.
What if n is uneven? Then we have only one real root: it’s positive when a is positive, and negative when a is negative. Done.

But let’s not complicate matters from the start. The point here is to do some algebra that should help us to understand complex exponentials. However, I will make one small digression, and that’s on logarithmic functions. It’s not essential but, again, useful. […] Well… Maybe. 🙂 I hope so. 🙂

We know that exponentials are actually associated with two inverse operations:

Given some value y and some number n, we can take the n^throot of y (y^1/n) to find the original base x for which y = xⁿ.
Given some value y and some number a, we can take the logarithm (to base a) of y to find the original exponent x for which y = a^x.

In the first case, the problem is: given n, find x for which y = xⁿ. In the second case, the problem is: given a, find x for which y = a^x. Is that complicated? Probably. In order to further confuse you, I’ve inserted a thumbnail graph with y = 2^x (so that’s the exponential function with base 2) and y = log₂x (so that’s the logarithmic function with base 2). You can see these two functions mirror each other, with the x = y line as the mirror axis.

We usually find logarithms more ‘difficult’ than roots (I do, for sure), but that’s just because we usually learn about them much later in life–like in a senior high school class, for example, as opposed to a junior high school class (I am just guessing, but you know what I mean).

In addition, we have these extra symbols ‘log‘–L-O-G :-)–to express the function. Indeed, we use just two symbols to write the y = 2^xfunction: 2 and x – and then the meaning is clear from where we write these: we write 2 in normal script and x as a superscript and so we know that’s exponentiation. But so we’re not so economical for the logarithmic function. Not at all. In fact, we use three symbols for the logarithmic function: (1) ‘log’ (which is quite verbose as a symbol in itself, because it consists of three letters), (2) 2 and (3) x. That’s not economical at all! Indeed, why don’t we just write y = ₂x or something? So that’s a subscript in front, instead of a superscript behind. It would work. It’s just a matter of getting used to it, i.e. it’s just a convention in other words.

Of course, I am joking a bit here but you get my point: in essence, the logarithmic function should not come across as being more ‘difficult’ or less ‘natural’ than the exponential function: exponentiation involves two numbers – a base and an exponent – and, hence, it’s logical that we have two inverse operations, rather than one. [You’ll say that a sum or a product involves (at least) two terms or two factors as well, so why don’t they have two inverse operations? Well… Addition and multiplication are commutative operations: a+b = b+a, and a·b = b·a. Exponentiation isn’t: aⁿ≠ n^a. That’s why. Check it: 2×3 = 3×2, but 2³= 8 ≠ 3²= 9.]

Now, apart from us ‘liking’ exponential functions more than logarithmic functions because of the non-relevant fact that we learned about log functions only much later in our life, we will usually also have a strong preference for one or the other base for an exponential. The most preferred base is, obviously, ten (10). We use that base in so-called scientific notations for numbers. For example: the elementary charge (i.e. the charge of an electron) is approximately –1.6×10⁻¹⁹ coulombs. […] Oh… We have a minus sign in the exponent here (–19). So what’s that? Sorry. I forgot to mention that. But it’s easy: a^–n= (aⁿ)^–1= 1/aⁿ.

Our most preferred base is 10 because we have a decimal system, and we have a decimal system because we have ten fingers. Indeed, the Maya used a base-20 system because they used their toes to count as well (so they counted in twenties instead of tens), and it also seems that some tribes had octal (base-8) systems because they used the spaces between their fingers, rather than the fingers themselves. And, of course, we all know that computers use a base-2 system because… Well… Because they’re computers. In any case, 10 is called the common base, because… Well… Because it’s common.

However, by now you know that, in physics and mathematics, we prefer that strange number e as a base. However, remember it’s not that strange: it’s just a number like π. Why do we call it ‘natural’? Because of that nice property: the derivative of the exponential function e^xcomes back to itself: d(e^x)/dt = e^x. That’s not the case for 10^x. In fact, taking the derivative of 10^xis pretty easy too: we just need to put a coefficient in front. To be specific, we need to put the logarithm (to base e) of the base of our exponential function (i.e. 10) in front: d(10^x)/dt = 10^xln(10). [Ln(10) is yet another notation that has been introduced, it seems, to confuse young kids and ensure they hate logarithms: ln(10) is just log_e(10) or, if I would have had my way in terms of conventions (which would ensure an ‘economic’ use of symbols), we could also write ln(10) = _e10. :-)]

Stop! I am going way too fast here. We first need to define what irrational powers are! Indeed, from all that I’ve written so far, you can imagine what a^m/nis (a^m/n = a^m)^1/n, but what if m is not an integer? What if m equals the square root of 2, for example? In other words, what is 10^xor e^xor 2^xor whatever for irrational exponents?

We all sort of ‘know’ what irrationals are: it involves limits, infinitesimals, fractions of fractions, Dedekind cuts. Whatever, even if you don’t understand a word of what I am writing here, you do – intuitively: irrationals can be approximated by fractions of fractions. The grand idea is that we divide some number by 2, and then we divide by 2 once again (so we divide by 4), and then once again (so we take 1/8), and again (1/16), and so on and so on. These are Dedekind cuts. Of course, dividing by two is a pretty random way of cutting things up. Why don’t we divide by three, or by four, for example? Well… It’s the same as with those other ‘natural’ numbers: we have to start somewhere and so this ‘binary’ way of cutting things up is probably the most ‘natural’. 🙂 [Have you noticed how many ‘natural’ numbers we’ve mentioned already: 10, e, π, 2… And one (1) itself of course. :-)]

So we’ll use something like Dedekind cuts for irrational powers as well. We’ll define them as a sort of limit (in fact, that’s exactly what they are) and so we have to find some approximation (or convergence) process that allows us to do so.

We’ll start with base 10 here because, as mentioned above, base 10 comes across as more ‘natural’ (or ‘common’) to us non-mathematicians than the so-called ‘natural’ base e. However, I should note that the base doesn’t matter much because it’s quite easy to switch from one base to another. Indeed, we can always write a^s= (b^k)^s= b^ks= b^twith a = b^kand t = k·s (as for k, k is obviously equal to log_b(a). From this simple formula, you can see that changing base amounts to changing the horizontal scale: we replace s by t = k·s. That’s it. So don’t worry about our choice of base. 🙂

Complex exponentials from an algebraic point of view: well… Not the introduction 🙂

Ouf! So much stuff! But so here we go. We take base 10 and see how such an approximation of an irrational power of 10 (10^x) looks like. Of course, we can write any irrational number x as some (positive or negative) integer plus an endless series of decimals after the zero (e.g. e = 2 + 0.7182818284590452… etc). So let’s just focus on numbers between 0 and 1 as for now (so we’ll take the integer out of the total, so to speak). In fact, before we start, I’ll cheat and show you the result, just to make sure you can follow the argument a bit.

Yes. That’s how 10^x looks like, but so we don’t know that yet because we don’t know what irrational powers are, and so we can’t make a graph like that–yet. We only know very general things right now, such as:

10⁰ = 1 and 10¹ = 10 etcetera.
Most importantly, we know that 10^m/n = (10^m)^1/n = (10^1/n)^mfor integer m and n.

In fact, we’ll use the second fact to calculate 10^x for x = 1/2, 1/4, 1/8, 1/16, and so on and so on. We’ll go all the way down to where x becomes a fraction very close to zero: that’s the table below. Note that the x values in the table are rational fractions 1/2, 1/4, 1/8 etcetera indeed, so x is not an irrational exponent: x is a real number but rational, so x can be expressed either as a fraction of two integers m and n (m = 1 and n = 1, 4, 8, 16, 32 and so on here), or as a decimal number with a finite number of decimals behind the decimal point (0.5, 0.25, 0.125, 0.0625 etcetera).

The third column gives the value 10^xfor these fractions x = 1/2, 1/4, 1/8 etcetera. How do we get these? Hmm… It’s true. I am jumping over another hurdle here. The key assumption behind the table is that we know how to take the square root of a number, so that we can calculate 10^1/2, to quite some precision indeed, as 10^1/2= 3.162278 (and there’s more decimals but we’re not too interested in them right now), and then that we can take the square root of that value (3.162278). That’s quite an assumption indeed.

However, if we don’t want this post to become a book in itself, then I must assume we can do that. In fact, I’ve done it with a calculator here but, before there were calculators, this kind of calculations could and had to be done with a table of logarithms. That’s because of a very convenient property of logarithms: log_c(ab) =log_c(a) + log_c(b). However, as said, I should be writing a post here only, not a book. [Already now, this post beats the record in terms of length and verbosity…] So I’ll just ask you to accept that – at this stage – we know how to calculate the square root of something and, therefore, to accept that we can take the square root not only of 10 but of any number really, including 3.162278, and then the root of that number, and then the root of that result, and so and so on. So that gives us the values in the third column of the table above: they’re successive square roots. [Please do double-check! It will help you to understand what I am writing about here.]

So… Back to the main story. What we are doing in the table above is to take the square root in succession, so that’s (10^1/2)^1/2= 10^1/4, and then again: (10^1/4)^1/2= 10^1/8 , and then again: (10^1/8)^1/2= 10^1/16 , so we get 10^1/2, 10^1/4, 10^1/8, 10^1/16, 10^1/32and so on and so on. All the way down. Well… Not all the way down. In fact, in the table above, we stop after ten iterations already, so that’s when x = 1/1024. [Note that 1/1024 is 2 to the power minus 10: 2^–10= 1/2¹⁰ = 1/1024. I am just throwing that in here because that little ‘fact’ will come in handy later.]

Why do we stop after ten iterations? Well… Actually, there’s no real good reason to stop at exactly ten iterations. We could have 15 iterations: then x would be 1/2¹⁵= 1/32768. Or 20 (x = 1/1048576). Or 39 (x = 1/too many digits to write down). Whatever. However, we start to notice something interesting that actually allows us to stop. We note that 10 to the power x (10^x) tends to one as x becomes very small.

Now you’re laughing. Well… Surely ! That’s what we’d expect, isn’t it? 10⁰= 1. Is that the grand conclusion?

No.

The question is how small should x be? That’s where the fourth column of the table above comes in. We’re calculating a number there that converges to some value quite near to 2.3 as x goes to zero and – importantly – it converges rather quickly. In fact, if you’d do the calculations yourself, you’d see that it converges to 2.302585 after a while. [With Excel or some similar application, you can do 20 or more iterations in no time, and so that’s what you’ll find.]

Of course, we can keep going and continue adding zillions of decimals to this number but we don’t want to do that: 2.302585 is fine. We don’t need any more decimals. Why? Well… We’re going to use this number to approximate 10^xnear x = 0: it turns out that we can get a real good approximation of 10^xnear x = 0 using that 2.302585 factor, so we can write that

10^x≈ 1 + 2.302585x

That approximation is the last column in the table above. In order to show you how good it is as an ‘approximation’, I’ve plotted the actual values for 10^x(blue markers) and the approximated values for 10^x(black markers) using that 1 + 2.302585x formula. You can see it’s a pretty good match indeed if x is small. And ‘small’ here is not that small: a ratio like x = 1/8 (i.e. x = 0.125) is good enough already! In fact, the graph below shows that 1/16 = 0.0625 is almost perfect! So we don’t need to ‘go down’ too far: ten iterations is plenty!

I’ve probably ‘lost’ you by now. What are we doing here really? How did we get that linear approximation formula, and why do we need it? Well… See the last column: we calculate (10^x–1)/x, so that’s the difference between 10^xand 1 divided by the (fractional) exponent x and we see, indeed, that that number converges to a value very near to 2.302585. Why? Well… What we are actually doing is calculating the gradient of 10^x, i.e. the slope of the tangent line to the (non-linear) 10^xcurve. That’s what’s shown in the graph below.

Working backwards, we can then re-write (10^x–1)/x ≈ 2.302585 as 10^x≈ 1 + 2.302585x indeed.

So what we’ve got here is quite standard: we know we can approximate a non-linear curve with a linear curve, using the gradient near the point that we’re observing (and so that’s near the point x = 0 in this case) and so that‘s what we’re doing here.

Of course, you should remember that we cannot actually plot a smooth curve like that, for the moment that is, because we can only calculate 10^xfor rational real numbers. However, it’s easy to generalize and just ‘fill the gaps’ so to speak, and so that’s how irrational powers are defined really.

Hmm… So what’s the next step? Well… The next step is not to continue and continue and continue and continue etcetera to show that the smooth curve above is, indeed, the graph of 10^x. No. The next step is to use that linear approximation to algebraically calculate the value of 10^is, so that’s a power of 10 with a complex exponent.

HUH!?

Yes. That’s the gem I found in Feynman’s 1965 Lectures. [Well… One of the gems, I should say. There are many. :-)]

It’s quite interesting. In his little chapter on ‘algebra’ (Lectures, I-22), Feynman just assumes that this ‘law’ that 10^x= 1 + 2.302585x is not only ‘correct’ for small real fractions x but also for very small complex fractions, and then he just reverses the procedure above to calculate 10^ixfor larger values of x. Let’s see how that goes.

However, let’s first switch the variable from x to s, because we’re talking complex numbers now. Indeed, I can’t use the symbol x as I used it above anymore because x is now the real part of some complex number 10^is. In addition, I should note that Feynman introduces this delta (Δ). The idea behind is to make things somewhat easier to read by relating s to an integer: Δ = 1024s, so Δ = 1, 2, 4, 8,… 1024 for s = 1/1024, 1/512, 1/256 etcetera (see the second column in the table below). I am not entirely sure why he does that: Feynman must think fractions are harder to ‘read’. [Frankly, the introduction of this Δ makes Feynman’s exposé somewhat harder to ‘read’ IMHO – but that’s just a matter of taste, I guess.] Of course, the approximation then becomes

10^x= 1 + 2.302585·Δ/1024 = 1 + 0.0022486Δ.

The table below is the one that Feynman uses. The important thing is that you understand the first line in this table: 10^i/1024= 1 + 0.00225i·Δ = 1 + 0.00225i·1 = 1 + 0.00225i. And then we go to the second line: 10^i/512 = 10^i/1024·10^i/1024 = 10^2i/1024 = 10^i/512, so we’re doing the reverse thing here: we don’t take square roots but we square what we’ve found already. So we multiply 1 + 0.00225i with itself and get (1+0.00225i)(1+0.00225i) = 1 + 2·0.00225i + 0.00225²i²= 1 – 0.000005 + 0.45i ≈ 0.999995 + 0.45i ≈ 1 + 0.0045i.

Let’s go to the third line now. In fact, what we’re doing here is working our way back up, i.e. all the way from s = 1/1024 to s = 1. And that’s where the ‘magic’ of i (i.e. the fact that i²= –1) is starting to show: (0.999995+0.0045i)² = 0.99999 + 2·0.999995·0.0045i + 0.0045²i²= 0.99997 + 0.009i. So the real part of 10^isis changing as well – it is decreasing in fact! Why is that? Because of the term with the i²factor! [I write 0.99997 instead of 0.99996 because I round up here, while Feynman consistently rounds down.]

So now the game is clear: we take larger and larger fractions s (i/512, i/256, i/128, etcetera), and calculate 10^isby squaring the previous result. After ten iterations, we get the grand result for s = i/1 = i:

10^is= –0.66928 + 0.74332i (more or less that is)

Note the minus sign in front of the real part, and look at the intermediate values for x and y too. Isn’t that remarkable?

OK. Waw ! But… So what? What’s next?

Well… To graph 10^is, we should not just keep squaring things because that amounts to doubling the exponent again and again and so that means the argument is just making larger and larger jumps along the positive real axis really (see that graph that I made above: the distance between the successive values of x gets larger and larger, and so that’s a bad recipe for a smooth graph).

So what can we do? Well… We should just take a sufficiently small power, i/8 for example, and multiply that with 1, 2, 3 etcetera so we get something more ‘regular’. That’s what’s done in the table below and what’s represented in the graph underneath (to get the scale of the horizontal axis, note that s = p/8).

Hey! Look at that! There we are! That’s the graph we were looking for: it shows a (complex) exponential (10^is) as a periodic (complex-valued) function with the real part behaving like a cosine function and the imaginary part behaving like as a sine function.

Note the upper and lower bounds: +1 and –1. Indeed, it doesn’t seem to matter whether we use 10 or e as a base: the x and y part oscillate between −1 and +1. So, whatever the base, we’ll see the same pattern: the base only changes the scale of the horizontal axis (i.e. s). However, that being said, because of this scale factor, I do need to say like a cosine/sine function when discussing that graph above. So I cannot say they are a cosine and a sine function. Feynman calls these functions algebraic sine and cosine functions.

But – remember! – we can always switch base through a clever substitution so 10^is= e^it and recalculate stuff to whatever number of decimals behind the decimal point we’d want. So let’s do that: let’s switch to base e. WOW! What happens?

We then [Finally! you’ll say!] get values that – Surprise ! Surprise ! – correspond to the real cosine and sine function. That then, in turn, allows us to just substitute the ‘algebraic’ cosine and sine function for the ‘real’ cosine in an equation that – Yes! – is Euler’s formula itself:

e^it= cos(t) + isin(t)

So that’s it. End of story.

[…]

You’ll say: So what? Well… Not sure what to say. I think this is rather remarkable. This is not the formal mathematical proof of Euler’s formula (at least not of the kind that you’ll find in a textbook or on Wikipedia). No, we are just calculating the values x and y of e^it= x + iy using an approximation process used to calculate real powers and then, well… Just some bold assumption involving infinitesimals really.

I think this is amazing stuff (even if I’ll downplay that statement a bit in my post scriptum). I really don’t understand these things the way I would like to understand them. I guess I just haven’t got the right kind of brain for these things. 😦 Indeed, just think about it: when we have the real exponential e^x, then we’ve got that typical ‘rocket’ graph (i.e. the blue one in the graph below): just something blasting away indeed. But when we put i in the exponent (e^ix), then we get two components oscillating up and down like the cosine and sine function. Well… Not only like the cosine and sine function: the green and red line– i.e. the real and imaginary part of e^ix!– actually are the cosine and sine function!

Do you understand this in an intuitive way? Yes? You do? Waw ! Please write me and tell me how. I don’t. 😦

Oh well… The good thing about it is… Well… At least complex numbers will always stay ‘magical’ to me. 🙂

Post scriptum: When I write, above, that I don’t understand this in an intuitive way, I don’t mean to say it’s not logical. In fact, it is. It has to be, of course, because we’re talking math here! 🙂

The logic is pretty clear indeed. We have an exponential function here (y = 10^x) and we’re evaluating that function in the neighborhood of x = 0 (we do it on the positive side only but we could, of course, do the same analysis on the other side as well). So then we use that very general mathematical procedure of calculating approximate values for the (non-linear) 10^xcurve using the gradient. So we plug in some differential value for x (in differential terms, we’d write Δx – but so the delta symbol here has nothing to do with Feynman’s Δ above) and, of course, we find Δy = 2.302585·Δx. So we add that to 1 (the value of 10^xat point x = 0) and, then, we go through these iterations, not using that linear equation any more, but the very fundamental property of an exponential function that 10^2x= (10^x)². So we start with an approximate value, but then the value we plug into these iterative calculations is the square of the previous value. So, to calculate the next points, we do not use an approximation method any more, but we just square the first result, and then the second and so on and so on, and that’s just calculation, not approximation.

[In fact, you may still wonder and think that it’s quite remarkable that the points we calculate using this process are so accurate, but that’s due to the rapid convergence of that value we found for the gradient. Well… Yes and no. Here I must admit that Feynman (and I) cheated a bit because we used a rather precise value for the gradient: 2.302585, so that’s six significant digits after the decimal point. Now, that value is actually calculated based on twenty (rather than 10) iterations when ‘going down’. But that little factoid is not embarrassing because it doesn’t change much: the argument itself is sound. Very sound.]

OK… That’s easy enough to understand. The thing that is not easy to understand – intuitively that is – is that we can just insert some complex differential Δs into that Δy = 2.302585·Δx equation. Isn’t it ‘weird’, indeed, that we can just use a complex fraction i·s = i/1024 to calculate our first point, instead of a real fraction x = 1/1024? It is. That’s the only thing really. Indeed, once we’ve done that, it’s plain sailing again: we just square the result to get the next result, and then we square that again, and so on and so on. However, that being said, the difference is that the ‘magic’ of i comes into play indeed. When squaring, we do not get a 4a²result but an (a+bi)²= a²– b²+ 2abi. So it’s that minus sign and the i that give an entirely different ‘dynamic’ to how the function evolves from there (i.e. different as compared to working with a real base only). It’s all quite remarkable really because we start off with a really tiny value b here: 0.00225 to be precise, so that’s (less than) 1/445 ! [Of course, the real part a, at the point from where we start doing these iterations, is one.]

But so that first step is ‘weird’ indeed. Why is it no problem whatsoever to insert the complex fraction s = i/1024 into 1 + 2.302585o·s, instead of the real fraction 1/1024, and then afterwards, to square these complex numbers that we’re getting, instead of real numbers?

It just doesn’t feel right, does it? I must admit that, at first, I felt that Feynman was doing something ‘illegal’ too. But, obviously, he’s not. It’s plain mathematical logic. We have two functions here: one is linear (y = 1 + 2.302585·x), and the other is quadratic (y = x²) and so what’s happening really is that, at the point x = 0, we change the function. We substitute not x for ix really but y = 10^xfor y = 10^ix. So we still have an independent real variable x but, instead of a real-valued y = 10^xfunction, we now have a complex-valued y = 10^ixfunction.

However, the ‘output’ of that function, of course, is a complex y, not a real y. In our case, because we’re plotting a function really–to be precise, we’re calculating the exponential function y = 10^xthrough all these iterations–we get a complex-valued function of the shape that, by now, we know so well.

So it is ‘discontinuous’ in a way, and so I can’t say all that much about it. Look at the graph below where, once again, we have the real exponential function e^x and then the two components of the complex exponential e^ix. This time, I’ve plotted them on both sides of the zero point because they’re continuous on both sides indeed. Imagine we’re walking along this blue e^x curve from some negative x to zero. We’re familiar with the path. It has, for instance, that property we exploited above: as we doubled the ‘input’ (so from x we went to 2x), the ‘output’ went up not as the double but as the square of the original value: e^2x= (e^x)². And then we also know that, around the point x = o, we can approximate it with a linear function. In fact, in this case, the linear approximation is super-simple: y = 1 + x. Indeed, the gradient for e^x at point x = 0 is equal to 1! So, yes, we know and understand that blue curve. But then we arrive at point x = 0 and we decide something radical: we change the function!

Yes. That’s what we’re really doing in that very lengthy story above: e^ix is a complex-valued function of the real variable x. That’s something different. However, we continue to say that the approximation y = 1 + x must also be valid for complex x and y. So we say that e^ix= 1 + ix. Is that wrong? No. Not at all. Functional forms are functional forms and gradients are gradients: d(e^ix)/dx = ie^ix, and ie^ix at x = 0 is equal to ie⁰ = i! Hence, e^ix= 1 + ix is a perfectly legitimate linear approximation. And then it’s just the same thing again: we use that iteration mechanism to calculate successive squares of complex numbers because, for complex exponentials as well, we have e^2(ix)= (e^ix)².

So. The ‘magic’ is a lot of ‘confusion’ really. The point to note is that we do have a different function here: e^ixand e^x‘look’ similar–it’s just that i, right?– but, in fact, when we replace x by ix in the exponent of e, that’s quite a radical change. We can use the same linear approximation at x = ix = 0 but then it’s over. Our blue graph stops: we’re no longer walking along it. I can’t even say it bifurcates, so to say, into the red and the green one, because it doesn’t. We’re talking apples and oranges indeed, and so the comparison is quickly done: they’re different. Full stop.

Is there any geometrical relationship between all these curves? Well… Yes and no. I can see one, at the very start: the gradient of our e^x function at x = 0 is equal to unity (i.e. 1), and so that’s the same gradient as the gradient of the imaginary part of our new e^ixfunction (the gradient of the real part is zero, before it becomes negative). But that’s just… I mean… That just comes out of Euler’s formula: e⁰= cos(0) + isin(0). Honestly, it’s no use to try to be smart here and think about stuff like that. We’re no longer walking on the blue curve. We’re looking at a new function: a complex-valued function e^ix (instead of a real-valued function e^x) of a real variable (x). That’s it. Just don’t try to relate the two too much: you switched functions. Full stop. It’s like changing trains! 🙂

So… What’s the conclusion? Well… I’d say: “Complex numbers can be analyzed as extensions of real numbers, so to say, but – frankly – they are different.”

[…]

I’ll probably never understand complex numbers in the way I would like to understand them–that is like I understand that one plus one is two. However, this rather lengthy forage in the complex forest has helped me somewhat. I hope it helped you too.

Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Differential equations revisited: the math behind oscillators

Pre-scriptum (dated 26 June 2020): This post – part of a series of rather simple posts on elementary math and physics – does not seem to have been targeted in the the attack by the dark force—which is good because I still like it. While my views on the true nature of light, matter and the force or forces that act on them have evolved significantly as part of my explorations of a more realist (classical) explanation of quantum mechanics, I think most (if not all) of the analysis in this post remains valid and fun to read. In fact, I would dare to say the whole Universe consists of oscillators!

Original post:

When wrapping up my previous post, I said that I might be tempted to write something about how to solve these differential equations. The math behind them is pretty essential indeed. So let’s revisit the oscillator from a formal-mathematical point of view.

Modeling the problem

The simplest equation we used was the one for a hypothetical ‘ideal’ oscillator without friction and without any external driving force. The equation for a mechanical oscillator (i.e. a mass on a spring) is md²x/dt² = –kx. The k in this equation is a factor of proportionality: the force pulling back is assumed to be proportional to the amount of stretch, and the minus sign is there because the force is pulling back indeed. As for the equation itself, it’s just Newton’s Law: the mass times the acceleration equals the force: ma = F.

You’ll remember we preferred to write this as d²x/dt² = –(k/m)x = –ω₀²x with ω₀²= k/m. You’ll also remember that ω₀is an angular frequency, which we referred to as the natural frequency of the oscillator (because it determines the natural motion of the spring indeed). We also gave the general solution to the differential equation: x(t) = x₀cos(ω₀t + Δ). That solution basically states that, if we just let go of that spring, it will oscillate with frequency ω₀ and some (maximum) amplitude x₀, the value of which depends on the initial conditions. As for the Δ term, that’s just a phase shift depending on where x is when we start counting time: if x would happen to pass through the equilibrium point at time t = 0, then Δ would be π/2. So Δ allows us to shift the beginning of time, so to speak.

In my previous posts, I just presented that general equation as a fait accompli, noting that a cosine (or sine) function does indeed have that ‘nice’ property of come back to itself with a minus sign in front after taking the derivative two times: d²[cos(ω₀t)]/dt² = –ω₀²cos(ω₀t). We could also write x(t) as a sine function because the sine and cosine function are basically the same except for a phase shift: x₀cos(ω₀t + Δ) = x₀sin(ω₀t + Δ + π/2).

Now, the point to note is that the sine or cosine function actually has two properties that are ‘nice’ (read ‘essential’ in the context of this discussion):

Sinusoidal functions are periodic functions and so that’s why they represent an oscillation–because that’s something periodic too!
Sinusoidal functions come back to themselves when we derive them two times and so that’s why it effectively solves our second-order differential equation.

However, in my previous post, I also mentioned in passing that sinusoidal functions share that second property with exponential functions: d²e^t/dt²= d[de^t/dt]/dt = de^t/dt = e^t. So, if it we would not have had that minus sign in our differential equation, our solution would have been some exponential function, instead of a sine or a cosine function. So what’s going on here?

Solving differential equations using exponentials

Let’s scrap that minus sign and assume our problem would indeed be to solve the d²x/dt² = ω₀²x equation. So we know we should use some exponential function, but we have that coefficient ω₀². Well… That’s actually easy to deal with: we know that, when deriving an exponential function, we should bring the exponent down as a coefficient: d[e^ω₀t]/dt = ω₀e^ω₀t. If we do it two times, we get d²[e^ω₀t]/dt² = ω₀²e^ω₀t, so we can immediately see that e^ω₀tis a solution indeed.

But it’s not the only one: e^–ω₀t is a solution too: d²[e^–ω₀t]/dt² = (–ω₀)(–ω₀)e^–ω₀t = ω₀²e^–ω₀t. So e^–ω₀tsolves the equation too. It is easy to see why: ω₀²has two square roots–one positive, and one negative.

But we have more: in fact, every linear combination c₁e^ω₀t+ c₂e^–ω₀tis also a solution to that second-order differential equation. Just check it by writing it all out: you’ll find that d²[c₁e^ω₀t+ c₂e^–ω₀t]/dt² = ω₀²[c₁e^ω₀t+c₂e^–ω₀t] and so, yes, we have a whole family of functions here, that are all solutions to our differential equation.

Now, you may or may not remember that we had the same thing with first-order differential equations: we would find a whole family of functions, but only one would be the actual solution or the ‘real‘ solution I should say. So what’s the real solution here?

Well… That depends on the initial conditions: we need to know the value of x at time t₀= 0 (or some other point t = t₁). And that’s not enough: we have two coefficients (c₁and c₂), and, therefore, we need one more initial condition (it takes two equations to solve for two variables). That could be another value for x at some other point in time (e.g. t₂) but, when solving problems like this, you’ll usually get the other ‘initial condition’ expressed in terms of the first derivative, so that’s in terms of dx/dt = v. For example, it is not illogical to assume that the initial velocity v₀ would be zero. Indeed, we can imagine we pull or push the spring and then let it go. In fact, that’s what we’ve been assuming here all along in our example! Assuming that v₀ = 0 is equivalent to writing that

d[c₁e^ω₀t+ c₂e^–ω₀t]/dt = 0 for t = 0

⇒ ω₀c₁– ω₀c₂= 0 (e⁰= 1) ⇔ c₁= c₂

Now we need the other initial condition. Let’s assume the initial value of x is equal to x₀= 2 (it’s just an example: we could take any value, including negative values). Then we get:

c₁e^ω₀t+ c₂e^–ω₀t = 2 for t = 0 ⇔ c₁+ c₂= 2 (again, note that e⁰= 1)

Combining the two gives us the grand result that c₁= c₂= 1 and, hence, the ‘real’ or actual solution is x = e^ω₀t+ e^–ω₀t. The graph below plots that function for ω₀= 1 and ω₀= 0.5 respectively. We could take other values for ω₀ but, whatever the value, we’ll always get an exponential function like the ones below. It basically graphs what we expect to happen: the mass just accelerates away from its equilibrium point. Indeed, the differential equation is just a description of an accelerating object. Indeed, the e^–ω₀t term quickly goes to zero, and then it’s the e^ω₀tterm that rockets that object sky-high – literally. [Note that the acceleration is actually not constant: the force is equal to kx and, hence, the force (and, therefore, the acceleration) actually increases as the mass goes further and further away from its equilibrium point. Also note that if the initial position would have been minus 2, i.e. x₀= –2, then the object would accelerate away in the other direction, i.e. downwards. Just check it to make sure you understand the equations.]

The point to note is our general solution. More formally, and more generally, we get it as follows:

If we have a linear second-order differential equation ax” + bx’ + cx = 0 (because of the zero on the right-hand side, we call such equation homogeneous, so it’s quite a mouthful: a linear and homogeneous DE of the second order), then we can find an exponential function e^rtthat will be a solution for it.
If such function is a solution, then plugging in it yields ar²e^rt+ bre^rt + ce^rt = 0 or (ar²+ br + c)e^rt = 0.
Now, we can read that as a condition, and the condition amounts to ar²+ br + c = 0. So that’s a quadratic equation we need to solve for r to find two specific solutions r₁and r₂, which, in turn, will then yield our general solution:

x(t) = c₁e^r₁t+ c₂e^r₂t

Note that the general solution is based on the principle of superposition: any linear combination of two specific solutions will be a solution as well. I am mentioning this here because we’ll use that principle more than once.

Complex roots

The steps as described above implicitly assume that the quadratic equation above (i.e. ar²e^rt+ bre^rt + ce^rt = 0), which is better known as the characteristic equation, does yield two real and distinct roots r₁and r₂. In fact, it amounts to assuming that that exponential e^rtis a real-valued exponential function. We know how to find these real roots from our high school math classes: r = (–b ± [b²– 4ac]^1/2)/2a. However, what happens if the discrimant b²– 4ac is negative?

If the disciminant is negative, we will still have two roots, but they will be complex roots. In fact, we can write these two complex roots as r = α ± βi, with i the imaginary unit. Hence, the two complex roots are each other’s complex conjugate and our e^r₁tand e^r₂t can be written as:

e^r₁t= e^(α+βi)t and e^r₂t= e^(α–βi)t

Also, the general solution based on these two particular solutions will be c₁e^(α+βi)t+ c₂e^(α–βi)t.

[You may wonder why complex roots have to be complex conjugates from each other. Indeed, that’s not so obvious from the raw r = (–b ± [b²– 4ac]^1/2)/2a formula. But you can re-write it as r = –b/2a ± [b²– 4ac]^1/2)/2a and, if b²– 4ac is negative, as r = –b/2a ± i·[(−b²+4ac)^1/2/2a]. So that gives you the α and β and shows that the two roots are, in effect, each other’s complex conjugate.]

We should briefly pause here to think about what we are doing here really: if we allow r to be complex, then what we’re doing really is allow a complex-valued function (to be precise: we’re talking the complex exponential functions e^(λ±μi)t, or any linear combination of the two) of a real variable (the time variable t) to be part of our ‘solution set’ as well.

Now, we’ve analyzed complex exponential functions before–long time ago: you can check out some of my posts last year (November 2013). In fact, we analyzed even more complex – in fact, I should say more complicated rather than more complex here: complex numbers don’t need to be complicated! 🙂 – because we were talking complex-valued functions of complex variables there! That’s not the case here: the argument t (i.e. the input into our function) is real, not complex, but the output – or the function itself – is complex-valued. Now, any complex exponential e^(α+βi)t can be written as e^αte^iβt, and so that’s easy enough to understand:

1. The first factor (i.e. e^αt) is just a real-valued exponential function and so we should be familiar with that. Depending on the value of α (negative or positive: see the graph below), it’s a factor that will create an envelope for our function. Indeed, when α is negative, the damping will cause the oscillation to stop after a while. When α is positive, we’ll have a solution resembling the second graph below: we have an amplitude that’s getting bigger and bigger, despite the friction factor (that’s obviously possible only because we keep reinforcing the movement, so we’re not switching off the force in that case). When α is equal to zero, then e^αt is equal to unity and so the amplitude will not change as the spring goes up and down over time: we have no friction in that case.

2. The second factor (i.e. e^iβt) is our periodic function. Indeed, e^iβt is the same as e^iθand so just remember Euler’s formula to see what it is really:

e^iθ= cos(θ) + isin(θ)

The two graphs below represent the idea: as the phase θ = ωt + Δ (the angular frequency or velocity times the time is equal to the phase, plus or minus some phase shift) goes round and round and round (i.e. increases with time), the two components of e^iθ, i.e. the real and imaginary part e^iθ, oscillate between –1 and 1 because they are both sinusoidal functions (cosine and sine respectively). Now, we could amplify the amplitude by putting another (real) factor in front (a magnitude different than 1) and write re^iθ= r·cos(θ) + i·r·sin(θ) but that wouldn’t change the nature of this thing.

But so how does all of this relate to that other ‘general’ solution which we’ve found for our oscillator, i.e. the one we got without considering these complex-valued exponential functions as solutions. Indeed, what’s the relation between that x = x₀cos(ω₀t + Δ) equation and that rather frightening c₁e^(α+βi)t+ c₂e^(α–βi)t equation? Perhaps we should look at x = x₀cos(ω₀t + Δ) as the real part of that monster? Yes and no. More no than yes actually. Actually… No. We are not going to have some complex exponential and then forget about the imaginary part. What we will do, though, is to find that general solution – i.e. a family of complex-valued functions – but then we’ll only consider those functions for which the imaginary part is zero, so that’s the subset of real-valued functions only.

I guess this must sound like Chinese. Let’s go step by step.

Using complex roots to find real-valued functions

If we re-write d²x/dt² = –ω₀²x in the more general ax” + bx’ + cx = 0 form, then we get x” + ω₀²x = 0 and so the discriminant b²– 4ac is equal to –4ω₀², and so that’s a negative number. So we need to go for these complex roots. However, before solving this, let’s first restate what we’re actually doing. We have a differential equation that, ultimately, depends on a real variable (the time variable t), but so now we allow complex-valued functions e^r₁t= e^(α+βi)t and e^r₂t= e^(α–βi)t as solutions. To be precise: these are complex-valued functions x of the real variable t.

That being said, it’s fine to note that real numbers are a subset of the complex numbers and so we can just shrug our shoulders and say all that we’re doing is switch to complex-valued functions because we got stuck with that negative determinant and so we had to allow for complex roots. However, in the end, we do want a real-valued solution x(t). So our x(t) = c₁e^(α+βi)t+ c₂e^(α–βi)t has to be a real-valued function, not a complex-valued function.

That means that we have to take a subset of the family of functions that we’ve found. In other words, the imaginary part of c₁e^(α+βi)t+ c₂e^(α–βi)t has to be zero. How can it be zero? Well… It basically means that c₁e^(α+βi)tand c₂e^(α–βi)t have to be complex conjugates.

OK… But how do we do that? We need to find a way to write that c₁e^(α+βi)t+ c₂e^(α–βi)tsum in a more manageable ζ + i·η form. We can do that by using Euler’s formula once again to re-write those two complex exponentials as follows:

e^(α+βi)t = e^αte^iβt = e^αt[cos(βt) + isin(βt)]
e^(α–βi)t = e^αte^–iβt = e^αt[cos(–βt) + isin(–βt)] = e^αt[cos(βt) – isin(βt)]

Note that, for the e^(α–βi)t expression, we’ve used the fact that cos(–θ) = cos(θ) and that sin(–θ) = –sin(θ). Also note that α and β are real numbers, so they do not have an imaginary part–unlike c₁and c₂, which may or may not have an imaginary part (i.e. they could be pure real numbers, but they could be complex as well).

We can then re-write that c₁e^(α+βi)t+ c₂e^(α–βi)t sum as:

c₁e^(α+βi)t+ c₂e^(α–βi)t = c₁e^αt[cos(βt) + isin(βt)] + c₂e^αt[cos(βt) – isin(βt)]

= (c₁ + c₂)e^αtcos(βt) + (c₁ – c₂)ie^αtsin(βt)

So what? Well, we want that imaginary part in our solution to disappear and so it’s easy to see that the imaginary part will indeed disappear if c₁ – c₂ = 0, i.e. if c₁= c₂= c. So we have a fairly general real-valued solution x(t) = 2c·e^αtcos(βt) here, with c some real number. [Note that c has to be some real number because, if we would assume that c₁and c₂(and, therefore, c) would be equal complex numbers, then the c₁ – c₂ factor would also disappear, but then we would have a complex c₁ + c₂ sum in front of the e^αtcos(βt) factor, so that would defeat the purpose of finding real-valued function as a solution because (c₁ + c₂)e^αtcos(βt) would still be complex! […] Are you still with me? :-)]

So, OK, we’ve got the solution and so that should be it, isn’t it? Well… No. Wait. Not yet. Because these coefficients c₁ and c₂ may be complex, there’s another solution as well. Look at that formula above. Let us suppose that c₁ would be equal to some (real) number c divided by i (so c₁= c/i), and that c₂would be its opposite, so c₂= –c₁(i.e. minus c₁). Then we would have two complex numbers consisting of an imaginary part only: c₁= c/i and c₂= –c₁= –c/i, and they would be each other’s complex conjugate. Indeed, note that 1/i = i^–1= –i and so we can write c₁= –c·i and c₂= c·i. Then we’d get the following for that c₁e^(α+βi)t+ c₂e^(α–βi)t sum:

(c₁ + c₂)e^αtcos(βt) + (c₁ – c₂)ie^αtsin(βt)

= (c/i – c/i)e^αtcos(βt) + (c/i + c/i)ie^αtsin(βt) = 2c·e^αtsin(βt)

So, while c₁and c₂ are complex, our grand result is a real-valued function once again or – to be precise – another family of real-valued functions (that’s because c can take on any value).

Are we done? Yes. There are no other possibilities. So now we just need to remember to apply the principle of superposition: any (real) linear combination of 2c·e^αtcos(μt) and 2c·e^αtsin(μt) will also be a (real-valued) solution, so the general (real-valued) solution for our problem is:

x(t) = a·2c·e^αtcos(βt) + b·2c·e^αtsin(βt) = Ae^αtcos(βt) + Be^αtsin(βt)

= e^αt[Acos(βt) + Bsin(βt)]

So what do we have here? Well, the first factor is, once again, an ‘envelope’ function: depending on the value of α, (i) negative, (ii) positive or (iii) zero, we have an oscillation that (i) damps out, (ii) goes out of control, or (iii) keeps oscillating in the same steady way forever.

The second part is equivalent to our ‘general’ x(t) = x₀cos(ω₀t + Δ) solution. Indeed, that x(t) = x₀cos(ω₀t + Δ) solution is somewhat less ‘general’ than the one above because it does not have the e^αt factor. However, x(t) = x₀cos(ω₀t + Δ) solution is equivalent to the Acos(βt) + Bsin(βt) factor. How’s that? We can show how they are related by using the trigonometric formula for adding angles: cos(α + β) = cos(α)cos(β) – sin(α)sin(β). Indeed, we can write:

x₀cos(ω₀t + Δ) = x₀cos(Δ)cos(ω₀t) – x₀sin(Δ)sin(ω₀t) = Acos(βt) + Bsin(βt)

with A = x₀cos(Δ), B = – x₀sin(Δ) and, finally, μ = ω₀

Are you convinced now? If not… Well… Nothing much I can do, I feel. In that case, I can only encourage you to do a full ‘work-out’ by reading the excellent overview of all possible situations in Paul’s Online MathNotes (tutorial.math.lamar.edu/Classes/DE/Vibrations.aspx).

Feynman’s treatment of second-order differential equations

Feynman takes a somewhat different approach in his Lectures. He solves them in a much more general way. At first, I thought his treatment was too confusing and, hence, I would not have mentioned it. However, I like the logic behind, even if his approach is somewhat more messy in terms of notations and all that. Let’s first look at the differential equation once again. Let’s take a system with a friction factor that’s proportional to the speed: F_f = –c·dx/dt. [See my previous post for some comments on that assumption: the assumption is, generally speaking, too much of a simplification but it makes for a ‘nice’ linear equation and so that’s why physicists present it that way.] To ease the math, c is usually written as c = mγ. Hence, γ = c/m is the friction per unit of mass. That makes sense, I’d think. In addition, we need to remember that ω₀²= k/m, so k = mω₀². Our differential equation then becomes m·d²x/dt² = –γm·dx/dt – kx (mass times acceleration is the sum of the forces) or m·d²x/dt² + γm·dx/dt + mω₀²·x = 0. Dividing the mass factor away gives us an even simpler form:

d²x/dt² + γdx/dt + ω₀²x = 0

You’ll remember this differential equation from the previous post: we used it to calculate the (stored) energy and the Q of a mechanical oscillator. However, we didn’t show you how. You now understand why: the stuff above is not easy–the length of the arguments involved is why I am devoting an entire post to it!

Now, instead of assuming some exponential e^rtas a solution, real- or complex-valued, Feynman assumes a much more general complex-valued function as solution: he substitutes x for x = Ae^iαt, with A a complex number as well so we can write A as A = A₀e^iΔ. That more general assumption allows for the inclusion of a phase shift straight from the start. Indeed, we can write x as x = A₀e^iΔe^iαt= = A₀e^i(αt+Δ). Does that look complicated? It probably does, because we also have to remember that α is a complex number! So we’ve got a very general complex-valued exponential function indeed here!

However, let’s not get ahead of ourselves and follow Feynman. So he plugs in that complex-valued x = Ae^iαt and we get:

(–α²+ iγα + ω₀²)Ae^iαt = 0

So far, so good. The logic now is more or less the same as the logic we developed above. We’ve got two factors here: (1) a quadratic equation –α²+ iγα + ω₀² (with one complex coefficient iγ) and (2) a complex exponential function Ae^iαt. The second factor (Ae^iαt) cannot be zero, because that’s x and we assume our oscillator is not standing still. So it’s the first factor (i.e. the quadratic equation in α with a complex coefficient iγ) which has to be zero. So we solve for the roots α and find

α = –iγ/(–2) ± i·[(–(iγ)²–4ω₀²)^1/2/(-2)] = iγ/2 ± i·[(γ²–4ω₀²)^1/2/(-2)]

= iγ/2 ± (ω₀²– γ²/4)^1/2= iγ/2 ± ω_γ

[We get this by bringing i and –2 inside of the square root expression. It’s not very straightforward but you should be able to figure it out.]

So that’s an interesting expression: the imaginary part of α is iγ/2 and its real part is (ω₀²– γ²/4)^1/2, which we denoted as ω_γ in the expression above. [Note that we assume there’s no problem with the square root expression: γ²/4 should be smaller than ω₀² so ω_γis supposed to be some real positive number.] And so we’ve got the two solutions x₁and x₂:

x₁= Ae^{i(iγ/2 + ω_γ)t} = Ae^{–γt/2+iω_γt}= Ae^–γt/2e^iω_γt

x₂= Be^{i(iγ/2 – ω_γ)t} = Be^{–γt/2–iω_γt}= Be^–γt/2e^–iω_γt

Note, once again, that A and B can be any (complex) number and that, because of the principle of superposition, any linear combination of these two solutions will also be a solution. So the general solution is

x= Ae^–γt/2e^iω_γt+ Be^–γt/2e^–iω_γt= e^–γt/2(Ae^iω_γt+ Be^–iω_γt)

Now, we recognize the shape of this: a (real-valued) envelope function e^–γt/2 and then a linear combination of two exponentials. But so we want something real-valued in the end so, once again, we need to impose the condition that Ae^iω_γtand Be^–iω_γtare complex conjugates of each other. Now, we can see that e^iω_γtand e^–iω_γtare complex conjugates but what does this say about A and B? Well… The complex conjugate of a product is the product of the complex conjugates of the factors involved: (z₁z₂)* = (z₁*)(z₁*). That implies that B has to be the complex conjugate of A: B = A*. So the final (real-valued) solution becomes:

x= e^–γt/2(Ae^iω_γt+ A*e^–iω_γt)

Now, I’ll leave it to you to prove that the second factor in the product above (Ae^iω_γt+ A*e^–iω_γt) is a real-valued function of the real variable t. It should be the same as x₀cos(Δ)cos(ω₀t) – x₀sin(Δ)sin(ω₀t), and that gives you a graph like the one below. However, I can readily imagine that, by now, you’re just thinking: Oh well… Whatever! 🙂

So the difference between Feynman’s approach and the one I presented above (which is the one you’ll find in most textbooks) is the assumption in terms of the specific solution: instead of substituting x for e^rt, with allowing r to take on complex values, Feynman substitutes x for Ae^iαt, and allows both A and α to take on complex values. It makes the calculations more complicated but, when everything is said and done, I think Feynman’s approach is more consistent because more encompassing. However, that’s subject to taste, and I gather, from comments on the Web, that many people think that this chapter in Feynman’s Lectures is not his best. So… Well… I’ll leave it to you to make the final judgment.

Note: The one critique that is relevant, in regard to Feynman’s treatment of the matter, is that he devotes quite a bit of time and space to explain how these oscillatory or periodic displacements can be viewed as being the real part of a complex exponential. Indeed, cos(ωt) is the real part of e^iωt. But so that’s something different than (1) expanding the realm of possible solutions to a second-order differential equation from real-valued functions to complex-valued functions in order to (2) then, once we’ve found the general solution, consider only real-valued functions once again as ‘allowable’ solutions to that equation. I think that’s the gist of the matter really. It took me a while to fully ‘get’ this. I hope this post helps you to understand it somewhat quicker than I did. 🙂

Conclusion

I guess the only thing that I should do now is to work some examples. However, I’ll refer you Paul’s Online Math Notes for that once again (see the reference above). Indeed, it is about time I end my rather lengthy exposé (three posts on the same topic!) on oscillators and resonance. I hope you enjoyed it, although I can readily imagine that it’s hard to appreciate the math involved.

It is not easy indeed: I actually struggled with it, despite the fact that I think I understand complex analysis somewhat. However, the good thing is that, once we’re through it, we can really solve a lot of problems. As Feynman notes: “Linear (differential) equations are so important that perhaps fifty percent of the time we are solving linear equations in physics and engineering.” So, bearing in that mind, we should move on to the next.

Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

The electric oscillator

Pre-scriptum (dated 26 June 2020): This post – part of a series of rather simple posts on elementary math and physics – has suffered only a little bit from the attack by the dark force—which is good because I still like it. One illustration seems to have removed because of perceived ‘unfair use’, but you will be able to google equivalent stuff. While my views on the true nature of light, matter and the force or forces that act on them have evolved significantly as part of my explorations of a more realist (classical) explanation of quantum mechanics, I think most (if not all) of the analysis in this post remains valid and fun to read. In fact, I would dare to say the whole Universe is all about resonance phenomena!

Original post:

My previous post was too short to do justice to the topic (resonance phenomena). That’s why I’ll approach the topic using the relatively easy example of an electric oscillator. In addition, in this post I’ll also talk about the Q of an oscillator and the concept of a transient.

[…] Oh… Well… I admit there’s no real reason to write this post. It’s not essential – or not as essential as understanding something about complex numbers, for example. In fact, I admit the reason for writing this post is entirely emotional: my father was a rather distant figure, and we never got along, I guess, although he did patch things up near the end of his life–but I realize now, at the age of 45 (so that’s the age I associate with him), that we have a lot in common, including this desire to catch up with things physical and mathematical. He would not have been able to read a lot of what I am writing about in this blog, because he had gone to school only until 18 and, hence, differential equations and complex numbers must have frightened him more than they frighten me. In fact, even then, he actually might have understood something about differential equations, and perhaps something about complex numbers too. I don’t know. I should try to find the books he read. In any case, he surely did not have much of a clue about relativity theory or so. That being said, he sure knew a lot more about electric circuits than I ever will, and I guess that’s the real reason why I want to do a post on the electric oscillator here.

My father knew everything about electric motors, for example. Single-phase, split-phase, three-phase; synchronous or asynchronous; with two, four, six or eight poles; wound rotors or squirrel-cage rotors; centrifugal switches, capacitors… Electric motors (and engines in general) had no secrets for him. While I would understand the basic principle of the electric motor (he actually helped me build a little one – just using copper wire, a horsehoe magnet and a huge nail, and a piece of iron – to demonstrate in school), I had difficulty with the huge number of wires coming out of these things. [We had plenty of motors, because my father would bring old washing machines home to get the parts out.] Part of the problem was that he would never take the time to explain me how the capacitor that one needs to start a single-phase motor actually works.

Now I know, because I looked it up: single-phase electric (induction) motors have an auxiliary winding because they do not need have a starting torque. The magnetic field does not rotate: it just pulsates between 0 and 180 degrees and, hence, the rotor doesn’t know in which direction to go and, hence, if there’s no fuse to protect it, the wiring will start burning. [To explain why the wiring does not get (too) hot when it’s rotating is another story–which I won’t tell here because it involves changing electric and magnetic fields and so that’s a bit more complicated.] So now I have a bit more of an inkling of why there’s so many wires coming out of a simple (single-phase) electric motor:

We have wires coming from the rotor (or, to be precise, from the carbon brushes). [Not always though: a lot of those old electric motors had so-called squirrel-cage rotors, instead of wound rotors.]
We have wires going to the ‘run’ or ‘main’ winding in the stator (i.e. the stationary part of the motor).
We have wires going to the ‘start’ or ‘auxiliary’ winding. In fact, with single phase, the ‘run’ and ‘start’ winding will share one common ‘end’ and so there will be three wires only: usually black, brown and blue in Europe and, to make things complicated, the same wires will usually be red, yellow and black in the US. 🙂
We have wires coming from the capacitor and, most probably, also from some fuse somewhere, and then there’s a centrifugal switch to switch the auxiliary winding off once the motor is running, so that’s one or two more wires.
And then we also need to control the speed of the motor and so that implies even more wires and little control boxes.

Phew! Things become complicated here. The primitive way to change the speed of single-phase motor is to change the number of poles. How? Well… We can separate the windings and/or by placing taps in-between. In short, more wires. A motor with two poles only will run at 3000 rpm when supplies with 50 Hz power, but we can also have 4, 6, 8 and more poles. More poles, means slower velocity. For example, if we switch to 10 poles, then the motor will run at 600 rpm (yes, 10/2 = 3000/600 = 5, so it’s the same factor). However, changing the number of poles while the motor is running is rather impractical so, in practice, speed control is done through a device referred to as a variable frequency drive (VFD). But so my father would just cut the wires and we’d end up with a motor running at one speed only–not very handy because these things spin incredibly fast–and with too many wires.

I have to admire him for making sense of all those wires. He would do so by measuring the resistance off all the circuits. So he’d just pick two wires and measure the resistance from one end to the other. For example, the main winding has less resistance–typically less than 5 Ω (Ohm)–than the auxiliary winding–typically 10 to 20 Ω (Ohm). Why? The wiring used to run the motor will typically be thicker and, hence, offer less resistance. With a bit of knowledge like that, he’d figure out the wiring in no time, while I would just sit and stare and wonder how he did it.

In any case, let me explain here what I would have liked my father to explain to me, and that’s the components of an electric circuit, and how an electric oscillator works–more or less at least.

The electric oscillator

In an electric circuit, we can have passive and active elements. An example of an active element would be a generator. That’s not passive. So what’s passive?

First, we have a resistor. A resistor is any piece of some substance through we have some current flowing and which offers resistance to that flow of electric current. What does that mean? The resistance (denoted by the symbol R) will determine the flow of current (I) through the circuit as a function of the potential difference (i.e. the voltage) V across. In fact, the resistance is defined as the factor of proportionality between V and I. So that’s Ohm’s Law really:

V = RI = R(dq/dt)

As for the current (I) being equal to I = dq/dt, that’s the definition of electric current itself: a current transports electric charge through a wire, so we can measure the current at any point in the electric current as the time-rate of change dq/dt. Current is Coulomb per second, i.e. in amperes. One ampere amounts to 6.241×10¹⁸ unit charges (electrons), i.e. one Coulomb passing through the wire per second, so 1 A = 1 C/s.

As for voltage, we’ve encountered that in previous posts. It’s a difference in potential indeed. Potential is that scalar number Φ which we associated with the potential energy U of a particle with charge q: Φ = U/q. So it’s like the potential energy of the unit charge, and we calculated by using the electric field vector to calculate the amount of work we needed to do to bring a unit charge to some point r: Φ(r) = –∫E·ds (the minus sign is there because we’re doing work against the electromagnetic force). We’ve actually calculated the difference in potential, or the voltage (difference) for something that’s called a capacitor: two parallel plates with a lack of electrons on one, and too many on the other (see below). As a result, there’s a strong electric field between both, and a difference in potential, and we’ve calculated the voltage as V = ΔΦ = σd/ε₀ = qd/ε₀A, with the d the plate separation (distance between the two plates), σ the (surface) charge per unit area, ε₀ the electric constant and A the area of the plates. So it’s like a battery… For now at least–I’ll correct this statement later.

If we connect the two plates with a wire, i.e. a conductor, then we’ll have a current. Increasing the resistance of the circuit, by putting a resistor in, for example, will reduce the current and, hence, save the battery life somewhat. Of course, the resistor could be something that actually does work for us, a lamp, for example, or an electric motor.

Let me now correct that statement about a capacitor being like a battery. That statement is true and not true–but I must immediately add that it’s much more not true than true. 🙂 A battery is an active circuit element: it generates a voltage because of a chemical reaction that drives electrons through the circuit, and it will continue to provide power until all the reagents have been used up and the chemical reaction stops. In contrast, a capacitor is not active. There is a voltage only because charge has been stored on it (or, to be precise, because charges have been separated on it). Hence, when you connect the capacitor to a passive circuit, the current will only flow until all of the charge has been drained. So there’s no active element. Also, unlike a battery, the voltage on a capacitor is variable: it’s proportional to the amount of charge stored on it.

OK. So we’ve got a resistor, a capacitor and a voltage source, e.g. a battery but, because we want to look at resonance phenomena, we’ll not have a battery but a voltage source that drives the circuit with a nice sine wave oscillation. Why a sine wave? Well… First, it makes the mathematical analysis easier (we’ll have second-order differential equations again and so d²cosx/dt² = –cosx and so that’s nice). Second, the AC current that comes into our houses is a nice sine wave indeed. So let’s put it all together now, including our AC generator (instead of a battery). The circuit can then be represented as follows:

In this circuit, the charge q on the capacitor is analogous to the displacement x of the mass on that oscillating spring we analyzed in the previous post. Likewise:

I = dq/dt is analogous to the velocity v = dx/dt
The resistance R is analogous to the resistive coefficient γ
From our formula V = ΔΦ = σd/ε₀, it is easy to see that V is proportional to the charge q: V = q/C, with 1/C the factor of proportionality, aka as the capacitance of the capacitor. In other words, 1/C is analogous to the spring constant k.

But we’re missing something: what’s the analogy to the mass or intertia factor in this circuit? Well… There’s one passive element in this circuit which we haven’t explained as yet: the self-inductance L. The phenomenon of self-inductance is the following: a changing electric current in a coil builds up a changing magnetic field, and that induces a current (and, hence, a voltage) that’s opposite to the primary current (and, hence, an opposite voltage). So it resists the change in current and, as such, it’s analogous to mass indeed. The illustration below explains how it works. I’ve also inserted a diagram showing how transformers work, because that’s based on the same principle of changing currents inducing changing magnetic fields that, in turn, generate another current. What’s going on in transformers is referred to as mutual inductance and note, indeed, that it doesn’t work with DC (i.e. steady) current.

Now, I know that’s not all that easy to understand, but I should limit myself here to just giving the formula: the induced voltage is such a coil is proportional to the time-rate of change of the current I = dq/dt. So we have a second-order derivative here:

V = LdI/dt = L(d²q/dt²)

So now we’re finally ready to put it all together. In that ‘basic electric circuit’ above, we’ve got the three passive circuit elements – resistor, capacitor and self-inductance – connected in series, and so then we apply a sine wave voltage to the whole circuit. Of course, all the voltages – i.e. over the resistor, over the capacitor, and over the self-inductance – must add up to the total voltage we apply to the circuit (which we’ll denote by V(t), as it’s a changing voltage), taking into account their sign. We have: V_R = RI = R(dq/dt); V_C = q/C; and V_L = L(dI/dt) = L(d²q/dt²). Hence, we get:

L(d²q/dt²) + R(dq/dt) + q/C = V(t)

This is, once again, a differential equation of the second-order, and its mathematical form is the same as that equation for the oscillating spring (with a driving force and damping). [I repeat the equation below (in the section on the Q and the energy of an oscillator, so you don’t need to scroll too far.] So the solution is going to be the same and we’re going to have resonance if the angular frequency ω of our sine wave (i.e. the AC voltage generated by our generator) is close or equal to some kind of natural frequency characterizing the circuit. So what is that natural frequency ω₀? Well… Just like ω₀²was equal to k/m for our mechanical oscillator, we here get the grand result that ω₀²= 1/LC, and our friction parameter γ corresponds to R/L.

The Q and the energy of an oscillator

There’s another point I did not develop in my previous post, and that was the energy of an oscillator. To explain that, we’ll take the example of our mechanical spring once again. The equation for that one was:

m(d²x/dt²) + γm(dx/dt) + mω₀²x = F(t)

Now, from my posts on energy concepts, you’ll remember that a force does work, and that the work done is the product of the force and the displacement (i.e. the distance over which the force is doing work). Work done is energy, potential or kinetic (one gets converted into the other). In addition, you may or may not remember that the work done per second gives us the power, so the concept of power relates energy to time, rather than distance.

For infinitesimal quantities (i.e. using differentials), we can write that the differential work done in a time dt is equal to F·dx. The power that’s expended by the force is then F·dx/dt, so that turns out to be the product of the force and the velocity (dx/dt = v): P = F·v. Now, if we substitute F for that differential equation above, and re-arrange the terms a bit, we get a fairly monstrously looking equation:

P = F·(dx/dt) = m[(d²x/dt²)(dx/dt) + ω₀²x(dx/dt)] + γm(dx/dt)²

Now it turns out that we can write the first two terms on the left on this monstrous equation as d/dt[m(dx/dt)²/2 + mω₀²x²/2]. So we have a time derivative here of a sum of two terms we recognize: the first is the kinetic energy (mv²/2) and the second (mω₀²x²/2) is the potential energy of the spring. [I would need to show that to you but I hope you believe me here.] Both of them taken together are the energy that’s stored in the oscillation, i.e. the stored energy. Now, in the long run, this driving force will not add any more energy to this quantity (the spring will oscillate back and forth, but so we’ll have stable motion and that’s it really). In other words, this derivative must be zero.

But so that driving force continues to do work and so the power must go somewhere. Where? It must all go to that other term: γm(dx/dt)². What is that term? Well… It’s the energy that gets lost in friction: these are so-called resistive losses, and they usually get dissipated through heating. Hence, what happens is that most of the power of an external force is first used to build up the oscillation, thereby storing energy in the oscillator, but, once that’s done, the system only needs a little bit of energy to compensate for the heating (resistive) losses. Now the interesting thing is to calculate how much energy an oscillator can store. We can calculate that as follows:

The energy carried by a physical wave is proportional to the square of its amplitude: E ∝ A². Now, if it is a sinusoidal wave, we’ll need to take the average of the square of a sine or cosine function. Because sin²x and cos²x are the same functions really except for a phase difference of π/2, we can see that the average value for both functions should be 0.5 = 1/2. Hence, for any function Acosx, we can see that the average value of that square amplitude will be A²/2.
From your statistics classes, you may also remember that the mean of a product of a variable and some constant (e.g. γm(dx/dt)²) will be equal to the product of that constant and the mean of the variable. So we can write 〈γm(dx/dt)²〉 = γm〈(dx/dt)²〉. Now, taking into account that the solution x for the differential equation is a cosine function x = x₀cos(ωt+Δ), its derivative will also be a sinusoidal function but with ω in the amplitude as well. To make a long story short, 〈(dx/dt)²〉 is equal to ω²x₀²/2, and so we can write 〈γm(dx/dt)²〉 = γmω²x₀²/2.
So the expression above gives the energy being absorbed by the oscillator on a permanent basis, and we’ll denote that by 〈P〉 = γmω²x₀²/2. How much energy is stored?
Now that we’ve calculated 〈(dx/dt)²〉, we can calculate that too now. We’ll denote it by 〈E〉, and so 〈E〉 = 〈m(dx/dt)²/2 + mω₀²x²/2〉 = (1/2)m〈(dx/dt)² + (1/2)mω₀²〈x²〉 = m(ω² + ω₀²)x₀²/2. So what? Well… From the previous chapter, we know that x₀ becomes very large if ω is near to ω₀ (that’s what’s resonance is all about) and, hence, the stored energy will be quite large in that case. So the point is that we can get a large stored energy from a relatively small force, which is what you’d expect.

Now, the last thing I need to explain is the Q of an oscillator. The Q of an oscillator compares the stored energy with the amount of work that is done per cycle, multiplied by 2π for some historical reason I don’t understand to well:

Q = 2π·〈E〉/[〈P〉·2π/ω] = (ω² + ω₀²)/2γω

Note that 2π/ω is the period, i.e. the time T₀ that is needed to go through one cycle of the oscillation. As mentioned above, I am not sure about that 2π factor but it doesn’t matter too much: it’s just a constant and so we could divide by 2π and the result would not be substantially different: the Q is a relative number obviously, used to compare the efficiency of various oscillators when it comes to storing energy. Indeed, Q stands for quality: higher Q indicates a lower rate of energy loss relative to the stored energy of the resonator. So it implies that you do not need a lot of power to keep the oscillation going and, if the external driving force stops, that the oscillations will die out much more slowly. For example, a pendulum on a high-quality bearing, oscillating in air, will have a high Q, while a pendulum immersed in oil will have a low one.

But let me go back to the electric oscillator: we substitute m for L, R for mγ, and 1/C for mω₀², and then we can see that, for ω = ω₀² (so we calculate the Q at resonance), we find that Q = Lω/R, with ω the resonance frequency. Again, a circuit with high Q means that the circuit can store a very large amount of energy as compared to the work done per cycle of the voltage driving the oscillation.

An application of the Q: transients

Throughout this and my previous posts, I’ve managed to skirt around a more rigorous (i.e. mathematical) treatment of the subject-matter by not actually solving these second-order differential equations. So I won’t suddenly change tack and try to do that now. So this will, once again, be a rather intuitive approach. If you’d want a formal treatment, let me refer you to Paul’s Online Math Notes and, more in particular, the chapter on second-order DEs, which he wraps up with an overview of all differential equations you could possibly encounter when analyzing mechanical springs. But so here we go for the informal approach.

Above, we noted that the Q of a system is the ratio of (1) the stored energy (E) and (2) the work done per cycle, multiplied by 2π. Now, if we’d suddenly switch off the force, then no more work is being done, but the system will lose energy. Let’s suppose we have a system – an oscillating mechanical spring – for which we have a Q equal to 1000·2π, so we have Q/2π = 1000. So that means that the work done per cycle – when that driving force is still one – is one thousandth of its total energy. Hence, it’s not unreasonable to suggest that such system would also lose one thousandth of its total energy per cycle if we would just switch off the force and let go of it. Writing that assumption in terms of differential changes yields the following simple (first-order) differential equation:

dE/dt = –ωE/Q

Huh? Yes. Just think about it. A differential dE is associated with a differential dt. Now, the number of radians that the phase will go through during the infinitesimally short dt time interval is –ωdt, so the change in energy must be equal to dE = –ωdt·(E/Q) (the minus sign is there because we’re talking an energy loss obviously). So that gives us the equation above.

But what about ω? Well… If we just let that oscillator do what we would expect it to do, then it is not unreasonable to assume it would oscillate at its natural frequency. Hence, ω is likely to equal ω₀. Combining these two assumptions (i.e. that differential equation above and the ω = ω₀assumption) gives us the following formula for E:

E = E₀e^–tω₀/Q = E₀e^–γt

[Note that γ is the same friction coefficient: Q = (ω² + ω₀²)/2γω and, hence, if ω = ω₀, then we get ω₀/Q = γ indeed.]

Now, the energy goes as the square of the amplitude of the oscillation (i.e. the displacement x), so we would expect to find the square root of that e^–γt in the solution for x, so that’s a e^–γt/2 factor. If we’d formally solve it, we’d find the following solution for x indeed:

x = A₀e^–γt/2cos(ω₀t + Δ)

The diagram below shows the envelope curve e^–γt/2 as well as the x = e^–γt/2cos(ω₀t) curve (A₀ and Δ depend on the initial conditions obviously). So that’s what’s called a transient: a solution of the differential equation when there is no force present.

Now, I could bombard you with even more equations, more concepts (like the concept of impedance indeed), but I won’t do that here. I hope this post managed to get the most important ideas across and, hence, I’ll conclude this mini-series (i.e. two successive posts) on resonance. As for my next post, I may be tempted to treat the topic of second-order differential equations more formally, that is from a purely mathematical perspective. But let’s see. 🙂

Post scriptum:

The idea of applying only a little bit of power to build up a large amounts of stored energy may or may not trigger some thoughts on how a photo flash works and, in fact, you’re right. A photo flash uses both a transformer (to step up voltage) as well as an oscillator circuit to store up energy. You can find the details on the Web. See, for example, http://electronics.howstuffworks.com/camera-flash3.htm 🙂

Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Resonance phenomena

Pre-scriptum (dated 26 June 2020): This post – part of a series of rather simple posts on elementary math and physics – has suffered only a little bit from the attack by the dark force—which is good because I still like it. A few illustrations were removed because of perceived ‘unfair use’, but you will be able to google equivalent stuff. While my views on the true nature of light, matter and the force or forces that act on them have evolved significantly as part of my explorations of a more realist (classical) explanation of quantum mechanics, I think most (if not all) of the analysis in this post remains valid and fun to read. In fact, I would dare to say the whole Universe is all about resonance!

Original post:

One of the most common behaviors of physical systems is the phenomenon of resonance: a body (not only a tuning fork but any body really, such as a body of water, such as the ocean for example) or a system (e.g. an electric circuit) will have a so-called natural frequency, and an external driving force will cause it to oscillate. How it will behave, then, can be modeled using a simple differential equation, and the so-called resonance curve will usually look the same, regardless of what we are looking at. Besides the standard example of an electric circuit consisting of (i) a capacitor, (ii) a resistor and (iii) an inductor, Feynman also gives the following non-standard examples:

1. When the Earth’s atmosphere was disturbed as a result of the Krakatoa volcano explosion in 1883, it resonated at its own natural frequency, and its period was measured to be 10 hours and 20 minutes.

[In case you wonder how one can measure that, an explosion such as that one creates all kinds of waves, but the so-called infrasonic waves are the one we are talking about here. They circled the globe at least seven times, shattering windows hundreds of miles away. They did not only shatter windows in a radius , but they were also recorded worldwide. That’s how they could be measured a second, third, etc time. How? There was no wind or so, but the infrasonic waves (i.e. ‘sounds’ beneath the lowest limits of human hearing (about 16 or 17 Hz), down to 0.001 Hz) of such oscillation cause minute changes in the atmospheric pressure which can be measured by microbarometers. So the ‘ringing’ of the atmosphere was measurable indeed. A nice article on infrasound waves is journal.borderlands.com/1997/infrasound. Of course, the surface of the Earth was ‘ringing’ as well, and such seismic shocks then produce tsunami waves, which can also be analyzed in terms of natural frequencies.]

2. Crystals can be made to oscillate in response to a changing external electric field, and this crystal resonance phenomenon is used in quartz clocks: the quartz crystal resonator in a basic quartz wristwatch is usually in the shape of a very small tuner fork. Literally: there’s a tiny tuning fork in your wristwatch, made of quartz, that has been laser-trimmed to vibrate at exactly 32,768 Hz, i.e. 2¹⁵ cycles per second.

3. Some quantum-mechanical phenomena can be analyzed in terms of resonance as well, but then it’s the energy of the interfering particles that assumes the role of the frequency of the external driving force when analyzing the response of the system. Feynman gives the example of gamma radiation from lithium as a function of the energy of protons bombarding the lithium nuclei to provoke the reaction. Indeed, when graphing the intensity of the gamma radiation emitted as a function of the energy, one also gets a resonance curve, as shown below. [Don’t you just love the fact it’s so old? A Physical Review article of 1948! There’s older stuff as well, because this journal actually started in 1893.]

However, let us analyze the phenomenon first in its most classical appearance: an oscillating spring.

Basics

We’ve seen the equation for an oscillating spring before. From a math point of view, it’s a differential equation (because one of the terms is a derivative of the dependent variable x) of the second order (because the derivative involved is of the second order):

m(d²x/dt²) = –kx

What’s written here is simply Newton’s Law: the force is –kx (the minus sign is there because the force is directed opposite to the displacement from the equilibrium position), and the force has to equal the oscillating mass on the spring times its acceleration: F = ma.

Now, this can be written as d²x/dt² = –(k/m)x = –ω₀²x with ω₀²= k/m. This ω₀symbol uses the Greek omega once again, which we used for the angular velocity of a rotating body. While we do not have anything that’s rotating here, ω₀is still an angular velocity or, to be more precise, it’s an angular frequency. Indeed, the solution to the differential equation above is

x = x₀cos(ω₀t + Δ)

The x₀factor is the maximum amplitude and that’s, quite simply, determined by how far we pulled or pushed the spring when we started the motion. Now, ω₀t + Δ = θ is referred to as the phase of the motion, and it’s easy to see that ω₀is an angular frequency indeed, because ω₀equals the time derivative dθ/dt. Hence, ω₀is the phase change, measured in radians, per second, and that’s the definition of angular frequency or angular velocity. Finally, we have Δ. That’s just a phase shift, and it basically depends on our t = 0 point.

Something on the math

I’ll do a separate post on the math that’s associated with this (second-order differential equations) but, in this case, we can solve the equation in a simple and intuitive way. Look at it: d²x/dt² = –ω₀²x. It’s obvious that x has to be a function that comes back to itself after two derivations, but with a minus sign in front, and then we also have that coefficient –ω₀². Hmm… What can we think of? An exponential function comes back to itself, and if there’s a coefficient in the exponent, then it will end up as a coefficient in front too: d(e^at)/dt = ae^atand, hence, d²(e^at)/dt² = a²e^at. Waw ! That’s close. In fact, that’s the same equation as the one above, except for the minus sign.

In fact, if you’d quickly look at Paul’s Online Math Notes, you’ll see that we can indeed get the general solution for such second-order differential equation (to be precise: it’s a so-called linear and homogeneous second-order DE with constant coefficients) using that remarkable property of exponentials indeed. However, because of the minus sign, our solution for the equation above will involve complex exponentials, and so we’ll get a general function in a complex variable. However, we’ll then impose that our solution has to be real only and, hence, we’ll take a subset of our more general solution. However, don’t worry about that here now. There’s an easier way.

Apart from the exponential function, there are two other functions that come back to themselves after two derivatives: the sine and cosine functions. Indeed, d²cos(t)/dt² = –cos(t) and d²sin(t)/dt² = –sin(t). In fact, the sine and cosine function are obviously the same except for a phase shift equal π/2: cos(t) = sin(t + π/2), so we can choose either. Let’s work with the cosine as for now (we can always convert it to a sine function using that cos(t) = sin(t + π/2) identity). The nice thing about the cosine (and sine) function is that we do get that minus sign when deriving it two times, and we also get that coefficient in front. Indeed: d²cos(ω₀t)/dt² = –ω₀²cos(ω₀t). In short, cos(ω₀t) is the right function. The only thing we need to add is that x₀and Δ, i.e. the amplitude and some phase shift but, as mentioned above, it is easy to understand these will depend on the initial conditions (i.e. the value of x at point t = 0 and the initial pull or push on the spring). In short, x = x₀cos(ω₀t + Δ) is the complete general solution of the simple (differential) equation we started with (i.e. m(d²x/dt²) = –kx).

Introducing a driving force

Now, most real-life oscillating systems will be driven by an external force, permanently or just for a short while, and they will also lose some of their energy in a so-called dissipative process: friction or, in an electric circuit, electrical resistance will cause the oscillation to slowly lose amplitude, thereby damping it.

Let’s look at the friction coefficient first. The friction will often be proportional to the speed with which the object moves. Indeed, in the case of a mass on a spring, the drag (i.e. the force that acts on a body as it travels through air or a fluid) is dependent on a lot of things: first and foremost, there’s the fluid itself (e.g. a thick liquid will create more drag than water), and then there’s also the size, shape and velocity of the object. I am following the treatment you’ll find in most textbooks here and so that includes an assumption that the resistance force is proportional to the velocity: F_f = –cv = –c(dx/dt). Furthermore, the constant of proportionality c will usually be written as a product of the mass and some other coefficient γ, so we have F_f = –cv = –mγ(dx/dt). That makes sense because we can look at γ = c/m as the friction per unit of mass.

That being said, the simplification as a whole (i.e. the assumption of proportionality with speed) is rather strange in light of the fact that drag forces are actually proportional to the square of the velocity. If you look it up, you’ll find a formula resembling F_D = ρC_DAv²/2, with ρ the fluid density, C_D the drag coefficient of drag (determined by the shape of the object and a so-called Reynolds number, which is determined from experiments), and A the cross-section area. It’s also rather strange to relate drag to mass by writing c as c = mγ because drag has nothing to do with mass. What about dry friction? So that would be kinetic friction between two surfaces, like when the mass is sliding on a surface? Well… In that case, mass would play a role but velocity wouldn’t, because kinetic friction is independent of the sliding velocity.

So why do physicists use this simplification? One reason is that it works for electric circuits: the equivalent of the velocity in electrical resonance is the current I = dq/dt, so that’s the time derivative of the charge on the capacitor. Now, I is proportional to the voltage difference V, and the proportionality coefficient is the resistance R, so we have V = RI = R(dq/dt). So, in short, the resistance curve we’re actually going to derive below is one for electric circuits. The other reason is that this assumption makes it easier to solve the differential equation that’s involved: it makes for a linear differential equation indeed. In fact, that’s the main reason. After all, professors are professors and so they have to give their students stuff that’s not too difficult to solve. In any case, let’s not be bothered too much and so we’ll just go along with it.

Modeling the driving force is easy: we’ll just assume it’s a sinusoidal force with angular frequency ω (and ω is, obviously, more likely than not somewhat different than the natural frequency ω₀). If F is sinusoidal force, we can write it as F = F₀cos(ωt + Δ). [So we also assume there is some phase shift Δ.] So now we can write the full equation for our oscillating spring as:

m(d²x/dt²) + γm(dx/dt) + kx = F ⇔ (d²x/dt²)+ γ(dx/dt) + ω₀²x = F

How do we solve something like that for x? Well, it’s a differential equation once again. In fact, it’s, once again, a linear differential equation with constant coefficients, and so there’s a general solution method for that. As I mentioned above, that general solution method will involve exponentials and, in general, complex exponentials. I won’t walk you through that. Indeed, I’ll just write the solution because this is not an exercise in solving differential equations. I just want you to understand the solution:

x = ρF₀cos(ωt + Δ + θ)

ρ in this equation has nothing to do with some density or so. It’s a factor which depends on m, ω and ω₀, in a fairly complicated way in fact:

As we can see from the equation above, the (maximum) amplitude of the oscillation is equal to ρF₀. So we have the magnitude of the force F here multiplied by ρ. Hence, ρ is a magnification factor which, multiplied with F₀, gives us the ‘amount’ of oscillation.

As for the θ in the equation above, we’re using this Greek letter (theta) not to refer to the phase, as we usually do, because the phase here is the whole ωt + Δ + θ expression, not just theta! The theta (θ) here is a phase shift as compared to the original force phase ωt + Δ, and θ also depends on ω and ω₀. Again, I won’t show how we derived this solution but just accept it as for now:

These three equations, taken together, should allow you to understand what’s going on really. We’ve got an oscillation x = ρF₀cos(ωt + Δ + θ), so that’s an equation with this amplification or magnification factor ρ and some phase shift θ. Both depend on the difference between ω₀and ω, and the two graphs below show how exactly.

The first graph shows the resonance phenomenon and, hence, it’s what’s referred to as the resonance curve: if the difference between ω₀and ω is small, we get an enormous amplification effect. It would actually go to infinity if it weren’t for the frictional force (but, of course, if the frictional force was not there, the spring would just break as the oscillation builds up and the swings get bigger and bigger).

The second graph shows the phase shift θ. It is interesting to note that the lag θ is equal –π/2 when ω₀is equal to ω, but I’ll let you figure out why this makes sense. [It’s got something to do with that cos(t) = sin(t + π/2) identity, so it’s nothing ‘deep’ really.]

I guess I should, perhaps, also write something about the energy that gets stored in an oscillator like this because, in that resonance curve above, we actually have ρ squared on the vertical axis, and that’s because energy is proportional to the square of the amplitude: E ∝ A². I should also explain a concept that’s closely related to energy: the so-called Q of an oscillator. It’s an interesting topic, if only because it helps us to understand why, for instance, the waves of the sea are such tremendous stores of energy! Furthermore, I should also write something about transients, i.e. oscillations that dampen because the driving force was turned off so to say. However, I’ll leave that for you to look it up if you’re interested in this topic. Here, I just wanted to present the essentials.

[…] Hey ! I managed to keep this post quite short for a change. Isn’t that good? 🙂

Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Understanding gyroscopes

Pre-scriptum (dated 26 June 2020): This post – part of a series of rather simple posts on elementary math and physics – has suffered only a little bit from the attack by the dark force—which is good because I still like it. Only one or two illustrations were removed because of perceived ‘unfair use’, but you will be able to google equivalent stuff. While my views on the true nature of light, matter and the force or forces that act on them have evolved significantly as part of my explorations of a more realist (classical) explanation of quantum mechanics, I think most (if not all) of the analysis in this post remains valid and fun to read. Understanding the dynamics of rotations is extremely important in any realist interpretation of quantum physics. In fact, I would dare to say it is all about rotation!

Original post:

You know a gyroscope: it’s a spinning wheel or disk mounted in a frame that itself is free to alter in direction, so the axis of rotation is not affected as the mounting tilts or moves about. Therefore, gyroscopes are used to provide stability or maintain a reference direction in navigation systems. Understanding a gyroscope itself is simple enough: it only involves a good understanding of the so-called moment of inertia. Indeed, in the previous post, we introduced a lot of concepts related to rotational motion, notably the concepts of torque and angular momentum but, because that post was getting too long, I did not talk about the moment of inertia and gyroscopes. Let me do that now. However, I should warn you: you will not be able to understand this post if you haven’t read or didn’t understand the previous post. So, if you can’t follow, please go back: it’s probably because you didn’t get the other post.

The moment of inertia and angular momentum are related but not quite the same. Let’s first recapitulate angular momentum. Angular momentum is the equivalent of linear momentum for rotational motion:

If we want to change the linear motion of an object, as measured by its momentum p = mv, we’ll need to apply a force. Changing the linear motion means changing either (a) the speed (v), i.e. the magnitude of the velocity vector v, (b) the direction, or (c) both. This is expressed in Newton’s Law, F = m(dv/dt), and so we note that the mass is just a factor of proportionality measuring the inertia to change.
The same goes for angular momentum (denoted by L): if we want to change it, we’ll need to apply a force, or a torque as it’s referred to when talking rotational motion, and such torque can change either (a) L’s magnitude (L), (b) L’s direction or (c) both.

Just like linear momentum, angular momentum is also a product of two factors: the first factor is the angular velocity ω, and the second factor is the moment of inertia. The moment of inertia is denoted by I so we write L = Iω. But what is I? If we’re analyzing a rigid body (which is what we usually do), then it will be calculated as follows:

This is easy enough to understand: the inertia for turning will depend not just on the masses of all of the particles that make up the object, but also on their distance from the axis of rotation–and note that we need to square these distances. The L = Iω formula, combined with the formula for I above, explains why a spinning skater doing a ‘scratch spin’ speeds up tremendously when drawing in his or her arms and legs. Indeed, the total angular momentum has to remain the same, but I becomes much smaller as a result of that r²factor in the formula. Hence, if I becomes smaller, then ω has to go up significantly in order to conserve angular momentum.

Finally, we note that angular momentum and linear momentum can be easily related through the following equation:

That’s all kids stuff. To understand gyroscopes, we’ll have to go beyond that and do some vector analysis. In the previous post, we explained that rotational motion is usually analyzed in terms of torques than forces, and we detailed the relations between force and torque. More in particular, we introduced a torque vector τ with the following components:

τ = (τ_yz, τ_zx, τ_xy) = (τ_x, τ_y, τ_z) with

τ_x = τ_yz = yF_z – zF_y

τ_y = τ_zx = zF_x – xF_z

τ_z = τ_xy = xF_y – yF_x.

We also noted that this torque vector could be written as a cross product of a radius vector and the force: τ = r×F. Finally, we also pointed out the relation between the x-, y- and z-components of the torque vector and the plane of rotation:

(1) τ_x = τ_yz is rotational motion about the x-axis (i.e. motion in the yz-plane)

(2) τ_y = τ_zx is rotational motion about the y-axis (i.e. motion in the zx plane)

(3) τ_z = τ_xy is rotational motion about the z-axis (i.e. motion in the xy-plane)

The angular momentum vector L will have the same direction as the torque vector, but it’s the cross product of the radius vector and the momentum vector: L = r×p. For clarity, I reproduce the animation I used in my previous post once again.

How do we get that cross vector product for L? We noted that τ (i.e. the Greek tau) = dL/dt. So we need to take the time derivative of all three components of L. What are the components of L? They look very similar to those of τ:

L = (L_yz, L_zx, L_xy) = (L_x, L_y, L_z) with

L_x = L_yz = yp_z – zp_y

L_y = L_zx = zp_x – xp_z

L_z = L_xy = xp_y – yp_x.

Now, just check the time derivatives of L_x, L_y, and L_z and you’ll find the components of the torque vector τ. Together with the formulas above, that should be sufficient to convince you that L is, indeed, a vector cross product of r and p: L = r×p.

Again, if you feel this is too difficult, please read or re-read my previous post. But if you do understand everything, then you are ready for a much more difficult analysis, and that’s an explanation of why a spinning top does not fall as it rotates about.

In order to understand that explanation, we’ll first analyze the situation below. It resembles the experiment with the swivel chair that’s often described on ‘easy physics’ websites: the man below holds a spinning wheel with its axis horizontal, and then turns this axis into the vertical. As a result, the man starts to turn himself in the opposite direction.

Let’s now look at the forces and torques involved. These are shown below.

This looks very complicated–you’ll say! You’re right: it is quite complicated–but not impossible to understand. First note the vectors involved in the starting position: we have an angular momentum vector L₀ and an angular velocity vector ω₀. These are both axial vectors, as I explained in my previous post: their direction is perpendicular to the plane of motion, i.e. they are arrows along the axis of rotation. This is in line with what we wrote above: if an object is rotating in the zx-plane (which is the case here), then the angular momentum vector will have a y-component only, and so it will be directed along the y-axis. Which side? That’s determined by the right-hand screw rule. [Again, please do read my previous post for more details if you’d need them.]

So now we have explained L₀ and ω₀. What about all the other vectors? First note that there would be no torque if the man would not try to turn the axis. In that case, the angular momentum would just remain what it is, i.e. dL/dt = 0, and there would be no torque. Indeed, remember that τ = dL/dt, just like F = dp/dt, so dL/dt = 0, then τ = 0. But so the man is turning the axis of rotation and, hence, τ = dL/dt ≠ 0. What’s changing here is not the magnitude of the angular momentum but its direction. As usual, the analysis is in terms of differentials.

As the man turns the spinning wheel, the directional change of the angular momentum is defined by the angle Δθ, and we get a new angular momentum vector L₁. The difference between L₁ and L₀ is given by the vector ΔL. This ΔL vector is a tiny vector in the L₀L₁ plane and, because we’re looking at a differential displacement only, we can say that, for all practical purposes, this ΔL is orthogonal to L₀ (as we move from L₀ to L₁, we’re actually moving along an arc and, hence, ΔL is a tangential vector). Therefore, simple trigonometry allows us to say that its magnitude ΔL will be equal to L₀Δθ. [We should actually write sin(Δθ) but, because we’re talking differentials and measuring angles in radians (so the value reflects arc lengths), we can equate sin(Δθ) with Δθ).]

Now, the torque vector τ has the same direction as the ΔL vector (that’s obvious from their definitions), but what is its magnitude? That’s an easy question to answer: τ = ΔL/Δt = L₀Δθ/Δt = L₀ (Δθ/Δt). Now, this result induces us to define another axial vector which we’ll denote using the same Greek letter omega, but written as a capital letter instead of in lowercase: Ω. The direction of Ω is determined by using that right-hand screw rule which we’ve always been using, and Ω‘s magnitude is equal to Ω = Δθ/Δt. So, in short, Ω is an angular velocity vector just like ω: its magnitude is the speed with which the man is turning the axis of rotation of the spinning wheel, and its direction is determined using the same rules. If we do that, we get the rather remarkable result that we can write the torque vector τ as the cross product of Ω and L₀:

τ = Ω×L₀

Now, this is not an obvious result, so you should check it yourself. When doing that, you’ll note that the two vectors are orthogonal and so we have τ = Ω×L₀ = Ω×L₀ =|Ω||L₀|sin(π/2)n = ΩL₀n with n the normal unit vector given, once again, by the right-hand screw rule. [Note how the order of the two factors in a cross product matters: a×b = –b×a.]

You’re probably tired of this already, and so you’ll say: so what?

Well… We have a torque. A torque is produced by forces, and a torque vector along the z-axis is associated with rotation about the z-axis, i.e. rotation in the xy-plane. Such rotation is caused by the forces F and –F that produce the torque, as shown in the illustration. [Again, their direction is determined by the right-hand screw rule – but I’ll stop repeating that from now on.] But… Wait a minute. First, the direction is wrong, isn’t it? The man turns the other way in reality. And, second, where do these forces come from? Well… The man produces them, and the direction of the forces is not wrong: as the man applies these forces, with his hands, as he holds the spinning wheel and turns it into the vertical direction, equal and opposite forces act on him (cf. the action-reaction principle), and so he starts to turn in the opposite direction.

So there we are: we have explained this complex situation fully in terms of torques and forces now. So that’s good. [If you don’t believe the thing about those forces, just get one of your wheels out of your mountainbike, let it spin, and try to change the plane in which it is spinning: you’ll see you’ll need a bit of force. Not much, but enough, and it’s exactly the kind of force that the man in the illustration is experiencing.]

Now, what if we would not be holding the spinning wheel? What if we would let it pivot, for example? Well… It would just pivot, as shown below.

But… Why doesn’t it fall? Hah! There we are! Now we are finally ready for the analysis we really want to do, i.e. explaining why these spinning tops (or gyros as they’re referred to in physics) don’t fall.

Such spinning top is shown in the illustration below. It’s similar to the spinning wheel: there’s a rotational axis, and we have the force of gravity trying to change the direction of that axis, so it’s like the man turning that spinning wheel indeed, but so now it’s gravity exerting the force that’s needed to change the angular momentum. Let’s associate the vertical direction with the z-axis, and the horizontal place with the xy-axis, and let’s go step-by-step:

The gravitational force wants to pull that spinning top down. So the ΔL vector points downward this time, not upward. Hence, the torque vector will point downward too. But so it’s a torque pointing along the z-axis.
Such torque along the z-axis is associated with a rotation in the xy-plane, so that’s why the spinning top will slowly revolve about the z-axis, parallel to the xy-plane. This process is referred to as precession, and so there’s a precession torque and a precession angular velocity.

So that explains precession and so that’s all there is to it. Now you’ll complain, and rightly so: what I write above, does not explain why the spinning top does not actually fall. I only explained that precession movement. So what’s going on? That spinning top should fall as it precesses, shouldn’t it?

It actually does fall. The point to note, however, is that the precession movement itself changes the direction of the angular momentum vector as well. So we have a new ΔL vector pointing sideways, i.e. a vector in the horizontal plane–so not along the z axis. Hence, we should have a torque in the horizontal plane, and so that implies that we should have two equal and opposite forces acting along the z-axis.

In fact, the right-hand screw rule gives us the direction of those forces: if these forces were effectively applied to the spinning top, it would fall even faster! However, the point to note is that there are no such forces. Indeed, it is not like the man with the spinning wheel: no one (or nothing) is pushing or applying the forces that should produce the torque associated with this change in angular momentum. Hence, because these forces are absent, the spinning top begins to ‘fall’ in the opposite direction of the lacking force, thereby counteracting the gravitational force in such a way that the spinning top just spins about the z-axis without actually falling.

Now, this is, most probably, very difficult to understand in the way you would like to understand it, so just let it sink in and think about it for a while. In this regard, and to help the understanding, it’s probably worth noting that the actual process of reaching equilibrium is somewhat messy. It is illustrated below: if we hold a spinning gyro for a while and then, suddenly, we let it fall (yes, just let it go), it will actually fall. However, as it’s falling, it also starts turning and then, because it starts turning, it also starts ‘falling’ upwards, as explained in that story of the ‘missing force’ above. Initially, the upward movement will overshoot the equilibrium position, thereby slowing the gyro’s speed in the horizontal plane. And so then, because its horizontal speed becomes smaller, it stops ‘falling upward’, and so that means it’s falling down again. But then it starts turning again, and so on and so on. I hope you grasp this–more or less at least. Note that frictional effects will cause the up-and-down movement to dampen out, and so we get a so-called cycloidal motion dampening down to the steady motion we associate with spinning tops and gyros.

That, then, is the ‘miracle’ of a spinning top explained. Is it less of a ‘miracle’ now that we have explained it in terms of torques and missing forces? That’s an appreciation which each of us has to make for him- or herself. I actually find it all even more wonderful now that I can explain it more or less using the kind of math I used above–but then you may have a different opinion.

In any case, let us – to wrap it all up – ask some simple questions about some other spinning objects. What about the Earth for example? It has an axis of rotation too, and it revolves around the Sun. Is there anything like precession going on?

The first answer is: no, not really. The axis of rotation of the Earth changes little with respect to the stars. Indeed, why would it change? Changing it would require a torque, and where would the required force for such torque come from? The Earth is not like a gyro on a pivot being pulled down by some force we cannot see. The Sun attracts the Earth as a whole indeed. It does not change its axis of rotation. That’s why we have a fairly regular day and night cycle.

The more precise answer is: yes, there actually is a very slow axial precession. The whole precessional cycle takes approximately 26,000 years, and it causes the position of stars – as perceived by us, earthlings, that is – to slowly change. Over this cycle, the Earth’s north axial pole moves from where it is now, in a circle with an angular radius of about 23.5 degrees, as illustrated below.

What is this precession caused by? There must be some torque. There is. The Earth is not perfectly spherical: it bulges outward at the equator, and the gravitational tidal forces of the Moon and Sun apply some torque here, attempting to pull the equatorial bulge into the plane of the ecliptic, but instead causing it to precess. So it’s a quite subtle motion, but it’s there, and it’s got also something to do with the gravitational force. However, it’s got nothing to do with the way gravitation makes a spinning top do what it does. [The most amazing thing about this, in my opinion, is that, despite the fact that the precessional movement is so tiny, the Greeks had already discovered it: indeed, the Greek astronomer and mathematician Hipparchus of Nicaea gave a pretty precise figure for this so-called ‘precession of the equinoxes’ in 127 BC.]

What about electrons? Are they like gyros rotating around some pivot? Here the answer is very simple and very straightforward: No, not at all! First, there are no pivots in an atom. Second, the current understanding of an electron – i.e. the quantum-mechanical understanding of a electron – is not compatible with the classical notion of spin. Let me just copy an explanation from Georgia State University’s HyperPhyics website. It basically says it all:

“Experimental evidence like the hydrogen fine structure and the Stern-Gerlach experiment suggest that an electron has an intrinsic angular momentum, independent of its orbital angular momentum. These experiments suggest just two possible states for this angular momentum, and following the pattern of quantized angular momentum, this requires an angular momentum quantum number of 1/2. With this evidence, we say that the electron has spin 1/2. An angular momentum and a magnetic moment could indeed arise from a spinning sphere of charge, but this classical picture cannot fit the size or quantized nature of the electron spin. The property called electron spin must be considered to be a quantum concept without detailed classical analogy.”

So… I guess this should conclude my exposé on rotational motion. I am not sure what I am going to write about next, but I’ll see. 🙂

Post scriptum:

The above treatment is largely based on Feynman’s Lectures.(Vol. I, Chapter 18, 19 and 20). The subject could also be discussed using the concept of a force couple, aka pure moment. A force couple is a system of forces with a resultant moment but no resultant force. Hence, it causes rotation without translation or, more generally, without any acceleration of the centre of mass. In such analysis, we can say that gravity produces a force couple on the spinning top. The two forces of this couple are equal and opposite, and they pull at opposite ends. However, because one end of the top is fixed (friction forces keep the tip fixed to the ground), the force at the other end makes the top go about the vertical axis.

The situation we have is that gravity causes such force couple to appear, just like the man tilting the spinning wheel causes such force couple to appear. Now, the analysis above shows that the direction of the new force is perpendicular to the plane in which the axis of rotation changes, or wants to change in the case of our spinning top. So gravity wants to pull the top down and causes it to move sideways. This horizontal movement will, in turn, create another force couple. The direction of the resultant force, at the free end of the axis of rotation of the top, will, once again, be vertical, but it will oppose the gravity force. So, in a very simplified explanation of things, we could say:

Gravity pulls the top downwards, and causes a force that will make the top move sideways. So the new force, which causes the precession movement, is orthogonal to the gravitation force, i.e. it’s a horizontal force.
That horizontal force will, in turn, cause another force to appear. That force will also be orthogonal to the horizontal force. As we made two 90 degrees turns, so to say, i.e. 180 degrees in total, it means that this third force will be opposite to the gravitational force.
In equilibrium, we have three forces: gravity, the force causing the precession and, finally, a force neutralizing gravity as the spinning top precesses about the vertical axis.

This approach allows for a treatment that is somewhat more intuitive than Feynman’s concept of the ‘missing force.’

Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Spinning: the essentials

Pre-scriptum (dated 26 June 2020): These posts on elementary math and physics have not suffered much (if at all) from the attack by the dark force—which is good because I still like them. While my views on the true nature of light, matter and the force or forces that act on them have evolved significantly as part of my explorations of a more realist (classical) explanation of quantum mechanics, I think most (if not all) of the analysis in this post remains valid and fun to read. In fact, I find the simplest stuff is often the best. 🙂

Original post:

When introducing mirror symmetry (P-symmetry) in one of my older posts (time reversal and CPT-symmetry), I also introduced the concept of axial and polar vectors in physics. Axial vectors have to do with rotations, or spinning objects. Because spin – i.e. turning motion – is such an important concept in physics, I’d suggest we re-visit the topic here.

Of course, I should be clear from the outset that the discussion below is entirely classical. Indeed, as Wikipedia puts it: “The intrinsic spin of elementary particles (such as electrons) is quantum-mechanical phenomenon that does not have a counterpart in classical mechanics, despite the term spin being reminiscent of classical phenomena such as a planet spinning on its axis.” Nevertheless, if we don’t understand what spin is in the classical world – i.e. our world for all practical purposes – then we won’t get even near to appreciating what it might be in the quantum-mechanical world. Besides, it’s just plain fun: I am sure you have played, as a kid of as an adult even, with one of those magical spinning tops or toy gyroscopes and so you probably wonder how it really works in physics. So that’s what this post is all about.

The essential concept is the concept of torque. For rotations in space (i.e. rotational motion), the torque is what the force is for linear motion:

It’s the torque (τ) that makes an object spin faster or slower, just like the force would accelerate or decelerate that very same object when it would be moving along some curve (as opposed to spinning around some axis).
There’s also a similar ‘law of Newton’ for torque: you’ll remember that the force equals the time rate-of-change of a vector quantity referred to as (linear) momentum: F = dp/dt = d(mv)/dt = ma (the mass times the acceleration). Likewise, we have a vector quantity that is referred to as angular momentum (L), and we can write: τ (i.e. the Greek tau) = dL/dt.
Finally, instead of linear velocity, we’ll have an angular velocity ω (omega), which is the time rate-of-change of the angle θ defining how far the object has gone around (as opposed to the distance in linear dynamics, describing how far the object has gone along). So we have ω = dθ/dt. This is actually easy to visualize because we know that θ is the length of the corresponding arc on the unit circle. Hence, the equivalence with the linear distance traveled is easily ascertained.

There are numerous other equivalences. For example, we also have an angular acceleration: α = dω/dt = d²θ/dt²; and we should also note that, just like the force, the torque is doing work – in its conventional definition as used in physics – as it turns an object:

ΔW = τ·Δθ

However, we also need to point out the differences. The animation below does that very well, as it relates the ‘new’ concepts – i.e. torque and angular momentum – to the ‘old’ concepts – i.e. force and linear momentum.

So what do we have here? We have vector quantities once again, denoted by symbols in bold-face. However, τ, L and ω are special vectors: axial vectors indeed, as opposed to the polar vectors F, p and v. Axial vectors are directed along the axis of spin – so that is, strangely enough, at right angles to the direction of spin, or perpendicular to the ‘plane of the twist’ as Feynman calls it – and the direction of the axial vector is determined by the direction of spin through one of two conventions: the ‘right-hand screw rule’ or the ‘left-hand screw rule’. Physicists have settled on the former.

If you feel very confused now (I did when I first looked at it), just step back and go through the full argument as I develop it here. It helps to think of torque (also known, for some obscure reason, as the moment of the force) as a twist on an object or a plane indeed: the torque’s magnitude is equal to the tangential component of the force, i.e. F·sin(Δθ), times the distance between the object and the axis of rotation (we’ll denote this distance by r). This quantity is also equal to the product of the magnitude of the force itself and the length of the so-called lever arm, i.e. the perpendicular distance from the axis to the line of action of the force (this lever arm length is denoted by r₀). So we can write τ as:

The product of the tangential component of the force times the distance r: τ = r·F_t = r·F·sin(Δθ)
The product of the length of the lever arm times the force: τ = r₀·F
The torque is the work done per unit of distance traveled: τ = ΔW/Δθ or τ = dW/dθ in the limit.

So… These are actually only the basics, which you should remember from your high-school physics course. If not, have another look at it. We now need to go from scalar quantities to vector quantities to understand that animation above. Torque is not a vector like force or velocity, not a priori at least. However, we can associate torque with a vector of a special type, an axial vector. Feynman calls vectors such as force or (linear) velocity ‘honest’ or ‘real’ vectors. The mathematically correct term for such ‘honest’ or ‘real’ vectors is polar vector. Hence, axial vectors are not ‘honest’ or ‘real’ in some sense: we derive them from the polar vectors. They are, in effect, a so-called cross product of two ‘honest’ vectors. Here we need to explain the difference between a dot and a cross product between two vectors once again:

(1) A dot product, which we denoted by a little dot (·), yields a scalar quantity: a·b = |a||b|cosα = a·b·cosα with α the angle between the two vectors a and b. Note that the dot product of two orthogonal vectors is equal to zero, so take care: τ = r·F_t = r·F·sin(Δθ) is not a dot product of two vectors. It’s a simple product of two scalar quantities: we only use the dot as a mark of separation, which may be quite confusing. In fact, some authors use ∗ for a product of scalars to avoid confusion: that’s not a bad idea, but it’s not a convention as yet. Omitting the dot when multiplying scalars (as I do when I write |a||b|cosα) is also possible, but it makes it a bit difficult to read formulas I find. Also note, once again, how important the difference between bold-face and normal type is in formulas like this: it distinguishes vectors from scalars – and these are two very different things indeed.

(2) A cross product, which we denote by using a cross (×), yields another vector: τ = r×F =|r|·|F|·sinα·n = r·F·sinα·n with n the normal unit vector given by the right-hand rule. Note how a cross product involves a sine, not a cosine – as opposed to a dot product. Hence, if r and F are orthogonal vectors (which is not unlikely), then this sine term will be equal to 1. If the two vectors are not perpendicular to each other, then the sine function will assure that we use the tangential component of the force.

But, again, how do we go from torque as a scalar quantity (τ = r·F_t) to the vector τ = r×F? Well… Let’s suppose, first, that, in our (inertial) frame of reference, we have some object spinning around the z-axis only. In other words, it spins in the xy-plane only. So we have a torque around (or about) the z-axis, i.e. in the xy-plane. The work that will be done by this torque can be written as:

ΔW = F_xΔx + F_yΔy = (xF_y – yF_x)Δθ

Huh? Yes. This results from a simple two-dimensional analysis of what’s going on in the xy-plane: the force has an x- and a y-component, and the distance traveled in the x- and y-direction is Δx = –yΔθ and Δy = xΔθ respectively. I won’t go into the details of this (you can easily find these elsewhere) but just note the minus sign for Δx and the way the x and y get switched in the expressions.

So the torque in the xy-plane is given by τ_xy = ΔW/Δθ = xF_y – yF_x. Likewise, if the object would be spinning about the x-axis – or, what amounts to the same, in the yz-plane – we’d get τ_yz = yF_z – zF_y. Finally, for some object spinning about the y-axis (i.e. in the zx-plane – and please note I write zx, not xz, so as to be consistent as we switch the order of the x, y and z coordinates in the formulas), then we’d get τ_zx = zF_x – xF_z. Now we can appreciate the fact that a torque in some other plane, at some angle with our Cartesian planes, would be some combination of these three torques, so we’d write:

(1) τ_xy = xF_y – yF_x

(2) τ_yz = yF_z – zF_y and

(3) τ_zx = zF_x – xF_z.

Another observer with his Cartesian x’, y’ and z’ axes in some other direction (we’re not talking some observer moving away from us but, quite simply, a reference frame that’s being rotated itself around some axis not necessarily coinciding with any of the x-, y- z- or x’-, y’- and z’-axes mentioned above) would find other values as he calculates these torques, but the formulas would look the same:

(1’) τ_x’y’ = x’F_y’ – y’F_x’

(2’) τ_y’z’ = y’F_z’ – z’F_y’ and

(3’) τ_z’x’ = z’F_x’ – x’F_z’.

Now, of course, there must be some ‘nice’ relationship that expresses the τ_x’y’, τ_y’z’ and τ_z’x’ values in terms of τ_xy, τ_yz, just like there was some ‘nice’ relationship between the x’, y’ and z’ components of a vector in one coordination system (the x’, y’ and z’ coordinate system) and the x, y, z components of that same vector in the x, y and z coordinate system. Now, I won’t go into the details but that ‘nice’ relationship is, in fact, given by transformation expressions involving a rotation matrix. I won’t write that one down here, because it looks pretty formidable, but just google ‘axis-angle representation of a rotation’ and you’ll get all the details you want.

The point to note is that, in both sets of equations above, we have an x-, y- and z-component of some mathematical vector that transform just like a ‘real’ vector. Now, if it behaves like a vector, we’ll just call it a vector, and that’s how, in essence, we define torque, angular momentum (and angular velocity too) as axial vectors. We should note how it works exactly though:

(1) τ_xy and τ_x’y’ will transform like the z-component of a vector (note that we were talking rotational motion about the z-axis when introducing this quantity);

(2) τ_yz and τ_y’z’ will transform like the x-component of a vector (note that we were talking rotational motion about the x-axis when introducing this quantity);

(3) τ_zx and τ_z’x’ will transform like the y-component of a vector (note that we were talking rotation motion when introducing this quantity). So we have

τ = (τ_yz, τ_zx, τ_xy) = (τ_x, τ_y, τ_z) with

τ_x = τ_yz = yF_z – zF_y

τ_y = τ_zx = zF_x – xF_z

τ_z = τ_xy = xF_y – yF_x.

[This may look very difficult to remember but just look at the order: all we do is respect the clockwise order x, y, z, x, y, z, x, etc. when jotting down the x, y and z subscripts.]

Now we are, finally, well equipped to once again look at that vector representation of rotation. I reproduce it once again below so you don’t have to scroll back to that animation:

We have rotation in the zx-plane here (i.e. rotation about the y-axis) driven by an oscillating force F, and so, yes, we can see that the torque vector oscillates along the y-axis only: its x- and z-components are zero. We also have L here, the angular momentum. That’s a vector quantity as well. We can write it as

L = (L_yz, L_zx, L_xy) = (L_x, L_y, L_z) with

L_x = L_yz = yp_z – zp_y (i.e. the angular momentum about the x-axis)

L_y = L_zx = zp_x – xp_z(i.e. the angular momentum about the y-axis)

L_z = L_xy = xp_y – yp_x (i.e. the angular momentum about the z-axis),

And we note, once again, that only the y-component is non-zero in this case, because the rotation is about the y-axis.

We should now remember the rules for a cross product. Above, we wrote that τ = r´F =|r|×|F|×sina×n = = r×F×sina×n with n the normal unit vector given by the right-hand rule. However, a vector product can also be written in terms of its components: c = a´b if and only

c_x = a_yb_z – a_zb_y,

c_y = a_zb_x – a_xb_z, and

c_z = a_xb_y – a_yb_x.

Again, if this looks difficult, remember the trick above: respect the clockwise order when jotting down the x, y and z subscripts. I’ll leave it to you to work out r´F and r´p in terms of components but, when you write it all out, you’ll see it corresponds to the formulas above. In addition, I will also leave it to you to show that the velocity of some particle in a rotating body can be given by a similar vector product: v = ω´r, with ω being defined as another axial vector (aka pseudovector) pointing along the direction of the axis of rotation, i.e. not in the direction of motion. [Is that strange? No. As it’s rotational motion, there is no ‘direction of motion’ really: the object, or any particle in that object, goes round and round and round indeed and, hence, defining some normal vector using the right-hand rule to denote angular velocity makes a lot of sense.]

I could continue to write and write and write, but I need to stop here. Indeed, I actually wanted to tell you how gyroscopes work, but I notice that this introduction has already taken several pages. Hence, I’ll leave the gyroscope for a separate post. So, be warned, you’ll need to read and understand this one before reading my next one.

On (special) relativity: what’s relative?

Pre-scriptum (dated 26 June 2020): These posts on elementary math and physics have not suffered much the attack by the dark force—which is good because I still like them. While my views on the true nature of light, matter and the force or forces that act on them have evolved significantly as part of my explorations of a more realist (classical) explanation of quantum mechanics, I think most (if not all) of the analysis in this post remains valid and fun to read. In fact, I find the simplest stuff is often the best. 🙂

Original post:

This is my third and final post about special relativity. In the previous posts, I introduced the general idea and the Lorentz transformations. I present these Lorentz transformations once again below, next to their Galilean counterparts. [Note that I continue to assume, for simplicity, that the two reference frames move with respect to each other along the x- axis only, so the y- and z-component of u is zero. It is not all that difficult to generalize to three dimensions (especially not when using vectors) but it makes an intuitive understanding of what’s relativity all about more difficult.]

As you can see, under a Lorentz transformation, the new ‘primed’ space and time coordinates are a mixture of the ‘unprimed’ ones. Indeed, the new x’ is a mixture of x and t, and the new t’ is a mixture as well. You don’t have that under a Galilean transformation: in the Newtonian world, space and time are neatly separated, and time is absolute, i.e. it is the same regardless of the reference frame. In Einstein’s world – our world – that’s not the case: time is relative, or local as Hendrik Lorentz termed it, and so it’s space-time – i.e. ‘some kind of union of space and time’ as Minkowski termed it – that transforms. In practice, physicists will use so-called four-vectors, i.e. vectors with four coordinates, to keep track of things. These four-vectors incorporate both the three-dimensional space vector as well as the time dimension. However, we won’t go into the mathematical details of that here.

What else is relative? Everything, except the speed of light. Of course, velocity is relative, just like in the Newtonian world, but the equation to go from a velocity as measured in one reference frame to a velocity as measured in the other, is different: it’s not a matter of just adding or subtracting speeds. In addition, besides time, mass becomes a relative concept as well in Einstein’s world, and that was definitely not the case in the Newtonian world.

What about energy? Well… We mentioned that velocities are relative in the Newtonian world as well, so momentum and kinetic energy were relative in that world as well: what you would measure for those two quantities would depend on your reference frame as well. However, here also, we get a different formula now. In addition, we have this weird equivalence between mass and energy in Einstein’s world, about which I should also say something more.

But let’s tackle these topics one by one. We’ll start with velocities.

Relativistic velocity

In the Newtonian world, it was easy. From the Galilean transformation equations above, it’s easy to see that

v’ = dx’/dt’ = d(x – ut)/dt = dx/dt – d(ut)/dt = v – u

So, in the Newtonian world, it’s just a matter of adding/subtracting speeds indeed: if my car goes 100 km/h (v), and yours goes 120 km/h, then you will see my car falling behind at a speed of (minus) 20 km/h. That’s it. In Einstein’s world, it is not so simply. Let’s take the spaceship example once again. So we have a man on the ground (the inertial or ‘unprimed’ reference frame) and a man in the spaceship (the primed reference frame), which is moving away from us with velocity u.

Now, suppose an object is moving inside the spaceship (along the x-axis as well) with a (uniform) velocity v_x’, as measured from the point of view of the man inside the spaceship. Then the displacement x’ will be equal to x’ = v_x’t’. To know how that looks from the man on the ground, we just need to use the opposite Lorentz transformations: just replace u by –u everywhere (to the man in the spaceship, it’s like the man on the ground moves away with velocity –u), and note that the Lorentz factor does not change because we’re squaring and (–u)²= u². So we get:

Hence, x’ = v_x’t’ can be written as x = γ(v_x’t’ + ut’). Now we should also substitute t’, because we want to measure everything from the point of view of the man on the ground. Now, t = γ(t’ + uv_x’t’/c²). Because we’re talking uniform velocities, v_x(i.e. the velocity of the object as measured by the man on the ground) will be equal to x divided by t (so we don’t need to take the time derivative of x), and then, after some simplifying and re-arranging (note, for instance, how the t’ factor miraculously disappears), we get:

What does this rather complicated formula say? Just put in some numbers:

Suppose the object is moving at half the speed of light, so 0.5c, and that the spaceship is moving itself also at 0.5c, then we get the rather remarkable result that, from the point of view of the observer on the ground, that object is not going as fast as light, but only at v_x = (0.5c + 0.5c)/(1 + 0.5·0.5) = 0.8c.
Or suppose we’re looking at a light beam inside the spaceship, so something that’s traveling at speed c itself in the spaceship. How does that look to the man on the ground? Just put in the numbers: v_x = (0.5c + c)/(1 + 0.5·1) = c ! So the speed of light is not dependent on the reference frame: it looks the same – both to the man in the ship as well as to the man on the ground. As Feynman puts it: “This is good, for it is, in fact, what the Einstein theory of relativity was designed to do in the first place–so it had better work!”

It’s interesting to note that, even if u has no y– or z-component, velocity in the y direction will be affected too. Indeed, if an object is moving upward in the spaceship, then the distance of travel of that object to the man on the ground will appear to be larger. See the triangle below: if that object travels a distance Δs’ = Δy’ = Δy = v’Δt’ with respect to the man in the spaceship, then it will have traveled a distance Δs = vΔt to the man on the ground, and that distance is longer.

I won’t go through the process of substituting and combining the Lorentz equations (you can do that yourself) but the grand result is the following:

v_y = (1/γ)v_y’

1/γ is the reciprocal of the Lorentz factor, and I’ll leave it to you to work out a few numeric examples. When you do that, you’ll find the rather remarkable result that v_y is actually less than v_y’. For example, for u = 0.6c, 1/γ will be equal to 0.8, so v_y will be 20% less than v_y’. How is that possible? The vertical distance is what it is (Δy’ = Δy), and that distance is not affected by the ‘length contraction’ effect (y’ = y). So how can the vertical velocity be smaller? The answer is easy to state, but not so easy to understand: it’s the time dilation effect: time in the spaceship goes slower. Hence, the object will cover the same vertical distance indeed – for both observers – but, from the point of view of the observer on the ground, the object will apparently need more time to cover that distance than the time measured by the man in the spaceship: Δt > Δt’. Hence, the logical conclusion is that the vertical velocity of that object will appear to be less to the observer on the ground.

How much less? The time dilation factor is the Lorentz factor. Hence, Δt = γΔt’. Now, if u = 0.6c, then γ will be equal to 1.25 and Δt = 1.25Δt’. Hence, if that object would need, say, one second to cover that vertical distance, then, from the point of view of the observer on the ground, it would need 1.25 seconds to cover the same distance. Hence, its speed as observed from the ground is indeed only 1/(5/4) = 4/5 = 0.8 of its speed as observed by the man in the spaceship.

Is that hard to understand? Maybe. You have to think through it. One common mistake is that people think that length contraction and/or time dilation are, somehow, related to the fact that we are looking at things from a distance and that light needs time to reach us. Indeed, on the Web, you can find complicated calculations using the angle of view and/or the line of sight (and tons of trigonometric formulas) as, for example, shown in the drawing below. These have nothing to do with relativity theory and you’ll never get the Lorentz transformation out of them. They are plain nonsense: they are rooted in an inability of these youthful authors to go beyond Galilean relativity. Length contraction and/or time dilation are not some kind of visual trick or illusion. If you want to see how one can derive the Lorentz factor geometrically, you should look for a good description of the Michelson-Morley experiment in a good physics handbook such as, yes :-), Feynman’s Lectures.

So, I repeat: illustrations that try to explain length contraction and time dilation in terms of line of sight and/or angle of view are useless and will not help you to understand relativity. On the contrary, they will only confuse you. I will let you think through this and move on to the next topic.

Relativistic mass and relativistic momentum

Einstein actually stated two principles in his (special) relativity theory:

The first is the Principle of Relativity itself, which is basically just the same as Newton’s principle of relativity. So that was nothing new actually: “If a system of coordinates K is chosen such that, in relation to it, physical laws hold good in their simplest form, then the same laws must hold good in relation to any other system of coordinates K’ moving in uniform translation relatively to K.” Hence, Einstein did not change the principle of relativity – quite on the contrary: he re-confirmed it – but he did change Newton’s Laws, as well as the Galilean transformation equations that came with them. He also introduced a new ‘law’, which is stated in the second ‘principle’, and that the more revolutionary one really:
The Principle of Invariant Light Speed: “Light is always propagated in empty space with a definite velocity [speed] c which is independent of the state of motion of the emitting body.”

As mentioned above, the most notable change in Newton’s Laws – the only change, in fact – is Einstein’s relativistic formula for mass:

m_v = γm₀

This formula implies that the inertia of an object, i.e. its mass, also depends on the reference frame of the observer. If the object moves (but velocity is relative as we know: an object will not be moving if we move with it), then its mass increases. This affects its momentum. As you may or may not remember, the momentum of an object is the product of its mass and its velocity. It’s a vector quantity and, hence, momentum has not only a magnitude but also a direction:

p_v = m_vv = γm₀v

As evidenced from the formula above, the momentum formula is a relativistic formula as well, as it’s dependent on the Lorentz factor too. So where do I want to go from here? Well… In this section (relativistic mass and momentum), I just want to show that Einstein’s mass formula is not some separate law or postulate: it just comes with the Lorentz transformation equations (and the above-mentioned consequences in terms of measuring horizontal and vertical velocities).

Indeed, Einstein’s relativistic mass formula can be derived from the momentum conservation principle, which is one of the ‘physical laws’ that Einstein refers to. Look at the elastic collision between two billiard balls below. These balls are equal – same mass and same speed from the point of view of an inertial observer – but not identical: one is red and one is blue. The two diagrams show the collision from two different points of view: left, we have the inertial reference frame, and, right, we have a reference frame that is moving with a velocity equal to the horizontal component of the velocity of the blue ball.

The points to note are the following:

The total momentum of such elastic collision before and after the collision must be the same.
Because the two balls have equal mass (in the inertial reference frame at least), the collision will be perfectly symmetrical. Indeed, we may just turn the diagram ‘upside down’ and change the colors of the balls, as we do below, and the values w, u and v (as well as the angle α) are the same.

As mentioned above, the velocity of the blue and red ball and, hence, their momentum, will depend on the frame of reference. In the diagram on the left, we’re moving with a velocity equal to the horizontal component of the velocity of the blue ball and, therefore, in this particular frame of reference, the velocity (and the momentum) of the blue ball consists of a vertical component only, which we refer to as w.

From this point of view (i.e. the reference frame moving with, the velocity (and, hence, the momentum) of the red ball will have both a horizontal as well as a vertical component. If we denote the horizontal component by u, then it’s easy to show that the vertical velocity of the red ball must be equal to sin(α)v. Now, because u = cos(α)v, this vertical component will be equal to tan(α)u. But so what is tan(α)u? Now, you’ll say, that is quite evident: tan(α)u must be equal to w, right?

No. That’s Newtonian physics. The red ball is moving horizontally with speed u with respect to the blue ball and, hence, its vertical velocity will not be quite equal to w. Its vertical velocity will be given by the formula which we derived above: v_y = (1/γ)v_y’, so it will be a little bit slower than the w we see in the diagram on the right which is, of course, the same w as in the diagram on the left. [If you look carefully at my drawing above, then you’ll notice that the w vector is a bit longer indeed.]

Huh? Yes. Just think about it: tan(α)u = (1/γ)w. But then… How can momentum be conserved if these speeds are not the same? Isn’t the momentum conservation principle supposed to conserve both horizontal as well as vertical momentum? It is, and momentum is being conserved. Why? Because of the relativistic mass factor.

Indeed, the change in vertical momentum (Δp) of the blue ball in the diagram on the left or – which amounts to the same – the red ball in the diagram on the right (i.e. the vertically moving ball) is equal to Δp_blue = 2m_ww. [The factor 2 is there because the ball goes down and then up (or vice versa) and, hence, the total change in momentum must be twice the m_ww amount.] Now, that amount must be equal to Δp_red, which is equal to Δp_blue = 2m_v(1/γ)w. Equating both yields the following grand result:

m_v/m_w= γ ⇔ m_v= γm_w

What does this mean? It means that mass of the red ball in the diagram on the left is larger than the mass of the blue ball. So here we have actually derived Einstein’s relativistic mass formula from the momentum conservation principle !

Of course you’ll say: not quite. This formula is not the m_u= γm₀formula that we’re used to ! Indeed, it’s not. The blue ball has some velocity w itself, and so the formula links two velocities v and w. However, we can derive m_v= γm₀formula as a limit of m_v= γm_w for w going to zer0. How can w become infinitesimally small? If the angle α becomes infinitesimally small. It’s obvious, then, that v and u will be practically equal. In fact, if w goes to zero, then m_wwill be equal to m₀in the limiting case, and m_vwill be equal to m_u. So, then, indeed, we get the familiar formula as a limiting case:

m_u= γm₀

Hmm… You’ll probably find all of this quite fishy. I’d suggest you just think about it. What I presented above, is actually Feynman’s presentation of the subject, but with a bit more verbosity. Let’s move on to the final.

Relativistic energy

From what I wrote above (and from what I wrote in my two previous posts on this topic), it should be obvious, by now, that energy also depends on the reference frame. Indeed, mass and velocity depend on the reference frame (moving or not), and both appear in the formula for kinetic energy which, as you’ll remember, is

K.E. = mc²– m₀c²= (m – m₀)c²= γm₀c²– m₀c²= m₀c²(γ – 1).

Now, if you go back to the post where I presented that formula, you’ll see that we’re actually talking the change in kinetic energy here: if the mass is at rest, it’s kinetic energy is zero (because m = m₀), and it’s only when the mass is moving, that we can observe the increase in mass. [If you wonder how, think about the example of the fast-moving electrons in an electron beam: we see it as an increase in the inertia: applying the same force does no longer yield the same acceleration.]

Now, in that same post, I also noted that Einstein added an equivalent rest mass energy (E₀= m₀c²) to the kinetic energy above, to arrive at the total energy of an object:

E = E₀+ K.E. = mc²

Now, what does this equivalence actually mean? Is mass energy? Can we equate them really? The short answer to that is: yes.

Indeed, in one of my older posts (Loose Ends), I explained that protons and neutrons are made of quarks and, hence, that quarks are the actual matter particles, not protons and neutrons. However, the mass of a proton – which consists of two up quarks and one down quark – is 938 MeV/c²(don’t worry about the units I am using here: because protons are so tiny, we don’t measure their mass in grams), but the mass figure you get when you add the rest mass of two u‘s and one d, is 9.6 MeV/c²only: about one percent of 938 ! So where’s the difference?

The difference is the equivalent mass (or inertia) of the binding energy between the quarks. Indeed, the so-called ‘mass’ that gets converted into energy when a nuclear bomb explodes is not the mass of quarks. Quarks survive: nuclear power is binding energy between quarks that gets converted into heat and radiation and kinetic energy and whatever else a nuclear explosion unleashes.

In short, 99% of the ‘mass’ of a proton or an electron is due to the strong force. So that’s ‘potential’ energy that gets unleashed in a nuclear chain reaction. In other words, the rest mass of the proton is actually the inertia of the system of moving quarks and gluons that make up the particle. In such atomic system, even the energy of massless particles (e.g. the virtual photons that are being exchanged between the nucleus and its electron shells) is measured as part of the rest mass of the system. So, yes, mass is energy. As Feynman put it, long before the quark model was confirmed and generally accepted:

“We do not have to know what things are made of inside; we cannot and need not justify, inside a particle, which of the energy is rest energy of the parts into which it is going to disintegrate. It is not convenient and often not possible to separate the total mc²energy of an object into (1) rest energy of the inside pieces, (2) kinetic energy of the pieces, and (3) potential energy of the pieces; instead we simply speak of the total energy of the particle. We ‘shift the origin’ of energy by adding a constant m₀c²to everything, and say that the total energy of a particle is the mass in motion times c², and when the object is standing still, the energy is the mass at rest times c².” (Richard Feynman’s Lectures on Physics, Vol. I, p. 16-9)

So that says it all, I guess, and, hence, that concludes my little ‘series’ on (special) relativity. I hope you enjoyed it.

Post scriptum:

Feynman describes the concept of space-time with a nice analogy: “When we move to a new position, our brain immediately recalculates the true width and depth of an object from the ‘apparent’ width and depth. But our brain does not immediately recalculate coordinates and time when we move at high speed, because we have had no effective experience of going nearly as fast as light to appreciate the fact that time and space are also of the same nature. It is as though we were always stuck in the position of having to look at just the width of something, not being able to move our heads appreciably one way or the other; if we could, we understand now, we would see some of the other man’s time—we would see “behind”, so to speak, a little bit. Thus, we shall try to think of objects in a new kind of world, of space and time mixed together, in the same sense that the objects in our ordinary space-world are real, and can be looked at from different directions. We shall then consider that objects occupying space and lasting for a certain length of time occupy a kind of a “blob” in a new kind of world, and that when we look at this “blob” from different points of view when we are moving at different velocities. This new world, this geometrical entity in which the “blobs” exist by occupying position and taking up a certain amount of time, is called space-time.”

If none of what I wrote could convey the general idea, then I hope the above quote will. 🙂 Apart from that, I should also note that physicists will prefer to re-write the Lorentz transformation equations by measuring time and distance in so-called equivalent units: velocities will be expressed not in km/h but as a ratio of c and, hence, c = 1 (a pure number) and so u will also be a pure number between 0 and 1. That can be done by expressing distance in light-seconds ( a light-second is the distance traveled by light in one second or, alternatively, by expressing time in ‘meter’. Both are equivalent but, in most textbooks, it will be time that will be measured in the ‘new’ units. So how do we express time in meter?

It’s quite simple: we multiply the old seconds with c and then we get: time_{expressed in meters}= time_{expressed in seconds}multiplied by 3×10⁸meters per second. Hence, as the ‘second’ the first factor and the ‘per second’ in the second factor cancel out, the dimension of the new time unit will effectively be the meter. Now, if both time and distance are expressed in meter, then velocity becomes a pure number without any dimension, because we are dividing distance expressed in meter by time expressed in meter, and it should be noted that it will be a pure number between 0 and 1 (0 ≤ u ≤ 1), because 1 ‘time second’ = 1/(3×10⁸) ‘time meters’. Also, c itself becomes the pure number 1. The Lorentz transformation equations then become:

They are easy to remember in this form (cf. the symmetry between x – ut and t – ux) and, if needed, we can always convert back to the old units to recover the original formulas.

I personally think there is no better way to illustrate how space and time are ‘mere shadows’ of the same thing indeed: if we express both time and space in the same dimension (meter), we can see how, as result of that, velocity becomes a dimensionless number between zero and one and, more importantly, how the equations for x’ and t’ then mirror each other nicely. I am not sure what ‘kind of union’ between space and time Minkowski had in mind, but this must come pretty close, no?

Final note: I noted the equivalence of mass and energy above. In fact, mass and energy can also be expressed in the same units, and we actually do that above already. If we say that an electron has a rest mass of 0.511 MeV/c²(a bit less than a quarter of the mass of the u quark), then we express the mass in terms of energy. Indeed, the eV is an energy unit and so we’re actually using the m = E/c² formula when we express mass in such units. Expressing mass and energy in equivalent units allows us to derive similar ‘Lorentz transformation equations’ for the energy and the momentum of an object as measured under an inertial versus a moving reference frame. Hence, energy and momentum also transform like our space-time four-vectors and – likewise – the energy and the momentum itself, i.e. the components of the (four-)vector, are less ‘real’ than the vector itself. However, I think this post has become way too long and, hence, I’ll just jot these four equations down – please note, once again, the nice symmetry between (1) and (2) – but then leave it at that and finish this post. 🙂

On (special) relativity: the Lorentz transformations

Original post:

I just skyped to my kids (unfortunately, we’re separated by circumstances) and they did not quite get the two previous posts (on energy and (special) relativity). The main obstacle is that they don’t know much – nothing at all actually – about integrals. So I should avoid integrals. That’s hard but I’ll try to do so in this post, in which I want to introduce special relativity as it’s usually done, and so that’s not by talking about Einstein’s mass-energy equivalence relation first.

Galilean/Newtonian relativity

A lot of people think they understand relativity theory but they often confuse it with Galilean (aka Newtonian) relativity and, hence, they actually do not understand it at all. Indeed, Galilean or Newtonian relativity is as old as Galileo and Newton (so that’s like 400 years old), who stated the principle of relativity as a corollary to the laws of motion: “The motions of bodies included in a given space are the same amongst themselves, whether that space is at rest or moves uniformly forward in a straight line.”

The Galilean or Newtonian principle of relativity is about adding and subtracting speeds: if I am driving at 120 km/h on some highway, but you overtake me at 140 km/h, then I will see you go past me at the rather modest speed of 20 km/h. That’s all what there is to it.

Now, that’s not what Einstein‘s relativity theory is about. Indeed, the relationship between your and my reference frame (yours is moving with respect to mine, and mine is moving with respect to yours but with opposite velocity) is very simple in this example. It involves a so-called Galilean transformation only: if my coordinate system is (x, y, z, t), and yours is (x‘, y‘, z‘, t‘), then we can write:

(1) x’ = x – ut (or x = x’ + ut), (2) y’ = y, (3) z’ = z and (4) t’ = t

To continue the example above: if we start counting at t = t’ = 0 when you are overtaking me, and if we both consider ourselves to be at the center of our reference frame (i.e. x = 0 where I am and x’ = 0 where you are), then you will be at x = 10 km after 30 minutes from my point of view, and I will be at x’ = –10 km (so that’s 10 km behind) from your point of view. So x’ = x – ut indeed, with u = 20 km/h.

Again, that’s not what Einstein’s principle of relativity is about. They knew that very well in the 17th century already. In fact, they actually knew that much earlier but Descartes formalized his Cartesian coordinate system only in the first half of the 17th century and, hence, it’s only from that time onwards that scientists such as Newton and Huygens started using it to transform the laws of physics from one frame of reference to another. What they found is that those laws remained invariant.

For example, the conservation law for momentum remains valid even if, as illustrated below, an inertial observer will see an elastic collision, such as the one illustrated, differently than a observer who’s moving along: for the observer who’s moving along, the (horizontal) speed of the blue ball will be zero, and the (horizontal) speed of the red ball will be twice the speed as observed by the inertial observer. That being said, both observers will find that momentum (i.e. the product of mass and velocity: p = mv) is being conserved in such collisions.

But, again, that’s Galilean relativity only: the laws of Newton are of the same form in a moving system as in a stationary system and, therefore, it is impossible to tell, by making experiments, whether our system is moving or not. In other words: there is no such thing as ‘absolute speed’. But, so – let me repeat it again – that is not what Einstein’s relativity theory is about.

Let me give a more interesting example of Galilean relativity, and then we can see what’s wrong with it. The speed of a sound wave is not dependent on the motion of the source: the sound of a siren of an ambulance or a noisy car engine will always travel at a speed of 343 meter per second, regardless of the motion of the ambulance. So, while we’ll experience a so-called Doppler effect when the ambulance is moving – i.e. a higher pitch when it’s approaching than when it’s receding – this Doppler effect does not have any impact on the speed of the sound wave. It only affects the frequency as we hear it. The speed of the wave depends on the medium only, i.e. air in this case.

Indeed, the speed of sound will be different in another gas, or in a fluid, or in a solid, and there’s a surprisingly simple function for that – the so-called Newton-Laplace equation: v_sound = (k/ρ)². In this equation, k is a coefficient of ‘stiffness’ of the medium (even if ‘stiffness’ sounds somewhat strange as a concept to apply to gases), and ρ is the density of the medium (so lower or higher air density will increase/decrease the speed of sound).

This has nothing to do with speed being absolute. No. The Galilean relativity principle does come into play, as one would expect: it is actually possible to catch up with a sound wave (or with any wave traveling through some medium). In fact, that’s what supersonic planes do: they catch up with their own sound waves. However, in essence, planes are not any different from cars in terms of their relationship with the sound that they produce. It’s just that they are faster: the sound wave they produce also travels at a speed of 1,235 km/h, and so cars can’t match that, but supersonic planes can!

[As for the shock wave that is being produced as these planes accelerate and actually ‘break’ the ‘sound barrier’, that has to do with the pressure waves the plane creates in front of itself (just like a traveling compresses the air in front of it). These pressure waves also travel at the speed of sound. Now, as the speed of the object increases, the waves are forced together, or compressed, because they cannot get out of the way of each other. Eventually they merge into one single shock wave, and so that’s what happens and creates the ‘sonic boom’, which also travels at the speed of sound. However, that should not concern us here. For more information on this, I’d refer to Wikipedia, as I got these illustrations from that source, and I quite like the way they present the topic.]

The Doppler effect looks somewhat different (it’s illustrated above) but so, once again, this phenomenon has nothing to do with Einstein’s relativity theory. Why not? Because we are still talking Galilean relativity here. Indeed, let’s suppose our plane travels at twice the speed of sound (i.e. Mach 2 or almost 2,500 km/h). For us, as inertial observers, the speed of the sound wave originating at point 0 in the illustration above (i.e. the reference frame of the inertial observer) will be equal to dx/dt = 1235 km/h. However, for the pilot, the speed of that wave will be equal to

dx’/dt = d(x – ut)/dt = dx/dt – d(ut)/dt = dx/dt – d(ut)/dt = 1235 km/h – u

= 1235 km/h – u = 1235 km/h – 2470 km/h = – 1235 km/h

In short, from the point of view of the pilot, he sees the wave front of the wave created at point 0 traveling away from him (cf. the negative value) at 1235 km/h, i.e. the speed of sound. That makes sense obviously, because he travels twice as fast. However – I cannot repeat it enough – this phenomenon has nothing to do with Einstein’s theory of relativity: if they could have imagined supersonic travel, Galileo, Newton and Huygens would have predicted that too.

So what’s Einstein’s theory of (special) relativity about?

Einstein’s principle of relativity

In 1865, the Scottish mathematical physicist James Clerk Maxwell – I guess it’s important to note he’s Scottish with that referendum coming 🙂 – finally discovered that light was nothing but electromagnetic radiation – so radio waves, (visible) light, X-rays, gamma rays,… It’s all the same: electromagnetic radiation, also known as light tout court.

Now, the equations that describe how electromagnetic radiation (i.e. light) travels through space are beautiful but involve operators which you may not recognize and, hence, I will not write them down. The point to note is that Maxwell’s equations were very elegant but… There were two major difficulties with them:

They did not respect Galilean relativity: if we transform them using the above-mentioned Galilean transformation (x’ = x – ut, y’ = y, z’ = z and t’ = t) then we do not get some relative speed of light. On the contrary, according to Maxwell’s equations, from whatever reference frame you look at light, it should always travel at the same (absolute) speed of light c = 299,792 km/h. So c is a constant, and the same constant, ALWAYS.
Scientists did not have any clue about the medium in which light was supposed to travel. The second half of the 19th century saw lots of experiments trying to discover evidence of a hypothetical ‘luminiferous aether’ in which light was supposed to travel, and which should also have some ‘stiffness’ and ‘density’, but so they could not find any trace of it. No one ever did, and so now we’ve finally accepted that light can actually travel in a vacuum, i.e. in plain nothing.

So what? Well… Let’s first look at the first point. Just like a sound wave, the motion of the source does not have any impact on the speed of light: it goes out in all directions at the same speed c, whether it is emitted from a fast-moving car or from some beacon near the sea. However, unlike sound waves, Maxwell’s equations imply that we cannot catch up with them. That’s troublesome, very troublesome, because, according to the above-mentioned Galilean transformation rules,

i.e. v’ = dx’/dt = dx/dt – u = v – u,

some light beam that is traveling at speed v = c past a spaceship that itself is traveling at speed u – let’s say u = 0.2c for example – should have a speed of c‘ = c – 0.2c = 0.8c = = 239,834 km/h only with respect to the spaceship. However, that’s not what Maxwell’s equations say when you substitute x, y, z and t for x‘, y‘, z‘ and t‘ using those four simple equations x’ = x – ut, y’ = y, z’ = z and t’ = t. After you do the substitution, the transformed Maxwell equations will once again yield that c’ = c = 299,792 km/h, and not c’ = 0.8×299,792 km/h = 239,834 km/h.

That’s weird ! Why? Well… If you don’t think that this is weird, then you’re actually not thinking at all ! Just compare it with the example of our sound wave. There is just no logic to it !

The discovery startled all scientists because there could only be possible solutions to the paradox:

Either Maxwell’s equations were wrong (because they did not observe the principle of (Galilean relativity) or, else,
Newton’s equations (and the Galilean transformation rules – i.e. the Galilean relativity principle) are wrong.

Obviously, scientists and experimenters first tried to prove that Maxwell had it all wrong – if only because no experiment had ever shown Newton’s Laws to be wrong, and so it was probably hard – if not impossible – to try to come up with one that would ! So, instead, experimenters invented all kinds of wonderful apparatuses trying to show that the speed of the light was actually not absolute.

Basically, these experiments assumed that the speed of the Earth, as it rotates around the Sun at a speed of 108,000 km per hour, would result in measurable differences of c that would depend on the direction of the apparatus. More specifically, the speed of the light beam, as measured, would be different if the light beam would be traveling parallel to the motion of the Earth, as opposed to the light beam traveling at right angle to the motion of the Earth. Why? Well… It’s the same idea as the car chasing its own light beams, but I’ll refer to you to other descriptions of the experiment, because explaining these set-ups would take too much time and space. 🙂 I’ll just say that, because 108,000 km/h (on average) is only about 30 km per second (i.e. 0.0001 times c), these experiments relied on (expected) interference effects. The technical aspect of these experiments is really quite interesting. However, as mentioned above, I’ll refer you to Wikipedia or other sources if you’d want more detail.

Just note the most famous of those experiments: the 1887 Michelson-Morley experiment, also known as ‘the most famous failed experiment in history’ because, indeed, it failed to find any interference effects: the speed of light always was the speed of light, regardless of the direction of the beam with respect to the direction of motion of the Earth.

The Lorentz transformations

Once the scientists had recovered from this startling news (Michelson himself suffered from a nervous breakdown for a while, because he really wanted to find that interference effect in order to disprove Maxwell’s Laws), they suggested solutions.

The math was solved first. Indeed, just before the turn of the century, the Dutch physicist Hendrik Antoon Lorentz suggested that, if material bodies would contract in the direction of their motion with a factor (1 – u²/c²)^1/2 and, in addition, if time would also be dilated with a factor (1 – u²/c²)^–1/2, then the Michelson-Morley results could be explained. Of course, scientists objected to this ‘explanation’ as being very much ‘ad hoc’.

So then came Einstein. He just took the math for granted, so Einstein basically accepted the so-called Lorentz transformations that resulted from it, and corrected Newton’s Law in order to set physics right again.

And so that was it. As it turned out, all that was needed in fact, was to do away with the assumption that the inertia (or mass) of an object is a constant and, hence, that it does not vary with its velocity. For us, today, it seems obvious: mass also varies, and the factor involved is the very same Lorentz factor that we mentioned above: γ = (1 – u²/c²)^–1/2. Hence, the m in Newton’s Second Law (F = d(mv)/dt) is not a constant but equal to m = γm₀. For all speeds that we, human beings, can imagine (including the astronomical speed of the Earth in orbit around the Sun), the ‘correction’ is too small to be noticeable, or negligible, but so it’s there, as evidenced by the Michelson-Morley experiment, and, some hundred years later, we can actually verify it in particle accelerators.

As said, for us, today, it’s obvious (in my previous post, I mention a few examples: I explain how the mass of electrons in an electron beam is impacted by their speed, and how the lifetime of muon increases because of their speed) but one hundred years ago, it was not. Not at all – and so that’s why Einstein was a genius: he dared to explore and accept the non-obvious.

Now, what then are the correct transformations from one reference frame to another? They are referred to as the Lorentz transformations, and they can be written down (in a simplified form, assuming relative motion in the x direction only) as follows:

Now, I could point out many interesting implications, or come up with examples, but I will resist the temptation. I will only note two things about them:

1. These Lorentz transformations actually re-establish the principle of relativity: the Laws of Nature – including the Laws of Newton as corrected by Einstein’s relativistic mass formula – are of the same form in a moving system as in a stationary system, and therefore it is impossible to tell, by making experiments, whether the system is moving or not.

2. The second thing I should note is that the equations above imply that the idea of absolute time is no longer valid: there is no such thing as ‘absolute’ or ‘universal’ time. Indeed, Lorentz’ concept of ‘local time’ is a most profound departure from Newtonian mechanics that is implicit in these equations.

Indeed, space and time are entangled in these equations as you can see from the –ut and –ux/c² terms in the equation for x’ and t’ respectively and, hence, the idea of simultaneity has to be abandoned: what happens simultaneously in two separated places according to one observer, does not happen at the same time as viewed by an observer moving with respect to the first. Let me quickly show how.

Suppose that in my world I see two events happening at the same time t₀but so they happen at two different places x₁ and x₂. Now, if you are movingaway from me at a (uniform) speed u, then equation (4) tells us that you will see these two events happen at two different times t₁‘ and t₂‘, with the time difference t₁‘ – t₂‘ equal to t₁‘ – t₂‘ = γ[u(x₁ – x₂)/c²], with γ the above-mentioned Lorentz factor. [Just do the calculation for yourself using equation 4.]

Of course, the effect is negligible for most speeds that we, as human beings, can imagine, but it’s there. So we do not have three separate space coordinates and one time coordinates, but four space-time coordinates that transform together, fully entangled, when applying those four equations above.

That observation led the German mathematician Hermann Minkowski, who helped Einstein to develop his theory of four-dimensional space-time, to famously state that “Space of itself, and time of itself, will sink into mere shadows, and only a kind of union between them shall survive.”

Post scriptum: I did not elaborate on the second difficulty when I mentioned Maxwell’s equations: the lack of a need for a medium for light to travel through. I will let that rest for the moment (or, else, you can just Google some stuff on it). Just note that (1) it is kinda convenient that electromagnetic radiation does not need any medium (I can’t see how one would incorporate that in relativity theory) and (2) that light does seem to slow down in a medium. However, the explanation for that (i.e. for light to have an apparently lower speed in a medium) is to be found in quantum mechanics and so we won’t touch upon that complex matter here (for now that is). The point to note is that this slowing down is caused by light interacting with the matter it encounters as it travels through the medium. It does not actually go slower. However, I need to stop here as this is, yet again, a post which has become way too long. On the other hand, I am hopeful my kids will actually understand this one, because it does not involve integrals. 🙂

Another post for my kids: introducing (special) relativity

Original post:

In my previous post, I talked about energy, and I tried to keep it simple – but also accurate. However, to be completely accurate, one must, of course, introduce relativity at some point. So how does that work? What’s ‘relativistic’ energy? Well… Let me try to convey a few ideas here.

The first thing to note is that the energy conservation law still holds: special theory or not, the sum of the kinetic and potential energies in a (closed) system is always equal to some constant C. What constant? That doesn’t matter: Nature does not care about our zero point and, hence, we can add or subtract any (other) constant to the equation K.E. + P.E. = T + U = C.

That being said, in my previous post, I pointed out that the constant depends on the reference point for the potential energy term U: we will usually take infinity as the reference point (for a force that attracts) and associate it with zero potential (U = 0). We then get a function U(x) like the one below: for gravitational energy we have U(x) = –GMm/x, and for electrical charges, we have U(x) = q₁q₂/4πε₀x. The mathematical shape is exactly the same but, in the case of the electromagnetic forces, you have to remember that likes repel, and opposites attract, so we don’t need the minus sign: the sign of the charges takes care of it.

Minus sign? In case you wonder why we need that minus sign for the potential energy function, well… I explained that in my previous post and so I’ll be brief on that here: potential energy is measured by doing work against the force. That’s why. So we have an infinite sum (i.e. an integral) over some trajectory or path looking like this: U = – ∫F·ds.

For kinetic energy, we don’t need any minus sign: as an object picks up speed, it’s the force itself that is doing the work as its potential energy is converted into kinetic energy, so the change in kinetic energy will equal the change in potential energy, but with opposite sign: as the object loses potential energy, it gains kinetic energy. Hence, we write ΔT = –ΔU = ∫F·ds..

That’s all kids stuff obviously. Let’s go beyond this and ask some questions. First, why can we add or subtract any constant to the potential energy but not to the kinetic energy? The answer is… Well… We actually can add or subtract a ‘constant’ to the kinetic energy as well. Now you will shake your head: Huh? Didn’t we have that T = mv²/2 formula for kinetic energy? So how and why could one add or subtract some number to that?

Well… That’s where relativity comes into play. The velocity v depends on your reference frame. If another observer would move with and/or alongside the object, at the same speed, that observer would observe a velocity equal to zero and, hence, its kinetic energy – as that observer would measure it – would also be zero. You will object to that, saying that a change of reference frame does not change the force, and you’re right: the force will cause the object to accelerate or decelerate indeed, and if the observer is not subject to the same force, then he’ll see the object accelerate or decelerate indeed, regardless of his reference frame is a moving or inertial frame. Hence, both the inertial as well as the moving observer will see an increase (or decrease) in its kinetic energy and, therefore, both will conclude that its potential energy decreases (or increases) accordingly. In short, it’s the change in energy that matters, both for the potential as well as for the kinetic energy. The reference point itself, i.e. the point from where we start counting so to say, does not: that’s relative. [This also shows in the derivation for kinetic energy which I’ll do below.]

That brings us to the second question. We all learned in high school that mass and energy are related through Einstein’s mass-energy relation, E = mc², which establishes an equivalence between the two: the mass of an object that’s picking up speed increases, and so we need to look at both speed and mass as a function of time. Indeed, remember Newton’s Law: force is the time rate of change of momentum: F = d(mv)/dt. When the speed is low (i.e. non-relativistic), then we can just treat m as a constant and write that F = mdv/dt = ma (the mass times the acceleration). Treating m as a constant also allows us to derive the classical (Newtonian) formula for kinetic energy:

So if we assume that the velocity of the object at point O is equal to zero (so v_o = 0), then ΔT will be equal to T and we get what we were looking for: the kinetic energy at point P will be equal to T = mv²/2.

Now, you may wonder why we can’t do that same derivation for a non-constant mass? The answer to that question is simple: taking the m factor out of the integral can only be done if we assume it is a constant. If not, then we should leave it inside. It’s similar to taking a derivative. If m would not be constant, then we would have to apply the product rule to calculate d(mv)/dt, so we’d write d(mv)/dt = (dm/dt)v + m(dv/dt). So we have two terms here and it’s only when m is constant that we can reduce it to d(mv)/dt = m(dv/dt).

So we have our classical kinetic energy function. However, when the velocity gets really high – i.e. if it’s like the same order of magnitude as the velocity of light – then we cannot assume that mass is constant. Indeed, the same high-school course in physics that taught you that E = mc² equation will probably also have taught you that an object can never go faster than light, regardless of the reference frame. Hence, as the object goes faster and faster, it will pick up more momentum, but its rate of acceleration should (and will) go down in such way that the object can never actually reach the speed of light. Indeed, if Newton’s Law is to remain valid, we need to correct it such a way that m is no longer constant: m itself will increase as a function of its velocity and, hence, as a function of time. You’ll remember the formula for that:

This is often written as m = γm₀, with m₀ denoting the mass of the object at rest (in your reference frame that is) and γ = (1 – v²/c²)^–1/2the so-called Lorentz factor. The Lorentz factor is named after a Dutch physicist who introduced it near the end of the 19th century in order to explain why the speed of light is always c, regardless of the frame of reference (moving or not), or – in other words – why the speed of light is not relative. Indeed, while you’ll remember that there is no such thing as an absolute velocity according to the (special) theory of relativity, the velocity of light actually is absolute ! That means you will always see light traveling at speed c regardless of your reference frame. To put it simply, you can never catch up with light and, if you would be traveling away from some star in a spaceship with a velocity of 200,000 km per second, and a light beam from that star would pass you, you’d measure the speed of that light beam to be equal to 300,000 km/s, not 100,000 km/s. So c is an absolute speed that acts as an absolute speed limit regardless of your reference frame. [Note that we’re talking only about reference frames moving at a uniform speed: when acceleration comes into play, then we need to refer to the general theory of relativity and that’s a somewhat different ball game.]

The graph below shows how γ varies as a function of v. As you can see, the mass increase only becomes significant at speeds of like 100,000 km per second indeed. Indeed, for v = 0.3c, the Lorentz factor is 1.048, so the increase is about 5% only. For v = 0.5c, it’s still limited to an increase of some 15%. But then it goes up rapidly: for v = 0.9c, the mass is more than twice the rest mass: m ≈ 2.3m₀; for v = 0.99c, the mass increase is 600%: m ≈ 7m₀; and so on. For v = 0.999c – so when the speed of the object differs from c only by 1 part in 1,000 – the mass of the object will be more than twenty-two times the rest mass (m ≈ 22.4m₀).

You probably know that we can actually reach such speeds and, hence, verify Einstein’s correction of Newton’s Law in particle accelerators: the electrons in an electron beam in a particle accelerator get usually pretty close to c and have a mass that’s like 2000 times their rest mass. How do we know that? Because the magnetic field needed to deflect them is like 2000 times as great as their (theoretical) rest mass. So how fast do they go? For their mass to be 2000 times m₀, 1 – v²/c² must be equal to 1/4,000,000. Hence, their velocity v differs from c only by one part in 8,000,000. You’ll have to admit that’s very close.

Other effects of relativistic speeds

So we mentioned the thing that’s best known about Einstein’s (special) theory of relativity: the mass of an object, as measured by the inertial observer, increases with its speed. Now, you may or may not be familiar with two other things that come out of relativity theory as well:

The first is length contraction: objects are measured to be shortened in the direction of motion with respect to the (inertial) observer. The formula to be used incorporates the reciprocal of the Lorentz factor: L = (1/γ)L₀. For example, a meter stick in a space ship moving at a velocity v = 0.6c will appear to be only 80 cm to the external/inertial observer seeing it whizz past… That is if he can see anything at all of course: he’d have to take like a photo-finish picture as it zooms past ! 🙂
The second is time dilation, which is also rather well known – just like the mass increase effect – because of the so-called twin paradox: time will appear to be slower in that space ship and, hence, if you send one of two twins away on a space journey, traveling at such relativistic speed, he will come back younger than his brother. The formula here is a bit more complicated, but that’s only because we’re used to measure time in seconds. If we would take a more natural unit, i.e. the time it takes light to travel a distance of 1 m, then the formula will look the same as our mass formula: t = γt₀ and, hence, one ‘second’ in the space ship will be measured as 1.25 ‘seconds’ by the external observer. Hence, the moving clock will appear to run slower – to the external (inertial) observer that is.

Again, the reality of this can be demonstrated. You’ll remember that we introduced the muon in previous posts: muons resemble electrons in the sense that they have the same charge, but their mass is more than 200 times the mass of an electron. As compared to other unstable particles, their average lifetime is quite long: 2.2 microseconds. Still, that would not be enough to travel more than 600 meters or so – even at the speed of light (2.2 μs × 300,000 km/s = 660 m). But so we do detect muons in detectors down here that come all the way down from the stratosphere, where they are created when cosmic rays hit the Earth’s atmosphere some 10 kilometers up. So how do they get here if they decay so fast? Well, those that actually end up in those detectors, do indeed travel very close to the speed of light and, hence, while from their own point of view they live only like two millionths of a second, they live considerably longer from our point of view.

Relativistic energy: E = mc²

Let’s go back to our main story line: relativistic energy. We wrote above that it’s the change of energy that matters really. So let’s look at that.

You may or may not remember that the concept of work in physics is closely related to the concept of power. In fact, you may actually remember that power, in physics at least, is defined as the work done per second. Indeed, we defined work as the (dot) product of the force and the distance. Now, when we’re talking a differential distance only (i.e. an infinitesimally small change only), then we can write dT = F·ds, but when we’re talking something larger, then we have to do that integral: ΔT = ∫F·ds. However, we’re interested in the time rate of change of T here, and so that’s the time derivative dT/dt which, as you easily verify, will be equal to dT/dt = (F·ds)/dt = F·(ds/dt) = F·v and so we can use that differential formula and we don’t need the integral. Now, that (dot) product of the force and the velocity vectors is what’s referred to as the power. [Note that only the component of the force in the direction of motion contributes to the work done and, hence, to the power.]

OK. What am I getting at? Well… I just want to show an interesting derivation: if we assume, with Einstein, that mass and energy are equivalent and, hence, that the total energy of a body always equals E = mc², then we can actually derive Einstein’s mass formula from that. How? Well… If the time rate of change of the energy of an object is equal to the power expended by the forces acting on it, then we can write:

dE/dt = d(mc²)/dt = F·v

Now, we cannot take the mass out of those brackets after the differential operator (d) because the mass is not a constant in this case (relativistic speeds) and, hence, dm/dt ≠ 0. However, we can take out c² (that’s an absolute constant, remember?) and we can also substitute F using Newton’s Law (F = d(mv)/dt), again taking care to leave m between the brackets, not outside. So then we get:

d(mc²)/dt = c²dm/dt = [d(mv)/dt]·v = v·d(mv)/dt

In case you wonder why we can replace the vectors (bold face) v and d(mv) by their magnitudes (or lengths) v and d(mv): v and mv have the same direction and, hence, the angle θ between them is zero, and so v·v =│v││v│cosθ =v². Likewise, d(mv) and v also have the same direction and so we can just replace the dot product by the product of the magnitudes of those two vectors.

Now, let’s not forget the objective: we need to solve this equation for m and, hopefully, we’ll find Einstein’s mass formula, which we need to correct Newton’s Law. How do we do that? We’ll first multiply both sides by 2m. Why? Because we can then apply another mathematical trick, as shown below:

c²(2m)·dm/dt = 2mv·d(mv)/dt ⇔ d(m²c²)/dt = d(m²v²)/dt

However, if the derivatives of two quantities are equal, then the quantities themselves can only differ by a constant, say C. So we integrate both sides and get:

m²c²= m²v²+ C

Be patient: we’re almost there. The above equation must be true for all velocities v and, hence, we can choose the special case where v = 0 and call this mass m₀, and then substitute, so we get m₀c²= m₀0²+ C = C. Now we put this particular value for C back in the more general equation above and we get:

mc²= mv²+ m₀c²⇔ m= mv²/c² +m₀⇔ m(1 –v²/c²) = m₀⇔ m = m₀/(1 –v²/c²)^–1/2

So there we are: we have just shown that we get the relativistic mass formula (it’s on the right-hand side above) if we assume that Einstein’s mass-energy equivalence relation holds.

Now, you may wonder why that’s significant. Well… If you’re disappointed, then, at the very least, you’ll have to admit that it’s nice to show how everything is related to everything in this theory: from E = mc², we get m₀/(1 –v²/c²)^–1/2. I think that’s kinda neat!

In addition, let us analyze that mass-energy relation in another way. It actually allows us to re-define kinetic energy as the excess of a particle over its rest mass energy, or – it’s the same expression really – or the difference between its total energy and its rest energy.

How does that work? Well… When we’re looking at high-speed or high-energy particles, we will write the kinetic energy as:

K.E. = mc²– m₀c²= (m – m₀)c²= γm₀c²– m₀c²= m₀c²(γ – 1).

Now, we can expand that Lorentz factor γ = (1 – v²/c²)^–1/2into a binomial series (the binomial series is an infinite Taylor series, so it’s not to be confused with the (finite) binomial expansion: just check it online if you’re in doubt). If we do that, we we can write γ as an infinite sum of the following terms:

γ = 1 + (1/2)v²/c²+ (3/8)v⁴/c⁴+ (5/16)v⁶/c⁶+ …

Now, when we plug this back into our (relativistic) kinetic energy equation, we can scrap a few things (just do it) to get where I wanted to get:

K.E. = (1/2)m₀v²+ (3/8)m₀v⁴/c²+ (5/16)m₀v⁶/c⁴+ …

Again, you’ll wonder: so what? Well… See how the non-relativistic formula for kinetic energy (K.E. = m₀v²/2) appears here as the first term of this series and, hence, how the formula above shows that our ‘Newtonian’ formula is just an approximation. Of course, at low speeds, the second, third etcetera terms represent close to nothing and, hence, then our Newtonian ‘approximation is obviously pretty good of course !

OK… But… Now you’ll say: that’s fine, but how did Einstein get inspired to write E = mc² in the first place? Well, truth be told, the relativistic mass formula was derived first (i.e. before Einstein wrote his E = mc² equation), out of a derivation involving the momentum conservation law and the formulas we must use to convert the space-time coordinates from one reference frame to another when looking at phenomena (i.e. the so-called Lorentz transformations). And it was only afterwards that Einstein noted that, when expanding the relativistic mass formula, that the increase in mass of a body appeared to be equal to the increase in kinetic energy divided by c² (Δm = Δ(K.E.)/c²). Now, that, in turn, inspired him to also assign an equivalent energy to the rest mass of that body: E₀ = m₀c². […] At least that’s how Feynman tells the story in his 1965 Lectures… But so we’ve actually been doing it the other way around here!

Hmm… You will probably find all of this rather strange, and you may also wonder what happened to our potential energy. Indeed, that concept sort of ‘disappeared’ in this story: from the story above, it’s clear that kinetic energy has an equivalent mass, but what about potential energy?

That’s a very interesting question but, unfortunately, I can only give a rather rudimentary answer to that. Let’s suppose that we have two masses M and m. According to the potential energy formula above, the potential energy U between these two masses will then be equal to U = –GMm/r. Now, that energy is not interpreted as energy of either M or m, but as energy that is part of the (M, m) system, which includes the system’s gravitational field. So that energy is considered to be stored in that gravitational field. If the two masses would sit right on top of each other, then there would be no potential energy in the (M, m) system and, hence, the system as a whole would have less energy. In contrast, when we separate them further apart, then we increase the energy of the system as a whole, and so the system’s gravitational field then increases. So, yes, the potential energy does impact the (equivalent) mass of the system, but not the individual masses M and m. Does that make sense?

For me , it does, but I guess you’re a bit tired by now and, hence, I think I should wrap up here. In my next (and probably last) post on relativity, I’ll present those Lorentz transformations that allow us to ‘translate’ the space and time coordinates from one reference frame to another, and in that post I’ll also present the other derivation of Einstein’s relativistic mass formula, which is actually based on those transformations. In fact, I realize I should have probably started with that (as mentioned above, that’s how Feynman does it in his Lectures) but, then, for some reason, I find the presentation above more interesting, and so that’s why I am telling the story starting from another angle. I hope you don’t mind. In any case, it should be the same, because everything is related to everything in physics – just like in math. That’s why it’s important to have a good teacher. 🙂

A post for my kids: on energy and potential

Pre-scriptum (dated 26 June 2020): These posts on elementary math and physics for my kids (they are 21 and 23 now and no longer need such explanations) have not suffered much the attack by the dark force—which is good because I still like them. While my views on the true nature of light, matter and the force or forces that act on them have evolved significantly as part of my explorations of a more realist (classical) explanation of quantum mechanics, I think most (if not all) of the analysis in this post remains valid and fun to read. In fact, I find the simplest stuff is often the best. 🙂

Original post:

We’ve been juggling with a lot of advanced concepts in the previous post. Perhaps it’s time I write something that my kids can understand too. One of the things I struggled with when re-learning elementary physics is the concept of energy. What is energy really? I always felt my high school teachers did a poor job in trying to explain it. So let me try to do a better job here.

A high-school level course usually introduces the topic using the gravitational force, i.e. Newton’s Third Law: F = GmM/r². This law states that the force of attraction is proportional to the product of the masses m and M, and inversely proportional to the square of the distance r between those two masses. The factor of proportionality is equal to G, i.e. the so-called universal gravitational constant, aka the ‘big G’ (G ≈ 6.674×10^-11 N(m/kg)²), as opposed to the ‘little g’, which is the gravity of Earth (g ≈ 9.80665 m/s²). As far as I am concerned, it is at this point where my high-school teacher failed.

Indeed, he would just go on and simplify Newton’s Third Law by writing F = mg, noting that g = GM/r²and that, for all practical purposes, this g factor is constant, because we are talking small distances as compared to the radius of the Earth. Hence, we should just remember that the gravitational force is proportional to the mass only, and that one kilogram amounts to a weight of about 10 newton (9.80665 kg·m/s² (N) to be precise). That simplification would then be followed by another simplification: if we are lifting an object with mass m, we are doing work against the gravitational force. How much work? Well, he’d say, work is – quite simply – the force times the distance in physics, and the work done against the force is the potential energy (usually denoted by U) of that object. So he would write U = Fh = mgh, with h the height of the object (as measured from the surface of the Earth), and he would draw a nice linear graph like the one below (I set m to 10 kg here, and h ranges from 0 to 100 m).

Note that the slope of this line is slightly less than 45 degrees (and also note, of course, that it’s only approximately 45 degrees because of our choice of scale: dU/dh is equal to 98.0665, so if the x and y axes would have the same scale, we’d have a line that’s almost vertical).

So what’s wrong with this graph? Nothing. It’s just that this graph sort of got stuck in my head, and it complicated a more accurate understanding of energy. Indeed, with examples like the one above, one tends to forget that:

Such linear graphs are an approximation only. In reality, the gravitational field, and force fields in general, are not uniform and, hence, g is not a constant: the graph below shows how g varies with the height (but the height is expressed in kilometer this time, not in meter).
Not only is potential energy usually not a linear function but – equally important – it is usually not a positive real number either. In fact, in physics, U will usually take on a negative value. Why? Because we’re indeed measuring and defining it by the work done against the force.

So what’s the more accurate view of things? Well… Let’s start by noting that potential energy is defined in relation to some reference point and, taking a more universal point of view, that reference point will usually be infinity when discussing the gravitational (or electromagnetic) force of attraction. Now, the potential energy of the point(s) at infinity – i.e. the reference point – will, usually, be equated with zero. Hence, the potential energy curve will then take the shape of the graph below (y = –1/x), so U will vary from zero (0) to minus infinity (–∞) , as we bring the two masses closer together. You can readily see that the graph below makes sense: its slope is positive and, hence, as such it does capture the same idea as that linear mgh graph above: moving a mass from point 1 to point 2 requires work and, hence, the potential energy at point 2 is higher than at point 1, even if both values U(2) and U(1) are negative numbers, unlike the values of that linear mgh curve.

How do you get a curve like that? Well… I should first note another convention which is essential for making the sign come out alright: if the force is gravity, then we should write F = –GmMr/r³. So we have a minus sign here. And please do note the boldface type: F and r are vectors, and vectors have both a direction and magnitude – and so that’s why they are denoted by a bold letter (r), as opposed to the scalar quantities G, m, M or r).

Back to the minus sign. Why do we have that here? Well… It has to do with the direction of the force, which, in case of attraction, will be opposite to the so-called radius vector r. Just look at the illustration below, which shows, first, the direction of the force between two opposite electric charges (top) and then (bottom), the force between two masses, let’s say the Earth and the Moon.

So it’s a matter of convention really.

Now, when we’re talking the electromagnetic force, you know that likes repel and opposites attract, so two charges with the same sign will repel each other, and two charges with opposite sign will attract each other. So F₁₂, i.e. the force on q₂because of the presence of q₁, will be equal to F₁₂ = q₁q₂r/r³. Therefore, no minus sign is needed here because q₁and q₂ are opposite and, hence, the sign of this product will be negative. Therefore, we know that the direction of F comes out alright: it’s opposite to the direction of the radius vector r. So the force on a charge q₂ which is placed in an electric field produced by a charge q₁ is equal to F₁₂ = q₁q₂r/r³. In short, no minus sign needed here because we already have one. Of course, the original charge q₁ will be subject to the very same force and so we should write F₂₁ = –q₁q₂r/r³. So we’ve got that minus sign again now. In general, however, we’ll write F_ij = q_iq_jr/r³ when dealing with the electromagnetic force, so that’s without a minus sign, because the convention is to draw the radius vector from charge i to charge j and, hence, the radius vector r in the formula F₂₁ would point in the other direction and, hence, the minus sign is not needed.

In short, because of the way that the electromagnetic force works, the sign always come out right: there is no need for a minus sign in front. However, for gravity, there are no opposite charges: masses are always alike, and so likes actually attract when we’re talking gravity, and so that’s why we need the minus sign when dealing with the gravitational force: the force between a mass i and another mass j will always be written as F_ij = –m_im_jr/r³, so here we do have to put the minus sign, because the direction of the force needs to be opposite to the direction of the radius vector and so the sign of the ‘charges’ (i.e. the masses in this case), in the case of gravity, does not take care of that.

One last remark here may be useful: always watch out to not double-count forces when considering a system with many charges or many masses: both charges (or masses) feel the same force, but with opposite direction. OK. Let’s move on. If you are confused, don’t worry. Just remember that (1) it’s very important to be consistent when drawing that radius vector (it goes from the charge (or mass) causing the force field to the other charge (or mass) that is being brought in), and (2) that the gravitational and electromagnetic forces have a lot in common in terms of ‘geometry’ – notably that inverse proportionality relation with the square of the distance between the two charges or masses – but that we need to put a minus sign when we’re dealing with the gravitational force because, with gravitation, likes do not repel but attract each other, as opposed to electric charges.

Now, let’s move on indeed and get back to our discussion of potential energy. Let me copy that potential energy curve again and let’s assume we’re talking electromagnetics here, and that we’re have two opposite charges, so the force is one of attraction.

Hence, if we move one charge away from the other, we are doing work against the force. Conversely, if we bring them closer to each other, we’re working with the force and, hence, its potential energy will go down – from zero (i.e. the reference point) to… Well… Some negative value. How much work is being done? Well… The force changes all the time, so it’s not constant and so we cannot just calculate the force times the distance (Fs). We need to do one of those infinite sums, i.e. an integral, and so, for point 1 in the graph above, we can write:

Why the minus sign? Well… As said, we’re not increasing potential energy: we’re decreasing it, from zero to some negative value. If we’d move the charge from point 1 to the reference point (infinity), then we’d be doing work against the force and we’d be increasing potential energy. So then we’d have a positive value. If this is difficult, just think it through for a while and you’ll get there.

Now, this integral is somewhat special because F and s are vectors, and the F·ds product above is a so-called dot product between two vectors. The integral itself is a so-called path integral and so you may not have learned how to solve this one. But let me explain the dot product at least: the dot product of two vectors is the product of the magnitudes of those two vectors (i.e. their length) times the cosine of the angle between the two vectors:

F·ds =│F││ds│cosθ

Why that cosine? Well… To go from one point to another (from point 0 to point 1, for example), we can take any path really. [In fact, it is actually not so obvious that all paths will yield the same value for the potential energy: it is the case for so-called conservative forces only. But so gravity and the electromagnetic force are conservative forces and so, yes, we can take any path and we will find the same value.] Now, if the direction of the force and the direction of the displacement are the same, then that angle θ will be equal to zero and, hence, the dot product is just the product of the magnitudes (cos(0) = 1). However, if the direction of the force and the direction of the displacement are not the same, then it’s only the component of the force in the direction of the displacement that’s doing work, and the magnitude of that component is Fcosθ. So there you are: that explains why we need that cosine function.

Now, solving that ‘special’ integral is not so easy because the distance between the two charges at point 0 is zero and, hence, when we try to solve the integral by putting in the formula for F and finding the primitive and all that, you’ll find there’s a division by zero involved. Of course, there’s a way to solve the integral, but I won’t do it here. Just accept the general result here for U(r):

U(r) = q₁q₂/4πε₀r

You can immediately see that, because we’re dealing with opposite charges, U(r) will always be negative, while the limit of this function for r going to infinity is equal to zero indeed. Conversely, its limit equals –∞ for r going to zero. As for the 4πε₀factor in this formula, that factor plays the same role as the G-factor for gravity. Indeed, ε₀is an ubiquitous electric constant: ε₀≈ 8.854×10^-12 F/m, but it can be included in the value of the charges by choosing another unit and, hence, it’s often omitted – and that’s what I’ll also do here. Now, the same formula obviously applies to point 2 in the graph as well, and so now we can calculate the difference in potential energy between point 1 and point 2:

Does that make sense? Yes. We’re, once again, doing work against the force when moving the charge from point 1 to point 2. So that’s why we have a minus sign in front. As for the signs of q₁and q₂, remember these are opposite. As for the value of the (r₂ – r₁) factor, that’s obviously positive because r₂ > r₁. Hence, ΔU = U(1) – U(2) is negative. How do we interpret that? U(2) and U(1) are negative values, the difference between those two values, i.e. U(1) – U(2), is negative as well? Well… Just remember that ΔU is minus the work done to move the charge from point 1 to point 2. Hence, the change in potential energy (ΔU) is some negative value because the amount of work that needs to be done to move the charge from point 1 to point 2 is decidedly positive. Hence, yes, the charge has a higher energy level (albeit negative – but that’s just because of our convention which equates potential energy at infinity with zero) at point 2 as compared to point 1.

What about gravity? Well… That linear graph above is an approximation, we said, and it also takes r = h = 0 as the reference point but it assigns a value of zero for the potential energy there (as opposed to the –∞ value for the electromagnetic force above). So that graph is actually an linearization of a graph resembling the one below: we only start counting when we are on the Earth’s surface, so to say.

However, in a more advanced physics course, you will probably see the following potential energy function for gravity: U(r) = –GMm/r, and the graph of this function looks exactly the same as that graph we found for the potential energy between two opposite charges: the curve starts at point (0, –∞) and ends at point (∞, 0).

OK. Time to move on to another illustration or application: the covalent bond between two hydrogen atoms.

Application: the covalent bond between two hydrogen atoms

The graph below shows the potential energy as a function of the distance between two hydrogen atoms. Don’t worry about its exact mathematical shape: just try to understand it.

Natural hydrogen comes in H₂molecules, so there is a bond between two hydrogen atoms as a result of mutual attraction. The force involved is a chemical bond: the two hydrogen atoms share their so-called valence electron, thereby forming a so-called covalent bond (which is a form of chemical bond indeed, as you should remember from your high-school courses). However, one cannot push two hydrogen atoms too close, because then the positively charged nuclei will start repelling each other, and so that’s what is depicted above: the potential energy goes up very rapidly because the two atoms will repel each other very strongly.

The right half of the graph shows how the force of attraction vanishes as the two atoms are separated. After a while, the potential energy does not increase any more and so then the two atoms are free.

Again, the reference point does not matter very much: in the graph above, the potential energy is assumed to be zero at infinity (i.e. the ‘free’ state) but we could have chosen another reference point: it would only shift the graph up or down.

This brings us to another point: the law of energy conservation. For that, we need to introduce the concept of kinetic energy once again.

The formula for kinetic energy

In one of my previous posts, I defined the kinetic energy of an object as the excess energy over its rest energy:

K.E. = T = mc²– m₀c²= γm₀c²– m₀c²= (γ–1)m₀c²

γ is the Lorentz factor in this formula (γ = (1–v²/c²)^-1/2), and I derived the T = mv²/2 formula for the kinetic energy from a Taylor expansion of the formula above, noting that K.E. = mv²/2 is actually an approximation for non-relativistic speeds only, i.e. speeds that are much less than c and, hence, have no impact on the mass of the object: so, non-relativistic means that, for all practical purposes, m = m₀. Now, if m = m₀, then mc²– m₀c²is equal to zero ! So how do we derive the kinetic energy formula for non-relativistic speeds then? Well… We must apply another method, using Newton’s Law: the force equals the time rate of change of the momentum of an object. The momentum of an object is denoted by p (it’s a vector quantity) and is the product of its mass and its velocity (p = mv), so we can write

F = d(mv)/dt (again, all bold letters denote vectors).

When the speed is low (i.e. non-relativistic), then we can just treat m as a constant and so we can write F = mdv/dt = ma (the mass times the acceleration). If m would not be constant, then we would have to apply the product rule: d(mv) = (dm/dt)v + m(dv/dt), and so then we would have two terms instead of one. Treating m as a constant also allows us to derive the classical (Newtonian) formula for kinetic energy:

Energy conservation

Now, the total energy – potential and kinetic – of an object (or a system) has to remain constant, so we have E = T + U = constant. As a consequence, the time derivative of the total energy must equal zero. So we have:

E = T + U = constant, and dE/dt = 0

Can we prove that with the formulas T = mv²/2 and U = q₁q₂/4πε₀r? Yes, but the proof is a bit lengthy and so I won’t prove it here. [We need to take the derivatives ∂T/∂t and ∂U/∂t and show that these derivatives are equal except for the sign, which is opposite, and so the sum of those two derivatives equals zero. Note that ∂T/∂t = (dT/dv)(dv/dt) and that ∂U/∂t = (dU/dr)(dr/dt), so you have to use the chain rule for derivatives here.] So just take a mental note of that and accept the result:

(1) mv²/2 + q₁q₂/4πε₀r = constant when the electromagnetic force is involved (no minus sign, because the sign of the charges makes things come out alright), and
(2) mv²/2 – GMm/r = constant when the gravitational force is involved (note the minus sign, for the reason mentioned above: when the gravitational force is involved, we need to reverse the sign).

We can also take another example: an oscillating spring. When you try to compress a (linear) spring, the spring will push back with a force equal to F = kx. Hence, the energy needed to compress a (linear) spring a distance x from its equilibrium position can be calculated from the same integral/infinite sum formula: you will get U = kx²/2 as a result. Indeed, this is an easy integral (not a path integral), and so let me quickly solve it:

While that U = kx²/2 formula looks similar to the kinetic energy formula, you should note that it’s a function of the position, not of velocity, and that the formula does not involve the mass of the object we’re attaching to the string. So it’s a different animal altogether. However, because of the energy conservation law, the graph of both the potential and kinetic energy will obviously reflect each other, just like the energy graphs of a swinging pendulum, as shown below. We have:

T + U = mv²/2 + kx²/2 = C

Note: The graph above mentions an ‘ideal’ pendulum because, in reality, there will be an energy loss due to friction and, hence, the pendulum will slowly stop, as shown below. Hence, in reality, energy is conserved, but it leaks out of the system we are observing here: it gets lost as heat, which is another form of kinetic energy actually.

Another application: estimating the radius of an atom

A very nice application of the energy concepts introduced above is the so-called Bohr model of a hydrogen atom. Feynman introduces that model as an estimate of the size (or radius) of an atom (see Feynman’s Lectures, Vol. III, p. 2-6). The argument is the following.

The radius of an atom is more or less the spread (usually denoted by Δ or σ) in the position of the electron, so we can write that Δx = a. In words, the uncertainty about the position is the radius a. Now, we know that the uncertainty about the position (x) also determines the uncertainty about the momentum (p = mv) of the electron because of the Uncertainty Principle ΔxΔp ≥ ħ/2 (ħ ≈ 6.6×10^-16eV·s). The principle is illustrated below, and in a previous posts I proved the relationship. [Note that k in the left graph actually represents the wave number of the de Broglie wave, but wave number and momentum are related through the de Broglie relation p = ħk.]

Hence, the order of magnitude of the momentum of the electron will – very roughly – be p ≈ ħ/a. [Note that Feynman doesn’t care about factors 2 or π or even 2π (h = 2πħ): the idea is just to get the order of magnitude (Feynman calls it a ‘dimensional analysis’), and that he actually equates p with p = h/a, so he doesn’t use the reduced Planck constant (ħ).]

Now, the electron’s potential energy will be given by that U(r) = q₁q₂/4πε₀r formula above, with q₁= e (the charge of the proton) and q₂= –e (i.e. the charge of the electron), so we can simplify this to –e²/a.

The kinetic energy of the electron is given by the usual formula: T = mv²/2. This can be written as T = mv²/2 = m²v²/2m = p²/2m = h²/2ma². Hence, the total energy of the electron is given by

E = T + U = h²/2ma²– e²/a

What does this say? It says that the potential energy becomes smaller as a gets smaller (that’s because of the minus sign: when we say ‘smaller’, we actually mean a larger negative value). However, as it gets closer to the nucleus, it kinetic energy increases. In fact, the shape of this function is similar to that graph depicting the potential energy of a covalent bond as a function of the distance, but you should note that the blue graph below is the total energy (so it’s not only potential energy but kinetic energy as well).

I guess you can now anticipate the rest of the story. The electron will be there where its total energy is minimized. Why? Well… We could call it the minimum energy principle, but that’s usually used in another context (thermodynamics). Let me just quote Feynman here, because I don’t have a better explanation: “We do not know what a is, but we know that the atom is going to arrange itself to make some kind of compromise so that the energy is as little as possible.”

He then calculates, as expected, the derivative dE/da, which equals dE/da = –h²/ma³+ e²/a². Setting dE/da equal to zero, we get the ‘optimal’ value for a:

a₀= h²/me²=0.528×10^-10m = 0.528 Å (angstrom)

Note that this calculation depends on the value one uses for e: to be correct, we need to put the 4πε₀ factor back in. You also need to ensure you use proper and compatible units for all factors. Just try a couple of times and you should find that 0.528 value.

Of course, the question is whether or not this back-of-the-envelope calculation resembles anything real? It does: this number is very close to the so-called Bohr radius, which is the most probable distance between the proton and and the electron in a hydrogen atom (in its ground state) indeed. The Bohr radius is an actual physical constant and has been measured to be about 0.529 angstrom. Hence, for all practical purposes, the above calculation corresponds with reality. [Of course, while Feynman started with writing that we shouldn’t trust our answer within factors like 2, π, etcetera, he concludes his calculation by noting that he used all constants in such a way that it happens to come out the right number. :-)]

The corresponding energy for this value for a can be found by putting the value a₀back into the total energy equation, and then we find:

E₀= –me⁴/2h²= –13.6 eV

Again, this corresponds to reality, because this is the energy that is needed to kick an electron out of its orbit or, to use proper language, this is the energy that is needed to ionize a hydrogen atom (it’s referred to as a Rydberg of energy). By way of conclusion, let me quote Feynman on what this negative energy actually means: “[Negative energy] means that the electron has less energy when it is in the atom than when it is free. It means it is bound. It means it takes energy to kick the electron out.”

That being said, as we pointed out above, it is all a matter of choosing our reference point: we can add or subtract any constant C to the energy equation: E + C = T + U + C will still be constant and, hence, respect the energy conservation law. But so I’ll conclude here and – of course – check if my kids understand any of this.

And what about potential?

Oh – yes. I forgot. The title of this post suggests that I would also write something on what is referred to as ‘potential’, and it’s not the same as potential energy. So let me quickly do that.

By now, you are surely familiar with the idea of a force field. If we put a charge or a mass somewhere, then it will create a condition such that another charge or mass will feel a force. That ‘condition’ is referred to as the field, and one represents a field by field vectors. For a gravitational field, we can write:

F = mC

C is the field vector, and F is the force on the mass that we would ‘supply’ to the field for it to act on. Now, we can obviously re-write that integral for the potential energy as

U = –∫F·ds = –m∫C·ds = mΨ with Ψ (read: psi) = ∫C·ds = the potential

So we can say that the potential Ψ is the potential energy of a unit charge or a unit mass that would be placed in the field. Both C (a vector) as well Ψ (a scalar quantity, i.e. a real number) obviously vary in space and in time and, hence, are a function of the space coordinates x, y and z as well as the time coordinate t. However, let’s leave time out for the moment, in order to not make things too complex. [And, of course, I should not say that this psi has nothing to do with the probability wave function we introduced in previous posts. Nothing at all. It just happens to be the same symbol.]

Now, U is an integral, and so it can be shown that, if we know the potential energy, we also know the force. Indeed, the x-, y and z-component of the force is equal to:

F_x= – ∂U/∂x, F_y= – ∂U/∂y, F_z= – ∂U/∂z or, using the grad (gradient) operator: F = –∇U

Likewise, we can recover the field vectors C from the potential function Ψ:

C_x= – ∂Ψ/∂x, C_y= – ∂Ψ/∂y, C_z= – ∂Ψ/∂z, or C = –∇Ψ

That grad operator is nice: it makes a vector function out of a scalar function.

In the ‘electrical case’, we will write:

F = qE

And, likewise,

U = –∫F·ds = –q∫E·ds = qΦ with Φ (read: phi) = ∫E·ds = the electrical potential.

Unlike the ‘psi’ potential, the ‘phi’ potential is well known to us, if only because it’s expressed in volts. In fact, when we say that a battery or a capacitor is charged to a certain voltage, we actually mean the voltage difference between the parallel plates of which the capacitor or battery consists, so we are actually talking the difference in electrical potential ΔΦ = Φ₁– Φ₂., which we also express in volts, just like the electrical potential itself.

Post scriptum:

The model of the atom that is implied in the above derivation is referred to as the so-called Bohr model. It is a rather primitive model (Wikipedia calls it a ‘first-order approximation’) but, despite its limitations, it’s a proper quantum-mechanical view of the hydrogen atom and, hence, Wikipedia notes that “it is still commonly taught to introduce students to quantum mechanics.” Indeed, that’s Feynman also uses it in one of his first Lectures on Quantum Mechanics (Vol. III, Chapter 2), before he moves on to more complex things.

Some content on this page was disabled on June 20, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/