Music and Physics

Pre-scriptum (dated 26 June 2020): These posts on elementary math and physics have not suffered much the attack by the dark force—which is good because I still like them. While my views on the true nature of light, matter and the force or forces that act on them have evolved significantly as part of my explorations of a more realist (classical) explanation of quantum mechanics, I think most (if not all) of the analysis in this post remains valid and fun to read. In fact, I find the simplest stuff is often the best. 🙂

Original post:

My first working title for this post was Music and Modes. Yes. Modes. Not moods. The relation between music and moods is an interesting research topic as well but so it’s not what I am going to write about. 🙂

It started with me thinking I should write something on modes indeed, because the concept of a mode of a wave, or any oscillator really, is quite central to physics, both in classical physics as well as in quantum physics (quantum-mechanical systems are analyzed as oscillators too!). But I wondered how to approach it, as it’s a rather boring topic if you look at the math only. But then I was flying back from Europe, to Asia, where I live and, as I am also playing a bit of guitar, I suddenly wanted to know why we like music. And then I thought that’s a question you may have asked yourself at some point of time too! And so then I thought I should write about modes as part of a more interesting story: a story about music—or, to be precise, a story about the physics behind music. So… Let’s go for it.

Philosophy versus physics

There is, of course, a very simple answer to the question of why we like music: we like music because it is music. If it would not be music, we would not like it. That’s a rather philosophical answer, and it probably satisfies most people. However, for someone studying physics, that answer can surely not be sufficient. What’s the physics behind? I reviewed Feynman’s Lecture on sound waves in the plane, combined it with some other stuff I googled when I arrived, and then I wrote this post, which gives you a much less philosophical answer. 🙂

The observation at the center of the discussion is deceptively simple: why is it that similar strings (i.e. strings made of the same material, with the same thickness, etc), under the same tension but differing in length, sound ‘pleasant’ when sounded together if – and only if – the ratio of the length of the strings is like 1:2, 2:3, 3:4, 3:5, 4:5, etc (i.e. like whatever other ratio of two small integers)?

You probably wonder: is that the question, really? It is. The question is deceptively simple indeed because, as you will see in a moment, the answer is quite complicated. So complicated, in fact, that the Pythagoreans didn’t have any answer. Nor did anyone else for that matter—until the 18th century or so, when musicians, physicists and mathematicians alike started to realize that a string (of a guitar, or a piano, or whatever instrument Pythagoras was thinking of at the time), or a column of air (in a pipe organ or a trumpet, for example), or whatever other thing that actually creates the musical tone, actually oscillates at numerous frequencies simultaneously.

The Pythagoreans did not suspect that a string, in itself, is a rather complicated thing – something which physicists refer to as a harmonic oscillator – and that its sound, therefore, is actually produced by many frequencies, instead of only one. The concept of a pure note, i.e. a tone that is free of harmonics (i.e. free of all other frequencies, except for the fundamental frequency) also didn’t exist at the time. And if it did, they would not have been able to produce a pure tone anyway: producing pure tones – or notes, as I’ll call them, somewhat inaccurately (I should say: a pure pitch) – is remarkably complicated, and they do not exist in Nature. If the Pythagoreans would have been able to produce pure tones, they would have observed that pure tones do not give any sensation of consonance or dissonance if their relative frequencies respect those simple ratios. Indeed, repeated experiments, in which such pure tones are being produced, have shown that human beings can’t really say whether it’s a musical sound or not: it’s just sound, and it’s neither pleasant (or consonant, we should say) or unpleasant (i.e. dissonant).

The Pythagorean observation is valid, however, for actual (i.e. non-pure) musical tones. In short, we need to distinguish between tones and notes (i.e. pure tones): they are two very different things, and the gist of the whole argument is that musical tones coming out of one (or more) string(s) under tension are full of harmonics and, as I’ll explain in a minute, that’s what explains the observed relation between the lengths of those strings and the phenomenon of consonance (i.e. sounding ‘pleasant’) or dissonance (i.e. sounding ‘unpleasant’).

Of course, it’s easy to say what I say above: we’re 2015 now, and so we have the benefit of hindsight. Back then – so that’s more than 2,500 years ago! – the simple but remarkable fact that the lengths of similar strings should respect some simple ratio if they are to sound ‘nice’ together, triggered a fascination with number theory (in fact, the Pythagoreans actually established the foundations of what is now known as number theory). Indeed, Pythagoras felt that similar relationships should also hold for other natural phenomena! To mention just one example, the Pythagoreans also believed that the orbits of the planets would also respect such simple numerical relationships, which is why they talked of the ‘music of the spheres’ (Musica Universalis).

We now know that the Pythagoreans were wrong. The proportions in the movements of the planets around the Sun do not respect simple ratios and, with the benefit of hindsight once again, it is regrettable that it took many courageous and brilliant people, such as Galileo Galilei and Copernicus, to convince the Church of that fact. 😦 Also, while Pythagoras’ observations in regard to the sounds coming out of whatever strings he was looking at were correct, his conclusions were wrong: the observation does not imply that the frequencies of musical notes should all be in some simple ratio one to another.

Let me repeat what I wrote above: the frequencies of musical notes are not in some simple relationship one to another. The frequency scale for all musical tones is logarithmic and, while that implies that we can, effectively, do some tricks with ratios based on the properties of the logarithmic scale (as I’ll explain in a moment), the so-called ‘Pythagorean’ tuning system, which is based on simple ratios, was plain wrong, even if it – or some variant of it (instead of the 3:2 ratio, musicians used the 5:4 ratio from about 1510 onwards) – was generally used until the 18th century! In short, Pythagoras was wrong indeed—in this regard at least: we can’t do much with those simple ratios.

Having said that, Pythagoras’ basic intuition was right, and that intuition is still very much what drives physics today: it’s the idea that Nature can be described, or explained (whatever that means), by quantitative relationships only. Let’s have a look at how it actually works for music.

Tones, noise and notes

Let’s first define and distinguish tones and notes. A musical tone is the opposite of noise, and the difference between the two is that musical tones are periodic waveforms, so they have a period T, as illustrated below. In contrast, noise is a non-periodic waveform. It’s as simple as that.

Now, from previous posts, you know we can write any period function as the sum of a potentially infinite number of simple harmonic functions, and that this sum is referred to as the Fourier series. I am just noting it here, so don’t worry about it as for now. I’ll come back to it later.

You also know we have seven musical notes: Do-Re-Mi-Fa-Sol-La-Si or, more common in the English-speaking world, A-B-C-D-E-F-G. And then it starts again with A (or Do). So we have two notes, separated by an interval which is referred to as an octave (from the Greek octo, i.e. eight), with six notes in-between, so that’s eight notes in total. However, you also know that there are notes in-between, except between E and F and between B and C. They are referred to as semitones or half-steps. I prefer the term ‘half-step’ over ‘semitone’, because we’re talking notes really, not tones.

We have, for example, F–sharp (denoted by F#), which we can also call G-flat (denoted by Gb). It’s the same thing: a sharp # raises a note by a semitone (aka half-step), and a flat b lowers it by the same amount, so F# is Gb. That’s what shown below: in an octave, we have eight notes but twelve half-steps.

Let’s now look at the frequencies. The frequency scale above (expressed in oscillations per second, so that’s the hertz unit) is a logarithmic scale: frequencies double as we go from one octave to another: the frequency of the C4 note above (the so-called middle C) is 261.626 Hz, while the frequency of the next C note (C5) is double that: 523.251 Hz. [Just in case you’d want to know: the 4 and 5 number refer to its position on a standard 88-key piano keyboard: C4 is the fourth C key on the piano.]

Now, if we equate the interval between C4 and C5 with 1 (so the octave is our musical ‘unit’), then the interval between the twelve half-steps is, obviously, 1/12. Why? Because we have 12 halve-steps in our musical unit. You can also easily verify that, because of the way logarithms work, the ratio of the frequencies of two notes that are separated by one half-step (between D# and E, for example) will be equal to 2^1/12. Likewise, the ratio of the frequencies of two notes that are separated by n half-steps is equal to 2^n/12. [In case you’d doubt, just do an example. For instance, if we’d denote the frequency of C4 as f₀, and the frequency of C# as f₁ and so on (so the frequency of D is f₂, the frequency of C5 is f₁₂, and everything else is in-between), then we can write the f₂/f₀ratio as f₂/f₀= ( f₂/f₁)(f₁/f₀) = 2^1/12·2^1/12 = 2^2/12= 2^1/6. I must assume you’re smart enough to generalize this result yourself, and that f₁₂/f₀is, obviously, equal to 2^12/12=2¹ = 2, which is what it should be!]

Now, because the frequencies of the various C notes are expressed as a number involving some decimal fraction (like 523.251 Hz, and the 0.251 is actually an approximation only), and because they are, therefore, a bit hard to read and/or work with, I’ll illustrate the next idea – i.e. the concept of harmonics – with the A instead of the C. 🙂

Harmonics

The lowest A on a piano is denoted by A0, and its frequency is 27.5 Hz. Lower A notes exist (we have one at 13.75 Hz, for instance) but we don’t use them, because they are near (or actually beyond) the limit of the lowest frequencies we can hear. So let’s stick to our grand piano and start with that 27.5 Hz frequency. The next A note is A1, and its frequency is 55 Hz. We then have A2, which is like the A on my (or your) guitar: its frequency is equal to 2×55 = 110 Hz. The next is A3, for which we double the frequency once again: we’re at 220 Hz now. The next one is the A in the illustration of the C scale above: A4, with a frequency of 440 Hz.

[Let me, just for the record, note that the A4 note is the standard tuning pitch in Western music. Why? Well… There’s no good reason really, except convention. Indeed, we can derive the frequency of any other note from that A4 note using our formula for the ratio of frequencies but, because of the properties of a logarithmic function, we could do the same using whatever other note really. It’s an important point: there’s no such thing as an absolute reference point in music: once we define our musical ‘unit’ (so that’s the so-called octave in Western music), and how many steps we want to have in-between (so that’s 12 steps—again, in Western music, that is), we get all the rest. That’s just how logarithms work. So music is all about structure, i.e. mathematical relationships. Again, Pythagoras’ conclusions were wrong, but his intuition was right.]

Now, the notes we are talking about here are all so-called pure tones. In fact, when I say that the A on our guitar is referred to as A2 and that it has a frequency of 110 Hz, then I am actually making a huge simplification. Worse, I am lying when I say that: when you play a string on a guitar, or when you strike a key on a piano, all kinds of other frequencies – so-called harmonics – will resonate as well, and that’s what gives the quality to the sound: it’s what makes it sound beautiful. So the fundamental frequency (aka as first harmonic) is 110 Hz alright but we’ll also have second, third, fourth, etc harmonics with frequency 220 Hz, 330 Hz, 440 Hz, etcetera. In music, the basic or fundamental frequency is referred to as the pitch of the tone and, as you can see, I often use the term ‘note’ (or pure tone) as a synonym for pitch—which is more or less OK, but not quite correct actually. [However, don’t worry about it: my sloppiness here does not affect the argument.]

What’s the physics behind? Look at the illustration below (I borrowed it from the Physics Classroom site). The thick black line is the string, and the wavelength of its fundamental frequency (i.e. the first harmonic) is twice its length, so we write λ₁ = 2·L or, the other way around, L = (1/2)·λ₁. Now that’s the so-called first mode of the string. [One often sees the term fundamental or natural or normal mode, but the adjective is not necessary really. In fact, I find it confusing, although I sometimes find myself using it too.]

We also have a second, third, etc mode, depicted below, and these modes correspond to the second, third, etc harmonic respectively.

For the second, third, etc mode, the relationship between the wavelength and the length of the string is, obviously, the following: L = (2/2)·λ₂= λ₂, L = L = (3/2)·λ₃, etc. More in general, for the n^th mode, L will be equal to L = (n/2)·λ_n, with n = 1, 2, etcetera. In fact, because L is supposed to be some fixed length, we should write it the other way around: λ_n = (2/n)·L.

What does it imply for the frequencies? We know that the speed of the wave – let’s denote it by c – as it travels up and down the string, is a property of the string, and it’s a property of the string only. In other words, it does not depend on the frequency. Now, the wave velocity is equal to the frequency times the wavelength, always, so we have c = f·λ. To take the example of the (classical) guitar string: its length is 650 mm, i.e. 0.65 m. Hence, the identities λ₁ = (2/1)·L, λ₂ = (2/2)·L, λ₃ = (2/3)·L etc become λ₁ = (2/1)·0.65 = 1.3 m, λ₂ = (2/2)·0.65 = 0.65 m, λ₃ = (2/3)·0.65 = 0.433.. m and so on. Now, combining these wavelengths with the above-mentioned frequencies, we get the wave velocity c = (110 Hz)·(1.3 m) = (220 Hz)·(0.65 m) = (330 Hz)·(0.433.. m) = 143 m/s.

Let me now get back to Pythagoras’ string. You should note that the frequencies of the harmonics produced by a simple guitar string are related to each other by simple whole number ratios. Indeed, the frequencies of the first and second harmonics are in a simple 2 to 1 ratio (2:1). The second and third harmonics have a 3:2 frequency ratio. The third and fourth harmonics a 4:3 ratio. The fifth and fourth harmonic 5:4, and so on and so on. They have to be. Why? Because the harmonics are simple multiples of the basic frequency. Now that is what’s really behind Pythagoras’ observation: when he was sounding similar strings with the same tension but different lengths, he was making sounds with the same harmonics. Nothing more, nothing less.

Let me be quite explicit here, because the point that I am trying to make here is somewhat subtle. Pythagoras’ string is Pythagoras’ string: he talked similar strings. So we’re not talking some actual guitar or a piano or whatever other string instrument. The strings on (modern) string instruments are not similar, and they do not have the same tension. For example, the six strings of a guitar strings do not differ in length (they’re all 650 mm) but they’re different in tension. The six strings on a classical guitar also have a different diameter, and the first three strings are plain strings, as opposed to the bottom strings, which are wound. So the strings are not similar but very different indeed. To illustrate the point, I copied the values below for just one of the many commercially available guitar string sets. It’s the same for piano strings. While they are somewhat more simple (they’re all made of piano wire, which is very high quality steel wire basically), they also differ—not only in length but in diameter as well, typically ranging from 0.85 mm for the highest treble strings to 8.5 mm (so that’s ten times 0.85 mm) for the lowest bass notes.

In short, Pythagoras was not playing the guitar or the piano (or whatever other more sophisticated string instrument that the Greeks surely must have had too) when he was thinking of these harmonic relationships. The physical explanation behind his famous observation is, therefore, quite simple: musical tones that have the same harmonics sound pleasant, or consonant, we should say—from the Latin con-sonare, which, literally, means ‘to sound together’ (from sonare = to sound and con = with). And otherwise… Well… Then they do not sound pleasant: they are dissonant.

To drive the point home, let me emphasize that, when we’re plucking a string, we produce a sound consisting of many frequencies, all in one go. One can see it in practice: if you strike a lower A string on a piano – let’s say the 110 Hz A2 string – then its second harmonic (220 Hz) will make the A3 string vibrate too, because it’s got the same frequency! And then its fourth harmonic will make the A4 string vibrate too, because they’re both at 440 Hz. Of course, the strength of these other vibrations (or their amplitude we should say) will depend on the strength of the other harmonics and we should, of course, expect that the fundamental frequency (i.e. the first harmonic) will absorb most of the energy. So we pluck one string, and so we’ve got one sound, one tone only, but numerous notes at the same time!

In this regard, you should also note that the third harmonic of our 110 Hz A2 string corresponds to the fundamental frequency of the E4 tone: both are 330 Hz! And, of course, the harmonics of E, such as its second harmonic (2·330 Hz = 660 Hz) correspond to higher harmonics of A too! To be specific, the second harmonic of our E string is equal to the sixth harmonic of our A2 string. If your guitar is any good, and if your strings are of reasonable quality too, you’ll actually see it: the (lower) E and A strings co-vibrate if you play the A major chord, but by striking the upper four strings only. So we’ve got energy – motion really – being transferred from the four strings you do strike to the two strings you do not strike! You’ll say: so what? Well… If you’ve got any better proof of the actuality (or reality) of various frequencies being present at the same time, please tell me! 🙂

So that’s why A and E sound very well together (A, E and C#, played together, make up the so-called A major chord): our ear likes matching harmonics. And so that why we like musical tones—or why we define those tones as being musical! 🙂 Let me summarize it once more: musical tones are composite sound waves, consisting of a fundamental frequency and so-called harmonics (so we’ve got many notes or pure tones altogether in one musical tone). Now, when other musical tones have harmonics that are shared, and we sound those notes too, we get the sensation of harmony, i.e. the combination sounds consonant.

Now, i’s not difficult to see that we will always have such shared harmonics if we have similar strings, with the same tension but different lengths, being sounded together. In short, what Pythagoras observed has nothing much to do with notes, but with tones. Let’s go a bit further in the analysis now by introducing some more math. And, yes, I am very sorry: it’s the dreaded Fourier analysis indeed! 🙂

Fourier analysis

You know that we can decompose any periodic function into a sum of a (potentially infinite) series of simple sinusoidal functions, as illustrated below. I took the illustration from Wikipedia: the red function s₆(x) is the sum of six sine functions of different amplitudes and (harmonically related) frequencies. The so-called Fourier transform S(f) (in blue) relates the six frequencies with the respective amplitudes.

In light of the discussion above, it is easy to see what this means for the sound coming from a plucked string. Using the angular frequency notation (so we write everything using ω instead of f), we know that the normal or natural modes of oscillation have frequencies ω = 2π/T = 2πf (so that’s the fundamental frequency or first harmonic), 2ω (second harmonic), 3ω (third harmonic), and so on and so on.

Now, there’s no reason to assume that all of the sinusoidal functions that make up our tone should have the same phase: some phase shift Φ may be there and, hence, we should write our sinusoidal function not as cos(ωt), but as cos(ωt + Φ) in order to ensure our analysis is general enough. [Why not a sine function? It doesn’t matter: the cosine and sine function are the same, except for another phase shift of 90° = π/2.] Now, from our geometry classes, we know that we can re-write cos(ωt + Φ) as

cos(ωt + Φ) = [cos(Φ)cos(ωt) – sin(Φ)sin(ωt)]

We have a lot of these functions of course – one for each harmonic, in fact – and, hence, we should use subscripts, which is what we do in the formula below, which says that any function f(t) that is periodic with the period T can be written mathematically as:

You may wonder: what’s that period T? It’s the period of the fundamental mode, i.e. the first harmonic. Indeed, the period of the second, third, etc harmonic will only be one half, one third etcetera of the period of the first harmonic. Indeed, T₂ = (2π)/(2ω) = (1/2)·(2π)/ω = (1/2)·T₁, and T₃ = (2π)/(3ω) = (1/3)·(2π)/ω = (1/3)·T₁, and so on. However, it’s easy to see that these functions also repeat themselves after two, three, etc periods respectively. So all is alright, and the general idea behind the Fourier analysis is further illustrated below. [Note that both the formula as well as the illustration below (which I took from Feynman’s Lectures) add a ‘zero-frequency term’ a₀ to the series. That zero-frequency term will usually be zero for a musical tone, because the ‘zero’ level of our tone will be zero indeed. Also note that the a_n and b_n coefficients are, of course, equal to a_n = cos Φ_nand b_n= –sinΦ_n, so you can relate the illustration and the formula easily.]

You’ll say: What the heck! Why do we need the mathematical gymnastics here? It’s just to understand that other characteristic of a musical tone: its quality (as opposed to its pitch). A so-called rich tone will have strong harmonics, while a pure tone will only have the first harmonic. All other characteristics – the difference between a tone produced by a violin as opposed to a piano – are then related to the ‘mix’ of all those harmonics.

So we have it all now, except for loudness which is, of course, related to the magnitude of the air pressure changes as our waveform moves through the air: pitch, loudness and quality. that’s what makes a musical tone. 🙂

Dissonance

As mentioned above, if the sounds are not consonant, they’re dissonant. But what is dissonance really? What’s going on? The answer is the following: when two frequencies are near to a simple fraction, but not exact, we get so-called beats, which our ear does not like.

Huh? Relax. The illustration below, which I copied from the Wikipedia article on piano tuning, illustrates the phenomenon. The blue wave is the sum of the red and the green wave, which are originally identical. But then the frequency of the green wave is increased, and so the two waves are no longer in phase, and the interference results in a beating pattern. Of course, our musical tone involves different frequencies and, hence, different periods T₁,T₂, T₃etcetera, but you get the idea: the higher harmonics also oscillate with period T₁, and if the frequencies are not in some exact ratio, then we’ll have a similar problem: beats, and our ear will not like the sound.

Of course, you’ll wonder: why don’t we like beats in tones? We can ask that, can’t we? It’s like asking why we like music, isn’t it? […] Well… It is and it isn’t. It’s like asking why our ear (or our brain) likes harmonics. We don’t know. That’s how we are wired. The ‘physical’ explanation of what is musical and what isn’t only goes so far, I guess. 😦

Pythagoras versus Bach

From all of what I wrote above, it is obvious that the frequencies of the harmonics of a musical tone are, indeed, related by simple ratios of small integers: the frequencies of the first and second harmonics are in a simple 2 to 1 ratio (2:1); the second and third harmonics have a 3:2 frequency ratio; the third and fourth harmonics a 4:3 ratio; the fifth and fourth harmonic 5:4, etcetera. That’s it. Nothing more, nothing less.

In other words, Pythagoras was observing musical tones: he could not observe the pure tones behind, i.e. the actual notes. However, aesthetics led Pythagoras, and all musicians after him – until the mid-18th century – to also think that the ratio of the frequencies of the notes within an octave should also be simple ratios. From what I explained above, it’s obvious that it should not work that way: the ratio of the frequencies of two notes separated by n half-steps is 2^n/12, and, for most values of n, 2^n/12 is not some simple ratio. [Why? Just take your pocket calculator and calculate the value of 2^1/12: it’s 2^0.08333… = 1.0594630943… and so on… It’s an irrational number: there are no repeating decimals. Now, 2^n/12 is equal to 2^1/12·2^1/12·…·2^1/12 (n times). Why would you expect that product to be equal to some simple ratio?]

So – I said it already – Pythagoras was wrong—not only in this but also in other regards, such as when he espoused his views on the solar system, for example. Again, I am sorry to have to say that, but it is what is: the Pythagoreans did seem to prefer mathematical ideas over physical experiment. 🙂 Having said that, musicians obviously didn’t know about any alternative to Pythagoras, and they had surely never heard about logarithmic scales at the time. So… Well… They did use the so-called Pythagorean tuning system. To be precise, they tuned their instruments by equating the frequency ratio between the first and the fifth tone in the C scale (i.e. the C and G, as they did not include the C#, D# and F# semitones when counting) with the ratio 3/2, and then they used other so-called harmonic ratios for the notes in-between.

Now, the 3/2 ratio is actually almost correct, because the actual frequency ratio is 2^7/12 (we have seven tones, including the semitones—not five!), and so that’s 1.4983, approximately. Now, that’s pretty close to 3/2 = 1.5, I’d say. 🙂 Using that approximation (which, I admit, is fairly accurate indeed), the tuning of the other strings would then also be done assuming certain ratios should be respected, like the ones below.

So it was all quite good. Having said that, good musicians, and some great mathematicians, felt something was wrong—if only because there were several so-called just intonation systems around (for an overview, check out the Wikipedia article on just intonation). More importantly, they felt it was quite difficult to transpose music using the Pythagorean tuning system. Transposing music amounts to changing the so-called key of a musical piece: what one does, basically, is moving the whole piece up or down in pitch by some constant interval that is not equal to an octave. Today, transposing music is a piece of cake—Western music at least. But that’s only because all Western music is played on instruments that are tuned using that logarithmic scale (technically, it’s referred to as the 12-tone equal temperament (12-TET) system). When you’d use one of the Pythagorean systems for tuning, a transposed piece does not sound quite right.

The first mathematician who really seemed to know what was wrong (and, hence, who also knew what to do) was Simon Stevin, who wrote a manuscript based on the ’12^throot of 2 principle’ around AD 1600. It shouldn’t surprise us: the thinking of this mathematician from Bruges would inspire John Napier’s work on logarithms. Unfortunately, while that manuscript describes the basic principles behind the 12-TET system, it didn’t get published (Stevin had to run away from Bruges, to Holland, because he was protestant and the Spanish rulers at the time didn’t like that). Hence, musicians, while not quite understanding the math (or the physics, I should say) behind their own music, kept trying other tuning systems, as they felt it made their music sound better indeed.

One of these ‘other systems’ is the so-called ‘good’ temperament, which you surely heard about, as it’s referred to in Bach’s famous composition, Das Wohltemperierte Klavier, which he finalized in the first half of the 18th century. What is that ‘good’ temperament really? Well… It is what it is: it’s one of those tuning systems which made musicians feel better about their music for a number of reasons, all of which are well described in the Wikipedia article on it. But the main reason is that the tuning system that Bach recommended was a great deal better when it came to playing the same piece in another key. However, it still wasn’t quite right, as it wasn’t the equal temperament system (i.e. the 12-TET system) that’s in place now (in the West at least—the Indian music scale, for instance, is still based on simple ratios).

Why do I mention this piece of Bach? The reason is simple: you probably heard of it because it’s one of the main reference points in a rather famous book: Gödel, Escher and Bach—an Eternal Golden Braid. If not, then just forget about it. I am mentioning it because one of my brothers loves it. It’s on artificial intelligence. I haven’t read it, but I must assume Bach’s master piece is analyzed there because of its structure, not because of the tuning system that one’s supposed to use when playing it. So… Well… I’d say: don’t make that composition any more mystic than it already is. 🙂 The ‘magic’ behind it is related to what I said about A4 being the ‘reference point’ in music: since we’re using a universal logarithmic scale now, there’s no such thing as an absolute reference point any more: once we define our musical ‘unit’ (so that’s the so-called octave in Western music), and also define how many steps we want to have in-between (so that’s 12—in Western music, that is), we get all the rest. That’s just how logarithms work.

So, in short, music is all about structure, i.e. it’s all about mathematical relations, and about mathematical relations only. Again, Pythagoras’ conclusions were wrong, but his intuition was right. And, of course, it’s his intuition that gave birth to science: the simple ‘models’ he made – of how notes are supposed to be related to each other, or about our solar system – were, obviously, just the start of it all. And what a great start it was! Looking back once again, it’s rather sad conservative forces (such as the Church) often got in the way of progress. In fact, I suddenly wonder: if scientists would not have been bothered by those conservative forces, could mankind have sent people around the time that Charles V was born, i.e. around A.D. 1500 already? 🙂

Post scriptum: My example of the the (lower) E and A guitar strings co-vibrating when playing the A major chord striking the upper four strings only, is somewhat tricky. The (lower) E and A strings are associated with lower pitches, and we said overtones (i.e. the second, third, fourth, etc harmonics) are multiples of the fundamental frequency. So why is that the lower strings co-vibrate? The answer is easy: they oscillate at the higher frequencies only. If you have a guitar: just try it. The two strings you do not pluck do vibrate—and very visibly so, but the low fundamental frequencies that come out of them when you’d strike them, are not audible. In short, they resonate at the higher frequencies only. 🙂

The example that Feynman gives is much more straightforward: his example mentions the lower C (or A, B, etc) notes on a piano causing vibrations in the higher C strings (or the higher A, B, etc string respectively). For example, striking the C2 key (and, hence, the C2 string inside the piano) will make the (higher) C3 string vibrate too. But few of us have a grand piano at home, I guess. That’s why I prefer my guitar example. 🙂

Maxwell-Boltzmann, Bose-Einstein and Fermi-Dirac statistics

Pre-scriptum added much later: We have advanced much in our understanding since we wrote this post. If you are reading it because you want to understand more about the boson-fermion distinction, then you shouldn’t be here. The general distinction between bosons and fermions is a useless theoretical generalization which actually prevents you from understanding what is really going on. I am keeping this post online for documentation purposes only. It is interesting from a math point of view but you are not here to learn math, are you?

Jean Louis Van Belle, 20 May 2020

Original post:

I’ve discussed statistics, in the context of quantum mechanics, a couple of times already (see, for example, my post on amplitudes and statistics). However, I never took the time to properly explain those distribution functions which are referred to as the Maxwell-Boltzmann, Bose-Einstein and Fermi-Dirac distribution functions respectively. Let me try to do that now—without, hopefully, getting lost in too much math! It should be a nice piece, as it connects quantum mechanics with statistical mechanics, i.e. two topics I had nicely separated so far. 🙂

You know the Boltzmann Law now, which says that the probabilities of different conditions of energy are given by e^−energy/kT = 1/e^energy/kT. Different ‘conditions of energy’ can be anything: density, molecular speeds, momenta, whatever. The point is: we have some probability density function f, and it’s a function of the energy E, so we write:

f(E) = C·e^−energy/kT= C/e^energy/kT

C is just a normalization constant (all probabilities have to add up to one, so the integral of this function over its domain must be one), and k and T are also usual suspects: T is the (absolute) temperature, and k is the Boltzmann constant, which relates the temperate to the kinetic energy of the particles involved. We also know the shape of this function. For example, when we applied it to the density of the atmosphere at various heights (which are related to the potential energy, as P.E. = m·g·h), assuming constant temperature, we got the following graph. The shape of this graph is that of an exponential decay function (we’ll encounter it again, so just take a mental note of it).

graph

A more interesting application is the quantum-mechanical approach to the theory of gases, which I introduced in my previous post. To explain the behavior of gases under various conditions, we assumed that gas molecules are like oscillators but that they can only take on discrete levels of energy. [That’s what quantum theory is about!] We denoted the various energy levels, i.e. the energies of the various molecular states, by E₀, E₁, E₂,…, E_i,…, and if Boltzmann’s Law applies, then the probability of finding a molecule in the particular state E_i is proportional to e^−E_i /kT. We can then calculate the relative probabilities, i.e. the probability of being in state E_i, relative to the probability of being in state E₀, is:

P_i/P₀ = e^−E_i /kT/e^−E₀ /kT = e^{−(E_i–E₀)/kT}= 1/e^{(E_i–E₀)/kT}

Now, P_i obviously equals n_i/N, so it is the ratio of the number of molecules in state E_i (n_i) and the total number of molecules (N). Likewise, P₀ = n₀/N and, therefore, we can write:

n_i/n₀= e^{−(E_i−E₀)/kT}= 1/e^{(E_i–E₀)/kT}

This formulation is just another Boltzmann Law, but it’s nice in that it introduces the idea of a ground state, i.e. the state with the lowest energy level. We may or may not want to equate E₀ with zero. It doesn’t matter really: we can always shift all energies by some arbitrary constant because we get to choose the reference point for the potential energy.

So that’s the so-called Maxwell-Boltzmann distribution. Now, in my post on amplitudes and statistics, I had jotted down the formulas for the other distributions, i.e. the distributions when we’re not talking classical particles but fermions and/or bosons. As you know, fermions are particles governed by the Fermi exclusion principle: indistinguishable particles cannot be together in the same state. For bosons, it’s the other way around: having one in some quantum state actually increases the chance of finding another one there, and we can actually have an infinite number of them in the same state.

We also know that fermions and bosons are the real world: fermions are the matter-particles, bosons are the force-carriers, and our ‘Boltzmann particles’ are nothing but a classical approximation of the real world. Hence, even if we can’t see them in the actual world, the Fermi-Dirac and Bose-Einstein distributions are the real-world distributions. 🙂 Let me jot down the equations once again:

Fermi-Dirac (for fermions): f(E) = 1/[Ae^{(E − E_F)/kT}+ 1]

Bose-Einstein (for bosons): f(E) = 1/[Ae^E/kT− 1]

We’ve got some other normalization constant here (A), which we shouldn’t be too worried about—for the time being, that is. Now, to see how these distributions are different from the Maxwell-Boltzmann distribution (which we should re-write as f(E) = C·e^−E/kT = 1/[A·e^E/kT] so as to make all formulas directly comparable), we should just make a graph. Please go online to find a graph tool (I found a new one recently—really easy to use), and just do it. You’ll see they are all like that exponential decay function. However, in order to make a proper comparison, we would actually need to calculate the normalization coefficients and, for the Fermi energy, we would also need the Fermi energy E_F(note that, for simplicity, we did equate E₀ with zero). Now, we could give it a try, but it’s much easier to google and find an example online.

The HyperPhysics website of Georgia State University gives us one: the example assumes 6 particles and 9 energy levels, and the table and graph below compare the Maxwell-Boltzmann and Bose-Einstein distributions for the model.

Now that is an interesting example, isn’t it? In this example (but all depends on its assumptions, of course), the Maxwell-Boltzmann and Bose-Einstein distributions are almost identical. Having said that, we can clearly see that the lower energy states are, indeed, more probable with Bose-Einstein statistics than with the Maxwell-Boltzmann statistics. While the difference is not dramatic at all in this example, the difference does become very dramatic, in reality, with large numbers (i.e. high matter density) and, more importantly, at very low temperatures, at which bosons can condense into the lowest energy state. This phenomenon is referred to as Bose-Einstein condensation: it causes superfluidity and superconductivity, and it’s real indeed: it has been observed with supercooled He-4, which is not an everyday substance, but real nevertheless!

What about the Fermi-Dirac distribution for this example? The Fermi-Dirac distribution is given below: the lowest energy state is now less probable, the mid-range energies much more, and none of the six particles occupy any of the four highest energy levels. Again, while the difference is not dramatic in this example, it can become very dramatic, in reality, with large numbers (read: high matter density) and very low temperatures: at absolute zero, all of the possible energy states up to the Fermi energy level will be occupied, and all the levels above the Fermi energy will be vacant.

What can we make out of all of this? First, you may wonder why we actually have more than one particle in one state above: doesn’t that contradict the Fermi exclusion principle? No. We need to distinguish micro- and macro-states. In fact, the example assumes we’re talking electrons here, and so we can have two particles in each energy state—with opposite spin, however. At the same time, it’s true we cannot have three, or more, in any state. That results, in the example we’re looking at here, in five possible distributions only, as shown below.

The diagram is an interesting one: if the particles were to be classical particles, or bosons, then 26 combinations are possible, including the five Fermi-Dirac combinations, as shown above. Note the little numbers above the 26 possible combinations (e.g. 6, 20, 30,… 180): they are proportional to the likelihood of occurring under the Maxwell-Boltzmann assumption (so if we assume the particles are ‘classical’ particles). Let me introduce you to the math behind the example by using the diagram below, which shows three possible distributions/combinations (I know the terminology is quite confusing—sorry for that!).

If we could distinguish the particles, then we’d have 2002 micro-states, which is the total of all those little numbers on top of the combinations that are shown (6+60+180+…). However, the assumption is that we cannot distinguish the particles. Therefore, the first combination in the diagram above, with five particles in the zero energy state and one particle in state 9, occurs 6 times into 2002 and, hence, it has a probability of 6/2002 ≈ 0.003 only. In contrast, the second combination is 10 times more likely, and the third one is 30 times more likely! In any case, the point is, in the classical situation (and in the Bose-Einstein hypothesis as well), we have 26 possible macro-states, as opposed to 5 only for fermions, and so that leads to a very different density function. Capito?

No? Well, this blog is not a textbook on physics and, therefore, I should refer you to the mentioned site once again, which references a 1992 textbook on physics (Frank Blatt, Modern Physics, 1992) as the source of this example. However, I won’t do that: you’ll find the details in the Post Scriptum to this post. 🙂

Let’s first focus on the fundamental stuff, however. The most burning question is: if the real world consists of fermions and bosons, why is that that we only see the Maxwell-Boltzmann distribution in our actual (non-real?) world? 🙂 The answer is that both the Fermi-Dirac and Bose-Einstein distribution approach the Maxwell–Boltzmann distribution if higher temperatures and lower particle densities are involved. In other words, we cannot see the Fermi-Dirac distributions (all matter is fermionic, except for weird stuff like superfluid helium-4 at 1 or 2 degrees Kelvin), but they are there!

Let’s approach it mathematically: the most general formula, encompassing both Fermi-Dirac and Bose-Einstein statistics, is:

N_i(E_i) ∝ 1/[e^{(E_i − μ)/kT}± 1]

If you’d google, you’d find a formula involving an additional coefficient, g_i, which is the so-called degeneracy of the energy level E_i. I included it in the formula I used in the above-mentioned post of mine. However, I don’t want to make it any more complicated than it already is and, therefore, I omitted it this time. What you need to look at are the two terms in the denominator: e^{(E_i − μ)/kT}and ± 1.

From a math point of view, it is obvious that the values of e^{(E_i − μ)/kT}+ 1 (Fermi-Dirac) and e^{(E_i − μ)/kT}− 1 (Bose-Einstein) will approach each other if e^{(E_i − μ)/kT}is much larger than ±1, so if e^{(E_i − μ)/kT}>> 1. That’s the case, obviously, if the (E_i − μ)/kT ratio is large, so if (E_i − μ) >> kT. In fact, (E_i − μ) should, obviously, be much larger than kT for the lowest energy levels too! Now, the conditions under which that is the case are associated with the classical situation (such as a cylinder filled with gas, for example). Why?

Well… […] Again, I have to say that this blog can’t substitute for a proper textbook. Hence, I am afraid I have to leave it to you to do the necessary research to see why. 🙂 The non-mathematical approach is to simple note that quantum effects, i.e. the ±1 term, only apply if the concentration of particles is high enough. Indeed, quantum effects appear if the concentration of particles is higher than the so-called quantum concentration. Only when the quantum concentration is reached, particles will start interacting according to what they are, i.e. as bosons or as fermions. At higher temperature, that concentration will not be reached, except in massive objects such as a white dwarf (white dwarfs are stellar remnants with the mass like that of the Sun but a volume like that of the Earth). So, in general, we can say that at higher temperatures and at low concentration we will not have any quantum effects. That should settle the matter—as for now, at least.

You’ll have one last question: we derived Boltzmann’s Law from the kinetic theory of gases, but how do we derive that N_i(E_i) = 1/[Ae^{(E_i − μ)/kT}± 1] expression? Good question but, again, we’d need more than a few pages to explain that! The answer is: quantum mechanics, of course! Go check it out in Feynman’s third Volume of Lectures! 🙂

Post scriptum: combinations, permutations and multiplicity

The mentioned example from HyperPhysics is really interesting, if only because it shows you also need to master a bit of combinatorics to get into quantum mechanics. Let’s go through the basics. If we have n distinct objects, we can order hem in n! ways, with n! (read: n factorial) equal to n·(n–1)·(n–2)·…·3·2·1. Note that 0! is equal to 1, per definition. We’ll need that definition.

For example, a red, blue and green ball can be ordered in 3·2·1 = 6 ways. Each way is referred to as a permutation.

Besides permutations, we also have the concept of a k-permutation, which we can denote in a number of ways but let’s choose P(n, k). [The P stands for permutation here, not for probability.] P(n, k) is the number of ways to pick k objects out of a set of n objects. Again, the objects are supposed to be distinguishable. The formula is P(n, k) = n·(n–1)·(n–2)·…·(n–k+1) = n!/(n–k)!. That’s easy to understand intuitively: on your first pick you have n choices; on your second, n–1; on your third, n–2, etcetera. When n = k, we obviously get n! again.

There is a third concept: the k-combination (as opposed to the k-permutation), which we’ll denote by C(n, k). That’s when the order within our subset doesn’t matter: an ace, a queen and a jack taken out of some card deck are a queen, a jack, and an ace: we don’t care about the order. If we have k objects, there are k! ways of ordering them and, hence, we just have to divide P(n, k) by k! to get C(n, k). So we write: C(n, k) = P(n, k)/k! = n!/[(n–k)!k!]. You recognize C(n, k): it’s the binomial coeficient.

Now, the HyperPhysics example illustrating the three mentioned distributions (Maxwell-Boltzmann, Bose-Einstein and Fermi-Dirac) is a bit more complicated: we need to associate q energy levels with N particles. Every possible configuration is referred to as a micro-state, and the total number of possible micro-states is referred to as the multiplicity of the system, denoted by Ω(N, q). The formula for Ω(N, q) is another binomial coefficient: Ω(N, q) = (q+N–1)!/[q!(N–1)!]. Ω(N, q) = Ω(6, 9) = (9+6–1)!/[9!(6–1)!] = 2002.

In our example, however, we do not have distinct particles and, therefore, we only have 26 macro-states (as opposed to 2002 micro-states), which are also referred to, confusingly, as distributions or combinations.

Now, the number of micro-states associated with the same macro-state is given by yet another formula: it is equal to N!/[n₁!·n₂!·n₃!·…·n_q!], with n_i! the number of particles in level i. [See why we need the 0! = 1 definition? It ensures unoccupied states do not affect the calculation.] So that’s how we get those numbers 6, 60 and 180 for those three macro-states.

But how do we calculate those average numbers of particles for each energy level? In other words, how do we calculate the probability densities under the Maxwell-Boltzmann, Fermi-Dirac and Bose-Einstein hypothesis respectively?

For the Maxwell-Boltzmann distribution, we proceed as follows: for each energy level j (or E_j, I should say), we calculate n_j= ∑n_ij·P_i over all macro-states i. In this summation, we have n_ij, which is the number of particles in energy level j in micro-state i, while P_i is the probability of macro-state i as calculated by the ratio of (i) the number of micro-states associated with macro-state i and (ii) the total number of micro-states. For P_i, we gave the example of 3/2002 ≈ 0.3%. For 60 and 180, we get 60/2002 ≈ 3% and 180/2002 ≈ 9%. Calculating all the n_j‘s for j ranging from 1 to 9 should yield the numbers and the graph below indeed.

OK. That’s how it works for Maxwell-Boltzmann. Now, it is obvious that the Fermi-Dirac and the Bose-Einstein distribution should not be calculated in the same way because, if they were, they would not be different from the Maxwell-Boltzmann distribution! The trick is as follows.

For the Bose-Einstein distribution, we give all macro-states equal weight—so that’s a weight of one, as shown below. Hence, the probability P_i is, quite simply, 1/26 ≈ 3.85% for all 26 macro-states. So we use the same n_j= ∑n_ij·P_iformula but with P_i = 1/26.

Finally, I already explained how we get the Fermi-Dirac distribution: we can only have (i) one, (ii) two, or (iii) zero fermions for each energy level—not more than two! Hence, out of the 26 macro-states, only five are actually possible under the Fermi-Dirac hypothesis, as illustrated below once more. So it’s a very different distribution indeed!

Now, you’ll probably still have questions. For example, why does the assumption, for the Bose-Einstein analysis, that macro-states have equal probability favor the lower energy states? The answer is that the model also integrates other constraints: first, when associating a particle with an energy level, we do not favor one energy level over another, so all energy levels have equal probability. However, at the same time, the whole system has some fixed energy level, and so we cannot put the particles in the higher energy levels only! At the same time, we know that, if we have q particles, and the probability of a particle having some energy level j is the same for all j, then they are likely not to be all at the same energy level: they’ll be distributed, effectively, as evidenced by the very low chance (0.3% only) of having 5 particles in the ground state and 1 particle at a higher level, as opposed to the 3% and 9% chance of the other two combinations shown in that diagram with three possible Maxwell-Boltzmann (MB) combinations.

So what happens when assigning an equal probability to all 26 possible combinations (with value 1/26) is that the combinations that were previously rather unlikely – because they did have a rather heavy concentration of particles in the ground state only – are now much more likely. So that’s why the Bose-Einstein distribution, in this example at least, is skewed towards the lowest energy level—as compared to the Maxwell-Boltzmann distribution, that is.

So that’s what’s behind, and that should also answer the other question you surely have when looking at those five acceptable Fermi-Dirac configurations: why don’t we have the same five configurations starting from the top down, rather than from the bottom up? Now you know: such configuration would have much higher energy overall, and so that’s not allowed under this particular model.

There’s also this other question: we said the particles were indistinguishable, but so then we suddenly say there can be two at any energy level, because their spin is opposite. It’s obvious this is rather ad hoc as well. However, if we’d allow only one particle at any energy level, we’d have no allowable combinations and, hence, we’d have no Fermi-Dirac distribution at all in this example.

In short, the example is rather intuitive, which is actually why I like it so much: it shows how bosonic and fermionic behavior appear rather gradually, as a consequence of variables that are defined at the system level, such as density, or temperature. So, yes, you’re right if you think the HyperPhysics example lacks rigor. That’s why I think it’s such wonderful pedagogic device. 🙂

The Quantum-Mechanical Gas Law

Pre-script (dated 26 June 2020): This post has become less relevant (even irrelevant, perhaps) because my views on all things quantum-mechanical have evolved significantly as a result of my progression towards a more complete realist (classical) interpretation of quantum physics. The text also got mutilated because of the removal of material by the dark force. I keep blog posts like these mainly because I want to keep track of where I came from. I might review them one day, but I currently don’t have the time or energy for it. 🙂

Original post:

In my previous posts, it was mentioned repeatedly that the kinetic theory of gases is not quite correct: the experimentally measured values of the so-called specific heat ratio (γ) vary with temperature and, more importantly, their values differ, in general, from what classical theory would predict. It works, more or less, for noble gases, which do behave as ideal gases and for which γ is what the kinetic theory of gases would want it to be: γ = 5/3—but we get in trouble immediately, even for simple diatomic gases like oxygen or hydrogen, as illustrated below: the theoretical value is 9/7 (so that’s 1.286, more or less), but the measured value is very different.

Let me quickly remind you how we get the theoretical number. According to classical theory, a diatomic molecule like oxygen can be represented as two atoms connected by a spring. Each of the atoms absorbs kinetic energy, and for each direction of motion (x, y and z), that energy is equal to kT/2, so the kinetic energy of both atoms – added together – is 2·3·kT/2 = 3kT. However, I should immediately add that not all of that energy is to be associated with the center-of-mass motion of the whole molecule, which determines the temperature of the gas: that energy is and remains equal to the 3kT/2, always. We also have rotational and vibratory motion. The molecule can rotate in two independent directions (and any combination of these directions, of course) and, hence, rotational motion is to absorb an amount of energy equal to 2·kT/2 = kT. Finally, the vibratory motion is to be analyzed as any other oscillation, so like a spring really. There is only one dimension involved and, hence, the kinetic energy here is just kT/2. However, we know that the total energy in an oscillator is the sum of the kinetic and potential energy, which adds another kT/2 term. Putting it all together, we find that the average energy for each diatomic particle is (or should be) equal to 7·kT/2 = (7/2)kT. Now, as mentioned above, the temperature of the gas (T) is proportional to the mean molecular energy of the center-of-mass motion only (in fact, that’s how temperature is defined), with the constant of proportionality equal to 3k/2. Hence, for monatomic ideal gases, we can write: U = N·(3k/2)T and, therefore, PV = NkT = (2/3)·U. Now, γ appears as follows in the ideal gas law: PV = (γ–1)U. Therefore, γ = 2/3 + 1 = 5/3, but so that’s for monatomic ideal gases only! The total kinetic energy of our diatomic molecule is U = N·(7k/2)T and, therefore, PV = (2/7)·U. So γ must be γ = 2/7 + 1 = 9/7 ≈ 1.286 for diatomic gases, like oxygen and hydrogen.

Phew! So that’s the theory. However, as we can see from the diagram, γ approaches that value only when we heat the gas to a few thousand degrees! So what’s wrong? One assumption is that certain kinds of motions “freeze out” as the temperature falls—although it’s kinda weird to think of something ‘freezing out’ at a thousand degrees Kelvin! In any case, at the end of the 19th century, that was the assumption that was advanced, very reluctantly, by scientists such as James Jeans. However, the mystery was about to be solved then, as Max Planck, even more reluctantly, presented his quantum theory of energy at the turn of the century itself.

But the quantum theory was confirmed and so we should now see how we can apply it to the behavior of gas. In my humble view, it’s a really interesting analysis, because we’re applying quantum theory here to a phenomenon that’s usually being analyzed as a classical problem only.

Boltzmann’s Law

We derived Boltzmann’s Law in our post on the First Principles of Statistical Mechanics. To be precise, we gave Boltzmann’s Law for the density of a gas (which we denoted by n = N/V) in a force field, like a gravitational field, or in an electromagnetic field (assuming our gas particles are electrically charged, of course). We noted, however, Boltzmann’s Law was also applicable to much more complicated situations, like the one below, which shows a potential energy function for two molecules that is quite characteristic of the way molecules actually behave: when they come very close together, they repel each other but, at larger distances, there’s a force of attraction. We don’t really know the forces behind but we don’t need to: as long as these forces are conservative, they can combine in whatever way they want to combine, and Boltzmann’s Law will be applicable. [It should be obvious why. If you hesitate, just think of the definition of work and how it affects potential energy and all that. Work is force times distance, but when doing work, we’re also changing potential energy indeed! So if we’ve got a potential energy function, we can get all the rest.]

Boltzmann’s Law itself is illustrated by the graph below, which also gives the formula for it: n = n₀·e^−P.E/kT.

It’s a graph starting at n = n₀ for P.E. = 0, and it then decreases exponentially. [Funny expression, isn’t it? So as to respect mathematical terminology, I should say that it decays exponentially.] In any case, if anything, Boltzmann’s Law shows the natural exponential function is quite ‘natural’ indeed, because Boltzmann’s Law pops up in Nature everywhere! Indeed, Boltzmann’s Law is not limited to functions of potential energy only. For example, Feynman derives another Boltzmann Law for the distribution of molecular speeds or, so as to ensure the formula is also valid in relativity, the distribution of molecular momenta. In case you forgot, momentum (p) is the product of mass (m) and velocity (u), and the relevant Boltzmann Law is:

f(p)·dp = C·e^−K.E/kT·dp

The argument is not terribly complicated but somewhat lengthy, and so I’ll refer you to the link for more details. As for the f(p) function (and the dp factor on both sides of the equation), that’s because we’re not talking exact values of p but some range equal to dp and some probability of finding particles that have a momentum within that range. The principle is illustrated below for molecular speeds (denoted by u = p/m), so we have a velocity distribution below. The illustration for p would look the same: just substitute u for p.

Boltzmann’s Law can be stated, much more generally, as follows:

The probability of different conditions of energy (E), potential or kinetic, is proportional to e^−E/kT.

As Feynman notes, “This is a rather beautiful proposition, and a very easy thing to remember too!” It is, and we’ll need it for the next bit.

The quantum-mechanical theory of gases

According to quantum theory, energy comes in discrete packets, quanta, and any system, like an oscillator, will only have a discrete set of energy levels, i.e. states of different energy. An energy state is, obviously, a condition of energy and, hence, Boltzmann’s Law applies. More specifically, if we denote the various energy levels, i.e. the energies of the various molecular states, by E₀, E₁, E2,…, E_i,…, and if Boltzmann’s Law applies, then the probability of finding a molecule in the particular state E_i will be proportional to e^−E_i /kT.

Now, we know we’ve got some constant there, but we can get rid of that by calculating relative probabilities. For example, the probability of being in state E₁, relative to the probability of being in state E₀, is:

P₁/P₀ = e^−E₁ /kT/e^−E₀ /kT = e^{−(E₁–E₀)/kT}

But the relative probability P₁should, obviously, also be equal to the ratio n₁/N, i.e. the ratio of the number of molecules in state E₁ and the total number of molecules. Likewise, P₀= n₀/N. Hence, P₁/P₀ = n₁/n₀and, therefore, we can write:

n = n₀e^{−(E₁–E₀)/kT}

What can we do with that? Remember we want to explain the behavior of non-monatomic gas—like diatomic gas, for example. Now we need some other assumption, obviously. As it turns out, the assumption that we can represent a system as some kind of oscillation still makes sense! In fact, the assumption that our diatomic molecule is like a spring is equally crucial to our quantum-theoretical analysis of gases as it is to our classical kinetic theory of gases. To be precise, in both theories, we look at it as a harmonic oscillator.

Don’t panic. A harmonic oscillator is, quite simply, a system that, when displaced from its equilibrium position, experiences some kind of restoring force. Now, for it to be harmonic, the force needs to be linear. For example, when talking springs, the restoring force F will be proportional to the displacement x). It basically means we can use a linear differential equation to analyze the system, like m·(d²x/dt²) = –kx. […] I hope you recognize this equation, because you should! It’s Newton’s Law: F = m·a with F = –k·x. If you remember the equation, you’ll also remember that harmonic oscillations were sinusoidal oscillations with a constant amplitude and a constant frequency. That frequency did not depend on the amplitude: because of the sinusoidal function involved, it was easier to write that frequency as an angular frequency, which we denoted by ω₀ and which, in the case of our spring, was equal to ω₀ = (k/m)^1/2. So it’s a property of the system. Indeed, ω₀is the square root of the ratio of (1) k, which characterizes the spring (it’s its stiffness), and (2) m, i.e. the mass on the spring. Solving the differential equation yielded x = A·cos(ω₀t + Δ) as a general solution, with A the (maximum) amplitude, and Δ some phase shift determined by our t = 0 point. Let me quickly jot down too more formulas: the potential energy in the spring is kx²/2, while its kinetic energy is mv²/2, as usual (so the kinetic energy depends on the mass and its velocity, while the potential energy only depends on the displacement and the spring’s stiffness). Of course, kinetic and potential energy add up to the total energy of the system, which is constant and proportional to the square of the (maximum) amplitude: K.E. + P.E. = E ∝ A². To be precise, E = kA²/2.

That’s simple enough. Let’s get back to our molecular oscillator. While the total energy of an oscillator in classical theory can take on any value, Planck challenged that assumption: according to quantum theory, it can only take up energies equal to ħω at a time. [Note that we use the so-called reduced Planck constant here (i.e. h-bar), because we’re dealing with angular frequencies.] Hence, according to quantum theory, we have an oscillator with equally spaced energy levels, and the difference between them is ħω. Now, ħω is terribly tiny—but it’s there. Let me visualize what I just wrote:

So our expression for P₁/P₀ becomes P₁/P₀ = e^−ħω/kT/e^−0/kT = e^−ħω/kT. More generally, we have P_i/P₀ = e^{−i·ħω/kT}. So what? Well… We’ve got a function here which gives the chance of finding a molecule in state P_i relative to that of finding it in state E₀, and it’s a function of temperature. Now, the graph below illustrates the general shape of that function. It’s a bit peculiar, but you can see that the relative probability goes up and down with temperature. The graph makes it clear that, at extremely low temperatures, most particles will be in state E₀ and, of course, the internal energy of our body of gas will be close to nil.

Now, we can look at the oscillators in the bottom state (i.e. particles in the molecular energy state E₀) as being effectively ‘frozen’: they don’t contribute to the specific heat. However, as we increase the temperature, our molecules gradually begin to have an appreciable probability to be in the second state, and then in the next state, and so on, and so the internal energy of the gas increases effectively. Now, when the probability is appreciable for many states, the quantized states become nearly indistinguishable and, hence, the situation is like classical physics: it is nearly indistinguishable from a continuum of energies.

Now, while you can imagine such analysis should explain why the specific heat ratio for oxygen and hydrogen varies as it does in the very first graph of this post, you can also imagine the details of that analysis fill quite a few pages! In fact, even Feynman doesn’t include it in his Lectures. What he does include is the analysis of the blackbody radiation problem, which is remarkably similar. So… Well… For more details on that, I’ll refer you to Feynman indeed. 🙂

I hope you appreciated this little ‘lecture’, as it sort of wraps up my ‘series’ of posts on statistical mechanics, thermodynamics and, central to both, the classical theory of gases. Have fun with it all!

Entropy, energy and enthalpy

Original post:

Phew! I am quite happy I got through Feynman’s chapters on thermodynamics. Now is a good time to review the math behind it. We thoroughly understand the gas equation now:

PV = NkT = (γ–1)U

The gamma (γ) in this equation is the specific heat ratio: it’s 5/3 for ideal gases (so that’s about 1.667) and, theoretically, 4/3 ≈ 1.333 or 9/7 ≈ 1.286 for diatomic gases, depending on the degrees of freedom we associate with diatomic molecules. More complicated molecules have even more degrees of freedom and, hence, can absorb even more energy, so γ gets closer to one—according to the kinetic gas theory, that is. While we know that the kinetic gas theory is not quite accurate – an approach involving molecular energy states is a better match for reality – that doesn’t matter here. As for the term (specific heat ratio), I’ll explain that later. [I promise. 🙂 You’ll see it’s quite logical.]

The point to note is that this body of gas (or whatever substance) stores an amount of energy U that is directly proportional to the temperature (T), and Nk/(γ–1) is the constant of proportionality. We can also phrase it the other way around: the temperature is directly proportional to the energy, with (γ–1)/Nk the constant of proportionality. It means temperature and energy are in a linear relationship. [Yes, direct proportionality implies linearity.] The graph below shows the T = [(γ–1)/Nk]·U relationship for three different values of γ, ranging from 5/3 (i.e. the maximum value, which characterizes monatomic noble gases such as helium, neon or krypton) to a value close to 1, which is characteristic of more complicated molecular arrangements indeed, such as heptane (γ = 1.06) or methyl butane ((γ = 1.08). The illustration shows that, unlike monatomic gas, more complicated molecular arrangements allow the gas to absorb a lot of (heat) energy with a relatively moderate rise in temperature only.

We’ll soon encounter another variable, enthalpy (H), which is also linearly related to energy: H = γU. From a math point of view, these linear relationships don’t mean all that much: they just show these variables – temperature, energy and enthalphy – are all directly related and, hence, can be defined in terms of each other.

We can invent other variables, like the Gibbs energy, or the Helmholtz energy. In contrast, entropy, while often being mentioned as just some other state function, is something different altogether. In fact, the term ‘state function’ causes a lot of confusion: pressure and volume are state variables too. The term is used to distinguish these variables from so-called process functions, notably heat and work. Process functions describe how we go from one equilibrium state to another, as opposed to the state variables, which describe the equilibrium situation itself. Don’t worry too much about the distinction—for now, that is.

Let’s look at non-linear stuff. The PV = NkT = (γ–1)U says that pressure (P) and volume (V) are inversely proportional one to another, and so that’s a non-linear relationship. [Yes, inverse proportionality is non-linear.] To help you visualize things, I inserted a simple volume-pressure diagram below, which shows how pressure and volume are related for three different values of U (or, what amounts to the same, three different values of T).

The curves are simple hyperbolas which have the x- and y-axis as horizontal and vertical asymptote respectively. If you’ve studied social sciences (like me!) – so if you know a tiny little bit of the ‘dismal science’, i.e. economics (like me!) – you’ll note they look like indifference curves. The x- and y-axis then represent the quantity of some good X and some good Y respectively, and the curves closer to the origin are associated with lower utility. How much X and Y we will buy then, depends on (a) their price and (b) our budget, which we represented by a linear budget line tangent to the curve we can reach with our budget, and then we are a little bit happy, very happy or extremely happy, depending on our budget. Hence, our budget determines our happiness. From a math point of view, however, we can also look at it the other way around: our happiness determines our budget. [Now that‘s a nice one, isn’t it? Think about it! 🙂 And, in the process, think about hyperbolas too: the y = 1/x function holds the key to understanding both infinity and nothingness. :-)]

U is a state function but, as mentioned above, we’ve got quite a few state variables in physics. Entropy, of course, denoted by S—and enthalpy too, denoted by H. Let me remind you of the basics of the entropy concept:

The internal energy U changes because (a) we add or remove some heat from the system (ΔQ), (b) because some work is being done (by the gas on its surroundings or the other way around), or (c) because of both. Using the differential notation, we write: dU = dQ – dW, always. The (differential) work that’s being done is PdV. Hence, we have dU = dQ – PdV.
When transferring heat to a system at a certain temperature, there’s a quantity we refer to as the entropy. Remember that illustration of Feynman’s in my post on entropy: we go from one point to another on the temperature-volume diagram, taking infinitesimally small steps along the curve, and, at each step, an infinitesimal amount of work dW is done, and an infinitesimal amount of entropy dS = dQ/T is being delivered.
The total change in entropy, ΔS, is a line integral: ΔS = ∫_LdQ/T = ∫_LdS.

That’s somewhat tougher to understand than economics, and so that’s why it took me more time to come with terms with it. 🙂 Just go through Feynman’s Lecture on it, or through that post I referenced above. If you don’t want to do that, then just note that, while entropy is a very mysterious concept, it’s deceptively simple from a math point of view: ΔS = ΔQ/T, so the (infinitesimal) change in entropy is, quite simply, the ratio of (1) the (infinitesimal or incremental) amount of heat that is being added or removed as the system goes from one state to another through a reversible process and (2) the temperature at which the heat is being transferred. However, I am not writing this post to discuss entropy once again. I am writing it to give you an idea of the math behind the system.

So dS = dQ/T. Hence, we can re-write dU = dQ – dW as:

dU = TdS – PdV ⇔ dU + d(PV) = TdS – PdV + d(PV)

⇔ d(U + PV) = dH = TdS – PdV + PdV + VdP = TdS + VdP

The U + PV quantity on the left-hand side of the equation is the so-called enthalpy of the system, which I mentioned above. It’s denoted by H indeed, and it’s just another state variable, like energy: same-same but different, as they say in Asia. We encountered it in our previous post also, where we said that chemists prefer to analyze the behavior of substances using temperature and pressure as ‘independent variables’, rather than temperature and volume. Independent variables? What does that mean, exactly?

According to the PV = NkT equation, we only have two independent variables: if we assign some value to two variables, we’ve got a value for the third one. Indeed, remember that other equation we got when we took the total differential of U. We wrote U as U(V, T) and, taking the total differential, we got:

dU = (∂U/∂T)dT + (∂U/∂V)dV

We did not need to add a (∂U/∂P)dP term, because the pressure is determined by the volume and the temperature. We could also have written U = U(P, T) and, therefore, that dU = (∂U/∂T)dT + (∂U/∂P)dP. However, when working with temperature and pressure as the ‘independent’ variables, it’s easier to work with H rather than U. The point to note is that it’s all quite flexible really: we have two independent variables in the system only. The third one (and all of the other variables really, like energy or enthalpy or whatever) depend on the other two. In other words, from a math point of view, we only have two degrees of freedom in the system here: only two variables are actually free to vary. 🙂

Let’s look at that dH = TdS + VdP equation. That’s a differential equation in which not temperature and pressure, but entropy (S) and pressure (P) are ‘independent’ variables, so we write:

dH(S, P) = TdS + VdP

Now, it is not very likely that we will have some problem to solve with data on entropy and pressure. At our level of understanding, any problem that’s likely to come our way will probably come with data on more common variables, such as the heat, the pressure, the temperature, and/or the volume. So we could continue with the expression above but we don’t do that. It makes more sense to re-write the expression substituting TdS for dQ once again, so we get:

dH = dQ + VdP

That resembles our dU = dQ – PdV expression: it just substitutes V for –P. And, yes, you guessed it: it’s because the two expressions resemble each other that we like to work with H now. 🙂 Indeed, we’re talking the same system and the same infinitesimal changes and, therefore, we can use all the formulas we derived already by just substituting H for U, V for –P, and dP for dV. Huh? Yes. It’s a rather tricky substitution. If we switch V for –P (or vice versa) in a partial derivative involving T, we also need to include the minus sign. However, we do not need to include the minus sign when substituting dV and dP, and we also don’t need to change the sign of the partial derivatives of U and H when going from one expression to another! It’s a subtle and somewhat weird point, but a very important one! I’ll explain it in a moment. Just continue to read as for now. Let’s do the substitution using our rules:

dU = (∂Q/∂T)_VdT + [T(∂P/∂T)_V − P]dV becomes:

dH = (∂Q/∂T)_PdT + (∂H/∂P)_TdP = C_PdT + [–T·(∂V/∂T)_P+ V]dP

Note that, just as we referred to (∂Q/∂T)_Vas the specific heat capacity of a substance at constant volume, which we denoted by C_V, we now refer to (∂Q/∂T)_P as the specific heat capacity at constant pressure, which we’ll denote, logically, as C_P. Dropping the subscripts of the partial derivatives, we re-write the expression above as:

dH = C_PdT + [–T·(∂V/∂T)+ V]dP

So we’ve got what we wanted: we switched from an expression involving derivatives assuming constant volume to an expression involving derivatives assuming constant pressure. [In case you wondered what we wanted, this is it: we wanted an equation that helps us to solve another type of problem—another formula for a problem involving a different set of data.]

As mentioned above, it’s good to use subscripts with the partial derivatives to emphasize what changes and what is constant when calculating those partial derivatives but, strictly speaking, it’s not necessary, and you will usually not find the subscripts when googling other texts. For example, in the Wikipedia article on enthalpy, you’ll find the expression written as:

dH = C_PdT + V(1–αT)dP with α = (1/V)(∂V/∂T)

Just write it all out and you’ll find it’s the same thing, exactly. It just introduces another coefficient, α, i.e. the coefficient of (cubic) thermal expansion. If you find this formula is easier to remember, then please use this one. It doesn’t matter.

Now, let’s explain that funny business with the minus signs in the substitution. I’ll do so by going back to that infinitesimal analysis of the reversible cycle in my previous post, in which we had that formula involving ΔQ for the work done by the gas during an infinitesimally small reversible cycle: ΔW = ΔVΔP = ΔQ·(ΔT/T). Now, we can either write that as:

ΔQ = T·(ΔP/ΔT)·ΔV = dQ = T·(∂P/∂T)_V·dV – which is what we did for our analysis of (∂U/∂V)_T– or, alternatively, as
ΔQ = T·(ΔV/ΔT)·ΔP = dQ = T·(∂V/∂T)_P·dP, which is what we’ve got to do here, for our analysis of (∂H/∂P)_T.

Hence, dH = dQ + VdP becomes dH = T·(∂V/∂T)_P·dP + V·dP, and dividing all by dP gives us what we want to get: dH/dP = (∂H/∂P)_T= T·(∂V/∂T)_P+ V.

[…] Well… NO! We don’t have the minus sign in front of T·(∂V/∂T)_P, so we must have done something wrong or, else, that formula above is wrong.

The formula is right (it’s in Wikipedia, so it must be right :-)), so we are wrong. Indeed! The thing is: substituting dT, dV and dP for ΔT, ΔV and ΔP is somewhat tricky. The geometric analysis (illustrated below) makes sense but we need to watch the signs.

We’ve got a volume increase, a temperature drop and, hence, also a pressure drop over the cycle: the volume goes from V to V+ΔV (and then back to V, of course), while the pressure and the temperature go from P to P–ΔP and T to T–ΔT respectively (and then back to P and T, of course). Hence, we should write: ΔV = dV, –ΔT = dT, and –ΔP = dP. Therefore, as we replace the ratio of the infinitesimal change of pressure and temperature, ΔP/ΔT, by a proper derivative (i.e. ∂P/∂T), we should add a minus sign: ΔP/ΔT = –∂P/∂T. Now that gives us what we want: dH/dP = (∂H/∂P)_T= –T·(∂V/∂T)_P+ V, and, therefore, we can, indeed, write what we wrote above:

dU = (∂Q/∂T)_VdT + [T(∂P/∂T)_V − P]dV becomes:

dH = (∂Q/∂T)_PdT + [–T·(∂V/∂T)_P+ V]dP = C_PdT + [–T·(∂V/∂T)_P+ V]dP

Now, in case you still wonder: what’s the use of all these different expressions stating the same? The answer is simple: it depends on the problem and what information we have. Indeed, note that all derivatives we use in our expression for dH expression assume constant pressure, so if we’ve got that kind of data, we’ll use the chemists’ representation of the system. If we’ve got data describing performance at constant volume, we’ll need the physicists’ formulas, which are given in terms of derivatives assuming constant volume. It all looks complicated but, in the end, it’s the same thing: the PV = NkT equation gives us two ‘independent’ variables and one ‘dependent’ variable. Which one is which will determine our approach.

Now, we left one thing unexplained. Why do we refer to γ as the specific heat ratio? The answer is: it is the ratio of the specific heat capacities indeed, so we can write:

γ = C_P/C_V

However, it is important to note that that’s valid for ideal gases only. In that case, we know that the (∂U/∂V)_Pderivative in our dU = (∂U/∂T)_VdT + (∂U/∂V)_TdV expression is zero: we can change the volume, but if the temperature remains the same, the internal energy remains the same. Hence, dU = (∂U/∂T)_VdT = C_VdT, and dU/dT = C_V. Likewise, the (∂H/∂P)_Tderivative in our dH = (∂H/∂T)_PdT + (∂H/∂P)_TdP expression is zero—for ideal gases, that is. Hence, dH = (∂H/∂T)_PdT = C_PdT, and dH/dT = C_P. Hence,

C_P/C_V = (dH/dT)/(dU/dT) = dH/dU

Does that make sense? If dH/dU = γ, then H must be some linear function of U. More specifically, H must be some function H = γU + c, with c some constant (it’s the so-called constant of integration). Now, γ is supposed to be constant too, of course. That’s all perfectly fine: indeed, combining the definition of H (H = U + PV), and using the PV = (γ–1)U relation, we have H = U + (γ–1)U = γU (hence, c = 0). So, yes, dH/dU = γ, and γ = C_P/C_V.

Note the qualifier, however: we’re assuming γ is constant (which does not imply the gas has to be ideal, so the interpretation is less restrictive than you might think it is). If γ is not a constant, it’s a different ballgame. […] So… Is γ actually constant? The illustration below shows γ is not constant for common diatomic gases like hydrogen or (somewhat less common) oxygen. It’s the same for other gases: when mentioning γ, we need to state the temperate at which we measured it too. 😦 However, the illustration also shows the assumption of γ being constant holds fairly well if temperature varies only slightly (like plus or minus 100° C), so that’s OK. 🙂

I told you so: the kinetic gas theory is not quite accurate. An approach involving molecular energy states works much better (and is actually correct, as it’s consistent with quantum theory). But so we are where we are and I’ll save the quantum-theoretical approach for later. 🙂

So… What’s left? Well… If you’d google the Wikipedia article on enthalphy in order to check if I am not writing nonsense, you’ll find it gives γ as the ratio of H and U itself: γ = H/U. That’s not wrong, obviously (γ = H/U = γU/U = γ), but that formula doesn’t really explain why γ is referred to as the specific heat ratio, which is what I wanted to do here.

OK. We’ve covered a lot of ground, but let’s reflect some more. We did not say a lot about entropy, and/or the relation between energy and entropy. Too bad… The relationship between entropy and energy is obviously not so simple as between enthalpy and energy. Indeed, because of that easy H = γU relationship, enthalpy emerges as just some auxiliary variable: some temporary variable we need to calculate something. Entropy is, obviously, something different. Unlike enthalpy, entropy involves very complicated thinking, involving (ir)reversibility and all that. So it’s quite deep, I’d say – but I’ll write more about that later. I think this post has gone as far as it should. 🙂

The Ideal versus the Actual Gas Law

Original post:

In previous posts, we referred, repeatedly, to the so-called ideal gas law, for which we have various expressions. The expression we derived from analyzing the kinetics involved when individual gas particles (atoms or molecules) move and collide was P·V = N·k·T, in which the variables are P (pressure), V (volume), N (the number of particles in the given volume), T (temperature) and k (the Boltzmann constant). We also wrote it as P·V = (2/3)·U, in which U represents the total energy, i.e. the sum of the energies of all gas particles. We also said the P·V = (2/3)·U formula was only valid for monatomic gases, in which case U is the kinetic energy of the center-of-mass motion of the atoms.

In order to provide some more generality, the equation is often written as P·V = (γ–1)·U. Hence, for monatomic gases, we have γ = 5/3. For a diatomic gas, we’ll also have vibrational and rotational kinetic energy. As we pointed out in a previous post, each independent direction of motion, i.e. each degree of freedom in the system, will absorb an amount of energy equal to k·T/2. For monatomic gases, we have three independent directions of motion (x, y, z) and, hence, the total energy U = 3·k·T/2 = (2/3)·U.

Finally, when we’re considering adiabatic expansion/compression only – so when we do not add or remove any heat to/from to the gas – we can also write the ideal gas law as PV^γ= C, with C some constant. [It is important to note that this PV^γ= C relation can be derived from the more general P·V = (γ–1)·U expression, but that the two expressions are not equivalent. Please have a look at the P.S. to this post on this, which shows how we get that PV^γ= constant expression, and talks a bit about its meaning.]

So what’s the gas law for diatomic gas, like O₂, i.e. oxygen? The key to the analysis of diatomic gases is, basically, a model which represents the oxygen molecule as two atoms connected by a spring, but with a force law that’s not as simplistic as Hooke’s law: we’re not looking at some linear force, but a force that’s referred to as a van der Waals force. The image below gives a vague idea of what that might imply. Remember: when moving an object in a force field, we change its potential energy, and the work done, as we move with or against the force, is equal to the change in potential energy. The graph below shows the force is anything but linear.

The illustration above is a graph of potential energy for two molecules, but we can also apply it for the ‘spring’ model for two atoms within a single molecule. For the detail, I’ll refer you to Feynman’s Lecture on this. It’s not that the full story is too complicated: it’s just too lengthy to reproduce it in this post. Just note the key point of the whole story: one arrives at a theoretical value for γ that is equal to γ = 9/7 ≈ 1.286. Wonderful! Yes. Except for the fact that value does not correspond to what is measured in reality: the experimentally confirmed value for γ for oxygen (O₂) is about 1.40.

What about other gases? When measuring the value for other diatomic gases, like iodine (I₂) or bromine (Br₂), we get a value closer to the theoretical value (1.30 and 1.32 respectively) but, still, there’s a variation to be explained here. The value for hydrogen H₂ is about 1.4, so that’s like oxygen again. For other gases, we again get different values. Why? What’s the problem?

It cannot be explained using classical theory. In addition, doing the measurements for oxygen and hydrogen at various temperatures also reveals that γ is a function of temperature, as shown below. Now that’s another experimental fact that does not line up with our kinetic theory of gases!

Reality is right, always. Hence, our theory must be wrong. Our analysis of the independent direction of motions inside of a molecule doesn’t work—even for the simple case of a diatomic molecule. Great minds such as James Clerk Maxwell couldn’t solve the puzzle in the 19th century and, hence, had to admit classical theory was in trouble. Indeed, popular belief has it that the black-body radiation problem was the only thing classical theory couldn’t explain in the late 19th century but that’s not true: there were many more problems keeping physicists awake. But so we’ve got a problem here. As Feynman writes: “We might try some force law other than a spring but it turns out that anything else will only make γ higher. If we include more forms of energy, γ approaches unity more closely, contradicting the facts. All the classical theoretical things that one can think of will only make it worse. The fact is that there are electrons in each atom, and we know from their spectra that there are internal motions; each of the electrons should have at least kT/2 of kinetic energy, and something for the potential energy, so when these are added in, γ gets still smaller. It is ridiculous. It is wrong.”

So what’s the answer? The answer is to be found in quantum mechanics. Indeed, one can develop a model distinguishing various molecular states with various energy levels E₀, E₁, E₂,…, E_i,…, and then associate a probability distribution which gives us the probability of finding a molecule in a particular state. Some more assumptions, all quite similar to the assumptions used by Planck when he solved the black-body radiation problem, then give us what we want: to put it simply, it is like some of the motions ‘freeze out’ at lower temperatures. As a result, γ goes up as we go down in temperature.

Hence, quantum mechanics saves the day, again. However, that’s not what I want to write about here. What I want to do here is to give you an equation for the internal energy of a gas which is based on what we can actually measure, so that’s pressure, volume and temperature. I’ll refer to it as the Actual Gas Law, because it takes into account that γ is not some fixed value (so it’s not some natural constant, like Planck’s or Boltzmann’s constant), and it also takes into account that we’re not always gas—ideal or actual gas—but also liquids and solids.

Now, we have many inter-connected variables here, and so the analysis is quite complicated. In fact, it’s a great opportunity to learn more about partial derivatives and how we can use them. So the lesson is as much about math as it about physics. In fact, it’s probably more about math. 🙂 Let’s see what we can make out of it.

Energy, work, force, pressure and volume

First, I should remind you that work is something that is done by a force on some object in the direction of the displacement of that object. Hence, work is force times distance. Now, because the force may actually vary as our object is being displaced and while the work is being done, we represent work as a line integral:

W = ∫F·ds

We write F and s in bold-face and, hence, we’ve got a vector dot product here, which ensures we only consider the component of the force in the direction of the displacement: F·Δs = |F|·|Δs|·cosθ, with θ the angle between the force and the displacement.

As for the relationship between energy and work, you know that one: as we do work on an object, we change its energy, and that’s what we are looking at here: the (internal) energy of our substance. Indeed, when we have a volume of gas exerting pressure, it’s the same thing: some force is involved (pressure is the force per unit area, so we write: P = F/A) and, using the model of the box with the frictionless piston (illustrated below), we write:

dW = F(–dx) = – PAdx = – PdV

The dW = – PdV formula is the one we use when looking at infinitesimal changes. When going through the full thing, we should integrate, as the volume (and the pressure) changes over the trajectory, so we write:

W = ∫PdV

Now, it is very important to note that the formulas above (dW = – PdV and W = ∫PdV) are always valid. Always? Yes. We don’t care whether or not the compression (or expansion) is adiabatic or isothermal. [To put it differently, we don’t care whether or not heat is added to (or removed from) the gas as it expands (or decreases in volume).] We also don’t keep track of the temperature here. It doesn’t matter. Work is work.

Now, as you know, an integral is some area under a graph so I can rephrase our result as follows: the work that is being done by a gas, as it expands (or the work that we need to put in in order to compress it), is the area under the pressure-volume graph, always.

Of course, as we go through a so-called reversible cycle, getting work out of it, and then putting some work back in, we’ll have some overlapping areas cancelling each other. That’s how we derived the amount of useful (i.e. net) work that can be done by an ideal gas engine (illustrated below) as it goes through a Carnot cycle, taking in some amount of heat Q₁ from one reservoir (which is usually referred to as the boiler) and delivering some other amount of heat (Q₁) to another reservoir (usually referred to as the condenser). As I don’t want to repeat myself too much, I’ll refer you to one of my previous posts for more details. Hereunder, I just present the diagram once again. If you want to understand anything of what follows, you need to understand it—thoroughly.

It’s important to note that work is being done in each of the four steps of the cycle, and that the work done by the gas is positive when it expands, and negative when its volume is being reduced. So, let me repeat: the W = ∫PdV formula is valid for both adiabatic as well as isothermal expansion/compression. We just need to be careful about the sign and see in which direction it goes. Having said that, it’s obvious adiabatic and isothermal expansion/compression are two very different things and, hence, their impact on the (internal) energy of the gas is quite different:

Adiabatic compression/expansion assumes that no (external) heat energy (Q) is added or removed and, hence, all the work done goes into changing the internal energy (U). Hence, we can write: W = PΔV = –ΔU and, therefore, ΔU = –PΔV. Of course, adiabatic compression/expansion must involve a change in temperature, as the kinetic energy of the gas molecules is being transferred from/to the piston. Hence, the temperature (which is nothing but the average kinetic energy of the molecules) changes.
In contrast, isothermal compression/expansion (i.e. a volume change without any change in temperature) must involve an exchange of heat energy with the surroundings so to allow the temperature to remain constant. So ΔQ ≠ 0 in this case.

The grand but simple formula capturing all is, obviously:

ΔU = ΔQ – PΔV

It says what we’ve said already: the internal energy of a substance (a gas) changes because some work is being done as its volume changes and/or because some heat is added or removed.

Now we have to get serious about partial derivatives, which relate one variable (the so-called ‘dependent’ variable) to another (the ‘independent’ variable). Of course, in reality, all depends on all and, hence, the distinction is quite artificial. Physicists tend to treat temperature and volume as the ‘independent’ variables, while chemists seem to prefer to think in terms of pressure and temperature. In math, it doesn’t matter all that much: we simply take the reciprocal and there you go: dy/dx = 1/(dx/dy). We go from one to another. Well… OK… We’ve got a lot of variables here, so… Yes. You’re right. It’s not going to be that simple, obviously! 🙂

Differential analysis

If we have some function f in two variables, x and y, then we can write: Δf = f(x + Δx, y + Δy) – f(x, y). We can then write the following clever thing:

What’s being said here is that we can approximate Δf using the partial derivatives ∂f/∂x and ∂f/∂y. Note that the formula above actually implies that we’re evaluating the (partial) ∂f/∂x derivative at point (x, y+Δy), rather than the point (x, y) itself. It’s a minor detail, but I think it’s good to signal it: this ‘clever thing’ is just pedagogical. [Feynman is the greatest teacher of all times! :-)] The mathematically correct approach is to simply give the formal definition of partial derivatives, and then just get on with it:

Now, let us apply that Δf formula to what we’re interested in, and that’s the change in the (internal) energy U. So we write:

Now, we can’t do anything with this, in practice, because we cannot directly measure the two partial derivatives. So, while this is an actual gas law (which is what we want), it’s not a practical one, because we can’t use it. 🙂 Let’s see what we can do about that. We need to find some formula for those partial derivatives. Let’s have a look at the (∂U/∂T)_Vfactor first. That factor is defined and referred to as the specific heat capacity at constant volume, and it’s usually denoted by C_V. Hence, we write:

C_V = specific heat capacity at constant volume = (∂U/∂T)_V

Heat capacity? But we’re talking internal energy here? It’s the same. Remember that ΔU = ΔQ – PΔV formula: if we keep the volume constant, then ΔV = 0 and, hence, ΔU = ΔQ. Hence, all of the change in internal energy (and I really mean all of the change) is the heat energy we’re adding or removing from the gas. Hence, we can also write C_V in its more usual definitional form:

C_V= (∂Q/∂T)_V

As for its interpretation, you should look at it as a ratio: C_Vis the amount of heat one must put into (or remove from) a substance in order to change its temperature by one degree with the volume held constant. Note that the term ‘specific heat capacity’ is usually referred to as the ‘specific heat’, as that’s shorter and simpler. However, you can see it’s some kind of ‘capacity’ indeed. More specifically, it’s a capacity of a substance to absorb heat. Now that’s stuff we can actually measure and, hence, we’re done with the first term in that ΔU = ΔT·(∂U/∂T)_V+ ΔV·(∂U/∂V)_Texpression, which we can now write as:

ΔT·(∂U/∂T)_V= ΔT·(∂Q/∂T)_V= ΔT·C_V

OK. So we’re done with the first term. Just to make sure we’re on the right track here, let’s have a quick look at the units here: the unit in which we should measure C_Vis, obviously, joule per degree (Kelvin), i.e. J/K. And then we multiply with ΔT, which is measured in degrees Kelvin, and we get some amount in Joule. Fine. We’re done, indeed. 🙂

Let’s look at the second term now, i.e. the ΔV·(∂U/∂V)_T term. Now, you may think that we could define C_T = (∂U/∂V)_Tas the specific heat capacity at constant temperature because… Well… Hmm… It is the amount of heat one must put into (or remove from) a substance in order to change its volume by one unit with the temperature held constant, isn’t it? So we write C_T = (∂U/∂V)_T = (∂Q/∂V)_T and we’re done here too, aren’t we?

NO! HUGE MISTAKE!

It’s not that simple. Two very different things are happening here. Indeed, the change in (internal) energy ΔU, as the volume changes by ΔV while keeping the temperature constant (we’re looking at that (∂U/∂V)_T factor here, and I’ll remind you of that subscript T a couple of times), consists of two parts:

First, the volume is not being kept constant and, hence, the internal energy (U) changes because work is being done.
Second, the internal energy (U) also changes because heat is being put in, so the temperature can be kept constant indeed.

So we cannot simplify. We’re stuck with the full thing: ΔU = ΔQ – PΔV, in which – PΔV is the (infinitesimal amount of) work that’s being done on the substance, and ΔQ is the (infinitesimal amount of) heat that’s being put in. What can we do? How can we relate this to actual measurables?

Now, the logic is quite abstruse, so please be patient and bear with me. The key to the analysis is that diagram of the reversible Carnot cycle, with the shaded area representing the net work that’s being done, except that we’re now talking infinitesimally small changes in volume, temperature and pressure. So we redraw the diagram and get something like this:

Now, you can easily see the equivalence between the shaded area and the ΔPΔV rectangle below:

So the work done by the gas is the shaded area, whose surface is equal to ΔPΔV. […] But… Hey, wait a minute! You should object: we are not talking ideal engines here and, hence, we are not going through a full Carnot cycle, are we? We’re calculating the change in internal energy when the temperature changes with ΔT, the volume changes with ΔV, and the pressure changes with ΔP. Full stop. So we’re not going back to where we came from and, hence, we should not be analyzing this thing using the Carnot cycle, should we? Well… Yes and no. More yes than no. Remember we’re looking at the second term only here: ΔV·(∂U/∂V)_T. So we are changing the volume (and, hence, the internal energy) but the subscript in the (∂U/∂V)_Tterm makes it clear we’re doing so at constant temperature. In practice, that means we’re looking at a theoretical situation here that assumes a complete and fully reversible cycle indeed. Hence, the conceptual idea is, indeed, that we put some heat in, that the gas does some work as it expands, and that we then are actually putting some work back in to bring the gas back to its original temperature T. So, in short, yes, the reversible cycle idea applies.

[…] I know, it’s very confusing. I am actually struggling with the analysis myself, so don’t be too hard on yourself. Think about it, but don’t lose sleep over it. 🙂 I added a note on it in the P.S. to this post on it so you can check that out too. However, I need to get back to the analysis itself here. From our discussion of the Carnot cycle and ideal engines, we know that the work done is equal to the difference between the heat that’s being put in and the heat that’s being delivered: W = Q₁ – Q₂. Now, because we’re talking reversible processes here, we also know that Q₁/T₁ = Q₂/T₂. Hence, Q₂ = (T ₂/T₁)Q₁ and, therefore, the work done is also equal to W = Q₁– (T ₂/T₁)Q₁ = Q₁(1 – T ₂/T₁) = Q₁[(T₁– T₂)/T₁]= Q₁(ΔT/T₁). Let’s now drop the subscripts by equating Q₁ with ΔQ, so we have:

W = ΔQ(ΔT/T)

You should note that ΔQ is not the difference between Q₁ and Q₂. It is not. ΔQ is the heat we put in as it expands isothermally from volume V to volume V + ΔV. I am explicit about it because the Δ symbol usually denotes some difference between two values. In case you wonder how we can do away with Q₂, think about it. […] The answer is that we did not really get away with it: the information is captured in the ΔT factor, as T–ΔT is the final temperature reached by the gas as it expands adiabatically on the second leg of the cycle, and the change in temperature obviously depends on Q₂! Again, it’s all quite confusing because we’re looking at infinitesimal changes only, but the analysis is valid. [Again, go through the P.S. of this post if you want more remarks on this, although I am not sure they’re going to help you much. The logic is really very deep.]

[…] OK… I know you’re getting tired, but we’re almost done. Hang in there. So what do we have now? The work done by the gas as it goes through this infinitesimally small cycle is the shaded area in the diagram above, and it is equal to:

W = ΔPΔV = ΔQ(ΔT/T)

From this, it follows that ΔQ = T·ΔV·ΔP/ΔT. Now, you should look at the diagram once again to check what ΔP actually stands for: it’s the change in pressure when the temperature changes at constant volume. Hence, using our partial derivative notation, we write:

ΔP/ΔT = (∂P/∂T)_V

We can now write ΔQ = T·ΔV·(∂P/∂T)_Vand, therefore, we can re-write ΔU = ΔQ – PΔV as:

ΔU = T·ΔV·(∂P/∂T)_V– PΔV

Now, dividing both sides by ΔV, and writing all using the partial derivative notation, we get:

ΔU/ΔV = (∂U/∂V)_T= T·(∂P/∂T)_V– P

So now we know how to calculate the (∂U/∂V)_Tfactor, from measurable stuff, in that ΔU = ΔT·(∂U/∂T)_V+ ΔV·(∂U/∂V)_Texpression, and so we’re done. Let’s write it all out:

ΔU = ΔT·(∂U/∂T)_V+ ΔV·(∂U/∂V)_T= ΔT·C_V+ ΔV·[T·(∂P/∂T)_V– P]

Phew! That was tough, wasn’t it? It was. Very tough. As far as I am concerned, this is probably the toughest of all I’ve written so far.

Dependent and independent variables

Let’s pause to take stock of what we’ve done here. The expressions above should make it clear we’re actually treating temperature and volume as the independent variables, and pressure and energy as the dependent variables, or as functions of (other) variables, I should say. Let’s jot down the key equations once more:

ΔU = ΔQ – PΔV
ΔU = ΔT·(∂U/∂T)_V+ ΔV·(∂U/∂V)_T
(∂U/∂T)_V= (∂Q/∂T)_V = C_V
(∂U/∂V)_T= T·(∂P/∂T)_V– P

It looks like Chinese, doesn’t it? 🙂 What can we do with this? Plenty. Especially the first equation is really handy for analyzing and solving various practical problems. The second equation is much more difficult and, hence, less practical. But let’s try to apply this equation for actual gases to an ideal gas—just to see if we’re getting our ideal gas law once again. 🙂 We know that, for an ideal gas, the internal energy depends on temperature, not on V. Indeed, if we change the volume but we keep the temperature constant, the internal energy should be the same, as it only depends on the motion of the molecules and their number. Hence, (∂U/∂V)_Tmust equal zero and, hence, T·(∂P/∂T)_V– P = 0. Replacing the partial derivative with an ordinary one (not forgetting that the volume is kept constant), we get:

T·(dP/dT)– P = 0 (constant volume)

⇔ (1/P)·(dP/dT) = 1/T (constant volume)

Integrating both sides yields: lnP = lnT + constant. This, in turn, implies that P = T × constant. [Just re-write the first constant as the (natural) logarithm of some other constant, i.e. the second constant, obviously).] Now that’s consistent with our ideal gas P = NkT/V, because N, k and V are all constant. So, yes, the ideal gas law is a special case of our more general thermodynamical expression. Fortunately! 🙂

That’s not very exciting, you’ll say—and you’re right. You may be interested – although I doubt it 🙂 – in the chemists’ world view: they usually have performance data (read: values for derivatives) measured under constant pressure. The equations above then transform into:

ΔH = Δ(U + P·V) = ΔQ + VΔP
ΔH = ΔT·(∂H/∂T)_P+ ΔP·(∂H/∂P)_T
(∂H/∂P)_T= –T·(∂V/∂T)_P+ V

H? Yes. H is another so-called state variable, so it’s like entropy or internal energy but different. As they say in Asia: “Same-same but different.” 🙂 It’s defined as H = U + PV and its name is enthalpy. Why do we need it? Because some clever man noted that, if you take the total differential of P·V, i.e. Δ(P·V) = P·ΔV + V·ΔP, and our ΔU = ΔQ – P·ΔV expression, and you add both sides of both expressions, you get Δ(U + P·V) = ΔQ + VΔP. So we’ve substituted –P for V – so as to please the chemists – and all our equations hold provided we substitute U for H and, importantly, –P for V. [Note the sign switch is to be applied to derivatives as well: if we substitute P for –V, then ∂P/∂T becomes ∂(–V)/∂T = –(∂V/∂T)!

So that’s the chemists’ model of the world, and they’ll usually measure the specific heat capacity at constant pressure, rather than at constant volume. Indeed, one can show the following:

(∂H/∂T)_P= (∂Q/∂T)_P= C_P = the specific heat capacity at constant pressure

In short, while we referred to γ as the specific heat ratio in our previous posts, assuming we’re talking ideal gases only, we can now appreciate the fact there is actually no such thing as the specific heat: there are various variables and, hence, various definitions. Indeed, it’s not only pressure or volume: the specific heat capacity of some substance will usually also be expressed as a function of its mass (i.e. per kg), the number of particles involved (i.e. per mole), or its volume (i.e. per m³). In that case, we talk about the molar or volumetric heat capacity respectively. The name for the same thing expressed in joule per degree Kelvin and per kg (J/kg·K) is the same: specific heat capacity. So we’ve got three different concepts here, and two ways of measuring them: at constant pressure or at constant volume. No wonder one gets quite confused when googling tables listing the actual values! 🙂

Now, there’s one question left: why is γ being referred to as the specific heat ratio? The answer is simple: it actually is the ratio of the specific heat capacities C_P and C_V. Hence, γ is equal to:

γ = C_P/C_V

I could show you how that works. However, I would just be copying the Wikipedia article on it, so I won’t do that: you’re sufficiently knowledgeable now to check it out yourself, and verify it’s actually true. Good luck with it ! In the process, please also do check why C_Pis always larger than C_Vso you can explain why γ is always larger than one. 🙂

Post scriptum: As usual, Feynman’s Lectures, were the inspiration here—once more. Now, Feynman has a habit of ‘integrating’ expressions and, frankly, I never found a satisfactory answer to a pretty elementary question: integration in regard to what variable? His exposé on both the ideal as well as the actual gas law has enlightened me. The answer is simple: it doesn’t matter. 🙂 Let me show that by analyzing the following argument of Feynman:

So… What is that ‘integration’ that ‘yields’ that γlnV + lnP = lnC expression? Are we solving some differential equation here? Well… Yes. But let’s be practical and take the derivative of the expression in regard to V, P and T respectively. Let’s first see where we come from. The fundamental equation is PV = (γ–1)U. That means we’ve got two ‘independent’ variables, and one that ‘depends’ on the others: if we fix P and V, we have U, or if we fix U, then P and V are in an inversely proportional relationship. That’s easy enough. We’ve got three ‘variables’ here: U, P and V—or, in differential form, dU, dP and dV. However, Feynman eliminates one by noting that dU = –PdV. He rightly notes we can only do that because we’re talking adiabatic expansion/compression here: all the work done while expanding/compressing the gas goes into changing the internal energy: no heat is added or removed. Hence, there is no dQ term here.

So we are left with two ‘variables’ only now: P and V, or dP and dV when talking differentials. So we can choose: P depends on V, or V depends on P. If we think of V as the independent variable, we can write:

d[γ·lnV + lnP]/dV = γ·(1/V)·(dV/dV) + (1/P)·(dP/dV), while d[lnC]/dV = 0

So we have γ·(1/V)·(dV/dV) + (1/P)·(dP/dV) = 0, and we can then multiply sides by dV to get:

(γ·dV/V) + (dP/P) = 0,

which is the core equation in this argument, so that’s the one we started off with. Picking P as the ‘independent’ variable and, hence, integrating with respect to P yields the same:

d[γ·lnV + lnP]/dP = γ·(1/V)·(dV/dP) + (1/P)·(dP/dP), while d[lnC]/dP = 0

Multiplying both sides by dP yields the same thing: (γ·dV/V) + (dP/P) = 0. So it doesn’t matter, indeed. But let’s be smart and assume both P and V, or dP and dV, depend on some implicit variable—a parameter really. The obvious candidate is temperature (T). So we’ll now integrate and differentiate in regard to T. We get:

d[γ·lnV + lnP]/dT = γ·(1/V)·(dV/dT) + (1/P)·(dP/dT), while d[lnC]/dT = 0

We can, once again, multiply both sides with dT and – surprise, surprise! – we get the same result:

(γ·dV/V) + (dP/P) = 0

The point is that the γlnV + lnP = lnC expression is damn valid, and C or lnC or whatever is ‘the constant of integration’ indeed, in regard to whatever variable: it doesn’t matter. So then we can, indeed, take the exponential of both sides (which is much more straightforward than ‘integrating both sides’), so we get:

e^{γlnV + lnP}= e^lnC= C

It then doesn’t take too much intelligence to see that e^{γlnV + lnP}= e^{(lnV)^γ}^+lnP= e^{(lnV)^γ}·e^lnP= V^γ·P = P·V^γ. So we’ve got the grand result that what we wanted: PV^γ= C, with C some constant determined by the situation we’re in (think of the size of the box, or the density of the gas).

So, yes, we’ve got a ‘law’ here. We should just remind ourselves, always, that it’s only valid when we’re talking adiabatic compression or expansion: so we we do not add or remove heat energy or, as Feynman puts it, much more succinctly, “no heat is being lost“. And, of course, we’re also talking ideal gases only—which excludes a number of real substances. 🙂 In addition, we’re talking adiabatic processes only: we’re not adding nor removing heat.

It’s a weird formula: the pressure times the volume to the 5/3 power is a constant for monatomic gas. But it works: as long as individual atoms are not bound to each other, the law holds. As mentioned above, when various molecular states, with associated energy levels are at play, it becomes an entirely different ballgame. 🙂

I should add one final note as to the functional form of PV^γ= C. We can re-write it as P = C/V^γ. Because The shape of that graph is similar to the P = NkT/V relationship we started off with. Putting the two equations side by side, makes it clear our constant and temperature are obviously related one to another, but they are not directly proportional to each other. In fact, as the graphs below clearly show, the P = NkT/V gives us these isothermal lines on the pressure-volume graph (i.e. they show P and V are related at constant temperature), while the P = C/V^γ equation gives us the adiabatic lines. Just google an online function graph tool, and you can now draw your own diagrams of the Carnot cycle! Just change the denominator (i.e. the constants C and T in both equations). 🙂

Now, I promised I would say something more about that infinitesimal Carnot cycle: why is it there? Why don’t we limit the analysis to just the first two steps? In fact, the shortest and best explanation I can give is something like this: think of the whole cycle as the first step in a reversible process really. We put some heat in (ΔQ) and the gas does some work, but so that heat has to go through the whole body of gas, and the energy has to go somewhere too. In short, the heat and the work is not being absorbed by the surroundings but it all stays in the ‘system’ that we’re analyzing, so to speak, and that’s why we’re going through the full cycle, not the first two steps only. Now, this ‘answer’ may or may not satisfy you, but I can’t do better. You may want to check Feynman’s explanation itself, but he’s very short on this and, hence, I think it won’t help you much either. 😦

The Strange Theory of Light and Matter (III)

Pre-script (dated 26 June 2020): This post has become less relevant (even irrelevant, perhaps) because my views on all things quantum-mechanical have evolved significantly as a result of my progression towards a more complete realist (classical) interpretation of quantum physics. I keep blog posts like these mainly because I want to keep track of where I came from. I might review them one day, but I currently don’t have the time or energy for it. 🙂

Original post:

This is my third and final comments on Feynman’s popular little booklet: The Strange Theory of Light and Matter, also known as Feynman’s Lectures on Quantum Electrodynamics (QED).

The origin of this short lecture series is quite moving: the death of Alix G. Mautner, a good friend of Feynman’s. She was always curious about physics but her career was in English literature and so she did not manage the math. Hence, Feynman introduces this 1985 publication by writing: “Here are the lectures I really prepared for Alix, but unfortunately I can’t tell them to her directly, now.”

Alix Mautner died from a brain tumor, and it is her husband, Leonard Mautner, who sponsored the QED lectures series at the UCLA, which Ralph Leigton transcribed and published as the booklet that we’re talking about here. Feynman himself died a few years later, at the relatively young age of 69. Tragic coincidence: he died of cancer too. Despite all this weirdness, Feynman’s QED never quite got the same iconic status of, let’s say, Stephen Hawking’s Brief History of Time. I wonder why, but the answer to that question is probably in the realm of chaos theory. 🙂 I actually just saw the movie on Stephen Hawking’s life (The Theory of Everything), and I noted another strange coincidence: Jane Wilde, Hawking’s first wife, also has a PhD in literature. It strikes me that, while the movie documents that Jane Wilde gave Hawking three children, after which he divorced her to marry his nurse, Elaine, the movie does not mention that he separated from Elaine too, and that he has some kind of ‘working relationship’ with Jane again.

Hmm… What to say? I should get back to quantum mechanics here or, to be precise, to quantum electrodynamics.

One reason why Feynman’s Strange Theory of Light and Matter did not sell like Hawking’s Brief History of Time, might well be that, in some places, the text is not entirely accurate. Why? Who knows? It would make for an interesting PhD thesis in History of Science. Unfortunately, I have no time for such PhD thesis. Hence, I must assume that Richard Feynman simply didn’t have much time or energy left to correct some of the writing of Ralph Leighton, who transcribed and edited these four short lectures a few years before Feynman’s death. Indeed, when everything is said and done, Ralph Leighton is not a physicist and, hence, I think he did compromise – just a little bit – on accuracy for the sake of readability. Ralph Leighton’s father, Robert Leighton, an eminent physicist who worked with Feynman, would probably have done a much better job.

I feel that one should not compromise on accuracy, even when trying to write something reader-friendly. That’s why I am writing this blog, and why I am writing three posts specifically on this little booklet. Indeed, while I’d warmly recommend that little book on QED as an excellent non-mathematical introduction to the weird world of quantum mechanics, I’d also say that, while Ralph Leighton’s story is great, it’s also, in some places, not entirely accurate indeed.

So… Well… I want to do better than Ralph Leighton here. Nothing more. Nothing less. 🙂 Let’s go for it.

I. Probability amplitudes: what are they?

The greatest achievement of that little QED publication is that it manages to avoid any reference to wave functions and other complicated mathematical constructs: all of the complexity of quantum mechanics is reduced to three basic events or actions and, hence, three basic amplitudes which are represented as ‘arrows’—literally.

Now… Well… You may or may not know that a (probability) amplitude is actually a complex number, but it’s not so easy to intuitively understand the concept of a complex number. In contrast, everyone easily ‘gets’ the concept of an ‘arrow’. Hence, from a pedagogical point of view, representing complex numbers by some ‘arrow’ is truly a stroke of genius.

Whatever we call it, a complex number or an ‘arrow’, a probability amplitude is something with (a) a magnitude and (b) a phase. As such, it resembles a vector, but it’s not quite the same, if only because we’ll impose some restrictions on the magnitude. But I shouldn’t get ahead of myself. Let’s start with the basics.

A magnitude is some real positive number, like a length, but you should not associate it with some spatial dimension in physical space: it’s just a number. As for the phase, we could associate that concept with some direction but, again, you should just think of it as a direction in a mathematical space, not in the real (physical) space.

Let me insert a parenthesis here. If I say the ‘real’ or ‘physical’ space, I mean the space in which the electrons and photons and all other real-life objects that we’re looking at exist and move. That’s a non-mathematical definition. In fact, in math, the real space is defined as a coordinate space, with sets of real numbers (vectors) as coordinates, so… Well… That’s a mathematical space only, not the ‘real’ (physical) space. So the real (vector) space is not real. 🙂 The mathematical real space may, or may not, accurately describe the real (physical) space. Indeed, you may have heard that physical space is curved because of the presence of massive objects, which means that the real coordinate space will actually not describe it very accurately. I know that’s a bit confusing but I hope you understand what I mean: if mathematicians talk about the real space, they do not mean the real space. They refer to a vector space, i.e. a mathematical construct. To avoid confusion, I’ll use the term ‘physical space’ rather than ‘real’ space in the future. So I’ll let the mathematicians get away with using the term ‘real space’ for something that isn’t real actually. 🙂

End of digression. Let’s discuss these two mathematical concepts – magnitude and phase – somewhat more in detail.

A. The magnitude

Let’s start with the magnitude or ‘length’ of our arrow. We know that we have to square these lengths to find some probability, i.e. some real number between 0 and 1. Hence, the length of our arrows cannot be larger than one. That’s the restriction I mentioned already, and this ‘normalization’ condition reinforces the point that these ‘arrows’ do not have any spatial dimension (not in any real space anyway): they represent a function. To be specific, they represent a wavefunction.

If we’d be talking complex numbers instead of ‘arrows’, we’d say the absolute value of the complex number cannot be larger than one. We’d also say that, to find the probability, we should take the absolute square of the complex number, so that’s the square of the magnitude or absolute value of the complex number indeed. We cannot just square the complex number: it has to be the square of the absolute value.

Why? Well… Just write it out. [You can skip this section if you’re not interested in complex numbers, but I would recommend you try to understand. It’s not that difficult. Indeed, if you’re reading this, you’re most likely to understand something of complex numbers and, hence, you should be able to work your way through it. Just remember that a complex number is like a two-dimensional number, which is why it’s sometimes written using bold-face (z), rather than regular font (z). However, I should immediately add this convention is usually not followed. I like the boldface though, and so I’ll try to use it in this post.] The square of a complex number z = a + bi is equal to z²= a²+ 2abi – b², while the square of its absolute value (i.e. the absolute square) is |z|²= [√(a²+ b²)]² = a²+ b². So you can immediately see that the square and the absolute square of a complex numbers are two very different things indeed: it’s not only the 2abi term, but there’s also the minus sign in the first expression, because of the i²= –1 factor. In case of doubt, always remember that the square of a complex number may actually yield a negative number, as evidenced by the definition of the imaginary unit itself: i²= –1.

End of digression. Feynman and Leighton manage to avoid any reference to complex numbers in that short series of four lectures and, hence, all they need to do is explain how one squares a length. Kids learn how to do that when making a square out of rectangular paper: they’ll fold one corner of the paper until it meets the opposite edge, forming a triangle first. They’ll then cut or tear off the extra paper, and then unfold. Done. [I could note that the folding is a 90 degree rotation of the original length (or width, I should say) which, in mathematical terms, is equivalent to multiplying that length with the imaginary unit (i). But I am sure the kids involved would think I am crazy if I’d say this. 🙂 So let me get back to Feynman’s arrows.

B. The phase

Feynman and Leighton’s second pedagogical stroke of genius is the metaphor of the ‘stopwatch’ and the ‘stopwatch hand’ for the variable phase. Indeed, although I think it’s worth explaining why z = a + bi = rcosφ + irsinφ in the illustration below can be written as z = re^iφ= |z|e^iφ, understanding Euler’s representation of complex number as a complex exponential requires swallowing a very substantial piece of math and, if you’d want to do that, I’ll refer you to one of my posts on complex numbers).

The metaphor of the stopwatch represents a periodic function. To be precise, it represents a sinusoid, i.e. a smooth repetitive oscillation. Now, the stopwatch hand represents the phase of that function, i.e. the φ angle in the illustration above. That angle is a function of time: the speed with which the stopwatch turns is related to some frequency, i.e. the number of oscillations per unit of time (i.e. per second).

You should now wonder: what frequency? What oscillations are we talking about here? Well… As we’re talking photons and electrons here, we should distinguish the two:

For photons, the frequency is given by Planck’s energy-frequency relation, which relates the energy (E) of a photon (1.5 to 3.5 eV for visible light) to its frequency (ν). It’s a simple proportional relation, with Planck’s constant (h) as the proportionality constant: E = hν, or ν = E/h.
For electrons, we have the de Broglie relation, which looks similar to the Planck relation (E = hf, or f = E/h) but, as you know, it’s something different. Indeed, these so-called matter waves are not so easy to interpret because there actually is no precise frequency f. In fact, the matter wave representing some particle in space will consist of a potentially infinite number of waves, all superimposed one over another, as illustrated below.

For the sake of accuracy, I should mention that the animation above has its limitations: the wavetrain is complex-valued and, hence, has a real as well as an imaginary part, so it’s something like the blob underneath. Two functions in one, so to speak: the imaginary part follows the real part with a phase difference of 90 degrees (or π/2 radians). Indeed, if the wavefunction is a regular complex exponential re^iθ, then rsin(φ–π/2) = rcos(φ), which proves the point: we have two functions in one here. 🙂 I am actually just repeating what I said before already: the probability amplitude, or the wavefunction, is a complex number. You’ll usually see it written as Ψ (psi) or Φ (phi). Here also, using boldface (Ψ or Φ instead of Ψ or Φ) would usefully remind the reader that we’re talking something ‘two-dimensional’ (in mathematical space, that is), but this convention is usually not followed.

In any case… Back to frequencies. The point to note is that, when it comes to analyzing electrons (or any other matter-particle), we’re dealing with a range of frequencies f really (or, what amounts to the same, a range of wavelengths λ) and, hence, we should write Δf = ΔE/h, which is just one of the many expressions of the Uncertainty Principle in quantum mechanics.

Now, that’s just one of the complications. Another difficulty is that matter-particles, such as electrons, have some rest mass, and so that enters the energy equation as well (literally). Last but not least, one should distinguish between the group velocity and the phase velocity of matter waves. As you can imagine, that makes for a very complicated relationship between ‘the’ wavelength and ‘the’ frequency. In fact, what I write above should make it abundantly clear that there’s no such thing as the wavelength, or the frequency: it’s a range really, related to the fundamental uncertainty in quantum physics. I’ll come back to that, and so you shouldn’t worry about it here. Just note that the stopwatch metaphor doesn’t work very well for an electron!

In his postmortem lectures for Alix Mautner, Feynman avoids all these complications. Frankly, I think that’s a missed opportunity because I do not think it’s all that incomprehensible. In fact, I write all that follows because I do want you to understand the basics of waves. It’s not difficult. High-school math is enough here. Let’s go for it.

One turn of the stopwatch corresponds to one cycle. One cycle, or 1 Hz (i.e. one oscillation per second) covers 360 degrees or, to use a more natural unit, 2π radians. [Why is radian a more natural unit? Because it measures an angle in terms of the distance unit itself, rather than in arbitrary 1/360 cuts of a full circle. Indeed, remember that the circumference of the unit circle is 2π.] So our frequency ν (expressed in cycles per second) corresponds to a so-called angular frequency ω = 2πν. From this formula, it should be obvious that ω is measured in radians per second.

We can also link this formula to the period of the oscillation, T, i.e. the duration of one cycle. T = 1/ν and, hence, ω = 2π/T. It’s all nicely illustrated below. [And, yes, it’s an animation from Wikipedia: nice and simple.]

The easy math above now allows us to formally write the phase of a wavefunction – let’s denote the wavefunction as φ (phi), and the phase as θ (theta) – as a function of time (t) using the angular frequency ω. So we can write: θ = ωt = 2π·ν·t. Now, the wave travels through space, and the two illustrations above (i.e. the one with the super-imposed waves, and the one with the complex wave train) would usually represent a wave shape at some fixed point in time. Hence, the horizontal axis is not t but x. Hence, we can and should write the phase not only as a function of time but also of space. So how do we do that? Well… If the hypothesis is that the wave travels through space at some fixed speed c, then its frequency ν will also determine its wavelength λ. It’s a simple relationship: c = λν (the number of oscillations per second times the length of one wavelength should give you the distance traveled per second, so that’s, effectively, the wave’s speed).

Now that we’ve expressed the frequency in radians per second, we can also express the wavelength in radians per unit distance too. That’s what the wavenumber does: think of it as the spatial frequency of the wave. We denote the wavenumber by k, and write: k = 2π/λ. [Just do a numerical example when you have difficulty following. For example, if you’d assume the wavelength is 5 units distance (i.e. 5 meter) – that’s a typical VHF radio frequency: ν = (3×10⁸m/s)/(5 m) = 0.6×10⁸Hz = 60 MHz – then that would correspond to (2π radians)/(5 m) ≈ 1.2566 radians per meter. Of course, we can also express the wave number in oscillations per unit distance. In that case, we’d have to divide k by 2π, because one cycle corresponds to 2π radians. So we get the reciprocal of the wavelength: 1/λ. In our example, 1/λ is, of course, 1/5 = 0.2, so that’s a fifth of a full cycle. You can also think of it as the number of waves (or wavelengths) per meter: if the wavelength is λ, then one can fit 1/λ waves in a meter.

Now, from the ω = 2πν, c = λν and k = 2π/λ relations, it’s obvious that k = 2π/λ = 2π/(c/ν) = (2πν)/c = ω/c. To sum it all up, frequencies and wavelengths, in time and in space, are all related through the speed of propagation of the wave c. More specifically, they’re related as follows:

c = λν = ω/k

From that, it’s easy to see that k = ω/c, which we’ll use in a moment. Now, it’s obvious that the periodicity of the wave implies that we can find the same phase by going one oscillation (or a multiple number of oscillations back or forward in time, or in space. In fact, we can also find the same phase by letting both time and space vary. However, if we want to do that, it should be obvious that we should either (a) go forward in space and back in time or, alternatively, (b) go back in space and forward in time. In other words, if we want to get the same phase, then time and space sort of substitute for each other. Let me quote Feynman on this: “This is easily seen by considering the mathematical behavior of a. Evidently, if we add a little time , we get the same value for as we would have if we had subtracted a little distance: .” The variable a stands for the acceleration of an electric charge here, causing an electromagnetic wave, but the same logic is valid for the phase, with a minor twist though: we’re talking a nice periodic function here, and so we need to put the angular frequency in front. Hence, the rate of change of the phase in respect to time is measured by the angular frequency ω. In short, we write:

θ = ω(t–x/c) = ωt–kx

Hence, we can re-write the wavefunction, in terms of its phase, as follows:

φ(θ) = φ[θ(x, t)] = φ[ωt–kx]

Note that, if the wave would be traveling in the ‘other’ direction (i.e. in the negative x-direction), we’d write φ(θ) = φ[kx+ωt]. Time travels in one direction only, of course, but so one minus sign has to be there because of the logic involved in adding time and subtracting distance. You can work out an example (with a sine or cosine wave, for example) for yourself.

So what, you’ll say? Well… Nothing. I just hope you agree that all of this isn’t rocket science: it’s just high-school math. But so it shows you what that stopwatch really is and, hence, I – but who am I? – would have put at least one or two footnotes on this in a text like Feynman’s QED.

Now, let me make a much longer and more serious digression:

Digression 1: on relativity and spacetime

As you can see from the argument (or phase) of that wave function φ(θ) = φ[θ(x, t)] = φ[ωt–kx] = φ[–k(x–ct)], any wave equation establishes a deep relation between the wave itself (i.e. the ‘thing’ we’re describing) and space and time. In fact, that’s what the whole wave equation is all about! So let me say a few things more about that.

Because you know a thing or two about physics, you may ask: when we’re talking time, whose time are we talking about? Indeed, if we’re talking photons going from A to B, these photons will be traveling at or near the speed of light and, hence, their clock, as seen from our (inertial) frame of reference, doesn’t move. Likewise, according to the photon, our clock seems to be standing still.

Let me put the issue to bed immediately: we’re looking at things from our point of view. Hence, we’re obviously using our clock, not theirs. Having said that, the analysis is actually fully consistent with relativity theory. Why? Well… What do you expect? If it wasn’t, the analysis would obviously not be valid. 🙂 To illustrate that it’s consistent with relativity theory, I can mention, for example, that the (probability) amplitude for a photon to travel from point A to B depends on the spacetime interval, which is invariant. Hence, A and B are four-dimensional points in spacetime, involving both spatial as well as time coordinates: A = (x_A, y_A, z_A, t_A) and B = (x_B, y_B, z_B, t_B). And so the ‘distance’ – as measured through the spacetime interval – is invariant.

Now, having said that, we should draw some attention to the intimate relationship between space and time which, let me remind you, results from the absoluteness of the speed of light. Indeed, one will always measure the speed of light c as being equal to 299,792,458 m/s, always and everywhere. It does not depend on your reference frame (inertial or moving). That’s why the constant c anchors all laws in physics, and why we can write what we write above, i.e. include both distance (x) as well as time (t) in the wave function φ = φ(x, t) = φ[ωt–kx] = φ[–k(x–ct)]. The k and ω are related through the ω/k = c relationship: the speed of light links the frequency in time (ν = ω/2π = 1/T) with the frequency in space (i.e. the wavenumber or spatial frequency k). There is only degree of freedom here: the frequency—in space or in time, it doesn’t matter: ν and ω are not independent. [As noted above, the relationship between the frequency in time and in space is not so obvious for electrons, or for matter waves in general: for those matter-waves, we need to distinguish group and phase velocity, and so we don’t have a unique frequency.]

Let me make another small digression within the digression here. Thinking about travel at the speed of light invariably leads to paradoxes. In previous posts, I explained the mechanism of light emission: a photon is emitted – one photon only – when an electron jumps back to its ground state after being excited. Hence, we may imagine a photon as a transient electromagnetic wave–something like what’s pictured below. Now, the decay time of this transient oscillation (τ) is measured in nanoseconds, i.e. billionths of a second (1 ns = 1×10^–9s): the decay time for sodium light, for example, is some 30 ns only.

However, because of the tremendous speed of light, that still makes for a wavetrain that’s like ten meter long, at least (30×10^–9s times 3×10⁸m/s is nine meter, but you should note that the decay time measures the time for the oscillation to die out by a factor 1/e, so the oscillation itself lasts longer than that). Those nine or ten meters cover like 16 to 17 million oscillations (the wavelength of sodium light is about 600 nm and, hence, 10 meter fits almost 17 million oscillations indeed). Now, how can we reconcile the image of a photon as a ten-meter long wavetrain with the image of a photon as a point particle?

The answer to that question is paradoxical: from our perspective, anything traveling at the speed of light – including this nine or ten meter ‘long’ photon – will have zero length because of the relativistic length contraction effect. Length contraction? Yes. I’ll let you look it up, because… Well… It’s not easy to grasp. Indeed, from the three measurable effects on objects moving at relativistic speeds – i.e. (1) an increase of the mass (the energy needed to further accelerate particles in particle accelerators increases dramatically at speeds nearer to c), (2) time dilation, i.e. a slowing down of the (internal) clock (because of their relativistic speeds when entering the Earth’s atmosphere, the measured half-life of muons is five times that when at rest), and (3) length contraction – length contraction is probably the most paradoxical of all.

Let me end this digression with yet another short note. I said that one will always measure the speed of light c as being equal to 299,792,458 m/s, always and everywhere and, hence, that it does not depend on your reference frame (inertial or moving). Well… That’s true and not true at the same time. I actually need to nuance that statement a bit in light of what follows: an individual photon does have an amplitude to travel faster or slower than c, and when discussing matter waves (such as the wavefunction that’s associated with an electron), we can have phase velocities that are faster than light! However, when calculating those amplitudes, c is a constant.

That doesn’t make sense, you’ll say. Well… What can I say? That’s how it is unfortunately. I need to move on and, hence, I’ll end this digression and get back to the main story line. Part I explained what probability amplitudes are—or at least tried to do so. Now it’s time for part II: the building blocks of all of quantum electrodynamics (QED).

II. The building blocks: P(A to B), E(A to B) and j

The three basic ‘events’ (and, hence, amplitudes) in QED are the following:

1. P(A to B)

P(A to B) is the (probability) amplitude for a photon to travel from point A to B. However, I should immediately note that A and B are points in spacetime. Therefore, we associate them not only with some specific (x, y, z) position in space, but also with a some specific time t. Now, quantum-mechanical theory gives us an easy formula for P(A to B): it depends on the so-called (spacetime) interval between the two points A and B, i.e. I = Δr²– Δt²= (x₂–x₁)²+(y₂–y₁)²+(z₂–z₁)²– (t₂–t₁)². The point to note is that the spacetime interval takes both the distance in space as well as the ‘distance’ in time into account. As I mentioned already, this spacetime interval does not depend on our reference frame and, hence, it’s invariant (as long as we’re talking reference frames that move with constant speed relative to each other). Also note that we should measure time and distance in equivalent units when using that Δr²– Δt²formula for I. So we either measure distance in light-seconds or, else, we measure time in units that correspond to the time that’s needed for light to travel one meter. If no equivalent units are adopted, the formula is I = Δr²– c·Δt².

Now, in quantum theory, anything is possible and, hence, not only do we allow for crooked paths, but we also allow for the difference in time to differ from the time you’d expect a photon to need to travel along some curve (whose length we’ll denote by l), i.e. l/c. Hence, our photon may actually travel slower or faster than the speed of light c! There is one lucky break, however, that makes all come out alright: it’s easy to show that the amplitudes associated with the odd paths and strange timings generally cancel each other out. [That’s what the QED booklet shows.] Hence, what remains, are the paths that are equal or, importantly, those that very near to the so-called ‘light-like’ intervals in spacetime only. The net result is that light – even one single photon – effectively uses a (very) small core of space as it travels, as evidenced by the fact that even one single photon interferes with itself when traveling through a slit or a small hole!

[If you now wonder what it means for a photon to interfere for itself, let me just give you the easy explanation: it may change its path. We assume it was traveling in a straight line – if only because it left the source at some point in time and then arrived at the slit obviously – but so it no longer travels in a straight line after going through the slit. So that’s what we mean here.]

2. E(A to B)

E(A to B) is the (probability) amplitude for an electron to travel from point A to B. The formula for E(A to B) is much more complicated, and it’s the one I want to discuss somewhat more in detail in this post. It depends on some complex number j (see the next remark) and some real number n.

3. j

Finally, an electron could emit or absorb a photon, and the amplitude associated with this event is denoted by j, for junction number. It’s the same number j as the one mentioned when discussing E(A to B) above.

Now, this junction number is often referred to as the coupling constant or the fine-structure constant. However, the truth is, as I pointed out in my previous post, that these numbers are related, but they are not quite the same: α is the square of j, so we have α = j². There is also one more, related, number: the gauge parameter, which is denoted by g (despite the g notation, it has nothing to do with gravitation). The value of g is the square root of 4πε₀α, so g²= 4πε₀α. I’ll come back to this. Let me first make an awfully long digression on the fine-structure constant. It will be awfully long. So long that it’s actually part of the ‘core’ of this post actually.

Digression 2: on the fine-structure constant, Planck units and the Bohr radius

The value for j is approximately –0.08542454.

How do we know that?

The easy answer to that question is: physicists measured it. In fact, they usually publish the measured value as the square root of the (absolute value) of j, which is that fine-structure constant α. Its value is published (and updated) by the US National Institute on Standards and Technology. To be precise, the currently accepted value of α is 7.29735257×10⁻³. In case you doubt, just check that square root:

j = –0.08542454 ≈ –√0.00729735257 = –√α

As noted in Feynman’s (or Leighton’s) QED, older and/or more popular books will usually mention 1/α as the ‘magical’ number, so the ‘special’ number you may have seen is the inverse fine-structure constant, which is about 137, but not quite:

1/α = 137.035999074 ± 0.000000044

I am adding the standard uncertainty just to give you an idea of how precise these measurements are. 🙂 About 0.32 parts per billion (just divide the 137.035999074 number by the uncertainty). So that‘s the number that excites popular writers, including Leighton. Indeed, as Leighton puts it:

“Where does this number come from? Nobody knows. It’s one of the greatest damn mysteries of physics: a magic number that comes to us with no understanding by man. You might say the “hand of God” wrote that number, and “we don’t know how He pushed his pencil.” We know what kind of a dance to do experimentally to measure this number very accurately, but we don’t know what kind of dance to do on the computer to make this number come out, without putting it in secretly!”

Is it Leighton, or did Feynman really say this? Not sure. While the fine-structure constant is a very special number, it’s not the only ‘special’ number. In fact, we derive it from other ‘magical’ numbers. To be specific, I’ll show you how we derive it from the fundamental properties – as measured, of course – of the electron. So, in fact, I should say that we do know how to make this number come out, which makes me doubt whether Feynman really said what Leighton said he said. 🙂

So we can derive α from some other numbers. That brings me to the more complicated answer to the question as to what the value of j really is: j‘s value is the electron charge expressed in Planck units, which I’ll denote by –e_P:

j = –e_P

[You may want to reflect on this, and quickly verify on the Web. The Planck unit of electric charge, expressed in Coulomb, is about 1.87555×10^–18C. If you multiply that j = –e_P, so with –0.08542454, you get the right answer: the electron charge is about –0.160217×10^–18C.]

Now that is strange.

Why? Well… For starters, when doing all those quantum-mechanical calculations, we like to think of j as a dimensionless number: a coupling constant. But so here we do have a dimension: electric charge.

Let’s look at the basics. If j is –√α, and it’s also equal to –e_P, then the fine-structure constant must also be equal to the square of the electron charge e_P, so we can write:

α = e_P²

You’ll say: yes, so what? Well… I am pretty sure that, if you’ve ever seen a formula for α, it’s surely not this simple j = –e_P or α = e_P² formula. What you’ve seen, most likely, is one or more of the following expressions below :

That’s a pretty impressive collection of physical constants, isn’t it? 🙂 They’re all different but, somehow, when we combine them in one or the other ratio (we have not less than five different expressions here (each identity is a separate expression), and I could give you a few more!), we get the very same number: α. Now that is what I call strange. Truly strange. Incomprehensibly weird!

You’ll say… Well… Those constants must all be related… Of course! That’s exactly the point I am making here. They are, but look how different they are: m_emeasures mass, r_emeasures distance, e is a charge, and so these are all very different numbers with very different dimensions. Yet, somehow, they are all related through this α number. Frankly, I do not know of any other expression that better illustrates some kind of underlying unity in Nature than the one with those five identities above.

Let’s have a closer look at those constants. You know most of them already. The only constants you may not have seen before are μ₀, R_Kand, perhaps, r_eas well as m_e. However, these can easily be defined as some easy function of the constants that you did see before, so let me quickly do that:

The μ₀ constant is the so-called magnetic constant. It’s something similar as ε₀ and it’s referred to as the magnetic permeability of the vacuum. So it’s just like the (electric) permittivity of the vacuum (i.e. the electric constant ε₀) and the only reason why this blog hasn’t mentioned this constant before is because I haven’t really discussed magnetic fields so far. I only talked about the electric field vector. In any case, you know that the electric and magnetic force are part and parcel of the same phenomenon (i.e. the electromagnetic interaction between charged particles) and, hence, they are closely related. To be precise, μ₀ε₀ = 1/c²= c^–2. So that shows the first and second expression for α are, effectively, fully equivalent. [Just in case you’d doubt that μ₀ε₀ = 1/c², let me give you the values: μ₀ = 4π·10^–7N/A², and ε₀ = (1/4π·c²)·10⁷C²/N·m². Just plug them in, and you’ll see it’s bang on. Moreover, note that the ampere (A) unit is equal to the coulomb per second unit (C/s), so even the units come out alright. 🙂 Of course they do!]
The k_e constant is the Coulomb constant and, from its definition k_e = 1/4πε₀, it’s easy to see how those two expressions are, in turn, equivalent with the third expression for α.
The R_Kconstant is the so-called von Klitzing constant. Huh? Yes. I know. I am pretty sure you’ve never ever heard of that one before. Don’t worry about it. It’s, quite simply, equal to R_K= h/e². Hence, substituting (and don’t forget that h = 2πħ) will demonstrate the equivalence of the fourth expression for α.
Finally, the r_e factor is the classical electron radius, which is usually written as a function of m_e, i.e. the electron mass: r_e = e²/4πε₀m_ec². Also note that this also implies that r_em_e = e²/4πε₀c². In words: the product of the electron mass and the electron radius is equal to some constant involving the electron (e), the electric constant (ε₀), and c (the speed of light).

I am sure you’re under some kind of ‘formula shock’ now. But you should just take a deep breath and read on. The point to note is that all these very different things are all related through α.

So, again, what is that α really? Well… A strange number indeed. It’s dimensionless (so we don’t measure in kg, m/s, eV·s or whatever) and it pops up everywhere. [Of course, you’ll say: “What’s everywhere? This is the first time I‘ve heard of it!” :-)]

Well… Let me start by explaining the term itself. The fine structure in the name refers to the splitting of the spectral lines of atoms. That’s a very fine structure indeed. 🙂 We also have a so-called hyperfine structure. Both are illustrated below for the hydrogen atom. The numbers n, J, I, and F are quantum numbers used in the quantum-mechanical explanation of the emission spectrum, which is also depicted below, but note that the illustration gives you the so-called Balmer series only, i.e. the colors in the visible light spectrum (there are many more ‘colors’ in the high-energy ultraviolet and the low-energy infrared range).

To be precise: (1) n is the principal quantum number: here it takes the values 1 or 2, and we could say these are the principal shells; (2) the S, P, D,… orbitals (which are usually written in lower case: s, p, d, f, g, h and i) correspond to the (orbital) angular momentum quantum number l = 0, 1, 2,…, so we could say it’s the subshell; (3) the J values correspond to the so-called magnetic quantum number m, which goes from –l to +l; (4) the fourth quantum number is the spin angular momentum s. I’ve copied another diagram below so you see how it works, more or less, that is.

Now, our fine-structure constant is related to these quantum numbers. How exactly is a bit of a long story, and so I’ll just copy Wikipedia’s summary on this: ” The gross structure of line spectra is the line spectra predicted by the quantum mechanics of non-relativistic electrons with no spin. For a hydrogenic atom, the gross structure energy levels only depend on the principal quantum number n. However, a more accurate model takes into account relativistic and spin effects, which break the degeneracy of the the energy levels and split the spectral lines. The scale of the fine structure splitting relative to the gross structure energies is on the order of (Zα)², where Z is the atomic number and α is the fine-structure constant.” There you go. You’ll say: so what? Well… Nothing. If you aren’t amazed by that, you should stop reading this.

It is an ‘amazing’ number, indeed, and, hence, it does quality for being “one of the greatest damn mysteries of physics”, as Feynman and/or Leighton put it. Having said that, I would not go as far as to write that it’s “a magic number that comes to us with no understanding by man.” In fact, I think Feynman/Leighton could have done a much better job when explaining what it’s all about. So, yes, I hope to do better than Leighton here and, as he’s still alive, I actually hope he reads this. 🙂

The point is: α is not the only weird number. What’s particular about it, as a physical constant, is that it’s dimensionless, because it relates a number of other physical constants in such a way that the units fall away. Having said that, the Planck or Boltzmann constant are at least as weird.

So… What is this all about? Well… You’ve probably heard about the so-called fine-tuning problem in physics and, if you’re like me, your first reaction will be to associate fine-tuning with fine-structure. However, the two terms have nothing in common, except for four letters. 🙂 OK. Well… I am exaggerating here. The two terms are actually related, to some extent at least, but let me explain how.

The term fine-tuning refers to the fact that all the parameters or constants in the so-called Standard Model of physics are, indeed, all related to each other in the way they are. We can’t sort of just turn the knob of one and change it, because everything falls apart then. So, in essence, the fine-tuning problem in physics is more like a philosophical question: why is the value of all these physical constants and parameters exactly what it is? So it’s like asking: could we change some of the ‘constants’ and still end up with the world we’re living in? Or, if it would be some different world, how would it look like? What if c was some other number? What if k_e or ε₀ was some other number? In short, and in light of those expressions for α, we may rephrase the question as: why is α what is is?

Of course, that’s a question one shouldn’t try to answer before answering some other, more fundamental, question: how many degrees of freedom are there really? Indeed, we just saw that k_e and ε₀are intimately related through some equation, and other constants and parameters are related too. So the question is like: what are the ‘dependent’ and the ‘independent’ variables in this so-called Standard Model?

There is no easy answer to that question. In fact, one of the reasons why I find physics so fascinating is that one cannot easily answer such questions. There are the obvious relationships, of course. For example, the k_e = 1/4πε₀relationship, and the context in which they are used (Coulomb’s Law) does, indeed, strongly suggest that both constants are actually part and parcel of the same thing. Identical, I’d say. Likewise, the μ₀ε₀ = 1/c²relation also suggests there’s only one degree of freedom here, just like there’s only one degree of freedom in that ω/k = c relationship (if we set a value for ω, we have k, and vice versa). But… Well… I am not quite sure how to phrase this, but… What physical constants could be ‘variables’ indeed?

It’s pretty obvious that the various formulas for α cannot answer that question: you could stare at them for days and weeks and months and years really, but I’d suggest you use your time to read more of Feynman’s real Lectures instead. 🙂 One point that may help to come to terms with this question – to some extent, at least – is what I casually mentioned above already: the fine-structure constant is equal to the square of the electron charge expressed in Planck units: α = e_P².

Now, that’s very remarkable because Planck units are some kind of ‘natural units’ indeed (for the detail, see my previous post: among other things, it explains what these Planck units really are) and, therefore, it is quite tempting to think that we’ve actually got only one degree of freedom here: α itself. All the rest should follow from it.

[…]

It should… But… Does it?

The answer is: yes and no. To be frank, it’s more no than yes because, as I noted a couple of times already, the fine-structure constant relates a lot of stuff but it’s surely not the only significant number in the Universe. For starters, I said that our E(A to B) formula has two ‘variables’:

We have that complex number j, which, as mentioned, is equal to the electron charge expressed in Planck units. [In case you wonder why –e_P ≈ –0.08542455 is said to be an amplitude, i.e. a complex number or an ‘arrow’… Well… Complex numbers include the real numbers and, hence, –0.08542455 is both real and complex. When combining ‘arrows’ or, to be precise, when multiplying some complex number with –0.08542455, we will (a) shrink the original arrow to about 8.5% of its original value (8.542455% to be precise) and (b) rotate it over an angle of plus or minus 180 degrees. In other words, we’ll reverse its direction. Hence, using Euler’s notation for complex numbers, we can write: –1 = e^iπ= e^–iπ and, hence, –0.085 = 0.085·e^iπ= 0.085·e^–iπ. So, in short, yes, j is a complex number, or an ‘arrow’, if you prefer that term.]
We also have some some real number n in the E(A to B) formula. So what’s the n? Well… Believe it or not, it’s the electron mass! Isn’t that amazing?

You’ll say: “Well… Hmm… I suppose so.” But then you may – and actually should – also wonder: the electron mass? In what units? Planck units again? And are we talking relativistic mass (i.e. its total mass, including the equivalent mass of its kinetic energy) or its rest mass only? And we were talking α here, so can we relate it to α too, just like the electron charge?

These are all very good questions. Let’s start with the second one. We’re talking rather slow-moving electrons here, so the relativistic mass (m) and its rest mass (m₀) is more or less the same. Indeed, the Lorentz factor γ in the m = γm₀ equation is very close to 1 for electrons moving at their typical speed. So… Well… That question doesn’t matter very much. Really? Yes. OK. Because you’re doubting, I’ll quickly show it to you. What is their ‘typical’ speed?

We know we shouldn’t attach too much importance to the concept of an electron in orbit around some nucleus (we know it’s not like some planet orbiting around some star) and, hence, to the concept of speed or velocity (velocity is speed with direction) when discussing an electron in an atom. The concept of momentum (i.e. velocity combined with mass or energy) is much more relevant. There’s a very easy mathematical relationship that gives us some clue here: the Uncertainty Principle. In fact, we’ll use the Uncertainty Principle to relate the momentum of an electron (p) to the so-called Bohr radius r (think of it as the size of a hydrogen atom) as follows: p ≈ ħ/r. [I’ll come back on this in a moment, and show you why this makes sense.]

Now we also know its kinetic energy (K.E.) is mv²/2, which we can write as p²/2m. Substituting our p ≈ ħ/r conjecture, we get K.E. = mv²/2 = ħ²/2mr². This is equivalent to m²v² = ħ²/r²(just multiply both sides with m). From that, we get v = ħ/mr. Now, one of the many relations we can derive from the formulas for the fine-structure constant is r_e = α²r. [I haven’t showed you that yet, but I will shortly. It’s a really amazing expression. However, as for now, just accept it as a simple formula for interim use in this digression.] Hence, r = r_e/α². The r_efactor in this expression is the so-called classical electron radius. So we can now write v = ħα²/mr_e. Let’s now throw c in: v/c = α²ħ/mcr_e. However, from that fifth expression for α, we know that ħ/mcr_e = α, so we get v/c = α. We have another amazing result here: the v/c ratio for an electron (i.e. its speed expressed as a fraction of the speed of light) is equal to that fine-structure constant α. So that’s about 1/137, so that’s less than 1% of the speed of light. Now… I’ll leave it to you to calculate the Lorentz factor γ but… Well… It’s obvious that it will be very close to 1. 🙂 Hence, the electron’s speed – however we want to visualize that – doesn’t matter much indeed, so we should not worry about relativistic corrections in the formulas.

Let’s now look at the question in regard to the Planck units. If you know nothing at all about them, I would advise you to read what I wrote about them in my previous post. Let me just note we get those Planck units by equating not less than five fundamental physical constants to 1, notably (1) the speed of light, (2) Planck’s (reduced) constant, (3) Boltzmann’s constant, (4) Coulomb’s constant and (5) Newton’s constant (i.e. the gravitational constant). Hence, we have a set of five equations here (c = ħ = k_B = k_e = G = 1), and so we can solve that to get the five Planck units, i.e. the Planck length unit, the Planck time unit, the Planck mass unit, the Planck energy unit, the Planck charge unit and, finally (oft forgotten), the Planck temperature unit. Of course, you should note that all mass and energy units are directly related because of the mass-energy equivalence relation E = mc², which simplifies to E = m if c is equated to 1. [I could also say something about the relation between temperature and (kinetic) energy, but I won’t, as it would only further confuse you.]

Now, you may or may not remember that the Planck time and length units are unimaginably small, but that the Planck mass unit is actually quite sizable—at the atomic scale, that is. Indeed, the Planck mass is something huge, like the mass of an eyebrow hair, or a flea egg. Is that huge? Yes. Because if you’d want to pack it in a Planck-sized particle, it would make for a tiny black hole. 🙂 No kidding. That’s the physical significance of the Planck mass and the Planck length and, yes, it’s weird. 🙂

Let me give you some values. First, the Planck mass itself: it’s about 2.1765×10⁻⁸kg. Again, if you think that’s tiny, think again. From the E = mc² equivalence relationship, we get that this is equivalent to 2 giga-joule, approximately. Just to give an idea, that’s like the monthly electricity consumption of an average American family. So that’s huge indeed! 🙂 [Many people think that nuclear energy involves the conversion of mass into energy, but the story is actually more complicated than that. In any case… I need to move on.]

Let me now give you the electron mass expressed in the Planck mass unit:

Measured in our old-fashioned super-sized SI kilogram unit, the electron mass is m_e = 9.1×10^–31kg.
The Planck mass is m_P = 2.1765×10⁻⁸kg.
Hence, the electron mass expressed in Planck units is m_{e_P} = m_e/m_P = (9.1×10^–31kg)/(2.1765×10⁻⁸kg) = 4.181×10⁻²³.

We can, once again, write that as some function of the fine-structure constant. More specifically, we can write:

m_{e_P} = α/r_{e_P} = α/α²r_P = 1/αr_P

So… Well… Yes: yet another amazing formula involving α.

In this formula, we have r_{e_P} and r_P, which are the (classical) electron radius and the Bohr radius expressed in Planck (length) units respectively. So you can see what’s going on here: we have all kinds of numbers here expressed in Planck units: a charge, a radius, a mass,… And we can relate all of them to the fine-structure constant.

Why? Who knows? I don’t. As Leighton puts it: that’s just the way “God pushed His pencil.” 🙂

Note that the beauty of natural units ensures that we get the same number for the (equivalent) energy of an electron. Indeed, from the E = mc² relation, we know the mass of an electron can also be written as 0.511 MeV/c². Hence, the equivalent energy is 0.511 MeV (so that’s, quite simply, the same number but without the 1/c²factor). Now, the Planck energy E_P (in eV) is 1.22×10²⁸eV, so we get E_{e_P} = E_e/E_P= (0.511×10⁶eV)/(1.22×10²⁸eV) = 4.181×10⁻²³. So it’s exactly the same as the electron mass expressed in Planck units. Isn’t that nice? 🙂

Now, are all these numbers dimensionless, just like α? The answer to that question is complicated. Yes, and… Well… No:

Yes. They’re dimensionless because they measure something in natural units, i.e. Planck units, and, hence, that’s some kind of relative measure indeed so… Well… Yes, dimensionless.
No. They’re not dimensionless because they do measure something, like a charge, a length, or a mass, and when you chose some kind of relative measure, you still need to define some gauge, i.e. some kind of standard measure. So there’s some ‘dimension’ involved there.

So what’s the final answer? Well… The Planck units are not dimensionless. All we can say is that they are closely related, physically. I should also add that we’ll use the electron charge and mass (expressed in Planck units) in our amplitude calculations as a simple (dimensionless) number between zero and one. So the correct answer to the question as to whether these numbers have any dimension is: expressing some quantities in Planck units sort of normalizes them, so we can use them directly in dimensionless calculations, like when we multiply and add amplitudes.

Hmm… Well… I can imagine you’re not very happy with this answer but it’s the best I can do. Sorry. I’ll let you further ponder that question. I need to move on.

Note that that 4.181×10⁻²³ is still a very small number (23 zeroes after the decimal point!), even if it’s like 46 million times larger than the electron mass measured in our conventional SI unit (i.e. 9.1×10^–31kg). Does such small number make any sense? The answer is: yes, it does. When we’ll finally start discussing that E(A to B) formula (I’ll give it to you in a moment), you’ll see that a very small number for n makes a lot of sense.

Before diving into it all, let’s first see if that formula for that alpha, that fine-structure constant, still makes sense with m_e expressed in Planck units. Just to make sure. 🙂 To do that, we need to use the fifth (last) expression for a, i.e. the one with r_e in it. Now, in my previous post, I also gave some formula for r_e: r_e = e²/4πε₀m_ec², which we can re-write as r_em_e = e²/4πε₀c². If we substitute that expression for r_em_e in the formula for α, we can calculate α from the electron charge, which indicates both the electron radius and its mass are not some random God-given variable, or “some magic number that comes to us with no understanding by man“, as Feynman – well… Leighton, I guess – puts it. No. They are magic numbers alright, one related to another through the equally ‘magic’ number α, but so I do feel we actually can create some understanding here.

At this point, I’ll digress once again, and insert some quick back-of-the-envelope argument from Feynman’s very serious Caltech Lectures on Physics, in which, as part of the introduction to quantum mechanics, he calculates the so-called Bohr radius from Planck’s constant h. Let me quickly explain: the Bohr radius is, roughly speaking, the size of the simplest atom, i.e. an atom with one electron (so that’s hydrogen really). So it’s not the classical electron radius r_e. However, both are also related to that ‘magical number’ α. To be precise, if we write the Bohr radius as r, then r_e = α²r ≈ 0.000053… times r, which we can re-write as:

α = √(r_e /r) = (r_e /r)^1/2

So that’s yet another amazing formula involving the fine-structure constant. In fact, it’s the formula I used as an ‘interim’ expression to calculate the relative speed of electrons. I just used it without any explanation there, but I am coming back to it here. Alpha again…

Just think about it for a while. In case you’d still doubt the magic of that number, let me write what we’ve discovered so far:

(1) α is the square of the electron charge expressed in Planck units: α = e_P².

(2) α is the square root of the ratio of (a) the classical electron radius and (b) the Bohr radius: α = √(r_e /r). You’ll see this more often written as r_e = α²r. Also note that this is an equation that does not depend on the units, in contrast to equation 1 (above), and 4 and 5 (below), which require you to switch to Planck units. It’s the square of a ratio and, hence, the units don’t matter. They fall away.

(3) α is the (relative) speed of an electron: α = v/c. [The relative speed is the speed as measured against the speed of light. Note that the ‘natural’ unit of speed in the Planck system of units is equal to c. Indeed, if you divide one Planck length by one Planck time unit, you get (1.616×10⁻³⁵m)/(5.391×10⁻⁴⁴s) = c m/s. However, this is another equation, just like (2), that does not depend on the units: we can express v and c in whatever unit we want, as long we’re consistent and express both in the same units.]

(4) Finally – I’ll show you in a moment – α is also equal to the product of (a) the electron mass (which I’ll simply write as m_e here) and (b) the classical electron radius r_e (if both are expressed in Planck units): α = m_e·r_e. Now I think that’s, perhaps, the most amazing of all of the expressions for α. If you don’t think that’s amazing, I’d really suggest you stop trying to study physics. 🙂

Note that, from (2) and (4), we find that:

(5) The electron mass (in Planck units) is equal m_e = α/r_e= α/α²r = 1/αr. So that gives us an expression, using α once again, for the electron mass as a function of the Bohr radius r expressed in Planck units.

Finally, we can also substitute (1) in (5) to get:

(6) The electron mass (in Planck units) is equal to m_e = α/r_e = e_P²/r_e. Using the Bohr radius, we get m_e = 1/αr = 1/e_P²r.

So… As you can see, this fine-structure constant really links ALL of the fundamental properties of the electron: its charge, its radius, its distance to the nucleus (i.e. the Bohr radius), its velocity, its mass (and, hence, its energy),… In short,

IT IS ALL IN ALPHA!

Now that should answer the question in regard to the degrees of freedom we have here, doesn’t it? It looks like we’ve got only one degree of freedom here. Indeed, if we’ve got some value for α, then we’ve have the electron charge, and from the electron charge, we can calculate the Bohr radius r (as I will show below), and if we have r, we have m_eand r_e. And then we can also calculate v, which gives us its momentum (mv) and its kinetic energy (mv²/2). In short,

ALPHA GIVES US EVERYTHING!

Isn’t that amazing? Hmm… You should reserve your judgment as for now, and carefully go over all of the formulas above and verify my statement. If you do that, you’ll probably struggle to find the Bohr radius from the charge (i.e. from α). So let me show you how you do that, because it will also show you why you should, indeed, reserve your judgment. In other words, I’ll show you why alpha does NOT give us everything! The argument below will, finally, prove some of the formulas that I didn’t prove above. Let’s go for it:

1. If we assume that (a) an electron takes some space – which I’ll denote by r 🙂 – and (b) that it has some momentum p because of its mass m and its velocity v, then the ΔxΔp = ħ relation (i.e. the Uncertainty Principle in its roughest form) suggests that the order of magnitude of r and p should be related in the very same way. Hence, let’s just boldly write r ≈ ħ/p and see what we can do with that. So we equate Δx with r and Δp with p. As Feynman notes, this is really more like a ‘dimensional analysis’ (he obviously means something very ‘rough’ with that) and so we don’t care about factors like 2 or 1/2. [Indeed, note that the more precise formulation of the Uncertainty Principle is σ_xσ_p≥ ħ/2.] In fact, we didn’t even bother to define r very rigorously. We just don’t care about precise statements at this point. We’re only concerned about orders of magnitude. [If you’re appalled by the rather rude approach, I am sorry for that, but just try to go along with it.]

2. From our discussions on energy, we know that the kinetic energy is mv²/2, which we can write as p²/2m so we get rid of the velocity factor. [Why? Because we can’t really imagine what it is anyway. As I said a couple of times already, we shouldn’t think of electrons as planets orbiting around some star. That model doesn’t work.] So… What’s next? Well… Substituting our p ≈ ħ/r conjecture, we get K.E. = ħ²/2mr². So that’s a formula for the kinetic energy. Next is potential.

3. Unfortunately, the discussion on potential energy is a bit more complicated. You’ll probably remember that we had an easy and very comprehensible formula for the energy that’s needed (i.e. the work that needs to be done) to bring two charges together from a large distance (i.e. infinity). Indeed, we derived that formula directly from Coulomb’s Law (and Newton’s law of force) and it’s U = q₁q₂/4πε₀r₁₂. [If you think I am going too fast, sorry, please check for yourself by reading my other posts.] Now, we’re actually talking about the size of an atom here in my previous post, so one charge is the proton (+e) and the other is the electron (–e), so the potential energy is U = P.E. = –e²/4πε₀r, with r the ‘distance’ between the proton and the electron—so that’s the Bohr radius we’re looking for!

[In case you’re struggling a bit with those minus signs when talking potential energy – I am not ashamed to admit I did! – let me quickly help you here. It has to do with our reference point: the reference point for measuring potential energy is at infinity, and it’s zero there (that’s just our convention). Now, to separate the proton and the electron, we’d have to do quite a lot of work. To use an analogy: imagine we’re somewhere deep down in a cave, and we have to climb back to the zero level. You’ll agree that’s likely to involve some sweat, don’t you? Hence, the potential energy associated with us being down in the cave is negative. Likewise, if we write the potential energy between the proton and the electron as U(r), and the potential energy at the reference point as U(∞) = 0, then the work to be done to separate the charges, i.e. the potential difference U(∞) – U(r), will be positive. So U(∞) – U(r) = 0 – U(r) > 0 and, hence, U(r) < 0. If you still don’t ‘get’ this, think of the electron being in some (potential) well, i.e. below the zero level, and so it’s potential energy is less than zero. Huh? Sorry. I have to move on. :-)]

4. We can now write the total energy (which I’ll denote by E, but don’t confuse it with the electric field vector!) as

E = K.E. + P.E. = ħ²/2mr²– e²/4πε₀r

Now, the electron (whatever it is) is, obviously, in some kind of equilibrium state. Why is that obvious? Well… Otherwise our hydrogen atom wouldn’t or couldn’t exist. 🙂 Hence, it’s in some kind of energy ‘well’ indeed, at the bottom. Such equilibrium point ‘at the bottom’ is characterized by its derivative (in respect to whatever variable) being equal to zero. Now, the only ‘variable’ here is r (all the other symbols are physical constants), so we have to solve for dE/dr = 0. Writing it all out yields:

dE/dr = –ħ²/mr³+ e²/4πε₀r²= 0 ⇔ r = 4πε₀ħ²/me²

You’ll say: so what? Well… We’ve got a nice formula for the Bohr radius here, and we got it in no time! 🙂 But the analysis was rough, so let’s check if it’s any good by putting the values in:

r = 4πε₀h²/me²

= [(1/(9×10⁹) C²/N·m²)·(1.055×10^–34J·s)²]/[(9.1×10^–31kg)·(1.6×10^–19C)²]

= 53×10^–12m = 53 pico-meter (pm)

So what? Well… Double-check it on the Internet: the Bohr radius is, effectively, about 53 trillionths of a meter indeed! So we’re right on the spot!

[In case you wonder about the units, note that mass is a measure of inertia: one kg is the mass of an object which, subject to a force of 1 newton, will accelerate at the rate of 1 m/s per second. Hence, we write F = m·a, which is equivalent to m = F/a. Hence, the kg, as a unit, is equivalent to 1 N/(m/s²). If you make this substitution, we get r in the unit we want to see: [(C²/N·m²)·(N²·m²·s²)/[(N·s²/m)·C²] = m.]

Moreover, if we take that value for r and put it in the (total) energy formula above, we’d find that the energy of the electron is –13.6 eV. [Don’t forget to convert from joule to electronvolt when doing the calculation!] Now you can check that on the Internet too: 13.6 eV is exactly the amount of energy that’s needed to ionize a hydrogen atom (i.e. the energy that’s needed to kick the electron out of that energy well)!

Waw ! Isn’t it great that such simple calculations yield such great results? 🙂 [Of course, you’ll note that the omission of the 1/2 factor in the Uncertainty Principle was quite strategic. :-)] Using the r = 4πε₀ħ²/me²formula for the Bohr radius, you can now easily check the r_e = α²r formula. You should find what we jotted down already: the classical electron radius is equal to r_e = e²/4πε₀m_ec². To be precise, r_e = (53×10^–6)·(53×10^–12m) = 2.8×10^–15m. Now that’s again something you should check on the Internet. Guess what? […] It’s right on the spot again. 🙂

We can now also check that α = m·r_e formula: α = m·r_e= 4.181×10⁻²³times… Hey! Wait! We have to express r_ein Planck units as well, of course! Now, (2.81794×10^–15m)/(1.616×10^–35m) ≈ 1.7438 ×10²⁰. So now we get 4.181×10⁻²³times 1.7438×10²⁰= 7.29×10^–3= 0.00729 ≈ 1/137. Bingo! We got the magic number once again. 🙂

So… Well… Doesn’t that confirm we actually do have it all with α?

Well… Yes and no… First, you should note that I had to use h in that calculation of the Bohr radius. Moreover, the other physical constants (most notably c and the Coulomb constant) were actually there as well, ‘in the background’ so to speak, because one needs them to derive the formulas we used above. And then we have the equations themselves, of course, most notably that Uncertainty Principle… So… Well…

It’s not like God gave us one number only (α) and that all the rest flows out of it. We have a whole bunch of ‘fundamental’ relations and ‘fundamental’ constants here.

Having said that, it’s true that statement still does not diminish the magic of alpha.

Hmm… Now you’ll wonder: how many? How many constants do we need in all of physics?

Well… I’d say, you should not only ask about the constants: you should also ask about the equations: how many equations do we need in all of physics? [Just for the record, I had to smile when the Hawking of the movie says that he’s actually looking for one formula that sums up all of physics. Frankly, that’s a nonsensical statement. Hence, I think the real Hawking never said anything like that. Or, if he did, that it was one of those statements one needs to interpret very carefully.]

But let’s look at a few constants indeed. For example, if we have c, h and α, then we can calculate the electric charge e and, hence, the electric constant ε₀= e²/2αhc. From that, we get Coulomb’s constant k_e, because k_e is defined as 1/4πε₀… But…

Hey! Wait a minute! How do we know that k_e = 1/4πε₀? Well… From experiment. But… Yes? That means 1/4π is some fundamental proportionality coefficient too, isn’t it?

Wow! You’re smart. That’s a good and valid remark. In fact, we use the so-called reduced Planck constant ħ in a number of calculations, and so that involves a 2π factor too (ħ = h/2π). Hence… Well… Yes, perhaps we should consider 2π as some fundamental constant too! And, then, well… Now that I think of it, there’s a few other mathematical constants out there, like Euler’s number e, for example, which we use in complex exponentials.

?!?

I am joking, right? I am not saying that 2π and Euler’s number are fundamental ‘physical’ constants, am I? [Note that it’s a bit of a nuisance we’re also using the e symbol for Euler’s number, but so we’re not talking the electron charge here: we’re talking that 2.71828…etc number that’s used in so-called ‘natural’ exponentials and logarithms.]

Well… Yes and no. They’re mathematical constants indeed, rather than physical, but… Well… I hope you get my point. What I want to show here, is that it’s quite hard to say what’s fundamental and what isn’t. We can actually pick and choose a bit among all those constants and all those equations. As one physicist puts its: it depends on how we slice it. The one thing we know for sure is that a great many things are related, in a physical way (α connects all of the fundamental properties of the electron, for example) and/or in a mathematical way (2π connects not only the circumference of the unit circle with the radius but quite a few other constants as well!), but… Well… What to say? It’s a tough discussion and I am not smart enough to give you an unambiguous answer. From what I gather on the Internet, when looking at the whole Standard Model (including the strong force, the weak force and the Higgs field), we’ve got a few dozen physical ‘fundamental’ constants, and then a few mathematical ones as well.

That’s a lot, you’ll say. Yes. At the same time, it’s not an awful lot. Whatever number it is, it does raise a very fundamental question: why are they what they are? That brings us back to that ‘fine-tuning’ problem. Now, I can’t make this post too long (it’s way too long already), so let me just conclude this discussion by copying Wikipedia on that question, because what it has on this topic is not so bad:

“Some physicists have explored the notion that if the physical constants had sufficiently different values, our Universe would be so radically different that intelligent life would probably not have emerged, and that our Universe therefore seems to be fine-tuned for intelligent life. The anthropic principle states a logical truism: the fact of our existence as intelligent beings who can measure physical constants requires those constants to be such that beings like us can exist.“

I like this. But the article then adds the following, which I do not like so much, because I think it’s a bit too ‘frivolous’:

“There are a variety of interpretations of the constants’ values, including that of a divine creator (the apparent fine-tuning is actual and intentional), or that ours is one universe of many in a multiverse (e.g. the many-worlds interpretation of quantum mechanics), or even that, if information is an innate property of the universe and logically inseparable from consciousness, a universe without the capacity for conscious beings cannot exist.”

Hmm… As said, I am quite happy with the logical truism: we are there because alpha (and a whole range of other stuff) is what it is, and we can measure alpha (and a whole range of other stuff) as what it is, because… Well… Because we’re here. Full stop. As for the ‘interpretations’, I’ll let you think about that for yourself. 🙂

I need to get back to the lesson. Indeed, this was just a ‘digression’. My post was about the three fundamental events or actions in quantum electrodynamics, and so I was talking about that E(A to B) formula. However, I had to do that digression on alpha to ensure you understand what I want to write about that. So let me now get back to it. End of digression. 🙂

The E(A to B) formula

Indeed, I must assume that, with all these digressions, you are truly despairing now. Don’t. We’re there! We’re finally ready for the E(A to B) formula! Let’s go for it.

We’ve now got those two numbers measuring the electron charge and the electron mass in Planck units respectively. They’re fundamental indeed and so let’s loosen up on notation and just write them as e and m respectively. Let me recap:

1. The value of e is approximately –0.08542455, and it corresponds to the so-called junction number j, which is the amplitude for an electron-photon coupling. When multiplying it with another amplitude (to find the amplitude for an event consisting of two sub-events, for example), it corresponds to a ‘shrink’ to less than one-tenth (something like 8.5% indeed, corresponding to the magnitude of e) and a ‘rotation’ (or a ‘turn’) over 180 degrees, as mentioned above.

Please note what’s going on here: we have a physical quantity, the electron charge (expressed in Planck units), and we use it in a quantum-mechanical calculation as a dimensionless (complex) number, i.e. as an amplitude. So… Well… That’s what physicists mean when they say that the charge of some particle (usually the electric charge but, in quantum chromodynamics, it will be the ‘color’ charge of a quark) is a ‘coupling constant’.

2. We also have m, the electron mass, and we’ll use in the same way, i.e. as some dimensionless amplitude. As compared to j, it’s is a very tiny number: approximately 4.181×10⁻²³. So if you look at it as an amplitude, indeed, then it corresponds to an enormous ‘shrink’ (but no turn) of the amplitude(s) that we’ll be combining it with.

So… Well… How do we do it?

Well… At this point, Leighton goes a bit off-track. Just a little bit. 🙂 From what he writes, it’s obvious that he assumes the frequency (or, what amounts to the same, the de Broglie wavelength) of an electron is just like the frequency of a photon. Frankly, I just can’t imagine why and how Feynman let this happen. It’s wrong. Plain wrong. As I mentioned in my introduction already, an electron traveling through space is not like a photon traveling through space.

For starters, an electron is much slower (because it’s a matter-particle: hence, it’s got mass). Secondly, the de Broglie wavelength and/or frequency of an electron is not like that of a photon. For example, if we take an electron and a photon having the same energy, let’s say 1 eV (that corresponds to infrared light), then the de Broglie wavelength of the electron will be 1.23 nano-meter (i.e. 1.23 billionths of a meter). Now that’s about one thousand times smaller than the wavelength of our 1 eV photon, which is about 1240 nm. You’ll say: how is that possible? If they have the same energy, then the f = E/h and ν = E/h should give the same frequency and, hence, the same wavelength, no?

Well… No! Not at all! Because an electron, unlike the photon, has a rest mass indeed – measured as not less than 0.511 MeV/c², to be precise (note the rather particular MeV/c²unit: it’s from the E = mc²formula) – one should use a different energy value! Indeed, we should include the rest mass energy, which is 0.511 MeV. So, almost all of the energy here is rest mass energy! There’s also another complication. For the photon, there is an easy relationship between the wavelength and the frequency: it has no mass and, hence, all its energy is kinetic, or movement so to say, and so we can use that ν = E/h relationship to calculate its frequency ν: it’s equal to ν = E/h = (1 eV)/(4.13567×10^–15eV·s) ≈ 0.242×10¹⁵Hz = 242 tera-hertz (1 THz = 10¹²oscillations per second). Now, knowing that light travels at the speed of light, we can check the result by calculating the wavelength using the λ = c/ν relation. Let’s do it: (2.998×10⁸m/s)/(242×10¹²Hz) ≈ 1240 nm. So… Yes, done!

But so we’re talking photons here. For the electron, the story is much more complicated. That wavelength I mentioned was calculated using the other of the two de Broglie relations: λ = h/p. So that uses the momentum of the electron which, as you know, is the product of its mass (m) and its velocity (v): p = mv. You can amuse yourself and check if you find the same wavelength (1.23 nm): you should! From the other de Broglie relation, f = E/h, you can also calculate its frequency: for an electron moving at non-relativistic speeds, it’s about 0.123×10²¹Hz, so that’s like 500,000 times the frequency of the photon we we’re looking at! When multiplying the frequency and the wavelength, we should get its speed. However, that’s where we get in trouble. Here’s the problem with matter waves: they have a so-called group velocity and a so-called phase velocity. The idea is illustrated below: the green dot travels with the wave packet – and, hence, its velocity corresponds to the group velocity – while the red dot travels with the oscillation itself, and so that’s the phase velocity. [You should also remember, of course, that the matter wave is some complex-valued wavefunction, so we have both a real as well as an imaginary part oscillating and traveling through space.]

To be precise, the phase velocity will be superluminal. Indeed, using the usual relativistic formula, we can write that p = γm₀v and E = γm₀c², with v the (classical) velocity of the electron and c what it always is, i.e. the speed of light. Hence, λ = h/γm₀v and f = γm₀c²/h, and so λf = c²/v. Because v is (much) smaller than c, we get a superluminal velocity. However, that’s the phase velocity indeed, not the group velocity, which corresponds to v. OK… I need to end this digression.

So what? Well, to make a long story short, the ‘amplitude framework’ for electrons is differerent. Hence, the story that I’ll be telling here is different from what you’ll read in Feynman’s QED. I will use his drawings, though, and his concepts. Indeed, despite my misgivings above, the conceptual framework is sound, and so the corrections to be made are relatively minor.

So… We’re looking at E(A to B), i.e. the amplitude for an electron to go from point A to B in spacetime, and I said the conceptual framework is exactly the same as that for a photon. Hence, the electron can follow any path really. It may go in a straight line and travel at a speed that’s consistent with what we know of its momentum (p), but it may also follow other paths. So, just like the photon, we’ll have some so-called propagator function, which gives you amplitudes based on the distance in space as well as in the distance in ‘time’ between two points. Now, Ralph Leighton identifies that propagator function with the propagator function for the photon, i.e. P(A to B), but that’s wrong: it’s not the same.

The propagator function for an electron depends on its mass and its velocity, and/or on the combination of both (like it momentum p = mv and/or its kinetic energy: K.E. = mv² = p²/2m). So we have a different propagator function here. However, I’ll use the same symbol for it: P(A to B).

So, the bottom line is that, because of the electron’s mass (which, remember, is a measure for inertia), momentum and/or kinetic energy (which, remember, are conserved in physics), the straight line is definitely the most likely path, but (big but!), just like the photon, the electron may follow some other path as well.

So how do we formalize that? Let’s first associate an amplitude P(A to B) with an electron traveling from point A to B in a straight line and in a time that’s consistent with its velocity. Now, as mentioned above, the P here stands for propagator function, not for photon, so we’re talking a different P(A to B) here than that P(A to B) function we used for the photon. Sorry for the confusion. 🙂 The left-hand diagram below then shows what we’re talking about: it’s the so-called ‘one-hop flight’, and so that’s what the P(A to B) amplitude is associated with.

Now, the electron can follow other paths. For photons, we said the amplitude depended on the spacetime interval I: when negative or positive (i.e. paths that are not associated with the photon traveling in a straight line and/or at the speed of light), the contribution of those paths to the final amplitudes (or ‘final arrow’, as it was called) was smaller.

For an electron, we have something similar, but it’s modeled differently. We say the electron could take a ‘two-hop flight’ (via point C or C’), or a ‘three-hop flight’ (via D and E) from point A to B. Now, it makes sense that these paths should be associated with amplitudes that are much smaller. Now that’s where that n-factor comes in. We just put some real number n in the formula for the amplitude for an electron to go from A to B via C, which we write as:

P(A to C)∗n²∗P(C to B)

Note what’s going on here. We multiply two amplitudes, P(A to C) and P(C to B), which is OK, because that’s what the rules of quantum mechanics tell us: if an ‘event’ consists of two sub-events, we need to multiply the amplitudes (not the probabilities) in order to get the amplitude that’s associated with both sub-events happening. However, we add an extra factor: n². Note that it must be some very small number because we have lots of alternative paths and, hence, they should not be very likely! So what’s the n? And why n² instead of just n?

Well… Frankly, I don’t know. Ralph Leighton boldly equates n to the mass of the electron. Now, because he obviously means the mass expressed in Planck units, that’s the same as saying n is the electron’s energy (again, expressed in Planck’s ‘natural’ units), so n should be that number m = m_{e_P} = E_{e_P} = 4.181×10⁻²³. However, I couldn’t find any confirmation on the Internet, or elsewhere, of the suggested n = m identity, so I’ll assume n = m indeed, but… Well… Please check for yourself. It seems the answer is to be found in a mathematical theory that helps physicists to actually calculate j and n from experiment. It’s referred to as perturbation theory, and it’s the next thing on my study list. As for now, however, I can’t help you much. I can only note that the equation makes sense.

Of course, it does: inserting a tiny little number n, close to zero, ensures that those other amplitudes don’t contribute too much to the final ‘arrow’. And it also makes a lot of sense to associate it with the electron’s mass: if mass is a measure of inertia, then it should be some factor reducing the amplitude that’s associated with the electron following such crooked path. So let’s go along with it, and see what comes out of it.

A three-hop flight is even weirder and uses that n² factor two times:

P(A to E)∗n²∗P(E to D)∗n²∗P(D to B)

So we have an (n²)²= n⁴factor here, which is good, because two hops should be much less likely than one hop. So what do we get? Well… (4.181×10⁻²³)⁴≈ 305×10⁻⁹². Pretty tiny, huh? 🙂 Of course, any point in space is a potential hop for the electron’s flight from point A to B and, hence, there’s a lot of paths and a lot of amplitudes (or ‘arrows’ if you want), which, again, is consistent with a very tiny value for n indeed.

So, to make a long story short, E(A to B) will be a giant sum (i.e. some kind of integral indeed) of a lot of different ways an electron can go from point A to B. It will be a series of terms P(A to E) + P(A to C)∗n²∗P(C to B) + P(A to E)∗n²∗P(E to D)∗n²∗P(D to B) + … for all possible intermediate points C, D, E, and so on.

What about the j? The junction number of coupling constant. How does that show up in the E(A to B) formula? Well… Those alternative paths with hops here and there are actually the easiest bit of the whole calculation. Apart from taking some strange path, electrons can also emit and/or absorb photons during the trip. In fact, they’re doing that constantly actually. Indeed, the image of an electron ‘in orbit’ around the nucleus is that of an electron exchanging so-called ‘virtual’ photons constantly, as illustrated below. So our image of an electron absorbing and then emitting a photon (see the diagram on the right-hand side) is really like the tiny tip of a giant iceberg: most of what’s going on is underneath! So that’s where our junction number j comes in, i.e. the charge (e) of the electron.

So, when you hear that a coupling constant is actually equal to the charge, then this is what it means: you should just note it’s the charge expressed in Planck units. But it’s a deep connection, isn’t? When everything is said and done, a charge is something physical, but so here, in these amplitude calculations, it just shows up as some dimensionless negative number, used in multiplications and additions of amplitudes. Isn’t that remarkable?

The situation becomes even more complicated when more than one electron is involved. For example, two electrons can go in a straight line from point 1 and 2 to point 3 and 4 respectively, but there’s two ways in which this can happen, and they might exchange photons along the way, as shown below. If there’s two alternative ways in which one event can happen, you know we have to add amplitudes, rather than multiply them. Hence, the formula for E(A to B) becomes even more complicated.

Moreover, a single electron may first emit and then absorb a photon itself, so there’s no need for other particles to be there to have lots of j factors in our calculation. In addition, that photon may briefly disintegrate into an electron and a positron, which then annihilate each other to again produce a photon: in case you wondered, that’s what those little loops in those diagrams depicting the exchange of virtual photons is supposed to represent. So, every single junction (i.e. every emission and/or absorption of a photon) involves a multiplication with that junction number j, so if there are two couplings involved, we have a j² factor, and so that’s 0.08542455² = α ≈ 0.0073. Four couplings implies a factor of 0.08542455⁴ ≈ 0.000053.

Just as an example, I copy two diagrams involving four, five or six couplings indeed. They all have some ‘incoming’ photon, because Feynman uses them to explain something else (the so-called magnetic moment of a photon), but it doesn’t matter: the same illustrations can serve multiple purposes.

Now, it’s obvious that the contributions of the alternatives with many couplings add almost nothing to the final amplitude – just like the ‘many-hop’ flights add almost nothing – but… Well… As tiny as these contributions are, they are all there, and so they all have to be accounted for. So… Yes. You can easily appreciate how messy it all gets, especially in light of the fact that there are so many points that can serve as a ‘hop’ or a ‘coupling’ point!

So… Well… Nothing. That’s it! I am done! I realize this has been another long and difficult story, but I hope you appreciated and that it shed some light on what’s really behind those simplified stories of what quantum mechanics is all about. It’s all weird and, admittedly, not so easy to understand, but I wouldn’t say an understanding is really beyond the reach of us, common mortals. 🙂

Post scriptum: When you’ve reached here, you may wonder: so where’s the final formula then for E(A to B)? Well… I have no easy formula for you. From what I wrote above, it should be obvious that we’re talking some really awful-looking integral and, because it’s so awful, I’ll let you find it yourself. 🙂

I should also note another reason why I am reluctant to identify n with m. The formulas in Feynman’s QED are definitely not the standard ones. The more standard formulations will use the gauge coupling parameter about which I talked already. I sort of discussed it, indirectly, in my first comments on Feynman’s QED, when I criticized some other part of the book, notably its explanation of the phenomenon of diffraction of light, which basically boiled down to: “When you try to squeeze light too much [by forcing it to go through a small hole], it refuses to cooperate and begins to spread out”, because “there are not enough arrows representing alternative paths.”

Now that raises a lot of questions, and very sensible ones, because that simplification is nonsensical. Not enough arrows? That statement doesn’t make sense. We can subdivide space in as many paths as we want, and probability amplitudes don’t take up any physical space. We can cut up space in smaller and smaller pieces (so we analyze more paths within the same space). The consequence – in terms of arrows – is that directions of our arrows won’t change but their length will be much and much smaller as we’re analyzing many more paths. That’s because of the normalization constraint. However, when adding them all up – a lot of very tiny ones, or a smaller bunch of bigger ones – we’ll still get the same ‘final’ arrow. That’s because the direction of those arrows depends on the length of the path, and the length of the path doesn’t change simply because we suddenly decide to use some other ‘gauge’.

Indeed, the real question is: what’s a ‘small’ hole? What’s ‘small’ and what’s ‘large’ in quantum electrodynamics? Now, I gave an intuitive answer to that question in that post of mine, but it’s much more accurate than Feynman’s, or Leighton’s. The answer to that question is: there’s some kind of natural ‘gauge’, and it’s related to the wavelength. So the wavelength of a photon, or an electron, in this case, comes with some kind of scale indeed. That’s why the fine-structure constant is often written in yet another form:

α = 2πr_e/λ_e= r_ek_e

λ_eand k_eare the Compton wavelength and wavenumber of the electron (so k_eis not the Coulomb constant here). The Compton wavelength is the de Broglie wavelength of the electron. [You’ll find that Wikipedia defines it as “the wavelength that’s equivalent to the wavelength of a photon whose energy is the same as the rest-mass energy of the electron”, but that’s a very confusing definition, I think.]

The point to note is that the spatial dimension in both the analysis of photons as well as of matter waves, especially in regard to studying diffraction and/or interference phenomena, is related to the frequencies, wavelengths and/or wavenumbers of the wavefunctions involved. There’s a certain ‘gauge’ involved indeed, i.e. some measure that is relative, like the gauge pressure illustrated below. So that’s where that gauge parameter g comes in. And the fact that it’s yet another number that’s closely related to that fine-structure constant is… Well… Again… That alpha number is a very magic number indeed… 🙂

Post scriptum (5 October 2015):

Much stuff is physics is quite ‘magical’, but it’s never ‘too magical’. I mean: there’s always an explanation. So there is a very logical explanation for the above-mentioned deep connection between the charge of an electron, its energy and/or mass, its various radii (or physical dimensions) and the coupling constant too. I wrote a piece about that, much later than when I wrote the piece above. I would recommend you read that piece too. It’s a piece in which I do take the magic out of ‘God’s number’. Understanding it involves a deep understanding of electromagnetism, however, and that requires some effort. It’s surely worth the effort, though.

Fields and charges (I)

Pre-script (dated 26 June 2020): This post has become less relevant (even irrelevant, perhaps) because my views on all things quantum-mechanical have evolved significantly as a result of my progression towards a more complete realist (classical) interpretation of quantum physics. In addition, some of the material was removed by a dark force (that also created problems with the layout, I see now). In any case, we recommend you read our recent papers. I keep blog posts like these mainly because I want to keep track of where I came from. I might review them one day, but I currently don’t have the time or energy for it. 🙂

Original post:

My previous posts focused mainly on photons, so this one should be focused more on matter-particles, things that have a mass and a charge. However, I will use it more as an opportunity to talk about fields and present some results from electrostatics using our new vector differential operators (see my posts on vector analysis).

Before I do so, let me note something that is obvious but… Well… Think about it: photons carry the electromagnetic force, but have no electric charge themselves. Likewise, electromagnetic fields have energy and are caused by charges, but so they also carry no charge. So… Fields act on a charge, and photons interact with electrons, but it’s only matter-particles (notably the electron and the proton, which is made of quarks) that actually carry electric charge. Does that make sense? It should. 🙂

Another thing I want to remind you of, before jumping into it all head first, are the basic units and relations that are valid always, regardless of what we are talking about. They are represented below:

Let me recapitulate the main points:

The speed of light is always the same, regardless of the reference frame (inertial or moving), and nothing can travel faster than light (except mathematical points, such as the phase velocity of a wavefunction).
This universal rule is the basis of relativity theory and the mass-energy equivalence relation E = mc².
The constant speed of light also allows us to redefine the units of time and/or distance such that c = 1. For example, if we re-define the unit of distance as the distance traveled by light in one second, or the unit of time as the time light needs to travel one meter, then c = 1.
Newton’s laws of motion define a force as the product of a mass and its acceleration: F = m·a. Hence, mass is a measure of inertia, and the unit of force is 1 newton (N) = 1 kg·m/s².
The momentum of an object is the product of its mass and its velocity: p = m·v. Hence, its unit is 1 kg·m/s = 1 N·s. Therefore, the concept of momentum combines force (N) as well as time (s).
Energy is defined in terms of work: 1 Joule (J) is the work done when applying a force of one newton over a distance of one meter: 1 J = 1 N·m. Hence, the concept of energy combines force (N) and distance (m).
Relativity theory establishes the relativistic energy-momentum relation pc = Ev/c, which can also be written as E² = p²c²+ m₀²c⁴, with m₀the rest mass of an object (i.e. its mass when the object would be at rest, relative to the observer, of course). These equations reduce to m = E and E²= p²+ m₀²when choosing time and/or distance units such that c = 1. The mass m is the total mass of the object, including its inertial mass as well as the equivalent mass of its kinetic energy.
The relationships above establish (a) energy and time and (b) momentum and position as complementary variables and, hence, the Uncertainty Principle can be expressed in terms of both. The Uncertainty Principle, as well as the Planck-Einstein relation and the de Broglie relation (not shown on the diagram), establish a quantum of action, h, whose dimension combines force, distance and time (h ≈ 6.626×10⁻³⁴ N·m·s). This quantum of action (Wirkung) can be defined in various ways, as it pops up in more than one fundamental relation, but one of the more obvious approaches is to define h as the proportionality constant between the energy of a photon (i.e. the ‘light particle’) and its frequency: h = E/ν.

Note that we talked about forces and energy above, but we didn’t say anything about the origin of these forces. That’s what we are going to do now, even if we’ll limit ourselves to the electromagnetic force only.

Electrostatics

According to Wikipedia, electrostatics deals with the phenomena and properties of stationary or slow-moving electric charges with no acceleration. Feynman usually uses the term when talking about stationary charges only. If a current is involved (i.e. slow-moving charges with no acceleration), the term magnetostatics is preferred. However, the distinction does not matter all that much because – remarkably! – with stationary charges and steady currents, the electric and magnetic fields (E and B) can be analyzed as separate fields: there is no interconnection whatsoever! That shows, mathematically, as a neat separation between (1) Maxwell’s first and second equation and (2) Maxwell’s third and fourth equation:

Electrostatics: (i) ∇•E = ρ/ε₀ and (ii) ∇×E = 0.
Magnetostatics: (iii) c²∇×B = j/ε₀ and (iv) ∇•B = 0.

Electrostatics: The ρ in equation (i) is the so-called charge density, which describes the distribution of electric charges in space: ρ = ρ(x, y, z). To put it simply: ρ is the ‘amount of charge’ (which we’ll denote by Δq) per unit volume at a given point. As for ε₀, that’s a constant which ensures all units are ‘compatible’. Equation (i) basically says we have some flux of E, the exact amount of which is determined by the charge density ρ or, more in general, by the charge distribution in space. As for equation (ii), i.e. ∇×E = 0, we can sort of forget about that. It means the curl of E is zero: everywhere, and always. So there’s no circulation of E. Hence, E is a so-called curl-free field, in this case at least, i.e. when only stationary charges and steady currents are involved.

Magnetostatics: The j in (iii) represents a steady current indeed, causing some circulation of B. The c²factor is related to the fact that magnetism is actually only a relativistic effect of electricity, but I can’t dwell on that here. I’ll just refer you to what Feynman writes about this in his Lectures, and warmly recommend to read it. Oh… Equation (iv), ∇•B = 0, means that the divergence of B is zero: everywhere, and always. So there’s no flux of B. None. So B is a divergence-free field.

Because of the neat separation, we’ll just forget about B and talk about E only.

The electric potential

OK. Let’s try to go through the motions as quickly as we can. As mentioned in my introduction, energy is defined in terms of work done. So we should just multiply the force and the distance, right? 1 Joule = 1 newton × 1 meter, right? Well… Yes and no. In discussions like this, we talk potential energy, i.e. energy stored in the system, so to say. That means that we’re looking at work done against the force, like when we carry a bucket of water up to the third floor or, to use a somewhat more scientific description of what’s going on, when we are separating two masses. Because we’re doing work against the force, we put a minus sign in front of our integral:

Now, the electromagnetic force works pretty much like gravity, except that, when discussing gravity, we only have positive ‘charges’ (the mass of some object is always positive). In electromagnetics, we have positive as well as negative charge, and please note that two like charges repel (that’s not the case with gravity). Hence, doing work against the electromagnetic force may involve bringing like charges together or, alternatively, separating opposite charges. We can’t say. Fortunately, when it comes to the math of it, it doesn’t matter: we will have the same minus sign in front of our integral. The point is: we’re doing work against the force, and so that’s what the minus sign stands for. So it has nothing to do with the specifics of the law of attraction and repulsion in this case (electromagnetism as opposed to gravity) and/or the fact that electrons carry negative charge. No.

Let’s get back to the integral. Just in case you forgot, the integral sign ∫ stands for an S: the S of summa, i.e. sum in Latin, and we’re using these integrals because we’re adding an infinite number of infinitesimally small contributions to the total effort here indeed. You should recognize it, because it’s a general formula for energy or work. It is, once again, a so-called line integral, so it’s a bit different than the ∫f(x)dx stuff you learned from high school. Not very different, but different nevertheless. What’s different is that we have a vector dot product F•ds after the integral sign here, so that’s not like f(x)dx. In case you forgot, that f(x)dx product represents the surface of an infinitesimally rectangle, as shown below: we make the base of the rectangle smaller and smaller, so dx becomes an infinitesimal indeed. And then we add them all up and get the area under the curve. If f(x) is negative, then the contributions will be negative.

But so we don’t have little rectangles here. We have two vectors, F and ds, and their vector dot product, F•ds, which will give you… Well… I am tempted to write: the tangential component of the force along the path, but that’s not quite correct: if ds was a unit vector, it would be true—because then it’s just like that h•n product I introduced in our first vector calculus class. However, ds is not a unit vector: it’s an infinitesimal vector, and, hence, if we write the tangential component of the force along the path as F_t, then F•ds = |F||ds|cosθ = F·cosθ·ds = F_t·ds. So this F•ds is a tangential component over an infinitesimally small segment of the curve. In short, it’s an infinitesimally small contribution to the total amount of work done indeed. You can make sense of this by looking at the geometrical representation of the situation below.

I am just saying this so you know what that integral stands for. Note that we’re not adding arrows once again, like we did when calculating amplitudes or so. It’s all much more straightforward really: a vector dot product is a scalar, so it’s just some real number—just like any component of a vector (tangential, normal, in the direction of one of the coordinates axes, or in whatever direction) is not a vector but a real number. Hence, W is also just some real number. It can be positive or negative because… Well… When we’d be going down the stairs with our bucket of water, our minus sign doesn’t disappear. Indeed, our convention to put that minus sign there should obviously not depend on what point a and b we’re talking about, so we may actually be going along the direction of the force when going from a to b.

As a matter of fact, you should note that’s actually the situation which is depicted above. So then we get a negative number for W. Does that make sense? Of course it does: we’re obviously not doing any work here as we’re moving along the direction, so we’re surely not adding any (potential) energy to the system. On the contrary, we’re taking energy out of the system. Hence, we are reducing its (potential) energy and, hence, we should have a negative value for W indeed. So, just think of the minus sign being there to ensure we add potential energy to the system when going against the force, and reducing it when going with the force.

OK. You get this. You probably also know we’ll re-define W as a difference in potential between two points, which we’ll write as Φ(b) – Φ(a). Now that should remind you of your high school integral ∫f(x)dx once again. For a definite integral over a line segment [a, b], you’d have to find the antiderivative of f(x), which you’d write as F(x), and then you’d take the difference F(b) – F(a) too. Now, you may or may not remember that this antiderivative was actually a family of functions F(x) + k, and k could be any constant – 5/9, 6π, 3.6×10¹²⁴, 0.86, whatever! – because such constant vanishes when taking the derivative.

Here we have the same, we can define an infinite number of functions Φ(r) + k, of which the gradient will yield… Stop! I am going too fast here. First, we need to re-write that W function above in order to ensure we’re calculating stuff in terms of the unit charge, so we write:

Huh? Well… Yes. I am using the definition of the field E here really: E is the force (F) when putting a unit charge in the field. Hence, if we want the work done per unit charge, i.e. W(unit), then we have to integrate the vector dot product E·ds over the path from a to b. But so now you see what I want to do. It makes the comparison with our high school integral complete. Instead of taking a derivative in regard to one variable only, i.e. dF(x)/dx) = f(x), we have a function Φ here not in one but in three variables: Φ = Φ(x, y, z) = Φ(r) and, therefore, we have to take the vector derivative (or gradient as it’s called) of Φ to get E:

∇Φ(x, y, z) = (∂Φ/∂x, ∂Φ/∂y, ∂Φ/∂z) = –E(x, y, z)

But so it’s the same principle as what you learned how to use to solve your high school integral. Now, you’ll usually see the expression above written as:

E = –∇Φ

Why so short? Well… We all just love these mysterious abbreviations, don’t we? 🙂 Jokes aside, it’s true some of those vector equations pack an awful lot of information. Just take Feynman’s advice here: “If it helps to write out the components to be sure you understand what’s going on, just do it. There is nothing inelegant about that. In fact, there is often a certain cleverness in doing just that.” So… Let’s move on.

I should mention that we can only apply this more sophisticated version of the ‘high school trick’ because Φ and E are like temperature (T) and heat flow (h): they are fields. T is a scalar field and h is a vector field, and so that’s why we can and should apply our new trick: if we have the scalar field, we can derive the vector field. In case you want more details, I’ll just refer you to our first vector calculus class. Indeed, our so-called First Theorem in vector calculus was just about the more sophisticated version of the ‘high school trick’: if we have some scalar field ψ (like temperature or potential, for example: just substitute the ψ in the equation below for T or Φ), then we’ll always find that:

The Γ here is the curve between point 1 and 2, so that’s the path along which we’re going, and ∇ψ must represent some vector field.

Let’s go back to our W integral. I should mention that it doesn’t matter what path we take: we’ll always get the same value for W, regardless of what path we take. That’s why the illustration above showed two possible paths: it doesn’t matter which one we take. Again, that’s only because E is a vector field. To be precise, the electrostatic field is a so-called conservative vector field, which means that we can’t get energy out of the field by first carrying some charge along one path, and then carrying it back along another. You’ll probably find that’s obvious, and it is. Just note it somewhere in the back of your mind.

So we’re done. We should just substitute E for ∇Φ, shouldn’t we? Well… Yes. For minus ∇Φ, that is. Another minus sign. Why? Well… It makes that W(unit) integral come out alright. Indeed, we want a formula like W = Φ(b) – Φ(a), not like Φ(a) – Φ(b). Look at it. We could, indeed, define E as the (positive) gradient of some scalar field ψ = –Φ, and so we could write E = ∇ψ, but then we’d find that W = –[ψ(b) – ψ(a)] = ψ(a) – ψ(b).

You’ll say: so what? Well… Nothing much. It’s just that our field vectors would point from lower to higher values of ψ, so they would be flowing uphill, so to say. Now, we don’t want that in physics. Why? It just doesn’t look good. We want our field vectors to be directed from higher potential to lower potential, always. Just think of it: heat (h) flows from higher temperature (T) to lower, and Newton’s apple falls from greater to lower height. Likewise, when putting a unit charge in the field, we want to see it move from higher to lower electric potential. Now, we can’t change the direction of E, because that’s the direction of the force and Nature doesn’t care about our conventions and so we can’t choose the direction of the force. But we can choose our convention. So that’s why we put a minus sign in front of ∇Φ when writing E = –∇Φ. It makes everything come out alright. 🙂 That’s why we also have a minus sign in the differential heat flow equation: h = –κ∇T.

So now we have the easy W(unit) = Φ(b) – Φ(a) formula that we wanted all along. Now, note that, when we say a unit charge, we mean a plus one charge. Yes: +1. So that’s the charge of the proton (it’s denoted by e) so you should stop thinking about moving electrons around! [I am saying this because I used to confuse myself by doing that. You end up with the same formulas for W and Φ but it just takes you longer to get there, so let me save you some time here. :-)]

But… Yes? In reality, it’s electrons going through a wire, isn’t? Not protons. Yes. But it doesn’t matter. Units are units in physics, and they’re always +1, for whatever (time, distance, charge, mass, spin, etcetera). Always. For whatever. Also note that in laboratory experiments, or particle accelerators, we often use protons instead of electrons, so there’s nothing weird about it. Finally, and most fundamentally, if we have a –e charge moving through a neutral wire in one direction, then that’s exactly the same as a +e charge moving in the other way.

Just to make sure you get the point, let’s look at that illustration once again. We already said that we have F and, hence, E pointing from a to b and we’ll be reducing the potential energy of the system when moving our unit charge from a to b, so W was some negative value. Now, taking into account we want field lines to point from higher to lower potential, Φ(a) should be larger than Φ(b), and so… Well.. Yes. It all makes sense: we have a negative difference Φ(b) – Φ(a) = W(unit), which amounts, of course, to the reduction in potential energy.

The last thing we need to take care of now, is the reference point. Indeed, any Φ(r) + k function will do, so which one do we take? The approach here is to take a reference point P₀at infinity. What’s infinity? Well… Hard to say. It’s a place that’s very far away from all of the charges we’ve got lying around here. Very far away indeed. So far away we can say there is nothing there really. No charges whatsoever. 🙂 Something like that. 🙂 In any case. I need to move on. So Φ(P₀) is zero and so we can finally jot down the grand result for the electric potential Φ(P) (aka as the electrostatic or electric field potential):

So now we can calculate all potentials, i.e. when we know where the charges are at least. I’ve shown an example below. As you can see, besides having zero potential at infinity, we will usually also have one or more equipotential surfaces with zero potential. One could say these zero potential lines sort of ‘separate’ the positive and negative space. That’s not a very scientifically accurate description but you know what I mean.

Let me make a few final notes about the units. First, let me, once again, note that our unit charge is plus one, and it will flow from positive to negative potential indeed, as shown below, even if we know that, in an actual electric circuit, and so now I am talking about a copper wire or something similar, that means the (free) electrons will move in the other direction.

If you’re smart (and you are), you’ll say: what about the right-hand rule for the magnetic force? Well… We’re not discussing the magnetic force here but, because you insist, rest assured it comes out alright. Look at the illustration below of the magnetic force on a wire with a current, which is a pretty standard one.

So we have a given B, because of the bar magnet, and then v, the velocity vector for the… Electrons? No. You need to be consistent. It’s the velocity vector for the unit charges, which are positive (+e). Now just calculate the force F = qv×B = ev×B using the right-hand rule for the vector cross product, as illustrated below. So v is the thumb and B is the index finger in this case. All you need to do is tilt your hand, and it comes out alright.

But… We know it’s electrons going the other way. Well… If you insist. But then you have to put a minus sign in front of the q, because we’re talking minus e (–e). So now v is in the other direction and so v×B is in the other direction indeed, but our force F = qv×B = –ev×B is not. Fortunately not, because physical reality should not depend on our conventions. 🙂 So… What’s the conclusion. Nothing. You may or may not want to remember that, when we say that our current j current flows in this or that direction, we actually might be talking electrons (with charge minus one) flowing in the opposite direction, but then it doesn’t matter. In addition, as mentioned above, in laboratory experiments or accelerators, we may actually be talking protons instead of electrons, so don’t assume electromagnetism is the business of electrons only.

To conclude this disproportionately long introduction (we’re finally ready to talk more difficult stuff), I should just make a note on the units. Electric potential is measured in volts, as you know. However, it’s obvious from all that I wrote above that it’s the difference in potential that matters really. From the definition above, it should be measured in the same unit as our unit for energy, or for work, so that’s the joule. To be precise, it should be measured in joule per unit charge. But here we have one of the very few inconsistencies in physics when it comes to units. The proton is said to be the unit charge (e), but its actual value is measured in coulomb (C). To be precise: +1 e = 1.602176565(35)×10⁻¹⁹C. So we do not measure voltage – sorry, potential difference 🙂 – in joule but in joule per coulomb (J/C).

Now, we usually use another term for the joule/coulomb unit. You guessed it (because I said it): it’s the volt (V). One volt is one joule/coulomb: 1 V = 1 J/C. That’s not fair, you’ll say. You’re right, but so the proton charge e is not a so-called SI unit. Is the Coulomb an SI unit? Yes. It’s derived from the ampere (A) which, believe it or not, is actually an SI base unit. One ampere is 6.241×10¹⁸ electrons (i.e. one coulomb) per second. You may wonder how the ampere (or the coulomb) can be a base unit. Can they be expressed in terms of kilogram, meter and second, like all other base units. The answer is yes but, as you can imagine, it’s a bit of a complex description and so I’ll refer you to the Web for that.

The Poisson equation

I started this post by saying that I’d talk about fields and present some results from electrostatics using our ‘new’ vector differential operators, so it’s about time I do that. The first equation is a simple one. Using our E = –∇Φ formula, we can re-write the ∇•E = ρ/ε₀ equation as:

∇•E = ∇•∇Φ = ∇²Φ = –ρ/ε₀

This is a so-called Poisson equation. The ∇² operator is referred to as the Laplacian and is sometimes also written as Δ, but I don’t like that because it’s also the symbol for the total differential, and that’s definitely not the same thing. The formula for the Laplacian is given below. Note that it acts on a scalar field (i.e. the potential function Φ in this case).

As Feynman notes: “The entire subject of electrostatics is merely the study of the solutions of this one equation.” However, I should note that this doesn’t prevent Feynman from devoting at least a dozen of his Lectures on it, and they’re not the easiest ones to read. [In case you’d doubt this statement, just have a look at his lecture on electric dipoles, for example.] In short: don’t think the ‘study of this one equation’ is easy. All I’ll do is just note some of the most fundamental results of this ‘study’.

Also note that ∇•E is one of our ‘new’ vector differential operators indeed: it’s the vector dot product of our del operator (∇) with E. That’s something very different than, let’s say, ∇Φ. A little dot and some bold-face type make an enormous difference here. 🙂 You may or may remember that we referred to the ∇• operator as the divergence (div) operator (see my post on that).

Gauss’ Law

Gauss’ Law is not to be confused with Gauss’ Theorem, about which I wrote elsewhere. It gives the flux of E through a closed surface S, any closed surface S really, as the sum of all charges inside the surface divided by the electric constant ε₀(but then you know that constant is just there to make the units come out alright).

The derivation of Gauss’ Law is a bit lengthy, which is why I won’t reproduce it here, but you should note its derivation is based, mainly, on the fact that (a) surface areas are proportional to r² (so if we double the distance from the source, the surface area will quadruple), and (b) the magnitude of E is given by an inverse-square law, so it decreases as 1/r². That explains why, if the surface S describes a sphere, the number we get from Gauss’ Law is independent of the radius of the sphere. The diagram below (credit goes to Wikipedia) illustrates the idea.

The diagram can be used to show how a field and its flux can be represented. Indeed, the lines represent the flux of E emanating from a charge. Now, the total number of flux lines depends on the charge but is constant with increasing distance because the force is radial and spherically symmetric. A greater density of flux lines (lines per unit area) means a stronger field, with the density of flux lines (i.e. the magnitude of E) following an inverse-square law indeed, because the surface area of a sphere increases with the square of the radius. Hence, in Gauss’ Law, the two effect cancel out: the two factors vary with distance, but their product is a constant.

Now, if we describe the location of charges in terms of charge densities (ρ), then we can write Q_int as:

Now, Gauss’ Law also applies to an infinitesimal cubical surface and, in one of my posts on vector calculus, I showed that the flux of E out of such cube is given by ∇•E·dV. At this point, it’s probably a good idea to remind you of what this ‘new’ vector differential operator ∇•, i.e. our ‘divergence’ operator, stands for: the divergence of E (i.e. ∇• applied to E, so that’s ∇•E) represents the volume density of the flux of E out of an infinitesimal volume around a given point. Hence, it’s the flux per unit volume, as opposed to the flux out of the infinitesimal cube itself, which is the product of ∇•E and dV, i.e. ∇•E·dV.

So what? Well… Gauss’ Law applied to our infinitesimal volume gives us the following equality:

That, in turn, simplifies to:

So that’s Maxwell’s first equation once again, which is equivalent to our Poisson equation: ∇•E = ∇²Φ = –ρ/ε₀. So what are we doing here? Just listing equivalent formulas? Yes. I should also note they can be derived from Coulomb’s law of force, which is probably the one you learned in high school. So… Yes. It’s all consistent. But then that’s what we should expect, of course. 🙂

The energy in a field

All these formulas look very abstract. It’s about time we use them for something. A lot of what’s written in Feynman’s Lectures on electrostatics is applied stuff indeed: it focuses, among other things, on calculating the potential in various circumstances and for various distributions of charge. Now, funnily enough, while that ∇•E = –ρ/ε₀ equation is equivalent to Coulomb’s law and, obviously, much more compact to write down, Coulomb’s law is easier to start with for basic calculations. Let me first write Coulomb’s law. You’ll probably recognize it from your high school days:

F₁is the force on charge q₁, and F₂is the force on charge q₂. Now, q₁and q₂. may attract or repel each other but, in both cases, the forces will be equal and opposite. [In case you wonder, yes, that’s basically the law of action and reaction.] The e₁₂ vector is the unit vector from q₂to q₁, not from q₁to q₂, as one might expect. That’s because we’re not talking gravity here: like charges do not attract but repel and, hence, we have to switch the order here. Having said that, that’s basically the only peculiar thing about the equation. All the rest is standard:

The force is inversely proportional to the square of the distance and so we have an inverse-square law here indeed.
The force is proportional to the charge(s).
Finally, we have a proportionality constant, 1/4πε₀, which makes the units come out alright. You may wonder why it’s written the way it’s written, i.e. with that 4π factor, but that factor (4π or 2π) actually disappears in a number of calculations, so then we will be left with just a 1/ε₀ or a 1/2ε₀ factor. So don’t worry about it.

We want to calculate potentials and all that, so the first thing we’ll do is calculate the force on a unit charge. So we’ll divide that equation by q₁, to calculate E(1) = F₁/q₁:

Piece of cake. But… What’s E(1) really? Well… It’s the force on the unit charge (+e), but so it doesn’t matter whether or not that unit charge is actually there, so it’s the field E caused by a charge q₂. [If that doesn’t make sense to you, think again.] So we can drop the subscripts and just write:

What a relief, isn’t it? The simplest formula ever: the (magnitude) of the field as a simple function of the charge q and its distance (r) from the point that we’re looking at, which we’ll write as P = (x, y, z). But what origin are we using to measure x, y and z. Don’t be surprised: the origin is q.

Now that’s a formula we can use in the Φ(P) integral. Indeed, the antiderivative is ∫(q/4πε₀r²)dr. Now, we can bring q/4πε₀out and so we’re left with ∫(1/r²)dr. Now ∫(1/r²)dr is equal to –1/r + k, and so the whole antiderivative is –q/4πε₀r + k. However, the minus sign cancels out with the minus sign in front of the Φ(P) = Φ(x, y, z) integral, and so we get:

You should just do the integral to check this result. It’s the same integral but with P₀(infinity) as point a and P as point b in the integral, so we have ∞ as start value and r as end value. The integral then yields Φ(P) – Φ(P₀) = –q/4πε₀[1/r – 1/∞). [The k constant falls away when subtracting Φ(P₀) from Φ(P).] But 1/∞ = 0, and we had a minus sign in front of the integral, which cancels the sign of –q/4πε₀. So, yes, we get the wonderfully simple result above. Also please do quickly check if it makes sense in terms of sign: the unit charge is +e, so that’s a positive charge. Hence, Φ(x, y, z) will be positive if the sign of q is also positive, but negative if q would happen to be negative. So that’s OK.

Also note that the potential – which, remember, represents the amount of work to be done when bringing a unit charge (e) from infinity to some distance r from a charge q – is proportional to the charge of q. We also know that the force and, hence, the work is proportional to the charge that we are bringing in (that’s how we calculated the work per unit in the first place: by dividing the total amount of work by the charge). Hence, if we’d not bring some unit charge but some other charge q₂, the work done would also be proportional to q₂. Now, we need to make sure we understand what we’re writing and so let’s tidy up and re-label our first charge once again as q₁, and the distance r as r₁₂, because that’s what r is: the distance between the two charges. We then have another obvious but nice result: the work done in bringing two charges together from a large distance (infinity) is

Now, one of the many nice properties of fields (scalar or vector fields) and the associated energies (because that’s what we are talking about here) is that we can simply add up contributions. For example, if we’d have many charges and we’d want to calculate the potential Φ at a point which we call 1, we can use the same Φ(r) = q/4πε₀r formula which we had derived for one charge only, for all charges, and then we simply add the contributions of each to get the total potential:

Now that we’re here, I should, of course, also give the continuum version of this formula, i.e. the formula used when we’re talking charge densities rather than individual charges. The sum then becomes an infinite sum (i.e. an integral), and q_j (note that j goes from 2 to n) becomes a variable which we write as ρ(2). We get:

Going back to the discrete situation, we get the same type of sum when bringing multiple pairs of charges q_i and q_j together. Hence, the total electrostatic energy U is the sum of the energies of all possible pairs of charges:

It’s been a while since you’ve seen any diagram or so, so let me insert one just to reassure you it’s as simple as that indeed:

Now, we have to be aware of the risk of double-counting, of course. We should not be adding q_iq_j/4πε₀r_ijtwice. That’s why we write ‘all pairs’ under the ∑ summation sign, instead of the usual i, j subscripts. The continuum version of this equation below makes that 1/2 factor explicit:

Hmm… What kind of integral is that? It’s a so-called double integral because we have two variables here. Not easy. However, there’s a lucky break. We can use the continuum version of our formula for Φ(1) to get rid of the ρ(2) and dV₂ variables and reduce the whole thing to a more standard ‘single’ integral. Indeed, we can write:

Now, because our point (2) no longer appears, we can actually write that more elegantly as:

That looks nice, doesn’t it? But do we understand it? Just to make sure. Let me explain it. The potential energy of the charge ρdV is the product of this charge and the potential at the same point. The total energy is therefore the integral over ϕρdV, but then we are counting energies twice, so that’s why we need the 1/2 factor. Now, we can write this even more beautifully as:

Isn’t this wonderful? We have an expression for the energy of a field, not in terms of the charges or the charge distribution, but in terms of the field they produce.

I am pretty sure that, by now, you must be suffering from ‘formula overload’, so you probably are just gazing at this without even bothering to try to understand. Too bad, and you should take a break then or just go do something else, like biking or so. 🙂

First, you should note that you know this E•E expression already: E•E is just the square of the magnitude of the field vector E, so E•E = E². That makes sense because we know, from what we know about waves, that the energy is always proportional to the square of an amplitude, and so we’re just writing the same here but with a little proportionality constant (ε₀).

OK, you’ll say. But you probably still wonder what use this formula could possibly have. What is that number we get from some integration over all space? So we associate the Universe with some number U and then what? Well… Isn’t that just nice? 🙂 Jokes aside, we’re actually looking at that E•E = E²product inside of the integral as representing an energy density (i.e. the energy per unit volume). We’ll denote that with a lower-case u symbol and so we write:

Just to make sure you ‘get’ what we’re talking about here: u is the energy density in the little cube dV in the rather simplistic (and, therefore, extremely useful) illustration below (which, just like most of what I write above, I got from Feynman).

Now that should make sense to you—I hope. 🙂 In any case, if you’re still with me, and if you’re not all formula-ed out you may wonder how we get that ε₀E•E = ε₀E² expression from that ρΦ expression. Of course, you know that E = –∇Φ, and we also have the Poisson equation ∇²Φ = –ρ/ε₀, but that doesn’t get you very far. It’s one of those examples where an easy-looking formula requires a lot of gymnastics. However, as the objective of this post is to do some of that, let me take you through the derivation.

Let’s do something with that Poisson equation first, so we’ll re-write it as ρ = –ε₀∇²Φ, and then we can substitute ρ in the integral with the ρΦ product. So we get:

Now, you should check out those fancy formulas with our new vector differential operators which we listed in our second class on vector calculus, but, unfortunately, none of them apply. So we have to write it all out and see what we get:

Now that looks horrendous and so you’ll surely think we won’t get anywhere with that. Well… Physicists don’t despair as easily as we do, it seems, and so they do substitute it in the integral which, of course, becomes an even more monstrous expression, because we now have two volume integrals instead of one! Indeed, we get:

But if ∇Φ is a vector field (it’s minus E, remember!), then Φ∇Φ is a vector field too, and we can then apply Gauss’ Theorem, which we mentioned in our first class on vector calculus, and which – mind you! – has nothing to do with Gauss’ Law. Indeed, Gauss produced so much it’s difficult to keep track of it all. 🙂 So let me remind you of this theorem. [I should also show why Φ∇Φ still yields a field, but I’ll assume you believe me.] Gauss’ Theorem basically shows how we can go from a volume integral to a surface integral:

If we apply this to the second integral in our U expression, we get:

So what? Where are we going with this? Relax. Be patient. What volume and surface are we talking about here? To make sure we have all charges and influences, we should integrate over all space and, hence, the surface goes to infinity. So we’re talking a (spherical) surface of enormous radius R whose center is the origin of our coordinate system. I know that sounds ridiculous but, from a math point of view, it is just the same like bringing a charge in from infinity, which is what we did to calculate the potential. So if we don’t difficulty with infinite line integrals, we should not have difficulty with infinite surface and infinite volumes. That’s all I can, so… Well… Let’s do it.

Let’s look at that product Φ∇Φ•n in the surface integral. Φ is a scalar and ∇Φ is a vector, and so… Well… ∇Φ•n is a scalar too: it’s the normal component of ∇Φ = –E. [Just to make sure, you should note that the way we define the normal unit vector n is such that ∇Φ•n is some positive number indeed! So n will point in the same direction, more or less, as ∇Φ = –E. So the θ angle between ∇Φ = –E and n is surely less than ± 90° and, hence, the cosine factor in the ∇Φ•n = |∇Φ||n|cosθ = |∇Φ|cosθ is positive, and so the whole vector dot product is positive.]

So, we have a product of two scalars here. What happens with them if R goes to infinity? Well… The potential varies as 1/r as we’re going to infinity. That’s obvious from that Φ = (q/4πε₀)(1/r) formula: just think of q as some kind of average now, which works because we assume all charges are located within some finite distance, while we’re going to infinity. What about ∇Φ•n? Well… Again assuming that we’re reasonably far away from the charges, we’re talking the density of flux lines here (i.e. the magnitude of E) which, as shown above, follows an inverse-square law, because the surface area of a sphere increases with the square of the radius. So ∇Φ•n varies not as 1/r but as 1/r². To make a long story short, the whole product Φ∇Φ•n falls of as 1/r goes to infinity. Now, we shouldn’t forget we’re integrating a surface integral here, with r = R, and so it’s R going to infinity. So that surface integral has to go to zero when we include all space. The volume integral still stands however, so our formula for U now consists of one term only, i.e. the volume integral, and so we now have:

Done !

What’s left?

In electrostatics? Lots. Electric dipoles (like polar molecules), electrolytes, plasma oscillations, ionic crystals, electricity in the atmosphere (like lightning!), dielectrics and polarization (including condensers), ferroelectricity,… As soon as we try to apply our theory to matter, things become hugely complicated. But the theory works. Fortunately! 🙂 I have to refer you to textbooks, though, in case you’d want to know more about it. [I am sure you don’t, but then one never knows.]

What I wanted to do is to give you some feel for those vector and field equations in the electrostatic case. We now need to bring magnetic field back into the picture and, most importantly, move to electrodynamics, in which the electric and magnetic field do not appear as completely separate things. No! In electrodynamics, they are fully interconnected through the time derivatives ∂E/∂t and ∂B/∂t. That shows they’re part and parcel of the same thing really: electromagnetism.

But we’ll try to tackle that in future posts. Goodbye for now!

The wave-particle duality revisited

Pre-script (dated 26 June 2020): This post has become less relevant (even irrelevant, perhaps) because my views on all things quantum-mechanical have evolved significantly as a result of my progression towards a more complete realist (classical) interpretation of quantum physics. I keep blog posts like these mainly because I want to keep track of where I came from. I might review them one day, but I currently don’t have the time or energy for it. 🙂

Original post:

As an economist, having some knowledge of what’s around in my field (social science), I think I am well-placed to say that physics is not an easy science. Its ‘first principles’ are complicated, and I am not ashamed to say that, after more than a year of study now, I haven’t reached what I would call a ‘true understanding’ of it.

Sometimes, the teachers are to be blamed. For example, I just found out that, in regard to the question of the wave function of a photon, the answer of two nuclear scientists was plain wrong. Photons do have a de Broglie wave, and there is a fair amount of research and actual experimenting going on trying to measure it. One scientific article which I liked in particular, and I hope to fully understand a year from now or so, is on such ‘direct measurement of the (quantum) wavefunction‘. For me, it drove home the message that these idealized ‘thought experiments’ that are supposed to make autodidacts like me understand things better, are surely instructive in regard to the key point, but confusing in other respects.

A typical example of such idealized thought experiment is the double-slit experiment with ‘special detectors’ near the slits, which may or may not detect a photon, depending on whether or not they’re switched on as well as on their accuracy. Depending on whether or not the detectors are switched on, and their accuracy, we get full interference (a), no interference (b), or a mixture of (a) and (b), as shown in (c) and (d).

I took the illustrations from Feynman’s lovely little book, QED – The Strange Theory of Light and Matter, and he surely knows what he’s talking about. Having said that, the set-up raises a key question in regard to these detectors: how do they work, exactly? More importantly, how do they disturb the photons?

I googled for actual double-slit experiments with such ‘special detectors’ near the slits, but only found such experiments for electrons. One of these, a 2010 experiment of an Italian team, suggests that it’s the interaction between the detector and the electron wave that may cause the interference pattern to disappear. The idea is shown below. The electron is depicted as an incoming plane wave, which breaks up as it goes through the slits. The slit on the left has no ‘filter’ (which you may think of as a detector) and, hence, the plane wave goes through as a cylindrical wave. The slit on the right-hand side is covered by a ‘filter’ made of several layers of ‘low atomic number material’, so the electron goes through but, at the same time, the barrier creates a spherical wave as it goes through. The researchers note that “the spherical and cylindrical wave do not have any phase correlation, and so even if an electron passed through both slits, the two different waves that come out cannot create an interference pattern on the wall behind them.” [Needless to say, while being represented as ‘real’ waves here, the ‘waves’ are, in fact, complex-valued psi functions.]

In fact, to be precise, there actually still was an interference effect if the filter was thin enough. Let me quote the reason for that: “The thicker the filter, the greater the probability for inelastic scattering. When the electron suffers inelastic scattering, it is localized. This means that its wavefunction collapses and, after the measurement act, it propagates roughly as a spherical wave from the region of interaction, with no phase relation at all with other elastically or inelastically scattered electrons. If the filter is made thick enough, the interference effects cancels out almost completely.”

This, of course, doesn’t solve the mystery. The mystery, in such experiments, is that, when we put detectors, it is either the detector at A or the detector at B that goes off. They should never go off together—”at half strength, perhaps?”, as Feynman puts it. That’s why I used italics when writing “even if an electron passed through both slits.” The electron, or the photon in a similar set-up, is not supposed to do that. As mentioned above, the wavefunction collapses or reduces. Now that’s where these so-called ‘weak measurement’ experiments come in: they indicate the interaction doesn’t have to be that way. It’s not all or nothing: our observations should not necessarily destroy the wavefunction. So, who knows, perhaps we will be able, one day, to show that the wavefunction does go through both slits, as it should (otherwise the interference pattern cannot be explained), and then we will have resolved the paradox.

I am pretty sure that, when that’s done, physicists will also be able to relate the image of a photon as a transient electromagnetic wave (first diagram below), being emitted by an atomic oscillator for a few nanoseconds only (we gave the example for sodium light, for which the decay time was 3.2×10^–8seconds) with the image of a photon as a de Broglie wave (second diagram below). I look forward to that day. I think it will come soon.

Spin

Pre-script (dated 26 June 2020): This post has become less relevant (even irrelevant, perhaps) because my views on all things quantum-mechanical have evolved significantly as a result of my progression towards a more complete realist (classical) interpretation of quantum physics. In addition, some of the material was removed by a dark force (that also created problems with the layout, I see now). In any case, we recommend you read our recent papers. I keep blog posts like these mainly because I want to keep track of where I came from. I might review them one day, but I currently don’t have the time or energy for it. 🙂

Original post:

In the previous posts, I showed how the ‘real-world’ properties of photons and electrons emerge out of very simple mathematical notions and shapes. The basic notions are time and space. The shape is the wavefunction.

Let’s recall the story once again. Space is an infinite number of three-dimensional points (x, y, z), and time is a stopwatch hand going round and round—a cyclical thing. All points in space are connected by an infinite number of paths – straight or crooked, whatever – of which we measure the length. And then we have ‘photons’ that move from A to B, but so we don’t know what is actually moving in space here. We just associate each and every possible path (in spacetime) between A and B with an amplitude: an ‘arrow‘ whose length and direction depends on (1) the length of the path l (i.e. the ‘distance’ in space measured along the path, be it straight or crooked), and (2) the difference in time between the departure (at point A) and the arrival (at point B) of our photon (i.e. the ‘distance in time’ as measured by that stopwatch hand).

Now, in quantum theory, anything is possible and, hence, not only do we allow for crooked paths, but we also allow for the difference in time to differ from l/c. Hence, our photon may actually travel slower or faster than the speed of light c! There is one lucky break, however, that makes all come out alright: the arrows associated with the odd paths and strange timings cancel each other out. Hence, what remains, are the nearby paths in spacetime only—the ‘light-like’ intervals only: a small core of space which our photon effectively uses as it travels through empty space. And when it encounters an obstacle, like a sheet of glass, it may or may not interact with the other elementary particle–the electron. And then we multiply and add the arrows – or amplitudes as we call them – to arrive at a final arrow, whose square is what physicists want to find, i.e. the likelihood of the event that we are analyzing (such a photon going from point A to B, in empty space, through two slits, or through as sheet of glass, for example) effectively happening.

The combining of arrows leads to diffraction, refraction or – to use the more general description of what’s going on – interference patterns:

Adding two identical arrows that are ‘lined up’ yields a final arrow with twice the length of either arrow alone and, hence, a square (i.e. a probability) that is four times as large. This is referred to as ‘positive’ or ‘constructive’ interference.
Two arrows of the same length but with opposite direction cancel each other out and, hence, yield zero: that’s ‘negative’ or ‘destructive’ interference.

Both photons and electrons are represented by wavefunctions, whose argument is the position in space (x, y, z) and time (t), and whose value is an amplitude or ‘arrow’ indeed, with a specific direction and length. But here we get a bifurcation. When photons interact with other, their wavefunctions interact just like amplitudes: we simply add them. However, when electrons interact with each other, we have to apply a different rule: we’ll take a difference. Indeed, anything is possible in quantum mechanics and so we combine arrows (or amplitudes, or wavefunctions) in two different ways: we can either add them or, as shown below, subtract one from the other.

There are actually four distinct logical possibilities, because we may also change the order of A and B in the operation, but when calculating probabilities, all we need is the square of the final arrow, so we’re interested in its final length only, not in its direction (unless we want to use that arrow in yet another calculation). And so… Well… The fundamental duality in Nature between light and matter is based on this dichotomy only: identical (elementary) particles behave in one of two ways: their wavefunctions interfere either constructively or destructively, and that’s what distinguishes bosons (i.e. force-carrying particles, such as photons) from fermions (i.e. matter-particles, such as electrons). The mathematical description is complete and respects Occam’s Razor. There is no redundancy. One cannot further simplify: every logical possibility in the mathematical description reflects a physical possibility in the real world.

Having said that, there is more to an electron than just Fermi-Dirac statistics, of course. What about its charge, and this weird number, its spin?,

Well… That’s what’s this post is about. As Feynman puts it: “So far we have been considering only spin-zero electrons and photons, fake electrons and fake photons.”

I wouldn’t call them ‘fake’, because they do behave like real photons and electrons already but… Yes. We can make them more ‘real’ by including charge and spin in the discussion. Let’s go for it.

Charge and spin

From what I wrote above, it’s clear that the dichotomy between bosons and fermions (i.e. between ‘matter-particles’ and ‘force-carriers’ or, to put it simply, between light and matter) is not based on the (electric) charge. It’s true we cannot pile atoms or molecules on top of each other because of the repulsive forces between the electron clouds—but it’s not impossible, as nuclear fusion proves: nuclear fusion is possible because the electrostatic repulsive force can be overcome, and then the nuclear force is much stronger (and, remember, no quarks are being destroyed or created: all nuclear energy that’s being released or used is nuclear binding energy).

It’s also true that the force-carriers we know best, notably photons and gluons, do not carry any (electric) charge, as shown in the table below. So that’s another reason why we might, mistakenly, think that charge somehow defines matter-particles. However, we can see that matter-particles, first carry very different charges (positive or negative, and with very different values: 1/3, 2/3 or 1), and even be neutral, like the neutrinos. So, if there’s a relation, it’s very complex. In addition, one of the two force-carrier for the weak force, the W boson, can have positive or negative charge too, so that doesn’t make sense, does it? [I admit the weak force is a bit of a ‘special’ case, and so I should leave it out of the analysis.] The point is: the electric charge is what it is, but it’s not what defines matter. It’s just one of the possible charges that matter-particles can carry. [The other charge, as you know, is the color charge but, to confuse the picture once again, that’s a charge that can also be carried by gluons, i.e. the carriers of the strong force.]

So what is it, then? Well… From the table above, you can see that the property of ‘spin’ (i.e. the third number in the top left-hand corner) matches the above-mentioned dichotomy in behavior, i.e. the two different types of interference (bosons versus fermions or, to use a heavier term, Bose-Einstein statistics versus Fermi-Dirac statistics): all matter-particles are so-called spin-1/2 particles, while all force-carriers (gauge bosons) all have spin one. [Never mind the Higgs particle: that’s ‘just’ a mechanism to give (most) elementary particles some mass.]

So why is that? Why are matter-particles spin-1/2 particles and force-carries spin-1 particles? To answer that question, we need to answer the question: what’s this spin number? And to answer that question, we first need to answer the question: what’s spin?

Spin in the classical world

In the classical world, it’s, quite simply, the momentum associated with a spinning or rotating object, which is referred to as the angular momentum. We’ve analyzed the math involved in another post, and so I won’t dwell on that here, but you should note that, in classical mechanics, we distinguish two types of angular momentum:

Orbital angular momentum: that’s the angular momentum an object gets from circling in an orbit, like the Earth around the Sun.
Spin angular momentum: that’s the angular momentum an object gets from spinning around its own axis., just like the Earth, in addition to rotating around the Sun, is rotating around its own axis (which is what causes day and night, as you know).

The math involved in both is pretty similar, but it’s still useful to distinguish the two, if only because we’ll distinguish them in quantum mechanics too! Indeed, when I analyzed the math in the above-mentioned post, I showed how we represent angular momentum by a vector that’s perpendicular to the direction of rotation, with its direction given by the ubiquitous right-hand rule—as in the illustration below, which shows both the angular momentum (L) as well as the torque (τ) that’s produced by a rotating mass. The formulas are given too: the angular momentum L is the vector cross product of the position vector r and the linear momentum p, while the magnitude of the torque τ is given by the product of the length of the lever arm and the applied force. An alternative approach is to define the angular velocity ω and the moment of inertia I, and we get the same result: L = Iω.

Of course, the illustration above shows orbital angular momentum only and, as you know, we no longer have a ‘planetary model’ (aka the Rutherford model) of an atom. So should we be looking at spin angular momentum only?

Well… Yes and no. More yes than no, actually. But it’s ambiguous. In addition, the analogy between the concept of spin in quantum mechanics, and the concept of spin in classical mechanics, is somewhat less than straightforward. Well… It’s not straightforward at all actually. But let’s get on with it and use more precise language. Let’s first explore it for light, not because it’s easier (it isn’t) but… Well… Just because. 🙂

The spin of a photon

I talked about the polarization of light in previous posts (see, for example, my post on vector analysis): when we analyze light as a traveling electromagnetic wave (so we’re still in the classical analysis here, not talking about photons as ‘light particles’), we know that the electric field vector oscillates up and down and is, in fact, likely to rotate in the xy-plane (with z being the direction of propagation). The illustration below shows the idealized (aka limiting) case of perfectly circular polarization: if there’s polarization, it is more likely to be elliptical. The other limiting case is plane polarization: in that case, the electric field vector just goes up and down in one direction only. [In case you wonder whether ‘real’ light is polarized, it often is: there’s an easy article on that on the Physics Classroom site.]

The illustration above uses Dirac’s bra-ket notation |L〉 and |R〉 to distinguish the two possible ‘states’, which are left- or right-handed polarization respectively. In case you forgot about bra-ket notations, let me quickly remind you: an amplitude is usually denoted by 〈x|s〉, in which 〈x| is the so-called ‘bra’, i.e. the final condition, and |s〉 is the so-called ‘ket’, i.e. the starting condition, so 〈x|s〉 could mean: a photon leaves at s (from source) and arrives at x. It doesn’t matter much here. We could have used any notation, as we’re just describing some state, which is either |L〉 (left-handed polarization) or |R〉 (right-handed polarization). The more intriguing extras in the illustration above, besides the formulas, are the values: ± ħ = ±h/2π. So that’s plus or minus the (reduced) Planck constant which, as you know, is a very tiny constant. I’ll come back to that. So what exactly is being represented here?

At first, you’ll agree it looks very much like the momentum of light (p) which, in a previous post, we calculated from the (average) energy (E) as p = E/c. Now, we know that E is related to the (angular) frequency of the light through the Planck-Einstein relation E = hν = ħω. Now, ω is the speed of light (c) times the wave number (k), so we can write: p = ħω = ħck/c = ħk. The wave number is the ‘spatial frequency’, expressed either in cycles per unit distance (1/λ) or, more usually, in radians per unit distance (k = 2π/λ), so we can also write p = ħk = h/λ. Whatever way we write it, we find that this momentum (p) depends on the energy and/or, what amounts to saying the same, the frequency and/or the wavelength of the light.

So… Well… The momentum of light is not just h or ħ, i.e. what’s written in that illustration above. So it must be something different. In addition, I should remind you this momentum was calculated from the magnetic field vector, as shown below (for more details, see my post on vector calculus once again), so it had nothing to do with polarization really.

Finally, last but not least, the dimensions of ħ and p = h/λ are also different (when one is confused, it’s always good to do a dimensional analysis in physics):

The dimension of Planck’s constant (both h as well as ħ = h/2π) is energy multiplied by time (J·s or eV·s) or, equivalently, momentum multiplied by distance. It’s referred to as the dimension of action in physics, and h is effectively, the so-called quantum of action.
The dimension of (linear) momentum is… Well… Let me think… Mass times velocity (mv)… But what’s the mass in this case? Light doesn’t have any mass. However, we can use the mass-energy equivalence: 1 eV = 1.7826×10⁻³⁶ kg. [10⁻³⁶? Well… Yes. An electronvolt is a very tiny measure of energy.] So we can express p in eV·m/s units.

Hmm… We can check: momentum times distance gives us the dimension of Planck’s constant again – (eV·m/s)·m = eV·s. OK. That’s good… […] But… Well… All of this nonsense doesn’t make us much smarter, does it? 🙂 Well… It may or may not be more useful to note that the dimension of action is, effectively, the same as the dimension of angular momentum. Huh? Why? Well… From our classical L = r×p formula, we find L should be expressed in m·(eV·m/s) = eV·m²/s units, so that’s… What? Well… Here we need to use a little trick and re-express energy in mass units. We can then write L in kg·m²/s units and, because 1 Newton (N) is 1 kg⋅m/s², the kg·m²/s unit is equivalent to the N·m·s = J·s unit. Done!

Having said that, all of this still doesn’t answer the question: are the linear momentum of light, i.e. our p, and those two angular momentum ‘states’, |L〉 and |R〉, related? Can we relate |L〉 and |R〉 to that L = r×p formula?

The answer is simple: no. The |L〉 and |R〉 states represent spin angular momentum indeed, while the angular momentum we would derive from the linear momentum of light using that L = r×p is orbital angular momentum. Let’s introduce the proper symbols: orbital angular momentum is denoted by L, while spin angular momentum is denoted by S. And then the total angular momentum is, quite simply, J = L + S.

L and S can both be calculated using either a vector cross product r × p (but using different values for r and p, of course) or, alternatively, using the moment of inertia tensor I and the angular velocity ω. The illustrations below (which I took from Wikipedia) show how, and also shows how L and S are added to yield J = L + S.

So what? Well… Nothing much. The illustration above show that the analysis – which is entirely classical, so far – is pretty complicated. [You should note, for example, that in the S = Iω and L = Iω formulas, we don’t use the simple (scalar) moment of inertia but the moment of inertia tensor (so that’s a matrix denoted by I, instead of the scalar I), because S (or L) and ω are not necessarily pointing in the same direction.

By now, you’re probably very confused and wondering what’s wiggling really. The answer for the orbital angular momentum is: it’s the linear momentum vector p. Now…

Hey! Stop! Why would that vector wiggle?

You’re right. Perhaps it doesn’t. The linear momentum p is supposed to be directed in the direction of travel of the wave, isn’t it? It is. In vector notation, we have p = ħk, and that k vector (i.e. the wavevector) points in the direction of travel of the wave indeed and so… Well… No. It’s not that simple. The wave vector is perpendicular to the surfaces of constant phase, i.e. the so-called wave fronts, as show in the illustration below (see the direction of e_k, which is a unit vector in the direction of k).

So, yes, if we’re analyzing light moving in a straight one-dimensional line only, or we’re talking a plane wave, as illustrated below, then the orbital angular momentum vanishes.

But the orbital angular momentum L does not vanish when we’re looking at a real light beam, like the ones below. Real waves? Well… OK… The ones below are idealized wave shapes as well, but let’s say they are somewhat more real than a plane wave. 🙂

So what do we have here? We have wavefronts that are shaped as helices, except for the one in the middle (marked by m = 0) which is, once again, an example of plane wave—so for that one (m = 0), we have zero orbital angular momentum indeed. But look, very carefully, at the m = ± 1 and m = ± 2 situations. For m = ± 1, we have one helical surface with a step length equal to the wavelength λ. For m = ± 2, we have two intertwined helical surfaces with the step length of each helix surface equal to 2λ. [Don’t worry too much about the second and third column: they show a beam cross-section (so that’s not a wave front but a so-called phase front) and the (averaged) light intensity, again of a beam cross-section.] Now, we can further generalize and analyze waves composed of m helices with the step length of each helix surface equal to |m|λ. The Wikipedia article on OAM (orbital angular momentum of light), from which I got this illustration, gives the following formula to calculate the OAM:

The same article also notes that the quantum-mechanical equivalent of this formula, i.e. the orbital angular momentum of the photons one would associate with the not-cylindrically-symmetric waves above (i.e. all those for which m ≠ 0), is equal to:

L_z = mħ

So what? Well… I guess we should just accept that as a very interesting result. For example, I duly note that L_zis along the direction of propagation of the wave (as indicated by the z subscript), and I also note the very interesting fact that, apparently, L_zcan be either positive or negative. Now, I am not quite sure how such result is consistent with the idea of radiation pressure, but I am sure there must be some logical explanation to that. The other point you should note is that, once again, any reference to the energy (or to the frequency or wavelength) of our photon has disappeared. Hmm… I’ll come back to this, as I promised above already.

The thing is that this rather long digression on orbital angular momentum doesn’t help us much in trying to understand what that spin angular momentum (SAM) is all about. So, let me just copy the final conclusion of the Wikipedia article on the orbital angular momentum of light: the OAM is the component of angular momentum of light that is dependent on the field spatial distribution, not on the polarization of light.

So, again, what’s the spin angular momentum? Well… The only guidance we have is that same little drawing again and, perhaps, another illustration that’s supposed to compare SAM with OAM (underneath).

Now, the Wikipedia article on SAM (spin angular momentum), from which I took the illustrations above, gives a similar-looking formula for it:

When I say ‘similar-looking’, I don’t mean it’s the same. [Of course not! Spin and orbital angular momentum are two different things!]. So what’s different in the two formulas? Well… We don’t have any del operator (∇) in the SAM formula, and we also don’t have any position vector (r) in the integral kernel (or integrand, if you prefer that term). However, we do find both the electric field vector (E) as well as the (magnetic) vector potential (A) in the equation again. Hence, the SAM (also) takes both the electric as well as the magnetic field into account, just like the OAM. [According to the author of the article, the expression also shows that the SAM is nonzero when the light polarization is elliptical or circular, and that it vanishes if the light polarization is linear, but I think that’s much more obvious from the illustration than from the formula… However, I realize I really need to move on here, because this post is, once again, becoming way too long. So…]

OK. What’s the equivalent of that formula in quantum mechanics?

Well… In quantum mechanics, the SAM becomes a ‘quantum observable’, described by a corresponding operator which has only two eigenvalues:

S_z = ± ħ

So that corresponds to the two possible values for J_z, as mentioned in the illustration, and we can understand, intuitively, that these two values correspond to two ‘idealized’ photons which describe a left- and right-handed circularly polarized wave respectively.

So… Well… There we are. That’s basically all there is to say about it. So… OK. So far, so good.

But… Yes? Why do we call a photon a spin-one particle?

That has to do with convention. A so-called spin-zero particle has no degrees of freedom in regard to polarization. The implied ‘geometry’ is that a spin-zero particle is completely symmetric: no matter in what direction you turn it, it will always look the same. In short, it really behaves like a (zero-dimensional) mathematical point. As you can see from the overview of all elementary particles, it is only the Higgs boson which has spin zero. That’s why the Higgs field is referred to as a scalar field: it has no direction. In contrast, spin-one particles, like photons, are also ‘point particles’, but they do come with one or the other kind of polarization, as evident from all that I wrote above. To be specific, they are polarized in the xy-plane, and can have one of two directions. So, when rotating them, you need a full rotation of 360° if you want them to look the same again.

Now that I am here, let me exhaust the topic (to a limited extent only, of course, as I don’t want to write a book here) and mention that, in theory, we could also imagine spin-2 particles, which would look the same after half a rotation (180°). However, as you can see from the overview, none of the elementary particles has spin-2. A spin-2 particle could be like some straight stick, as that looks the same even after it is rotated 180 degrees. I am mentioning the theoretical possibility because the graviton, if it would effectively exist, is expected to be a massless spin-2 boson. [Now why do I mention this? Not sure. I guess I am just noting this to remind you of the fact that the Higgs boson is definitely not the (theoretical) graviton, and/or that we have no quantum theory for gravity.]

Oh… That’s great, you’ll say. But what about all those spin-1/2 particles in the table? You said that all matter-particles are spin 1/2 particles, and that it’s this particular property that actually makes them matter-particles. So what’s the geometry here? What kind of ‘symmetries’ do they respect?

Well… As strange as it sounds, a spin-1/2 particle needs two full rotations (2×360°=720°) until it is again in the same state. Now, in regard to that particularity, you’ll often read something like: “There is nothing in our macroscopic world which has a symmetry like that.” Or, worse, “Common sense tells us that something like that cannot exist, that it simply is impossible.” [I won’t quote the site from which I took this quotes, because it is, in fact, the site of a very respectable research center!] Bollocks! The Wikipedia article on spin has this wonderful animation: look at how the spirals flip between clockwise and counterclockwise orientations, and note that it’s only after spinning a full 720 degrees that this ‘point’ returns to its original configuration after spinning a full 720 degrees.

So, yes, we can actually imagine spin-1/2 particles, and with not all that much imagination, I’d say. But… OK… This is all great fun, but we have to move on. So what’s the ‘spin’ of these spin-1/2 particles and, more in particular, what’s the concept of ‘spin’ of an electron?

The spin of an electron

When starting to read about it, I thought that the angular momentum of an electron would be easier to analyze than that of a photon. Indeed, while a photon has no mass and no electric charge, that analysis with those E and B vectors is damn complicated, even when sticking to a strictly classical analysis. For an electron, the classical picture seems to be much more straightforward—but only at first indeed. It quickly becomes equally weird, if not more.

We can look at an electron in orbit as a rotating electrically charged ‘cloud’ indeed. Now, from Maxwell’s equations (or from your high school classes even), you know that a rotating electric charged body creates a magnetic dipole. So an electron should behave just like a tiny bar magnet. Of course, we have to make certain assumptions about the distribution of the charge in space but, in general, we can write that the magnetic dipole moment μ is equal to:

In case you want to know, in detail, where this formula comes from, let me refer you to Feynman once again, but trust me – for once 🙂 – it’s quite straightforward indeed: the L in this formula is the angular momentum, which may be the spin angular momentum, the orbital angular momentum, or the total angular momentum. The e and m are, of course, the charge and mass of the electron respectively.

So that’s a good and nice-looking formula, and it’s actually even correct except for the spin angular momentum as measured in experiments. [You’ll wonder how we can measure orbital and spin angular momentum respectively, but I’ll talk about an 1921 experiment in a few minutes, and so that will give you some clue as to that mystery. :-)] To be precise, it turns out that one has to multiply the above formula for μ with a factor which is referred to as the g-factor. [And, no, it’s got nothing to do with the gravitational constant or… Well… Nothing. :-)] So, for the spin angular momentum, the formula should be:

Experimental physicists are constantly checking that value and they know measure it to be something like g = is 2.00231930419922 ± 1.5×10⁻¹². So what’s the explanation for that g? Where does it come from? There is, in fact, a classical explanation for it, which I’ll copy hereunder (yes, from Wikipedia). This classical explanation is based on assuming that the distribution of the electric charge of the electron and its mass does not coincide:

Why do I mention this classical explanation? Well… Because, in most popular books on quantum mechanics (including Feynman’s delightful QED), you’ll read that (a) the value for g can be derived from a quantum-theoretical equation known as Dirac’s equation (or ‘Dirac theory’, as it’s referred to above) and, more importantly, that (b) physicists call the “accurate prediction of the electron g-factor” from quantum theory (i.e. ‘Dirac’s theory’ in this case) “one of the greatest triumphs” of the theory.

So what about it? Well… Whatever the merits of both explanations (classical or quantum-mechanical), they are surely not the reason why physicists abandoned the classical theory. So what was the reason then? What a stupid question! You know that already! The Rutherford model was, quite simply, not consistent: according to classical theory, electrons should just radiate their energy away and spiral into the nucleus. More in particular, there was yet another experiment that wasn’t consistent with classical theory, and it’s one that’s very relevant for the discussion at hand: it’s the so-called Stern-Gerlach experiment.

It was just as ‘revolutionary’ as the Michelson-Morley experiment (which couldn’t measure the speed of light), or the discovery of the positron in 1932. The Stern-Gerlach experiment was done in 1921, so that’s many years before quantum theory replaced classical theory and, hence, it’s not one of those experiments confirming quantum theory. No. Quite the contrary. It was, in fact, one of the experiments that triggered the so-called quantum revolution. Let me insert the experimental set-up and summarize it (below).

sterngerlach

The German scientists Otto Stern and Walther Gerlach produced a beam of electrically-neutral silver atoms and let it pass through a (non-uniform) magnetic field. Why silver atoms? Well… Silver atoms are easy to handle (in a lab, that is) and easy to detect with a photoplate.
These atoms came out of an oven (literally), in which the silver was being evaporated (yes, one can evaporate silver), so they had no special orientation in space and, so Stern and Gerlach thought, the magnetic moment (or spin) of the outer electrons in these atoms would point into all possible directions in space.
As expected, the magnetic field did deflect the silver atoms, just like it would deflect little dipole magnets if you would shoot them through the field. However, the pattern of deflection was not the one which they expected. Instead of hitting the plate all over the place, within some contour, of course, only the contour itself was hit by the atoms. There was nothing in the middle!
And… Well… It’s a long story, but I’ll make it short. There was only one possible explanation for that behavior, and that’s that the magnetic moments – and, therefore the spins – had only two orientations in space, and two possible values only which – Surprise, surprise! – are ±ħ/2 (so that’s half the value of the spin angular momentum of photons, which explains the spin-1/2 terminology).

The spin angular momentum of an electron is more popularly known as ‘up’ or ‘down’.

So… What about it? Well… It explains why a atomic orbital can have two electrons, rather than one only and, as such, the behavior of the electron here is the basis of the so-called periodic table, which explains all properties of the chemical elements. So… Yes. Quantum theory is relevant, I’d say. 🙂

Conclusion

This has been a terribly long post, and you may no longer remember what I promised to do. What I promised to do, is to write some more about the difference between a photon and an electron and, more in particular, I said I’d write more about their charge, and that “weird number”: their spin. I think I lived up to that promise. The summary is simple:

Photons have no (electric) charge, but they do have spin. Their spin is linked to their polarization in the xy-plane (if z is the direction of propagation) and, because of the strangeness of quantum mechanics (i.e. the quantization of (quantum) observables), the value for this spin is either +ħ or –ħ, which explains why they are referred to as spin-one particles (because either value is one unit of the Planck constant).
Electrons have both electric charge as well as spin. Their spin is different and is, in fact, related to their electric charge. It can be interpreted as the magnetic dipole moment, which results from the fact we have a spinning charge here. However, again, because of the strangeness of quantum mechanics, their dipole moment is quantized and can take only one of two values: ±ħ/2, which is why they are referred to as spin-1/2 particles.

So now you know everything you need to know about photons and electrons, and then I mean real photons and electrons, including their properties of charge and spin. So they’re no longer ‘fake’ spin-zero photons and electrons now. Isn’t that great? You’ve just discovered the real world! 🙂

So… I am done—for the moment, that is… 🙂 If anything, I hope this post shows that even those ‘weird’ quantum numbers are rooted in ‘physical reality’ (or in physical ‘geometry’ at least), and that quantum theory may be ‘crazy’ indeed, but that it ‘explains’ experimental results. Again, as Feynman says:

“We understand how Nature works, but not why Nature works that way. Nobody understands that. I can’t explain why Nature behave in this particular way. You may not like quantum theory and, hence, you may not accept it. But physicists have learned to realize that whether they like a theory or not is not the essential question. Rather, it is whether or not the theory gives predictions that agree with experiment. The theory of quantum electrodynamics describes Nature as absurd from the point of view of common sense. But it agrees fully with experiment. So I hope you can accept Nature as She is—absurd.”

Frankly speaking, I am not quite prepared to accept Nature as absurd: I hope that some more familiarization with the underlying mathematical forms and shapes will make it look somewhat less absurd. More, I hope that such familiarization will, in the end, make everything look just as ‘logical’, or ‘natural’ as the two ways in which amplitudes can ‘interfere’.

Post scriptum: I said I would come back to the fact that, in the analysis of orbital and spin angular momentum of a photon (OAM and SAM), the frequency or energy variable sort of ‘disappears’. So why’s that? Let’s look at those expressions for |L〉 and |R〉 once again:

Formula L spin

What’s written here really? If |L〉 and |R〉 are supposed to be equal to either +ħ or –ħ, then that product of e^i(kz–ωt)with the 3×1 matrix (which is a ‘column vector’ in this case) does not seem to make much sense, does it? Indeed, you’ll remember that e^i(kz–ωt)just a regular wave function. To be precise, its phase is φ = kz–ωt (with z the direction of propagation of the wave), and its real and imaginary part can be written as e^iφ = cos(φ) + isin(φ) = a + bi. Multiplying it with that 3×1 column vector (1, i, 0) or (1, –i, 0) just yields another 3×1 column vector. To be specific, we get:

The 3×1 ‘vector’ (a + bi, –b+ai, 0) for |L〉, and
The 3×1 ‘vector’ (a + bi, b–ai, 0) for |R〉.

So we have two new ‘vectors’ whose components are complex numbers. Furthermore, we can note that their ‘x’-component is the same, their ‘y’-component is each other’s opposite –b+ai = –(b–ai), and their ‘z’-component is 0.

So… Well… In regard to their ‘y’-component, I should note that’s just the result of the multiplication with i and/or –i: multiplying a complex number with i amounts to a 90° degree counterclockwise rotation, while multiplication with –i amounts to the same but clockwise. Hence, we must arrive at two complex numbers that are each other’s opposite. [Indeed, in complex analysis, the value –1 = e^iπ= e^–iπ is a 180° rotation, both clockwise (φ = –π) or counterclockwise (φ = +π), of course!.]

Hmm… Still… What does it all mean really? The truth is that it takes some more advanced math to interpret the result. To be precise, pure quantum states, such |L〉 and |R〉 here, are represented by so-called ‘state vectors’ in a Hilbert space over complex numbers. So that’s what we’ve got here. So… Well… I can’t say much more about this right now: we’ll just need to study some more before we’ll ‘understand’ those expressions for |L〉 and |R〉. So let’s not worry about it right now. We’ll get there.

Just for the record, I should note that, initially, I thought 1/√2 factor in front gave some clue as to what’s going on here: 1/√2 ≈ 0.707 is a factor that’s used to calculate the root mean square (RMS) value for a sine wave. It’s illustrated below. The RMS value is a ‘special average’ one can use to calculate the energy or power (i.e. energy per time unit) of a wave. [Using the term ‘average’ is misleading, because the average of a sine wave is 1/2 over half a cycle, and 0 over a fully cycle, as you can easily see from the shape of the function. But I guess you know what I mean.]

Indeed, you’ll remember that the energy (E) of a wave is proportional to the square of its amplitude (A): E ∼ A². For example, when we have a constant current I, the power P will be proportional to its square: P ∼ I². With a varying current (I) and voltage (V), the formula is more complicated but we can simply it using the rms values: P_avg = V_RMS·I_RMS.

So… Looking at that formula, should we think of h and/or ħ as some kind of ‘average’ energy, like the energy of a photon per cycle or per radian? That’s an interesting idea so let’s explore it. If the energy of a photon is equal to E = h·ν = h·ω/2π = ħω, then we can also write:

h = E/ν and/or ħ = E/ω

So, yes: h is the energy of a photon per cycle obviously and, because the phase covers 2π radians during each cycle, and ħ must be the energy of the photon per radian! That’s a great result, isn’t it? It also gives a wonderfully simple interpretation to Planck’s quantum of action!

Well… No. We made at least two mistakes here. The first mistake is that if we think of a photon as wave train being radiated by an atom – which, as we calculated in another post, lasts about 3.2×10^–8 seconds – the graph for its energy is going to resemble the graph of its amplitude, so it’s going to die out and each oscillation will carry less and less energy. Indeed, the decay time given here (τ = 3.2×10^–8 seconds) was the time it takes for the radiation (we assumed sodium light with a wavelength of 600 nanometer) to die out by a factor 1/e. To be precise, the shape of the energy curve is E = E₀e^−t/τ, and so it’s an envelope resembling the A(t) curve below.

Indeed, remember, the energy of a wave is determined not only by its frequency (or wavelength) but also by its amplitude, and so we cannot assume the amplitude of a ‘photon wave’ is going to be the same everywhere. Just for the record: note that the energy of a wave is proportional to the frequency (doubling the frequency doubles the energy) but, when linking it to the amplitude, we should remember that the energy is proportional to the square of the amplitude, so we write E ∼ A².

The second mistake is that both ν and ω are the light frequency (expressed in cycles or radians respectively) of the light per second, i.e per time unit. So that’s not the number of cycles or radians that we should associate with the wavetrain! We should use the number of cycles (or radians) packed into one photon. We can calculate that easily from the value for the decay time τ. Indeed, for sodium light, which which has a frequency of 500 THz (500×10¹²oscillations per second) and a wavelength of 600 nm (600×10^–9meter), we said the radiation lasts about 3.2×10^–8seconds (that’s actually the time it takes for the radiation’s energy to die out by a factor 1/e, so the wavetrain will actually last (much) longer, but so the amplitude becomes quite small after that time), and so that makes for some 16 million oscillations, and a ‘length’ of the wavetrain of about 9.6 meter! Now, the energy of a sodium light photon is about 2eV (h·ν ≈ 4×10⁻¹⁵electronvolt·second times 0.5×10¹⁵cycles/sec = 2eV) and so we could say the average energy of each of those 16 million oscillations would be 2/(16×10⁶) eV = 0.125×10^–6 eV. But, from all that I wrote above, it’s obvious that this number doesn’t mean all that much, because the wavetrain is not likely to be shaped very regularly.

So, in short, we cannot say that h is the photon energy per cycle or that ħ is the photon energy per radian! That’s not only simplistic but, worse, false. Planck’s constant is what is is: a factor of proportionality for which there is no simple ‘arithmetic’ and/or ‘geometric’ explanation. It’s just there, and we’ll need to study some more math to truly understand the meaning of those two expressions for |L〉 and |R〉.

Having said that, and having thought about it all some more, I find there’s, perhaps, a more interesting way to re-write E = h·ν:

h = E/ν = (λ/c)E = T·E

T? Yes. T is the period, so that’s the time needed for one oscillation: T is just the reciprocal of the frequency (T = 1/ν = λ/c). It’s a very tiny number, because we divide (1) a very small number (the wavelength of light measured in meter) by (2) a very large number (the distance (in meter) traveled by light). For sodium light, T is equal to 2×10^–15seconds, so that’s two femtoseconds, i.e. two quadrillionths of a second.

Now, we can think of the period as a fraction of a second, and smaller fractions are, obviously, associated with higher frequencies and, what amounts to the same, shorter wavelengths (and, hence, higher energies). However, when writing T = λ/c, we can also think of T being another kind of fraction: λ/c can also be written as the ratio of the wavelength and the distance traveled by light in one second, i.e. a light-second (remember that light-seconds are measures of length, not of distance). The two fractions are the same when we express time and distance in equivalent units indeed (i.e. distance in light-second, or time in sec/c units).

So that links h to both time as well as distance and we may look at h as some kind of fraction or energy ‘density’ even (although the term ‘density’ in this context is not quite accurate). In the same vein, I should note that, if there’s anything that should make you think about h, is the fact that its value depends on how we measure time and distance. For example, if w’d measure time in other units (for example, the more ‘natural’ unit defined by the time light needs to travel one meter), then we get a different unit for h. And, of course, you also know we can relate energy to distance (1 J = 1 N·m). But that’s something that’s obvious from h‘s dimension (J·s), and so I shouldn’t dwell on that.

Hmm… Interesting thoughts. I think I’ll develop these things a bit further in one of my next posts. As for now, however, I’ll leave you with your own thoughts on it.

Note 1: As you’re trying to follow what I am writing above, you may have wondered whether or not the duration of the wavetrain that’s emitted by an atom is a constant, or whether or not it packs some constant number of oscillations. I’ve thought about that myself as I wrote down the following formula at some point of time:

h = (the duration of the wave)·(the energy of the photon)/(the number of oscillations in the wave)

As mentioned above, interpreting h as some kind of average energy per oscillation is not a great idea but, having said that, it would be a good exercise for you to try to answer that question in regard to the duration of these wavetrains, and/or the number of oscillations packed into them, yourself. There are various formulas for the Q of an atomic oscillator, but the simplest one is the one expressed in terms of the so-called classical electron radius r₀:

Q = 3λ/4πr₀

As you can see, the Q depends on λ: higher wavelengths (so lower energy) are associated with higher Q. In fact, the relationship is directly proportional: twice the wavelength will give you twice the Q. Now, the formula for the decay time τ is also dependent on the wavelength. Indeed, τ = 2Q/ω = Qλ/πc. Combining the two formulas yields (if I am not mistaken):

τ = 3λ²/4π²r₀c.

Hence, the decay time is proportional to the square of the wavelength. Hmm… That’s an interesting result. But I really need to learn how to be a bit shorter, and so I’ll really let you think now about what all this means or could mean.

Note 2: If that 1/√2 factor has nothing to do with some kind of rms calculation, where does it come from? I am not sure. It’s related to state vector math, it seems, and I haven’t started that as yet. I just copy a formula from Wikipedia here, which shows the same factor in front:

The formula above is said to represent the “superposition of joint spin states for two particles”. My gut instinct tells me 1/√2 factor has to do with the normalization condition and/or with the fact that we have to take the (absolute) square of the (complex-valued) amplitudes to get the probability.

The Strange Theory of Light and Matter (II)

If we limit our attention to the interaction between light and matter (i.e. the behavior of photons and electrons only—so we we’re not talking quarks and gluons here), then the ‘crazy ideas’ of quantum mechanics can be summarized as follows:

At the atomic or sub-atomic scale, we can no longer look at light as an electromagnetic wave. It consists of photons, and photons come in blobs. Hence, to some extent, photons are ‘particle-like’.
At the atomic or sub-atomic scale, electrons don’t behave like particles. For example, if we send them through a slit that’s small enough, we’ll observe a diffraction pattern. Hence, to some extent, electrons are ‘wave-like’.

In short, photons aren’t waves, but they aren’t particles either. Likewise, electrons aren’t particles, but they aren’t waves either. They are neither. The weirdest thing of all, perhaps, is that, while light and matter are two very different things in our daily experience – light and matter are opposite concepts, I’d say, just like particles and waves are opposite concepts) – they look pretty much the same in quantum physics: they are both represented by a wavefunction.

Let me immediately make a little note on terminology here. The term ‘wavefunction’ is a bit ambiguous, in my view, because it makes one think of a real wave, like a water wave, or an electromagnetic wave. Real waves are described by real-valued wave functions describing, for example, the motion of a ball on a spring, or the displacement of a gas (e.g. air) as a sound wave propagates through it, or – in the case of an electromagnetic wave – the strength of the electric and magnetic field.

You may have questions about the ‘reality’ of fields, but electromagnetic waves – i.e. the classical description of light – are quite ‘real’ too, even if:

Light doesn’t travel in a medium (like water or air: there is no aether), and
The magnitude of the electric and magnetic field (they are usually denoted by E and B) depend on your reference frame: if you calculate the fields using a moving coordinate system, you will get a different mixture of E and B. Therefore, E and B may not feel very ‘real’ when you look at them separately, but they are very real when we think of them as representing one physical phenomenon: the electromagnetic interaction between particles. So the E and B mix is, indeed, a dual representation of one reality. I won’t dwell on that, as I’ve done that in another post of mine.

How ‘real’ is the quantum-mechanical wavefunction?

The quantum-mechanical wavefunction is not like any of these real waves. In fact, I’d rather use the term ‘probability wave’ but, apparently, that’s used only by bloggers like me 🙂 and so it’s not very scientific. That’s for a good reason, because it’s not quite accurate either: the wavefunction in quantum mechanics represents probability amplitudes, not probabilities. So we should, perhaps, be consistent and term it a ‘probability amplitude wave’ – but then that’s too cumbersome obviously, so the term ‘probability wave’ may be confusing, but it’s not so bad, I think.

Amplitudes and probabilities are related as follows:

Probabilities are real numbers between 0 and 1: they represent the probability of something happening, e.g. a photon moves from point A to B, or a photon is absorbed (and emitted) by an electron (i.e. a ‘junction’ or ‘coupling’, as you know).
Amplitudes are complex numbers, or ‘arrows’ as Feynman calls them: they have a length (or magnitude) and a direction.
We get the probabilities by taking the (absolute) square of the amplitudes.

So photons aren’t waves, but they aren’t particles either. Likewise, electrons aren’t particles, but they aren’t waves either. They are neither. So what are they? We don’t have words to describe what they are. Some use the term ‘wavicle’ but that doesn’t answer the question, because who knows what a ‘wavicle’ is? So we don’t know what they are. But we do know how they behave. As Feynman puts it, when comparing the behavior of light and then of electrons in the double-slit experiment—struggling to find language to describe what’s going on: “There is one lucky break: electrons behave just like light.”

He says so because of that wave function: the mathematical formalism is the same, for photons and for electrons. Exactly the same? […] But that’s such a weird thing to say, isn’t it? We can’t help thinking of light as waves, and of electrons as particles. They can’t be the same. They’re different, aren’t they? They are.

Scales and senses

To some extent, the weirdness can be explained because the scale of our world is not atomic or sub-atomic. Therefore, we ‘see’ things differently. Let me say a few words about the instrument we use to look at the world: our eye.

Our eye is particular. The retina has two types of receptors: the so-called cones are used in bright light, and distinguish color, but when we are in a dark room, the so-called rods become sensitive, and it is believed that they actually can detect a single photon of light. However, neural filters only allow a signal to pass to the brain when at least five photons arrive within less than a tenth of a second. A tenth of a second is, roughly, the averaging time of our eye. So, as Feynman puts it: “If we were evolved a little further so we could see ten times more sensitively, we wouldn’t have this discussion—we would all have seen very dim light of one color as a series of intermittent little flashes of equal intensity.” In other words, the ‘particle-like’ character of light would have been obvious to us.

Let me make a few more remarks here, which you may or may not find useful. The sense of ‘color’ is not something ‘out there’: colors, like red or brown, are experiences in our eye and our brain. There are ‘pigments’ in the cones (cones are the receptors that work only if the intensity of the light is high enough) and these pigments absorb the light spectrum somewhat differently, as a result of which we ‘see’ color. Different animals see different things. For example, a bee can distinguish between white paper using zinc white versus lead white, because they reflect light differently in the ultraviolet spectrum, which the bee can see but we don’t. Bees can also tell the direction of the sun without seeing the sun itself, because they are sensitive to polarized light, and the scattered light of the sky (i.e. the blue sky as we see it) is polarized. The bee can also notice flicker up to 200 oscillations per second, while we see it only up to 20, because our averaging time is like a tenth of a second, which is short for us, but so the averaging time of the bee is much shorter. So we cannot see the quick leg movements and/or wing vibrations of bees, but the bee can!

Sometimes we can’t see any color. For example, we see the night sky in ‘black and white’ because the light intensity is very low, and so it’s our rods, not the cones, that process the signal, and so these rods can’t ‘see’ color. So those beautiful color pictures of nebulae are not artificial (although the pictures are often enhanced). It’s just that the camera that is used to take those pictures (film or, nowadays, digital) is much more sensitive than our eye.

Regardless, color is a quality which we add to our experience of the outside world ourselves. What’s out there are electromagnetic waves with this or that wavelength (or, what amounts to the same, this or that frequency). So when critics of the exact sciences say so much is lost when looking at (visible) light as an electromagnetic wave in the range of 430 to 790 teraherz, they’re wrong. Those critics will say that physics reduces reality. That is not the case.

What’s going on is that our senses process the signal that they are receiving, especially when it comes to vision. As Feynman puts it: “None of the other senses involves such a large amount of calculation, so to speak, before the signal gets into a nerve that one can make measurements on. The calculations for all the rest of the senses usually happen in the brain itself, where it is very difficult to get at specific places to make measurements, because there are so many interconnections. Here, with the visual sense, we have the light, three layers of cells making calculations, and the results of the calculations being transmitted through the optic nerve.”

Hence, things like color and all of the other sensations that we have are the object of study of other sciences, including biochemistry and neurobiology, or physiology. For all we know, what’s ‘out there’ is, effectively, just ‘boring’ stuff, like electromagnetic radiation, energy and ‘elementary particles’—whatever they are. No colors. Just frequencies. 🙂

Light versus matter

If we accept the crazy ideas of quantum mechanics, then the what and the how become one and the same. Hence we can say that photons and electrons are a wavefunction somewhere in space. Photons, of course, are always traveling, because they have energy but no rest mass. Hence, all their energy is in the movement: it’s kinetic, not potential. Electrons, on the other hand, usually stick around some nucleus. And, let’s not forget, they have an electric charge, so their energy is not only kinetic but also potential.

But, otherwise, it’s the same type of ‘thing’ in quantum mechanics: a wavefunction, like those below.

Why diagram A and B? It’s just to emphasize the difference between a real-valued wave function and those ‘probability waves’ we’re looking at here (diagram C to H). A and B represent a mass on a spring, oscillating at more or less the same frequency but a different amplitude. The amplitude here means the displacement of the mass. The function describing the displacement of a mass on a spring (so that’s diagram A and B) is an example of a real-valued wave function: it’s a simple sine or cosine function, as depicted below. [Note that a sine and a cosine are the same function really, except for a phase difference of 90°.]

Let’s now go back to our ‘probability waves’. Photons and electrons, light and matter… The same wavefunction? Really? How can the sunlight that warms us up in the morning and makes trees grow be the same as our body, or the tree? The light-matter duality that we experience must be rooted in very different realities, isn’t it?

Well… Yes and no. If we’re looking at one photon or one electron only, it’s the same type of wavefunction indeed. The same type… OK, you’ll say. So they are the same family or genus perhaps, as they say in biology. Indeed, both of them are, obviously, being referred to as ‘elementary particles’ in the so-called Standard Model of physics. But so what makes an electron and a photon specific as a species? What are the differences?

There’re quite a few, obviously:

1. First, as mentioned above, a photon is a traveling wave function and, because it has no rest mass, it travels at the ultimate speed, i.e. the speed of light (c). An electron usually sticks around or, if it travels through a wire, it travels at very low speeds. Indeed, you may find it hard to believe, but the drift velocity of the free electrons in a standard copper wire is measured in cm per hour, so that’s very slow indeed—and while the electrons in an electron microscope beam may be accelerated up to 70% of the speed of light, and close to c in those huge accelerators, you’re not likely to find an electron microscope or accelerator in Nature. In fact, you may want to remember that a simple thing like electricity going through copper wires in our houses is a relatively modern invention. 🙂

So, yes, those oscillating wave functions in those diagrams above are likely to represent some electron, rather than a photon. To be precise, the wave functions above are examples of standing (or stationary) waves, while a photon is a traveling wave: just extend that sine and cosine function in both directions if you’d want to visualize it or, even better, think of a sine and cosine function in an envelope traveling through space, such as the one depicted below.

Indeed, while the wave function of our photon is traveling through space, it is likely to be limited in space because, when everything is said and done, our photon is not everywhere: it must be somewhere.

At this point, it’s good to pause and think about what is traveling through space. It’s the oscillation. But what’s the oscillation? There is no medium here, and even if there would be some medium (like water or air or something like aether—which, let me remind you, isn’t there!), the medium itself would not be moving, or – I should be precise here – it would only move up and down as the wave propagates through space, as illustrated below. To be fully complete, I should add we also have longitudinal waves, like sound waves (pressure waves): in that case, the particles oscillate back and forth along the direction of wave propagation. But you get the point: the medium does not travel with the wave.

When talking electromagnetic waves, we have no medium. These E and B vectors oscillate but is very wrong to assume they use ‘some core of nearby space’, as Feynman puts it. They don’t. Those field vectors represent a condition at one specific point (admittedly, a point along the direction of travel) in space but, for all we know, an electromagnetic wave travels in a straight line and, hence, we can’t talk about its diameter or so.

Still, as mentioned above, we can imagine, more or less, what E and B stand for (we can use field line to visualize them, for instance), even if we have to take into account their relativity (calculating their values from a moving reference frame results in different mixtures of E and B). But what are those amplitudes? How should we visualize them?

The honest answer is: we can’t. They are what they are: two mathematical quantities which, taken together, form a two-dimensional vector, which we square to find a value for a real-life probability, which is something that – unlike the amplitude concept – does make sense to us. Still, that representation of a photon above (i.e. the traveling envelope with a sine and cosine inside) may help us to ‘understand’ it somehow. Again, you absolute have to get rid of the idea that these ‘oscillations’ would somehow occupy some physical space. They don’t. The wave itself has some definite length, for sure, but that’s a measurement in the direction of travel, which is often denoted as x when discussing uncertainty in its position, for example—as in the famous Uncertainty Principle (ΔxΔp > h).

You’ll say: Oh!—but then, at the very least, we can talk about the ‘length’ of a photon, can’t we? So then a photon is one-dimensional at least, not zero-dimensional! The answer is yes and no. I’ve talked about this before and so I’ll be short(er) on it now. A photon is emitted by an atom when an electron jumps from one energy level to another. It thereby emits a wave train that lasts about 10^–8seconds. That’s not very long but, taking into account the rather spectacular speed of light (3×10⁸m/s), that still makes for a wave train with a length of not less than 3 meter. […] That’s quite a length, you’ll say. You’re right. But you forget that light travels at the speed of light and, hence, we will see this length as zero because of the relativistic length contraction effect. So… Well… Let me get back to the question: if photons and electrons are both represented by a wavefunction, what makes them different?

2. A more fundamental difference between photons and electrons is how they interact with each other.

From what I’ve written above, you understand that probability amplitudes are complex numbers, or ‘arrows’, or ‘two-dimensional vectors’. [Note that all of these terms have precise mathematical definitions and so they’re actually not the same, but the difference is too subtle to matter here.] Now, there are two ways of combining amplitudes, which are referred to as ‘positive’ and ‘negative’ interference respectively. I should immediately note that there’s actually nothing ‘positive’ or ‘negative’ about the interaction: we’re just putting two arrows together, and there are two ways to do that. That’s all.

The diagrams below show you these two ways. You’ll say: there are four! However, remember that we square an arrow to get a probability. Hence, the direction of the final arrow doesn’t matter when we’re taking the square: we get the same probability. It’s the direction of the individual amplitudes that matters when combining them. So the square of A+B is the same as the square of –(A+B) = –A+(–B) = –A–B. Likewise, the square of A–B is the same as the square of –(A–B) = –A+B.

These are the only two logical possibilities for combining arrows. I’ve written ad nauseam about this elsewhere: see my post on amplitudes and statistics, and so I won’t go into too much detail here. Or, in case you’d want something less than a full mathematical treatment, I can refer you to my previous post also, where I talked about the ‘stopwatch’ and the ‘phase’: the convention for the stopwatch is to have its hand turn clockwise (obviously!) while, in quantum physics, the phase of a wave function will turn counterclockwise. But so that’s just convention and it doesn’t matter, because it’s the phase difference between two amplitudes that counts. To use plain language: it’s the difference in the angles of the arrows, and so that difference is just the same if we reverse the direction of both arrows (which is equivalent to putting a minus sign in front of the final arrow).

OK. Let me get back to the lesson. The point is: this logical or mathematical dichotomy distinguishes bosons (i.e. force-carrying ‘particles’, like photons, which carry the electromagnetic force) from fermions (i.e. ‘matter-particles’, such as electrons and quarks, which make up protons and neutrons). Indeed, the so-called ‘positive’ and ‘negative’ interference leads to two very different behaviors:

The probability of getting a boson where there are already n present, is n+1 times stronger than it would be if there were none before.
In contrast, the probability of getting two electrons into exactly the same state is zero.

The behavior of photons makes lasers possible: we can pile zillions of photon on top of each other, and then release all of them in one powerful burst. [The ‘flickering’ of a laser beam is due to the quick succession of such light bursts. If you want to know how it works in detail, check my post on lasers.]

The behavior of electrons is referred to as Fermi’s exclusion principle: it is only because real-life electrons can have one of two spin polarizations (i.e. two opposite directions of angular momentum, which are referred to as ‘up’ or ‘down’, but they might as well have been referred to as ‘left’ or ‘right’) that we find two electrons (instead of just one) in any atomic or molecular orbital.

So, yes, while both photons and electrons can be described by a similar-looking wave function, their behavior is fundamentally different indeed. How is that possible? Adding and subtracting ‘arrows’ is a very similar operation, isn’it?

It is and it isn’t. From a mathematical point of view, I’d say: yes. From a physics point of view, it’s obviously not very ‘similar’, as it does lead to these two very different behaviors: the behavior of photons allows for laser shows, while the behavior of electrons explain (almost) all the peculiarities of the material world, including us walking into doors. 🙂 If you want to check it out for yourself, just check Feynman’s Lectures for more details on this or, else, re-read my posts on it indeed.

3. Of course, there are even more differences between photons and electrons than the two key differences I mentioned above. Indeed, I’ve simplified a lot when I wrote what I wrote above. The wavefunctions of electrons in orbit around a nucleus can take very weird shapes, as shown in the illustration below—and please do google a few others if you’re not convinced. As mentioned above, they’re so-called standing waves, because they occupy a well-defined position in space only, but standing waves can look very weird. In contrast, traveling plane waves, or envelope curves like the one above, are much simpler.

In short: yes, the mathematical representation of photons and electrons (i.e. the wavefunction) is very similar, but photons and electrons are very different animals indeed.

Potentiality and interconnectedness

I guess that, by now, you agree that quantum theory is weird but, as you know, quantum theory does explain all of the stuff that couldn’t be explained before: “It works like a charm”, as Feynman puts it. In fact, he’s often quoted as having said the following:

“It is often stated that of all the theories proposed in this century, the silliest is quantum theory. Some say the the only thing that quantum theory has going for it, in fact, is that it is unquestionably correct.”

Silly? Crazy? Uncommon-sensy? Truth be told, you do get used to thinking in terms of amplitudes after a while. And, when you get used to them, those ‘complex’ numbers are no longer complicated. 🙂 Most importantly, when one thinks long and hard enough about it (as I am trying to do), it somehow all starts making sense.

For example, we’ve done away with dualism by adopting a unified mathematical framework, but the distinction between bosons and fermions still stands: an ‘elementary particle’ is either this or that. There are no ‘split personalities’ here. So the dualism just pops up at a different level of description, I’d say. In fact, I’d go one step further and say it pops up at a deeper level of understanding.

But what about the other assumptions in quantum mechanics. Some of them don’t make sense, do they? Well… I struggle for quite a while with the assumption that, in quantum mechanics, anything is possible really. For example, a photon (or an electron) can take any path in space, and it can travel at any speed (including speeds that are lower or higher than light). The probability may be extremely low, but it’s possible.

Now that is a very weird assumption. Why? Well… Think about it. If you enjoy watching soccer, you’ll agree that flying objects (I am talking about the soccer ball here) can have amazing trajectories. Spin, lift, drag, whatever—the result is a weird trajectory, like the one below:

But, frankly, a photon taking the ‘southern’ route in the illustration below? What are the ‘wheels and gears’ there? There’s nothing sensible about that route, is there?

In fact, there’s at least three issues here:

First, you should note that strange curved paths in the real world (such as the trajectories of billiard or soccer balls) are possible only because there’s friction involved—between the felt of the pool table cloth and the ball, or between the balls, or, in the case of soccer, between the ball and the air. There’s no friction in the vacuum. Hence, in empty space, all things should go in a straight line only.
While it’s quite amazing what’s possible, in the real world that is, in terms of ‘weird trajectories’, even the weirdest trajectories of a billiard or soccer ball can be described by a ‘nice’ mathematical function. We obviously can’t say the same of that ‘southern route’ which a photon could follow, in theory that is. Indeed, you’ll agree the function describing that trajectory cannot be ‘nice’. So even we’d allow all kinds of ‘weird’ trajectories, shouldn’t we limit ourselves to ‘nice’ trajectories only? I mean: it doesn’t make sense to allow the photons traveling from your computer screen to your retina take some trajectory to the Sun and back, does it?
Finally, and most fundamentally perhaps, even when we would assume that there’s some mechanism combining (a) internal ‘wheels and gears’ (such as spin or angular momentum) with (b) felt or air or whatever medium to push against, what would be the mechanism determining the choice of the photon in regard to these various paths? In Feynman’s words: How does the photon ‘make up its mind’?

Feynman answers these questions, fully or partially (I’ll let you judge), when discussing the double-slit experiment with photons:

“Saying that a photon goes this or that way is false. I still catch myself saying, “Well, it goes either this way or that way,” but when I say that, I have to keep in mind that I mean in the sense of adding amplitudes: the photon has an amplitude to go one way, and an amplitude to go the other way. If the amplitudes oppose each other, the light won’t get there—even though both holes are open.”

It’s probably worth re-calling the results of that experiment here—if only to help you judge whether or not Feynman fully answer those questions above!

The set-up is shown below. We have a source S, two slits (A and B), and a detector D. The source sends photons out, one by one. In addition, we have two special detectors near the slits, which may or may not detect a photon, depending on whether or not they’re switched on as well as on their accuracy.

First, we close one of the slits, and we find that 1% of the photons goes through the other (so that’s one photon for every 100 photons that leave S). Now, we open both slits to study interference. You know the results already:

If we switch the detectors off (so we have no way of knowing where the photon went), we get interference. The interference pattern depends on the distance between A and B and varies from 0% to 4%, as shown in diagram (a) below. That’s pretty standard. As you know, classical theory can explain that too assuming light is an electromagnetic wave. But so we have blobs of energy – photons – traveling one by one. So it’s really that double-slit experiment with electrons, or whatever other microscopic particles (as you know, they’ve done these interference electrons with large molecules as well—and they get the same result!). We get the interference pattern by using those quantum-mechanical rules to calculate probabilities: we first add the amplitudes, and it’s only when we’re finished adding those amplitudes, that we square the resulting arrow to the final probability.
If we switch those special detectors on, and if they are 100% reliable (i.e. all photons going through are being detected), then our photon suddenly behaves like a particle, instead of as a wave: they will go through one of the slits only, i.e. either through A, or, alternatively, through B. So the two special detectors never go off together. Hence, as Feynman puts it: we shouldn’t think there is “sneaky way that the photon divides in two and then comes back together again.” It’s one or the other way and, and there’s no interference: the detector at D goes off 2% of the time, which is the simple sum of the probabilities for A and B (i.e. 1% + 1%).
When the special detectors near A and B are not 100% reliable (and, hence, do not detect all photons going through), we have three possible final conditions: (i) A and D go off, (ii) B and D go off, and (iii) D goes off alone (none of the special detectors went off). In that case, we have a final curve that’s a mixture, as shown in diagram (c) and (d) below. We get it using the same quantum-mechanical rules: we add amplitudes first, and then we square to get the probabilities.

Now, I think you’ll agree with me that Feynman doesn’t answer my (our) question in regard to the ‘weird paths’. In fact, all of the diagrams he uses assume straight or nearby paths. Let me re-insert two of those diagrams below, to show you what I mean.

So where are all the strange non-linear paths here? Let me, in order to make sure you get what I am saying here, insert that illustration with the three crazy routes once again. What we’ve got above (Figure 33 and 34) is not like that. Not at all: we’ve got only straight lines there! Why? The answer to that question is easy: the crazy paths don’t matter because their amplitudes cancel each other out, and so that allows Feynman to simplify the whole situation and show all the relevant paths as straight lines only.

Now, I struggled with that for quite a while. Not because I can’t see the math or the geometry involved. No. Feynman does a great job showing why those amplitudes cancel each other out indeed (if you want a summary, see my previous post once again). My ‘problem’ is something else. It’s hard to phrase it, but let me try: why would we even allow for the logical or mathematical possibility of ‘weird paths’ (and let me again insert that stupid diagram below) if our ‘set of rules’ ensures that the truly ‘weird’ paths (like that photon traveling from your computer screen to your eye doing a detour taking it to the Sun and back) cancel each other out anyway? Does that respect Occam’s Razor? Can’t we devise some theory including ‘sensible’ paths only?

Of course, I am just an autodidact with limited time, and I know hundreds (if not thousands) of the best scientists have thought long and hard about this question and, hence, I readily accept the answer is quite simply: no. There is no better theory. I accept that answer, ungrudgingly, not only because I think I am not so smart as those scientists but also because, as I pointed out above, one can’t explain any path that deviates from a straight line really, as there is no medium, so there are no ‘wheels and gears’. The only path that makes sense is the straight line, and that’s only because…

Well… Thinking about it… We think the straight path makes sense because we have no good theory for any of the other paths. Hmm… So, from a logical point of view, assuming that the straight line is the only reasonable path is actually pretty random too. When push comes to shove, we have no good theory for the straight line either!

You’ll say I’ve just gone crazy. […] Well… Perhaps you’re right. 🙂 But… Somehow, it starts to make sense to me. We allow for everything to, then, indeed weed out the crazy paths using our interference theory, and so we do end up with what we’re ending up with: some kind of vague idea of “light not really traveling in a straight line but ‘smelling’ all of the neighboring paths around it and, hence, using a small core of nearby space“—as Feynman puts it.

Hmm… It brings me back to Richard Feynman’s introduction to his wonderful little book, in which he says we should just be happy to know how Nature works and not aspire to know why it works that way. In fact, he’s basically saying that, when it comes to quantum mechanics, the ‘how’ and the ‘why’ are one and the same, so asking ‘why’ doesn’t make sense, because we know ‘how’. He compares quantum theory with the system of calculation used by the Maya priests, which was based on a system of bars and dots, which helped them to do complex multiplications and divisions, for example. He writes the following about it: “The rules were tricky, but they were a much more efficient way of getting an answer to complicated questions (such as when Venus would rise again) than by counting beans.”

When I first read this, I thought the comparison was flawed: if a common Maya Indian did not want to use the ‘tricky’ rules of multiplication and what have you (or, more likely, if he didn’t understand them), he or she could still resort to counting beans. But how do we count beans in quantum mechanics? We have no ‘simpler’ rules than those weird rules about adding amplitudes and taking the (absolute) square of complex numbers so… Well… We actually are counting beans here then:

We allow for any possibility—any path: straight, curved or crooked. Anything is possible.
But all those possibilities are inter-connected. Also note that every path has a mirror image: for every route ‘south’, there is a similar route ‘north’, so to say, except for the straight line, which is a mirror image of itself.
And then we have some clock ticking. Time goes by. It ensures that the paths that are too far removed from the straight line cancel each other. [Of course, you’ll ask: what is too far? But I answered that question – convincingly, I hope – in my previous post: it’s not about the ‘number of arrows’ (as suggested in the caption under that Figure 34 above), but about the frequency and, hence, the ‘wavelength’ of our photon.]
And so… Finally, what’s left is a limited number of possibilities that interfere with each other, which results in what we ‘see’: light seems to use a small core of space indeed–a limited number of nearby paths.

You’ll say… Well… That still doesn’t ‘explain’ why the interference pattern disappears with those special detectors or – what amounts to the same – why the special detectors at the slits never click simultaneously.

You’re right. How do we make sense of that? I don’t know. You should try to imagine what happens for yourself. Everyone has his or her own way of ‘conceptualizing’ stuff, I’d say, and you may well be content and just accept all of the above without trying to ‘imagine’ what’s happening really when a ‘photon’ goes through one or both of those slits. In fact, that’s the most sensible thing to do. You should not try to imagine what happens and just follow the crazy calculus rules.

However, when I think about it, I do have some image in my head. The image is of one of those ‘touch-me-not’ weeds. I quickly googled one of these images, but I couldn’t quite find what I am looking for: it would be more like something that, when you touch it, curls up in a little ball. Any case… You know what I mean, I hope.

You’ll shake your head now and solemnly confirm that I’ve gone mad. Touch-me-not weeds? What’s that got to do with photons?

Well… It’s obvious you and I cannot really imagine how a photon looks like. But I think of it as a blob of energy indeed, which is inseparable, and which effectively occupies some space (in three dimensions that is). I also think that, whatever it is, it actually does travel through both slits, because, as it interferes with itself, the interference pattern does depend on the space between the two slits as well as the width of those slits. In short, the whole ‘geometry’ of the situation matters, and so the ‘interaction’ is some kind of ‘spatial’ thing. [Sorry for my awfully imprecise language here.]

Having said that, I think it’s being detected by one detector only because only one of them can sort of ‘hook’ it, somehow. Indeed, because it’s interconnected and inseparable, it’s the whole blob that gets hooked, not just one part of it. [You may or may not imagine that the detectors that’s got the best hold of it gets it, but I think that’s pushing the description too much.] In any case, the point is that a photon is surely not like a lizard dropping its tail while trying to escape. Perhaps it’s some kind of unbreakable ‘string’ indeed – and sorry for summarizing string theory so unscientifically here – but then a string oscillating in dimensions we can’t imagine (or in some dimension we can’t observe, like the Kaluza-Klein theory suggests). It’s something, for sure, and something that stores energy in some kind of oscillation, I think.

What it is, exactly, we can’t imagine, and we’ll probably never find out—unless we accept that the how of quantum mechanics is not only the why, but also the what. 🙂

Does this make sense? Probably not but, if anything, I hope it fired your imagination at least. 🙂

Applied vector analysis (II)

Pre-script (dated 26 June 2020): This post has become less relevant (even irrelevant, perhaps) because my views on all things quantum-mechanical have evolved significantly as a result of my progression towards a more complete realist (classical) interpretation of quantum physics. In addition, some of the material was removed by a dark force (that also created problems with the layout, I see now). In any case, we recommend you read our recent papers. I keep blog posts like these mainly because I want to keep track of where I came from. I might review them one day, but I currently don’t have the time or energy for it. 🙂

Original post:

We’ve covered a lot of ground in the previous post, but we’re not quite there yet. We need to look at a few more things in order to gain some kind of ‘physical’ understanding’ of Maxwell’s equations, as opposed to a merely ‘mathematical’ understanding only. That will probably disappoint you. In fact, you probably wonder why one needs to know about Gauss’ and Stokes’ Theorems if the only objective is to ‘understand’ Maxwell’s equations.

To some extent, your skepticism is justified. It’s already quite something to get some feel for those two new operators we’ve introduced in the previous post, i.e. the divergence (div) and curl operators, denoted by ∇• and ∇× respectively. By now, you understand that these two operators act on a vector field, such as the electric field vector E, or the magnetic field vector B, or, in the example we used, the heat flow h, so we should write ∇•(a vector) and ∇×(a vector. And, as for that del operator – i.e. ∇ without the dot (•) or the cross (×) – if there’s one diagram you should be able to draw off the top of your head, it’s the one below, which shows:

The heat flow vector h, whose magnitude is the thermal energy that passes, per unit time and per unit area, through an infinitesimally small isothermal surface, so we write: h = |h| = ΔJ/ΔA.
The gradient vector ∇T, whose direction is opposite to that of h, and whose magnitude is proportional to h, so we can write the so-called differential equation of heat flow: h = –κ∇T.
The components of the vector dot product ΔT = ∇T•ΔR = |∇T|·ΔR·cosθ.

You should also remember that we can re-write that ΔT = ∇T•ΔR = |∇T|·ΔR·cosθ equation – which we can also write as ΔT/ΔR = |∇T|·cosθ – in a more general form:

Δψ/ΔR = |∇ψ|·cosθ

That equation says that the component of the gradient vector ∇ψ along a small displacement ΔR is equal to the rate of change of ψ in the direction of ΔR. And then we had three important theorems, but I can imagine you don’t want to hear about them anymore. So what can we do without them? Let’s have a look at Maxwell’s equations again and explore some linkages.

Curl-free and divergence-free fields

From what I wrote in my previous post, you should remember that:

The curl of a vector field C (i.e. ∇×C) represents its circulation, i.e. its (infinitesimal) rotation.
Its divergence (i.e. ∇•C) represents the outward flux out of an (infinitesimal) volume around the point we’re considering.

Back to Maxwell’s equations:

Let’s start at the bottom, i.e. with equation (4). It says that a changing electric field (i.e. ∂E/∂t ≠ 0) and/or a (steady) electric current (j/ε₀) will cause some circulation of B, i.e. the magnetic field. It’s important to note that (a) the electric field has to change and/or (b) that electric charges (positive or negative) have to move in order to cause some circulation of B: a steady electric field will not result in any magnetic effects.

This brings us to the first and easiest of all the circumstances we can analyze: the static case. In that case, the time derivatives ∂E/∂t and ∂B/∂t are zero, and Maxwell’s equations reduce to:

∇•E = ρ/ε₀. In this equation, we have ρ, which represents the so-called charge density, which describes the distribution of electric charges in space: ρ = ρ(x, y, z). To put it simply: ρ is the ‘amount of charge’ (which we’ll denote by Δq) per unit volume at a given point. Hence, if we consider a small volume (ΔV) located at point (x, y, z) in space – an infinitesimally small volume, in fact (as usual) –then we can write: Δq = ρ(x, y, z)ΔV. [As for ε₀, you already know this is a constant which ensures all units are ‘compatible’.] This equation basically says we have some flux of E, the exact amount of which is determined by the charge density ρ or, more in general, by the charge distribution in space.
∇×E = 0. That means that the curl of E is zero: everywhere, and always. So there’s no circulation of E. We call this a curl-free field.
∇•B = 0. That means that the divergence of B is zero: everywhere, and always. So there’s no flux of B. None. We call this a divergence-free field.
c²∇×B = j/ε₀. So here we have steady current(s) causing some circulation of B, the exact amount of which is determined by the (total) current j. [What about that c²factor? Well… We talked about that before: magnetism is, basically, a relativistic effect, and so that’s where that factor comes from. I’ll just refer you to what Feynman writes about this in his Lectures, and warmly recommend to read it, because it’s really quite interesting: it gave me at least a much deeper understanding of what it’s all about, and so I hope it will help you as much.]

Now you’ll say: why bother with all these difficult mathematical constructs if we’re going to consider curl-free and divergence-free fields only. Well… B is not curl-free, and E is not divergence-free. To be precise:

E is a field with zero curl and a given divergence, and
B is a field with zero divergence and a given curl.

Yeah, but why can’t we analyze fields that have both curl and divergence? The answer is: we can, and we will, but we have to start somewhere, and so we start with an easier analysis first.

Electrostatics and magnetostatics

The first thing you should note is that, in the static case (i.e. when charges and currents are static), there is no interdependence between E and B. The two fields are not interconnected, so to say. Therefore, we can neatly separate them into two pairs:

Electrostatics: (1) ∇•E = ρ/ε₀ and (2) ∇×E = 0.
Magnetostatics: (1) ∇×B = j/c²ε₀ and (2) ∇•B = 0.

Now, I won’t go through all of the particularities involved. In fact, I’ll refer you to a real physics textbook on that (like Feynman’s Lectures indeed). My aim here is to use these equations to introduce some more math and to gain a better understanding of vector calculus – an understanding that goes, in fact, beyond the math (i.e. a ‘physical’ understanding, as Feynman terms it).

At this point, I have to introduce two additional theorems. They are nice and easy to understand (although not so easy to prove, and so I won’t):

Theorem 1: If we have a vector field – let’s denote it by C – and we find that its curl is zero everywhere, then C must be the gradient of something. In other words, there must be some scalar field ψ (psi) such that C is equal to the gradient of ψ. It’s easier to write this down as follows:

If ∇×C = 0, there is a ψ such that C = ∇ψ.

Theorem 2: If we have a vector field – let’s denote it by D, just to introduce yet another letter – and we find that its divergence is zero everywhere, then D must be the curl of some vector field A. So we can write:

If ∇•D = 0, there is an A such that D = ∇×A.

We can apply this to the situation at hand:

For E, there is some scalar potential Φ such that E = –∇Φ. [Note that we could integrate the minus sign in Φ, but we leave it there as a reminder that the situation is similar to that of heat flow. It’s a matter of convention really: E ‘flows’ from higher to lower potential.]
For B, there is a so-called vector potential A such that B = ∇×A.

The whole game is then to compute Φ and A everywhere. We can then take the gradient of Φ, and the curl of A, to find the electric and magnetic field respectively, at every single point in space. In fact, most of Feynman’s second Volume of his Lectures is devoted to that, so I’ll refer you that if you’d be interested. As said, my goal here is just to introduce the basics of vector calculus, so you gain a better understanding of physics, i.e. an understanding which goes beyond the math.

Electrodynamics

We’re almost done. Electrodynamics is, of course, much more complicated than the static case, but I don’t have the intention to go too much in detail here. The important thing is to see the linkages in Maxwell’s equations. I’ve highlighted them below:

I know this looks messy, but it’s actually not so complicated. The interactions between the electric and magnetic field are governed by equation (2) and (4), so equation (1) and (3) is just ‘statics’. Something needs to trigger it all, of course. I assume it’s an electric current (that’s the arrow marked by [0]).

Indeed, equation (4), i.e. c²∇×B = ∂E/∂t + j/ε₀, implies that a changing electric current – an accelerating electric charge, for instance – will cause the circulation of B to change. More specifically, we can write: ∂[c²∇×B]/∂t = ∂[j/ε₀]∂t. However, as the circulation of B changes, the magnetic field B itself must be changing. Hence, we have a non-zero time derivative of B (∂B/∂t ≠ 0). But, then, according to equation (2), i.e. ∇×E = –∂B/∂t, we’ll have some circulation of E. That’s the dynamics marked by the red arrows [1].

Now, assuming that ∂B/∂t is not constant (because that electric charge accelerates and decelerates, for example), the time derivative ∂E/∂t will be non-zero too (∂E/∂t ≠ 0). But so that feeds back into equation (4), according to which a changing electric field will cause the circulation of B to change. That’s the dynamics marked by the yellow arrows [2].

The ‘feedback loop’ is closed now: I’ve just explained how an electromagnetic field (or radiation) actually propagates through space. Below you can see one of the fancier animations you can find on the Web. The blue oscillation is supposed to represent the oscillating magnetic vector, while the red oscillation is supposed to represent the electric field vector. Note how the effect travels through space.

This is, of course, an extremely simplified view. To be precise, it assumes that the light wave (that’s what an electromagnetic wave actually is) is linearly (aka as plane) polarized, as the electric (and magnetic field) oscillate on a straight line. If we choose the direction of propagation as the z-axis of our reference frame, the electric field vector will oscillate in the xy-plane. In other words, the electric field will have an x- and a y-component, which we’ll denote as E_x and E_xrespectively, as shown in the diagrams below, which give various examples of linear polarization.

Light is, of course, not necessarily plane-polarized. The animation below shows circular polarization, which is a special case of the more general elliptical polarization condition.

The relativity of magnetic and electric fields

Allow me to make a small digression here, which has more to do with physics than with vector analysis. You’ll have noticed that we didn’t talk about the magnetic field vector anymore when discussing the polarization of light. Indeed, when discussing electromagnetic radiation, most – if not all – textbooks start by noting we have E and B vectors, but then proceed to discuss the E vector only. Where’s the magnetic field? We need to note two things here.

1. First, I need to remind you of the force on any electrically charged particle (and note we only have electric charge: there’s no such thing as a magnetic charge according to Maxwell’s third equation) consists of two components. Indeed, the total electromagnetic force (aka Lorentz force) on a charge q is:

F = q(E + v×B) = qE + q(v×B) = F_E + F_M

The velocity vector v is the velocity of the charge: if the charge is not moving, then there’s no magnetic force. The illustration below shows you the components of the vector cross product that, by now, you’re fully familiar with. Indeed, in my previous post, I gave you the expressions for the x, y and z coordinate of a cross product, but there’s a geometrical definition as well:

v×B = |v||B|sin(θ)n

The magnetic force F_M is q(v×B) = qv×B = q|v||B|sin(θ)n. The unit vector n determines the direction of the force, which is determined by that right-hand rule that, by now, you also are fully familiar with: it’s perpendicular to both v and B (cf. the two 90° angles in the illustration). Just to make sure, I’ve also added the right-hand rule illustration above: check it out, as it does involve a bit of arm-twisting in this case. 🙂

In any case, the point to note here is that there’s only one electromagnetic force on the particle. While we distinguish between an E and a B vector, the E and B vector depend on our reference frame. Huh? Yes. The velocity v is relative: we specify the magnetic field in a so-called inertial frame of reference here. If we’d be moving with the charge, the magnetic force would, quite simply, disappear, because we’d have a v equal to zero, so we’d have v×B = 0×B= 0. Of course, all other charges (i.e. all ‘stationary’ and ‘moving’ charges that were causing the field in the first place) would have different velocities as well and, hence, our E and B vector would look very different too: they would come in a ‘different mixture’, as Feynman puts it. [If you’d want to know in what mixture exactly, I’ll refer you Feynman: it’s a rather lengthy analysis (five rather dense pages, in fact), but I can warmly recommend it: in fact, you should go through it if only to test your knowledge at this point, I think.]

You’ll say: So what? That doesn’t answer the question above. Why do physicists leave out the magnetic field vector in all those illustrations?

You’re right. I haven’t answered the question. This first remark is more like a warning. Let me quote Feynman on it:

“Since electric and magnetic fields appear in different mixtures if we change our frame of reference, we must be careful about how we look at the fields E and B. […] The fields are our way of describing what goes on at a point in space. In particular, E and B tell us about the forces that will act on a moving particle. The question “What is the force on a charge from a moving magnetic field?” doesn’t mean anything precise. The force is given by the values of E and B at the charge, and the F = q(E + v×B) formula is not to be altered if the source of E or B is moving: it is the values of E and B that will be altered by the motion. Our mathematical description deals only with the fields as a function of x, y, z, and t with respect to some inertial frame.”

If you allow me, I’ll take this opportunity to insert another warning, one that’s quite specific to how we should interpret this concept of an electromagnetic wave. When we say that an electromagnetic wave ‘travels’ through space, we often tend to think of a wave traveling on a string: we’re smart enough to understand that what is traveling is not the string itself (or some part of the string) but the amplitude of the oscillation: it’s the vertical displacement (i.e. the movement that’s perpendicular to the direction of ‘travel’) that appears first at one place and then at the next and so on and so on. It’s in that sense, and in that sense only, that the wave ‘travels’. However, the problem with this comparison to a wave traveling on a string is that we tend to think that an electromagnetic wave also occupies some space in the directions that are perpendicular to the direction of travel (i.e. the x and y directions in those illustrations on polarization). Now that’s a huge misconception! The electromagnetic field is something physical, for sure, but the E and B vectors do not occupy any physical space in the x and y direction as they ‘travel’ along the z direction!

Let me conclude this digression with Feynman’s conclusion on all of this:

“If we choose another coordinate system, we find another mixture of E and B fields. However, electric and magnetic forces are part of one physical phenomenon—the electromagnetic interactions of particles. While the separation of this interaction into electric and magnetic parts depends very much on the reference frame chosen for the description, the complete electromagnetic description is invariant: electricity and magnetism taken together are consistent with Einstein’s relativity.”

2. You’ll say: I don’t give a damn about other reference frames. Answer the question. Why are magnetic fields left out of the analysis when discussing electromagnetic radiation?

The answer to that question is very mundane. When we know E (in one or the other reference frame), we also know B, and, while B is as ‘essential’ as E when analyzing how an electromagnetic wave propagates through space, the truth is that the magnitude of B is only a very tiny fraction of that of E.

Huh? Yes. That animation with these oscillating blue and red vectors is very misleading in this regard. Let me be precise here and give you the formulas:

I’ve analyzed these formulas in one of my other posts (see, for example, my first post on light and radiation), and so I won’t repeat myself too much here. However, let me recall the basics of it all. The e_R′ vector is a unit vector pointing in the apparent direction of the charge. When I say ‘apparent’, I mean that this unit vector is not pointing towards the present position of the charge, but at where is was a little while ago, because this ‘signal’ can only travel from the charge to where we are now at the same speed of the wave, i.e. at the speed of light c. That’s why we prime the (radial) vector R also (so we write R′ instead of R). So that unit vector wiggles up and down and, as the formula makes clear, it’s the second-order derivative of that movement which determines the electric field. That second-order derivative is the acceleration vector, and it can be substituted for the vertical component of the acceleration of the charge that caused the radiation in the first place but, again, I’ll refer you my post on that, as it’s not the topic we want to cover here.

What we do want to look at here, is that formula for B: it’s the cross product of that e_R′ vector (the minus sign just reverses the direction of the whole thing) and E divided by c. We also know that the E and e_R′ vectors are at right angles to each, so the sine factor (sinθ) is 1 (or –1) too. In other words, the magnitude of B is |E|/c = E/c, which is a very tiny fraction of E indeed (remember: c ≈ 3×10⁸).

So… Yes, for all practical purposes, B doesn’t matter all that much when analyzing electromagnetic radiation, and so that’s why physicists will note it but then proceed and look at E only when discussing radiation. Poor B! That being said, the magnetic force may be tiny, but it’s quite interesting. Just look at its direction! Huh? Why? What’s so interesting about it? I am not talking the direction of B here: I am talking the direction of the force. Oh… OK… Hmm… Well…

Let me spell it out. Take the force formula: F = q(E + v×B) = qE + q(v×B). When our electromagnetic wave hits something real (I mean anything real, like a wall, or some molecule of gas), it is likely to hit some electron, i.e. an actual electric charge. Hence, the electric and magnetic field should have some impact on it. Now, as we pointed here, the magnitude of the electric force will be the most important one – by far – and, hence, it’s the electric field that will ‘drive’ that charge and, in the process, give it some velocity v, as shown below. In what direction? Don’t ask stupid questions: look at the equation. F_E = qE, so the electric force will have the same direction as E.

But we’ve got a moving charge now and, therefore, the magnetic force comes into play as well! That force is F_M = q(v×B) and its direction is given by the right-hand rule: it’s the F above in the direction of the light beam itself. Admittedly, it’s a tiny force, as its magnitude is F = qvE/c only, but it’s there, and it’s what causes the so-called radiation pressure (or light pressure tout court). So, yes, you can start dreaming of fancy solar sailing ships (the illustration below shows one out of of Star Trek) but… Well… Good luck with it! The force is very tiny indeed and, of course, don’t forget there’s light coming from all directions in space!

Jokes aside, it’s a real and interesting effect indeed, but I won’t say much more about it. Just note that we are really talking the momentum of light here, and it’s a ‘real’ as any momentum. In an interesting analysis, Feynman calculates this momentum and, rather unsurprisingly (but please do check out how he calculates these things, as it’s quite interesting), the same 1/c factor comes into play once: the momentum (p) that’s being delivered when light hits something real is equal to 1/c of the energy that’s being absorbed. So, if we denote the energy by W (in order to not create confusion with the E symbol we’ve used already), we can write: p = W/c.

Now I can’t resist one more digression. We’re, obviously, fully in classical physics here and, hence, we shouldn’t mention anything quantum-mechanical here. That being said, you already know that, in quantum physics, we’ll look at light as a stream of photons, i.e. ‘light particles’ that also have energy and momentum. The formula for the energy of a photon is given by the Planck relation: E = hf. The h factor is Planck’s constant here – also quite tiny, as you know – and f is the light frequency of course. Oh – and I am switching back to the symbol E to denote energy, as it’s clear from the context I am no longer talking about the electric field here.

Now, you may or may not remember that relativity theory yields the following relations between momentum and energy:

E² – p²c² = m₀c⁴and/or pc = Ev/c

In this equations, m₀stands, obviously, for the rest mass of the particle, i.e. its mass at v = 0. Now, photons have zero rest mass, but their speed is c. Hence, both equations reduce to p = E/c, so that’s the same as what Feynman found out above: p = W/c.

Of course, you’ll say: that’s obvious. Well… No, it’s not obvious at all. We do find the same formula for the momentum of light (p) – which is great, of course – but so we find the same thing coming from very different necks parts of the woods. The formula for the (relativistic) momentum and energy of particles comes from a very classical analysis of particles – ‘real-life’ objects with mass, a very definite position in space and whatever other properties you’d associate with billiard balls – while that other p = W/c formula comes out of a very long and tedious analysis of light as an electromagnetic wave. The two analytical frameworks couldn’t differ much more, could they? Yet, we come to the same conclusion indeed.

Physics is wonderful. 🙂

So what’s left?

Lots, of course! For starters, it would be nice to show how these formulas for E and B with e_R′in them can be derived from Maxwell’s equations. There’s no obvious relation, is there? You’re right. Yet, they do come out of the very same equations. However, for the details, I have to refer you to Feynman’s Lectures once again – to the second Volume to be precise. Indeed, besides calculating scalar and vector potentials in various situations, a lot of what he writes there is about how to calculate these wave equations from Maxwell’s equations. But so that’s not the topic of this post really. It’s, quite simply, impossible to ‘summarize’ all those arguments and derivations in a single post. The objective here was to give you some idea of what vector analysis really is in physics, and I hope you got the gist of it, because that’s what needed to proceed. 🙂

The other thing I left out is much more relevant to vector calculus. It’s about that del operator (∇) again: you should note that it can be used in many more combinations. More in particular, it can be used in combinations involving second-order derivatives. Indeed, till now, we’ve limited ourselves to first-order derivatives only. I’ll spare you the details and just copy a table with some key results:

∇•(∇T) = div(grad T) = ∇•∇T = (∇•∇)T = ∇²T = ∂²T/∂x²+ ∂²T/∂y²+ ∂²T/∂z²= a scalar field
(∇•∇)h = ∇²h = a vector field
∇(∇•h) = grad(div h) = a vector field
∇×(∇×h) = curl(curl h) =∇(∇•h) – ∇²h
∇•(∇×h) = div(curl h) = 0 (always)
∇×(∇T) = curl(grad T) = 0 (always)

So we have yet another set of operators here: not less than six, to be precise. You may think that we can have some more, like (∇×∇), for example. But… No. A (∇×∇) operator doesn’t make sense. Just write it out and think about it. Perhaps you’ll see why. You can try to invent some more but, if you manage, you’ll see they won’t make sense either. The combinations that do make sense are listed above, all of them.

Now, while of these combinations make (some) sense, it’s obvious that some of these combinations are more useful than others. More in particular, the first operator, ∇², appears very often in physics and, hence, has a special name: it’s the Laplacian. As you can see, it’s the divergence of the gradient of a function.

Note that the Laplace operator (∇²) can be applied to both scalar as well as vector functions. If we operate with it on a vector, we’ll apply it to each component of the vector function. The Wikipedia article on the Laplace operator shows how and where it’s used in physics, and so I’ll refer to that if you’d want to know more. Below, I’ll just write out the operator itself, as well as how we apply it to a vector:

So that covers (1) and (2) above. What about the other ‘operators’?

Let me start at the bottom. Equations (5) and (6) are just what they are: two results that you can use in some mathematical argument or derivation. Equation (4) is… Well… Similar: it’s an identity that may or may not help one when doing some derivation.

What about (3), i.e. the gradient of the divergence of some vector function? Nothing special. As Feynman puts it: “It is a possible vector field, but there is nothing special to say about it. It’s just some vector field which may occasionally come up.”

So… That should conclude my little introduction to vector analysis, and so I’ll call it a day now. 🙂 I hope you enjoyed it.

Applied vector analysis (I)

Pre-script (dated 26 June 2020): This post has become less relevant (even irrelevant, perhaps) because my views on all things quantum-mechanical have evolved significantly as a result of my progression towards a more complete realist (classical) interpretation of quantum physics. In addition, some of the material was removed by a dark force (that also created problems with the layout, I see now). In any case, we recommend you read our recent papers. I keep blog posts like these mainly because I want to keep track of where I came from. I might review them one day, but I currently don’t have the time or energy for it. 🙂

Original post:

The relationship between math and physics is deep. When studying physics, one sometimes feels physics and math become one and the same. But they are not. In fact, eminent physicists such as Richard Feynman warn against emphasizing the math side of physics too much: “It is not because you understand the Maxwell equations mathematically inside out, that you understand physics inside out.”

We should never lose sight of the fact that all these equations and mathematical constructs represent physical realities. So the math is nothing but the ‘language’ in which we express physical reality and, as Feynman puts it, one (also) needs to develop a ‘physical’ – as opposed to a ‘mathematical’ – understanding of the equations. Now you’ll ask: what’s a ‘physical’ understanding? Well… Let me quote Feynman once again on that: “A physical understanding is a completely unmathematical, imprecise, and inexact thing, but absolutely necessary for a physicist.”

It’s rather surprising to hear that from him: this is a rather philosophical statement, indeed, and Feynman doesn’t like philosophy (see, for example, what he writes on the philosophical implications of the Uncertainty Principle). Indeed, while most physicists – or scientists in general, I’d say – will admit there is some value in a philosophy of science (that’s the branch of philosophy concerned with the foundations and methods of science), they will usually smile derisively when hearing someone talk about metaphysics. However, if metaphysics is the branch of philosophy that deals with ‘first principles’, then it’s obvious that the Standard Model (SM) in physics is, in fact, also some kind of ‘metaphysical’ model! Indeed, what everything is said and done, physicists assume those complex-valued wave functions are, somehow, ‘real’, but all they can ‘see’ (i.e. measure or verify by experiment) are (real-valued) probabilities: we can’t ‘see’ the probability amplitudes.

The only reason why we accept the SM theory is because its predictions agree so well with experiment. Very well indeed. The agreement between theory and experiment is most perfect in the so-called electromagnetic sector of the SM, but the results for the weak force (which I referred to as the ‘weird force’ in some of my posts) are very good too. For example, using CERN data, researchers could finally, recently, observe an extremely rare decay mode which, once again, confirms that the Standard Model, as complicated as it is, is the best we’ve got: just click on the link if you want to hear more about it. [And please do: stuff like this is quite readable and, hence, interesting.]

As this blog makes abundantly clear, it’s not easy to ‘summarize’ the Standard Model in a couple of sentences or in one simple diagram. In fact, I’d say that’s impossible. If there’s one or two diagrams sort of ‘covering’ it all, then it’s the two diagrams that you’ve seen ad nauseam already: (a) the overview of the three generations of matter, with the gauge bosons for the electromagnetic, strong and weak force respectively, as well as the Higgs boson, next to it, and (b) the overview of the various interactions between them. [And, yes, these two diagrams come from Wikipedia.]

I’ve said it before: the complexity of the Standard Model (it has not less than 61 ‘elementary’ particles taking into account that quarks and gluons come in various ‘colors’, and also including all antiparticles – which we have to include them in out count because they are just as ‘real’ as the particles), and the ‘weirdness’ of the weak force, plus a astonishing range of other ‘particularities’ (these ‘quantum numbers’ or ‘charges’ are really not easy to ‘understand’), do not make for a aesthetically pleasing theory but, let me repeat it again, it’s the best we’ve got. Hence, we may not ‘like’ it but, as Feynman puts it: “Whether we like or don’t like a theory is not the essential question. It is whether or not the theory gives predictions that agree with experiment.” (Feynman, QED – The Strange Theory of Light and Matter, p. 10)

It would be foolish to try to reduce the complexity of the Standard Model to a couple of sentences. That being said, when digging into the subject-matter of quantum mechanics over the past year, I actually got the feeling that, when everything is said and done, modern physics has quite a lot in common with Pythagoras’ ‘simple’ belief that mathematical concepts – and numbers in particular – might have greater ‘actuality’ than the reality they are supposed to describe. To put it crudely, the only ‘update’ to the Pythagorean model that’s needed is to replace Pythagoras’ numerological ideas by the equipartition theorem and quantum-mechanical wave functions, describing probability amplitudes that are represented by complex numbers. Indeed, complex numbers are numbers too, and Pythagoras would have reveled in their beauty. In fact, I can’t help thinking that, if he could have imagined them, he would surely have created a ‘religion’ around Euler’s formula, rather than around the tetrad. 🙂

In any case… Let’s leave the jokes and the silly comparisons aside, as that’s not what I want to write about in this post (if you want to read more about this, I’ll refer you another blog of mine). In this post, I want to present the basics of vector calculus, an understanding of which is absolutely essential in order to gain both a mathematical as well as a ‘physical’ understanding of what fields really are. So that’s classical mechanics once again. However, as I found out, one can’t study quantum mechanics without going through the required prerequisites. So let’s go for it.

Vectors in math and physics

What’s a vector? It may surprise you, but the term ‘vector’, in physics and in math, refers to more than a dozen different concepts, and that’s a major source of confusion for people like us–autodidacts. The term ‘vector’ refers to many different things indeed. The most common definitions are:

The term ‘vector’ often refers to a (one-dimensional) array of numbers. In that case, a vector is, quite simply, an element of Rⁿ, while the array will be referred to as an n-tuple. This definition can be generalized to also include arrays of alphanumerical values, or blob files, or any type of object really, but that’s a definition that’s more relevant for other sciences – most notably computer science. In math and physics, we usually limit ourselves to arrays of numbers. However, you should note that a ‘number’ may also be a complex number, and so we have real as well as complex vector spaces. The most straightforward example of a complex vector space is the set of complex numbers itself: C. In that case, the n-tuple is a ‘1-tuple’, aka as a singleton, but the element in it (i.e. a complex number) will have ‘two dimensions’, so to speak. [Just like the term ‘vector’, the term ‘dimension’ has various definitions in math and physics too, and so it may be quite confusing.] However, we can also have 2-tuples, 3-tuples or, more in general, n-tuples of complex numbers. In that case, the vector space is denoted by Cⁿ. I’ve written about vector spaces before and so I won’t say too much about this.
A vector can also be a point vector. In that case, it represents the position of a point in physical space – in one, two or three dimensions – in relation to some arbitrarily chosen origin (i.e. the zero point). As such, we’ll usually write it as x (in one dimension) or, in three dimensions, as (x, y, z). More generally, a point vector is often denoted by the bold-face symbol R. This definition is obviously ‘related’ to the definition above, but it’s not the same: we’re talking physical space here indeed, not some ‘mathematical’ space. Physical space can be curved, as you obviously know when you’re reading this blog, and I also wrote about that in the above-mentioned post, so you can re-visit that topic too if you want. Here, I should just mention one point which may or may not confuse you: while (two-dimensional) point vectors and complex numbers have a lot in common, they are not the same, and it’s important to understand both the similarities as well as the differences between both. For example, multiplying two vectors and multiplying two complex numbers is definitely not the same. I’ll come back to this.
A vector can also be a displacement vector: in that case, it will specify the change in position of a point relative to its previous position. Again, such displacement vectors may be one-, two-, or three-dimensional, depending on the space we’re envisaging, which may be one-dimensional (a simple line), two-dimensional (i.e. the plane), three-dimensional (i.e. three-dimensional space), or four-dimensional (i.e. space-time). A displacement vector is often denoted by s or ΔR, with the delta indicating we’re talking a a distance or a difference indeed: s = ΔR = R₂ – R₁ = (x₂ – x₁, y₂ – y₁, z₂ – z₁). That’s kids’ stuff, isn’t it?
A vector may also refer to a so-called four-vector: a four-vector obeys very specific transformation rules, referred to as the Lorentz transformation. In this regard, you’ve surely heard of space-time vectors, referred to as events, and noted as X = (ct, r), with r the spatial vector r = (x, y, z) and c the speed of light (which, in this case, is nothing but a proportionality constant ensuring that space and time are measured in compatible units). So we can also write X as X = (ct, x, y, z). However, there is another four-vector which you’ve probably also seen already (see, for example, my post on (special) Relativity Theory): P = (E/c, p), which relates energy and momentum in spacetime. Of course, spacetime can also be curved. In fact, Einstein’s (general) Relativity Theory is about the curvature of spacetime, not of ordinary space. But I should not write more about this here, as it’s about time I get back to the main story line of this post.
Finally, we also have vector operators, like the gradient vector ∇. Now that is what I want to write about in this post. Vector operators are also considered to be ‘vectors’ – to some extent, at least: we use them in a ‘vector products’, for example, as I will show below – but, because they are operators and, as such, “hungry for something to operate on”, they are obviously quite different from any of the ‘vectors’ I defined in point (1) to (4) above. [Feynman attributes this ‘hungry for something to operate on’ expression to the British mathematician Sir James Hopwood Jeans, who’s best known from the infamous Rayleigh-Jeans law, whose inconsistency with observations is known as the ultraviolet catastrophe or ‘black-body radiation problem’. But that’s a fairly useless digression so let me got in with it.]

In a text on physics, the term ‘vector’ may refer to any of the above but it’s often the second and third definition (point and/or displacement vectors) that will be implicit. As mentioned above, I want to write about the fifth ‘type’ of vector: vector operators. Now, the title of this post is ‘vector calculus’, and so you’ll immediately wonder why I say these vector operators may or may not be defined as vectors. Moreover, the fact of the matter is that these operators operate on yet another type of ‘vector’ – so that’s a sixth definition I need to introduce here: field vectors.

Now, funnily enough, the term ‘field vector’, while being the most obvious description of what it is, is actually not widely used: what I call a ‘field vector’ is often referred to as a gradient vector, and the vectors E and B are usually referred to as the electric or magnetic field, tout court. Indeed, if you google the terms ‘electromagnetic vector’ (or electric or magnetic vector), you will usually be redirected. However, when everything is said and done, E and B are vectors: they have a magnitude, and they have a direction. To be even more precise, while they depend on both space and time – so we can write E as E = E(x, y, z, t) and we have four independent variables here – they have three components: one of each direction in space, so we can write E as:

E = E(x, y, z, t) = [E_x, E_y, E_z] = [E_x(x, y, z, t), E_y(x, y, z, t), E_z(x, y, z, t)]

So, truth be told, vector calculus (aka vector analysis) in physics is about (vector) fields and (vector) operators,. While the ‘scene’ for these fields and operators is, obviously, physical space (or spacetime) and, hence, a vector space, it’s good to be clear on terminology and remind oneself that, in physics, vector calculus is not about mathematical vectors: it’s about real stuff. That’s why Feynman prefers a much longer term than vector calculus or vector analysis: he calls it differential calculus of vector fields which, indeed, is what it is – but I am sure you would not have bothered starting reading this post if I would have used that term too. 🙂

Now, this has probably become the longest introduction ever to a blog post, and so I had better get on with it. 🙂

Vector fields and scalar fields

Let’s dive straight into it. Vector fields like E and B behave like h, which is the symbol used in a number of textbooks for the heat flow in some body or block of material: E, B and h are all vector fields derived from a scalar field.

Huh? Scalar field? Aren’t we talking vectors? We are. If I say we can derive the vector field h (i.e. the heat flow) from a scalar field, I am basically saying that the relationship between h and the temperature T (i.e. the scalar field) is direct and very straightforward. Likewise, the relationship between E and the scalar field Φ is also direct and very straightforward.

[To be fully precise and complete, I should qualify the latter statement: it’s only true in electrostatics, i.e. when we’re considering charges that don’t move. When we have moving charges, magnetic effects come into play, and then we have a more complicated relationship between (i) two scalar fields, namely A (the magnetic potential – i.e. the ‘magnetic scalar field’) and Φ (the electrostatic potential, or ‘electric scalar field’), and (ii) two vector fields, namely B and E. The relationships between the two are then a bit more complicated than the relationship between T and h. However, the math involved is the same. In fact, the complication arises from the fact that magnetism is actually a relativistic effect. However, at this stage, this statement will only confuse you, and so I will write more about that in my next post.]

Let’s look at h and T. As you know, the temperature is a measure for energy. In a block of material, the temperature T will be a scalar: some real number that we can measure in Kelvin, Fahrenheit or Celsius but – whatever unit we use – any observer using the same unit will measure the same at any given point. That’s what distinguishes a ‘scalar’ quantity from ‘real numbers’ in general: a scalar field is something real. It represents something physical. A real number is just… Well… A real number, i.e. a mathematical concept only.

The same is true for a vector field: it is something real. As Feynman puts it: “It is not true that any three numbers form a vector [in physics]. It is true only if, when we rotate the coordinate system, the components of the vector transform among themselves in the correct way.” What’s the ‘correct way’? It’s a way that ensures that any observer using the same unit will measure the same at any given point.

How does it work?

In physics, we associate a point in space with physical realities, such as:

Temperature, the ‘height‘ of a body in a gravitational field, or the pressure distribution in a gas or a fluid, are all examples of scalar fields: they are just (real) numbers from a math point of view but, because they do represent a physical reality, these ‘numbers’ respect certain mathematical conditions: in practice, they will be a continuous or continuously differentiable function of position.
Heat flow (h), the velocity (v) of the molecules/atoms in a rotating object, or the electric field (E), are examples of vector fields. As mentioned above, the same condition applies: any observer using the same unit should measure the same at any given point.
Tensors, which represent, for example, stress or strain at some point in space (in various directions), or the curvature of space (or spacetime, to be fully correct) in the general theory of relativity.
Finally, there are also spinors, which are often defined as a “generalization of tensors using complex numbers instead of real numbers.” They are very relevant in quantum mechanics, it is said, but I don’t know enough about them to write about them, and so I won’t.

How do we derive a vector field, like h, from a scalar field (i.e. T in this case)? The two illustrations below (taken from Feynman’s Lectures) illustrate the ‘mechanics’ behind it: heat flows, obviously, from the hotter to the colder places. At this point, we need some definitions. Let’s start with the definition of the heat flow: the (magnitude of the) heat flow (h) is the amount of thermal energy (ΔJ) that passes, per unit time and per unit area, through an infinitesimal surface element at right angles to the direction of flow.

A vector has both a magnitude and a direction, as defined above, and, hence, if we define e_f as the unit vector in the direction of flow, we can write:

h = h·e_f = (ΔJ/Δa)·e_f

ΔJ stands for the thermal energy flowing through an area marked as Δa in the diagram above per unit time. So, if we incorporate the idea that the aspect of time is already taken care of, we can simplify the definition above, and just say that the heat flow is the flow of thermal energy per unit area. Simple trigonometry will then yield an equally simple formula for the heat flow through any surface Δa₂ (i.e. any surface that is not at right angles to the heat flow h):

ΔJ/Δa₂ = (ΔJ/Δa₁)cosθ = h·n

When I say ‘simple’, I must add that all is relative, of course, Frankly, I myself did not immediately understand why the heat flow through the Δa₁ and Δa₂ areas below must be the same. That’s why I added the blue square in the illustration above (which I took from Feynman’s Lectures): it’s the same area as Δa₁, but it shows more clearly – I hope! – why the heat flow through the two areas is the same indeed, especially in light of the fact that we are looking at infinitesimally small areas here (so we’re taking a limit here).

As for the cosine factor in the formula above, you should note that, in that ΔJ/Δa₂ = (ΔJ/Δa₁)cosθ = h·n equation, we have a dot product (aka as a scalar product) of two vectors: (1) h, the heat flow and (2) n, the unit vector that is normal (orthogonal) to the surface Δa₂. So let me remind you of the definition of the scalar (or dot) product of two vectors. It yields a (real) number:

A·B = |A||B|cosθ, with θ the angle between A and B

In this case, h·n = |h||n|cosθ = |h|·1·cosθ = |h|cosθ = h·cosθ. What we are saying here is that we get the component of the heat flow that’s perpendicular (or normal, as physicists and mathematicians seem to prefer to say) to the surface Δa₂ by taking the dot product of the heat flow h and the unit normal n. We’ll use this formula later, and so it’s good to take note of it here.

OK. Let’s get back to the lesson. The only thing that we need to do to prove that ΔJ/Δa₂ = (ΔJ/Δa₁)cosθ formula is show that Δa₂ = Δa₁/cosθ or, what amounts to the same, that Δa₁ = Δa₂cosθ. Now that is something you should be able to figure out yourself: it’s quite easy to show that the angle between h and n is equal to the angle between the surfaces Δa₁ and Δa₂. The rest is just plain triangle trigonometry.

For example, when the surfaces coincide, the angle θ will be zero and then h·n is just equal to |h|cosθ = |h| = h·1 = h = ΔJ/Δa₁. The other extreme is that orthogonal surfaces: in that case, the angle θ will be 90° and, hence, h·n = |h||n|cos(π/2) = |h|·1·0 = 0: there is no heat flow normal to the direction of heat flow.

OK. That’s clear enough. The point to note is that the vectors h and n represent physical entities and, therefore, they do not depend on our reference frame (except for the units we use to measure things). That allows us to define vector equations.

The ∇ (del) operator and the gradient

Let’s continue our example of temperature and heat flow. In a block of material, the temperature (T) will vary in the x, y and z direction and, hence, the partial derivatives ∂T/∂x, ∂T/∂y and ∂T/∂z make sense: they measure how the temperature varies with respect to position. Now, the remarkable thing is that the 3-tuple (∂T/∂x, ∂T/∂y, ∂T/∂z) is a physical vector itself: it is independent, indeed, of the reference frame (provided we measure stuff in the same unit) – so we can do a translation and/or a rotation of the coordinate axes and we get the same value. This means this set of three numbers is a vector indeed:

(∂T/∂x, ∂T/∂y, ∂T/∂z) = a vector

If you like to see a formal proof of this, I’ll refer you to Feynman once again – but I think the intuitive argument will do: if temperature and space are real, then the derivatives of temperature in regard to the x-, y- and z-directions should be equally real, isn’t it? Let’s go for the more intricate stuff now.

If we go from one point to another, in the x-, y- or z-direction, then we can define some (infinitesimally small) displacement vector ΔR = (Δx, Δy, Δz), and the difference in temperature between two nearby points (ΔT) will tend to the (total) differential of T – which we denote by ΔT – as the two point get closer and closer. Hence, we write:

ΔT = (∂T/∂x)Δx + (∂T/∂y)Δy + (∂T/∂z)Δz

The two equations above combine to yield:

ΔT = (∂T/∂x, ∂T/∂y, ∂T/∂z)(Δx, Δy, Δz) = ∇T·ΔR

In this equation, we used the ∇ (del) operator, i.e. the vector differential operator. It’s an operator like the differential operator ∂/∂x (i.e. the derivative) but, unlike the derivative, it returns not one but three values, i.e. a vector, which is usually referred to as the gradient, i.e. ∇T in this case. More in general, we can write ∇f(x, y, z), ∇ψ or ∇ followed by whatever symbol for the function we’re looking at.

In other words, the ∇ operator acts on a scalar-valued function (T), aka a scalar field, and yields a vector:

∇T = (∂T/∂x, ∂T/∂y, ∂T/∂z)

That’s why we write ∇ in bold-type too, just like the vector R. Indeed, using bold-type (instead of an arrow or so) is a convenient way to mark a vector, and the difference (in print) between ∇ and ∇ is subtle, but it’s there – and for a good reason as you can see!

[To be precise, I should add that we do not write all of the operators that return three components in bold-type. The most simple example is the common derivative ∂E/∂t = [∂E_x/∂t, ∂E_y/∂t, ∂E_z/∂t]. We have a lot of other possible combinations. Some make sense, and some don’t, like ∂h/∂y = [∂h_x/∂y, ∂h_y/∂y, ∂h_z/∂y], for example.]

If ∇T is a vector, what’s its direction? Think about it. […] The rate of change of T in the x-, y- and z-direction are the x-, y- and z-component of our ∇T vector respectively. In fact, the rate of change of T in any direction will be the component of the ∇T vector in that direction. Now, the magnitude of a vector component will always be smaller than the magnitude of the vector itself, except if it’s the component in the same direction as the vector, in which case the component is the vector. [If you have difficulty understanding this, read what I write once again, but very slowly and attentively.] Therefore, the direction of ∇T will be the direction in which the (instantaneous) rate of change of T is largest. In Feynman’s words: “The gradient of T has the direction of the steepest uphill slope in T.” Now, it should be quite obvious what direction that really is: it is the opposite direction of the heat flow h.

That’s all you need to know to understand our first real vector equation:

h = –κ∇T

Indeed, you don’t need too much math to understand this equation in the way we want to understand it, and that’s in some kind of ‘physical‘ way (as opposed to just the math side of it). Let me spell it out:

The direction of heat flow is opposite to the direction of the gradient vector ∇T. Hence, heat flows from higher to lower temperature (i.e. ‘downhill’), as we would expect, of course!). So that’s the minus sign.
The magnitude of h is proportional to the magnitude of the gradient ∇T, with the constant of proportionality equal to κ (kappa), which is called the thermal conductivity. Now, in case you wonder what this means (again: do go beyond the math, please!), remember that the heat flow is the flow of thermal energy per unit area (and per unit time, of course): |h| = h = ΔJ/Δa.

But… Yes? Why would it be proportional? Why don’t we have some exponential relation or something? Good question, but the answer is simple, and it’s rooted in physical reality – of course! The heat flow between two places – let’s call them 1 and 2 – is proportional to the temperature difference between those two places, so we have: ΔJ ∼ T₂ – T₁. In fact, that’s where the factor of proportionality comes in. If we imagine a very small slab of material (infinitesimally small, in fact) with two faces, parallel to the isothermals, with a surface area ΔA and a tiny distance Δs between them, we can write:

ΔJ = κ(T₂ – T₁)ΔA/Δs = ΔJ = κ·ΔT·ΔA/Δs ⇔ ΔJ/ΔA = κΔT/Δs

Now, we defined ΔJ/ΔA as the magnitude of h. As for its direction, it’s obviously perpendicular (not parallel) to the isothermals. Now, as Δs tends to zero, ΔT/Δs is nothing but the rate of change of T with position. We know it’s the maximum rate of change, because the position change is also perpendicular to the isotherms (if the faces are parallel, that tiny distance Δs is perpendicular). Hence, ΔT/Δs must be the magnitude of the gradient vector (∇T). As its direction is opposite to that of h, we can simply pop in a minus sign and switch from magnitudes to vectors to write what we wrote already: h = –κ∇T.

But let’s get back to the lesson. I think you ‘get’ all of the above. In fact, I should probably not have introduced that extra equation above (the ΔJ expression) and all the extra stuff (i.e. the ‘infinitesimally small slab’ explanation), as it probably only confuses you. So… What’s the point really? Well… Let’s look, once again, at that equation h = –κ∇T and let us generalize the result:

We have a scalar field here, the temperature T – but it could be any scalar field really!
When we have the ‘formula’ for the scalar field – it’s obviously some function T(x, y, z) – we can derive the heat flow h from it, i.e. a vector quantity, which has a property which we can vaguely refer to as ‘flow’.
We do so using this brand-new operator ∇. That’s a so-called vector differential operator aka the del operator. We just apply it to the scalar field and we’re done! The only thing left is to add some proportionality factor, but so that’s just because of our units. [In case you wonder about the symbol it self, ∇ is the so-called nabla symbol: the name comes from the Hebrew word for a harp, which has a similar shape indeed.]

This truly is a most remarkable result, and we’ll encounter the same equation elsewhere. For example, if the electric potential is Φ, then we can immediately calculate the electric field using the following formula:

E = –∇Φ

Indeed, the situation is entirely analogous from a mathematical point of view. For example, we have the same minus sign, so E also ‘flows’ from higher to lower potential. Where’s the factor of proportionality? Well… We don’t have one, as we assume that the units in which we measure E and Φ are ‘fully compatible’ (so don’t worry about them now). Of course, as mentioned above, this formula for E is only valid in electrostatics, i.e. when there are no moving charges. When moving charges are involved, we also have the magnetic force coming into play, and then equations become a bit more complicated. However, this extra complication does not fundamentally alter the logic involved, and I’ll come back to this so you see how it all nicely fits together.

Note: In case you feel I’ve skipped some of the ‘explanation’ of that vector equation h = –κ∇T… Well… You may be right. I feel that it’s enough to simply point out that ∇T is a vector with opposite direction to h, so that explains the minus sign in front of the ∇T factor. The only thing left to ‘explain’ then is the magnitude of h, but so that’s why we pop in that kappa factor (κ), and so we’re done, I think, in terms of ‘understanding’ this equation. But so that’s what I think indeed. Feynman offers a much more elaborate ‘explanation‘, and so you can check that out if you think my approach to it is a bit too much of a shortcut.

Interim summary

So far, we have only have shown two things really:

[I] The first thing to always remember is that h·n product: it gives us the component of ‘flow’ (per unit time and per unit area) of h perpendicular through any surface element Δa. Of course, this result is valid for any other vector field, or any vector for that matter: the scalar product of a vector and a unit vector will always yield the component of that vector in the direction of that unit vector. [But note the second vector needs to be a unit vector: it is not generally true that the dot product of one vector with another yields the component of the first vector in the direction of the second: there’s a scale factor that comes into play.]

Now, you should note that the term ‘component’ (of a vector) usually refers to a number (not to a vector) – and surely in this case, because we calculate it using a scalar product! I am just highlighting this because it did confuse me for quite a while. Why? Well… The concept of a ‘component’ of a vector is, obviously, intimately associated with the idea of ‘direction’: we always talk about the component in some direction, e.g. in the x-, y- or z-direction, or in the direction of any combination of x, y and z. Hence, I think it’s only natural to think of a ‘component’ as a vector in its own right. However, as I note here, we shouldn’t do that: a ‘component’ is just a magnitude, i.e. a number only. If we’d want to include the idea of direction, it’s simple: we can just multiply the component with the normal vector n once again, and then we have a vector quantity once again, instead of just a scalar. So then we just write (h·n)·n = (h·n)n. Simple, isn’t it? 🙂

[As I am smiling here, I should quickly say something about this dot (·) symbol: we use the same symbol here for (i) a product between scalars (i.e. real or complex numbers), like 3·4; (ii) a product between a scalar and a vector, like 3·v – but then I often omit the dot to simply write 3v; and, finally, (iii) a scalar product of two vectors, like h·n indeed. We should, perhaps, introduce some new symbol for multiplying numbers, like ∗ for example, but then I hope you’re smart enough to see from the context what’s going on really.]

Back to the lesson. Let me jot down the formula once again: h·n = |h||n|cosθ = h·cosθ. Hence, the number we get here is h (i.e. the amount of heat flow in the direction of flow) multiplied by cosθ, with θ the angle between (i) the surface we’re looking at (which, as mentioned above, is any surface really) and (ii) the surface that’s perpendicular to the direction of flow.

Hmm… […] The direction of flow? Let’s take a moment to think about what we’re saying here. Is there any particular or unique direction really? Heat flows in all directions from warmer to colder areas, and not just in one direction, doesn’t it? You’re right. Once again, the terminology may confuse you – which is yet another reason why math is so much better as a language to express physical ideas 🙂 – and so we should be precise: the direction of h is the direction in which the amount of heat flow (i.e. h·cosθ) is largest (hence, the angle θ is zero). As we pointed out above, that’s the direction in which the temperature T changes the fastest. In fact, as Feynman notes: “We can, if we wish, consider that this statement defines h.”

That brings me to the second thing you should – always and immediately – remember from all of that I’ve written above.

[II] If we write the infinitesimal (i.e. the differential) change in temperature (in whatever direction) as ΔT, then we know that

ΔT = (∂T/∂x, ∂T/∂y, ∂T/∂z)(Δx, Δy, Δz) = ∇T·ΔR

Now, what does this say really? ΔR is an (infinitesimal) displacement vector: ΔR = (Δx, Δy, Δz). Hence, it has some direction. To be clear: that can be any direction in space really. So that’s simple. What about the second factor in this dot product, i.e. that gradient vector ∇T?

The direction of the gradient (i.e. ∇T) is not just ‘any direction’: it’s the direction in which the rate of change of T is largest, and we know what direction that is: it’s the opposite direction of the heat flow h, as evidenced by the minus sign in our vector equations h = –κ∇T or E = –∇Φ. So, once again, we have a (scalar) product of two vectors here, ∇T·ΔR, which yields… Hmm… Good question. That ∇T·ΔR expression is very similar to that h·n expression above, but it’s not quite the same. It’s also a vector dot product – or a scalar product, in other words, but, unlike that n vector, the ΔR vector is not a unit vector: it’s an infinitesimally small displacement vector. So we do not get some ‘component’ of ∇T. More in general, you should note that the dot product of two vectors A and B does not, in general, yield the component of A in the direction of B, unless B is a unit vector – which, again, is not the case here. So if we don’t have that here, what do we have?

Let’s look at the (physical) geometry of the situation once again. Heat obviously flows in one direction only: from warmer to colder places – not in the opposite direction. Therefore, the θ in the h·n = h·cosθ expression varies from –90° to +90° only. Hence, the cosine factor (cosθ) is always positive. Always. Indeed, we do not have any right- or left-hand rule here to distinguish between the ‘front’ side and the ‘back’ side of our surface area. So when we’re looking at that h·n product, we should remember that that normal unit vector n is a unit vector that’s normal to the surface but which is oriented, generally, towards the direction of flow. Therefore, that h·n product will always yield some positive value, because θ varies from –90° to +90° only indeed.

When looking at that ΔT = ∇T·ΔR product, the situation is quite different: while ∇T has a very specific direction (I really mean unique) – which, as mentioned above is opposite to that of h – that ΔR vector can point in any direction – and then I mean literally any direction, including directions ‘uphill’. Likewise, it’s obvious that the temperature difference ΔT can be both positive or negative (or zero, when we’re moving on a isotherm itself). In fact, it’s rather obvious that, if we go in the direction of flow, we go from higher to lower temperatures and, hence, ΔT will, effectively, be negative: ΔT = T₂ – T₁ < 0, as shown below.

Now, because |∇T| and |ΔR| are absolute values (or magnitudes) of vectors, they are always positive (always!). Therefore, if ΔT has a minus sign, it will have to come from the cosine factor in the ΔT = ∇T·ΔR = |∇T|·|ΔR|·cosθ expression. [Again, if you wonder where this expression comes from: it’s just the definition of a vector dot product.] Therefore, ΔT and cosθ will have the same sign, always, and θ can have any value between –180° to +180°. In other words, we’re effectively looking at the full circle here. To make a long story short, we can write the following:

ΔT = |∇T|·|ΔR|·cosθ = |∇T|·ΔR·cosθ ⇔ ΔT/ΔR = |∇T|cosθ

As you can see, θ is the angle between ∇T and ΔR here and, as mentioned above, it can take on any value – well… Any value between –180° to +180°, that is. ΔT/ΔR is, obviously, the rate of change of T in the direction of ΔR and, from the expression above, we can see it is equal to the component of ∇T in the direction of ΔR:

ΔT/ΔR = |∇T|cosθ

So we have a negative component here? Yes. The rate of change is negative and, therefore, we have a negative component. Indeed, any vector has components in all directions, including directions that point away from it. However, in the directions that point away from it, the component will be negative. More in general, we have the following interesting result: the rate of change of a scalar field ψ in the direction of a (small) displacement ΔR is the component of the gradient ∇ψ along that displacement. We write that result as:

Δψ/ΔR = |∇ψ|cosθ

[Note the (not so) subtle difference between ΔR (i.e. a vector) and ΔR (some real number). It’s quite important. ]

We’ve said a lot of (not so) interesting things here, but we still haven’t answered the original question: what’s ∇T·ΔR? Well… We can’t say more than what we said already: it’s equal to ΔT, which is a differential: ΔT = (∂T/∂x)Δx + (∂T/∂y)Δy + (∂T/∂z)Δz. A differential and a derivative (i.e. a rate of change) are not the same, but they are obviously closely related, as evidenced from the equations above: the rate of change is the change per unit distance. [Likewise, note that |∇ψ|cosθ is just a product of two real numbers, while ∇T·ΔR is a vector dot product, i.e. a (scalar) product of two vectors!]

In any case, this is enough of a recapitulation. In fact, this ‘interim summary’ has become longer than the preceding text! We’re now ready to discuss what I’ll call the First Theorem of vector calculus in physics. Of course, never mind the term: what’s first or second or third doesn’t matter really: you’ll need all of the theorems below to understand vector calculus.

The First Theorem

Let’s assume we have some scalar field ψ in space: ψ might be the temperature, but it could be any scalar field really. Now, if we go from one point (1) to another (2) in space, as shown below, we’ll follow some arbitrary path, which is denoted by the curve Γ in the illustrations below. Each point along the curve can then be associated with a gradient ∇ψ (think of the h = –κ∇T and E = –∇Φ expressions above if you’d want examples). Its tangential component is obviously equal to (∇ψ)_t·Δs= ∇ψ·Δs. [Please note, once again, the subtle difference between Δs(with the s in bold-face) and Δs: Δsis a vector, and Δsis its magnitude.]

As shown in the illustrations above, we can mark off the curve at a number of points (a, b, c, etcetera) and join these points by straight-line segments Δs_i. Now let’s consider the first line segment, i.e. Δs₁. It’s obvious that the change in ψ from point 1 to point a is equal to Δψ₁= ψ(a) – ψ(1). Now, we have that general Δψ = (∂ψ/∂x, ∂ψ/∂y, ∂ψ/∂z)(Δx, Δy, Δz) = ∇ψ·Δs equation. [If you find it difficult to interpret what I am writing here, just substitute ψ for T and Δs for ΔR.] So we can write:

Δψ₁= ψ(a) – ψ(1) = (∇ψ)₁·Δs₁

Likewise, we can write:

ψ(b) – ψ(a) = (∇ψ)₂·Δs₁

In these expressions, (∇ψ)₁and (∇ψ)₂mean the gradient evaluated at segment Δs₁ and point Δs₂ respectively, not at point 1 and 2 – obviously. Now, if we add the two equations above, we get:

ψ(b) – ψ(1) = (∇ψ)₁·Δs₁+ (∇ψ)₂·Δs₁

To make a long story short, we can keep adding such terms to get:

ψ(2) – ψ(1) = ∑(∇ψ)_i·Δs_i

We can add more and more segments and, hence, take a limit: as Δs_i tends to zero, ∑ becomes a sum of an infinite number of terms – which we denote using the integral sign ∫ – in which ds is – quite simply – just the infinitesimally small displacement vector. In other words, we get the following line integral along that curve Γ:

This is a gem, and our First Theorem indeed. It’s a remarkable result, especially taking into account the fact that the path doesn’t matter: we could have chosen any curve Γ indeed, and the result would be the same. So we have:

You’ll say: so what? What do we do with this? Well… Nothing much for the moment, but we’ll need this result later. So I’d say: just hang in there, and note this is the first significant use of our del operator in a mathematical expression that you’ll encounter very often in physics. So just let it sink in, and allow me to proceed with the rest of the story.

Before doing so, however, I should note that even Feynman sins when trying to explain this theorem in a more ‘intuitive’ way. Indeed, in his Lecture on the topic, he writes the following: “Since the gradient represents the rate of change of a field quantity, if we integrate that rate of change, we should get the total change.” Now, from that Δψ/ΔR = |∇ψ|cosθ formula, it’s obvious that the gradient is the rate of change in a specific direction only. To be precise, in this particular case – with the field quantity ψ equal to the temperature T – it’s the direction in which T changes the fastest.

You should also note that the integral above is not the type of integral you known from high school. Indeed, it’s not of the rather straightforward ∫f(x)dx type, with f(x) the integrand and dx the variable of integration. That type of integral, we knew how to solve. A line integral is quite different. Look at it carefully: we have a vector dot product after the ∫ sign. So, unlike what Feynman suggests, it’s not just a matter of “integrating the rate of change.”

Now, I’ll refer you to Wikipedia for a good discussion of what a line integral really is, but I can’t resist the temptation to copy the animation in that article, because it’s very well made indeed. While it shows that we can think of a line integral as the two- or three-dimensional equivalent of the standard type of integral we learned to solve in high school (you’ll remember the solution was also the area under the graph of the function that had to be integrated), the way to go about it is quite different. Solving them will, in general, involve some so-called parametrization of the curve C.

However, this post is becoming way too long and, hence, I really need to move on now.

Operations with ∇: divergence and curl

You may think we’ve covered a lot of ground already, and we did. At the same time, everything I wrote above is actually just the start of it. I emphasized the physics of the situation so far. Let me now turn to the math involved. Let’s start by dissociating the del operator from the scalar field, so we just write:

∇ = (∂/∂x, ∂/∂y, ∂/∂z)

This doesn’t mean anything, you’ll say, because the operator has nothing to operate on. And, yes, you’re right. However, in math, it doesn’t matter: we can combine this ‘meaningless’ operator (which looks like a vector, because it has three components) with something else. For example, we can do a vector dot product:

∇·(a vector)

As mentioned above, we can ‘do’ this product because ∇ has three components, so it’s a ‘vector’ too (although I find such name-giving quite confusing), and so we just need to make sure that the vector we’re operating on has three components too. To continue with our heat flow example, we can write, for example:

∇·h = (∂/∂x, ∂/∂y, ∂/∂z)·(h_x, h_y, h_z) = ∂h_x/∂x + ∂h_y/∂y, ∂h_z/∂z

This del operator followed by a dot, and acting on a vector – i.e. ∇·(vector) – is, in fact, a new operator. Note that we use two existing symbols, the del (∇) and the dot (·), but it’s one operator really. [Inventing a new symbol for it would not be wise, because we’d forget where it comes from and, hence, probably scratch our head when we’d see it.] It’s referred to as a vector operator, just like the del operator, but don’t worry about the terminology here because, once again, the terminology here might confuse you. Indeed, our del operator acted on a scalar to yield a vector, and now it’s the other way around: we have an operator acting on a vector to return a scalar. In a few minutes, we’ll define yet another operator acting on a vector to return a vector. Now, all of these operators are so-called vector operators, not because there’s some vector involved, but because they all involve the del operator. It’s that simple. So there’s no such thing as a scalar operator. 🙂 But let me get back to the main line of the story. This ∇· operator is quite important in physics, and so it has a name (and an abbreviated notation) of its own:

∇·h = div h = the divergence of h

The physical significance of the divergence is related to the so-called flux of a vector field: it measures the magnitude of a field’s source or sink at a given point. Continuing our example with temperature, consider air as it is heated or cooled. The relevant vector field is now the velocity of the moving air at a point. If air is heated in a particular region, it will expand in all directions such that the velocity field points outward from that region. Therefore the divergence of the velocity field in that region would have a positive value, as the region is a source. If the air cools and contracts, the divergence has a negative value, as the region is a sink.

A less intuitive but more accurate definition is the following: the divergence represents the volume density of the outward flux of a vector field from an infinitesimal volume around a given point.

Phew! That sounds more serious, doesn’t it? We’ll come back to this definition when we’re ready to define the concept of flux somewhat accurately. For now, just note two of Maxwell’s famous equations involve the divergence operator:

∇·E = ρ/ε₀ and ∇·B = 0

In my previous post, I gave a verbal description of those two equations:

The flux of E through a closed surface = (the net charge inside)/ε₀
The flux of B through a closed surface = zero

The first equation basically says that electric charges cause an electric field. The second equation basically says there is no such thing as a magnetic charge: the magnetic force only appears when charges are moving and/or when electric fields are changing. Note that we’re talking closed surface here, so they define a volume indeed. We can also look at the flux through a non-closed surface (and we’ll do that shortly) but, in the context of Maxwell’s equations, we’re talking volumes and, hence, closed surfaces.

Let me quickly throw in some remarks on the units in which we measure stuff. Electric field strength (so the unit we use to measure the magnitude of E) is measured in Newton per Coulomb, so force divided by charge. That makes sense, because E is defined as the force on the unit charge: E = F/q, and so the unit is N/C. Please do think about why we have q in the denominator: if we’d have the same force on an electric charge that is twice as big, then we’d have a field strength that’s only half, so we have an inverse proportionality here. Conversely, if we’d have twice the force on the same electric charge, the field strength would also double.

Now, flux and field strength are obviously related, but not the same. The flux is obviously proportional to the field strength (expressed in N/C), but then we also know it’s some number expressed per unit area. Hence, you might think that the unit of flux is field strength per square meter, i.e. N/C/m². It’s not. It’s a stupid mistake, but one that is commonly made. Flux is expressed in N/C times m², so that’s the product (N/C)·m²= (N·m/C)·m= (J/C)·m. Why is that? Think about the common graphical representation of a field: we just draw lines, all tangent to the direction of the field vector at every point, and the density of the lines (i.e. the number of lines per unit area) represents the magnitude of our electric field vector. Now, the flux through some area is the number of lines we count in that area. Hence, if you double the area, you should get twice the flux. Halve the area, and you should get half the flux. So we have a direct proportionality here. In fact, assuming the electric field is uniform, we can write the (electric) flux as the product of the field strength E and the (vector) area S, so we write Φ_E = E·S = E·S·cosθ.

Huh? Yes. The origin of the mistake is that we, somehow, think the ‘per unit area’ qualification comes with the flux. It doesn’t: it’s in the idea of field strength itself. Indeed, an alternative to the presentation above is just to draw arrows representing the same field strength, as illustrated below. However, instead of drawing more arrows (of some standard length) to represent increasing field strength, we’d just draw longer arrows—not more of them. So then the idea of the number of lines per unit area is no longer valid.

[…] OK. I realize I am probably just confusing you here. Just one more thing, perhaps. We also have magnetic flux, denoted as Φ_B, and it’s defined in the same way: Φ_B = B·S = B·S·cosθ. However, because the unit of magnetic field strength is different, the unit of magnetic flux is different too. It’s the weber, and I’ll let you look up its definition yourself. 🙂 Note that it’s a bit of a different beast, because the magnetic force is a bit of a different beast. 🙂

So let’s get back to our operators. You’ll anticipate the second new operator now, because that’s the one that appears in the other two equations in Maxwell’s set of equations. It’s the cross product:

∇×E = (∂/∂x, ∂/∂y, ∂/∂z)×(E_x, E_y, E_z) = … What?

Well… The cross product is not as straightforward to write down as the dot product. We get a vector indeed, not a scalar, and its three components are:

(∇×E)_z = ∇_xE_y– ∇_yE_x= ∂E_y/∂x – ∂E_x/∂y

(∇×E)_x = ∇_yE_z– ∇_zE_y= ∂E_z/∂y – ∂E_y/∂z

(∇×E)_y = ∇_zE_x– ∇_xE_z= ∂E_x/∂z – ∂E_z/∂x

I know this looks pretty monstrous, but so that’s how cross products work. Please do check it out: you have to play with the order of the x, y and z subscripts. I gave the geometric formula for a dot product above, so I should also give you the same for a cross product:

A×B = |A||B|sin(θ)n

In this formula, we once again have θ, the angle between A and B, but note that, this time around, it’s the sine, not the cosine, that pops up when calculating the magnitude of this vector. In addition, we have n at the end: n is a unit vector at right angles to both A and B. It’s what makes the cross product a vector. Indeed, as you can see, multiplying by n will not alter the magnitude (|A||B|sinθ) of this product, but it gives the whole thing a direction, so we get a new vector indeed. Of course, we have two unit vectors at right angles to A and B, and so we need a rule to choose one of these: the direction of the n vector we want is given by that right-hand rule which we encountered a couple of times already.

Again, it’s two symbols but one operator really, and we also have a special name (and notation) for it:

∇×h = curl h = the curl of h

The curl is, just like the divergence, a so-called vector operator but, as mentioned above, that’s just because it involves the del operator. Just note that it acts on a vector and that its result is a vector too. What’s the geometric interpretation of the curl? Well… It’s a bit hard to describe that but let’s try. The curl describes the ‘rotation’ or ‘circulation’ of a vector field:

The direction of the curl is the axis of rotation, as determined by the right-hand rule.
The magnitude of the curl is the magnitude of rotation.

I know. This is pretty abstract, and I’ll probably have to come back to it in another post. Let’s first ask some basic question: should we associate some unit with the curl? In fact, when you google, you’ll find lots of units used in electromagnetic theory (like the weber, for example), but nothing for circulation. I am not sure why, because if flux is related to some density, the idea of curl (or circulation) is pretty much the same. It’s just that it isn’t used much in actual engineering problems, and surely not those you may have encountered in your high school physics course!

In any case, just note we defined three new operators in this ‘introduction’ to vector calculus:

∇T = grad T = a vector
∇·h = div h = a scalar
∇×h = curl h = a vector

That’s all. It’s all we need to understand Maxwell’s famous equations:

Huh? Hmm… You’re right: understanding the symbols, to some extent, doesn’t mean we ‘understand’ these equations. What does it mean to ‘understand’ an equation? Let me quote Feynman on that: “What it means really to understand an equation—that is, in more than a strictly mathematical sense—was described by Dirac. He said: “I understand what an equation means if I have a way of figuring out the characteristics of its solution without actually solving it.” So if we have a way of knowing what should happen in given circumstances without actually solving the equations, then we “understand” the equations, as applied to these circumstances.”

We’re surely not there yet. In fact, I doubt we’ll ever reach Dirac’s understanding of Maxwell’s equations. But let’s do what we can.

In order to ‘understand’ the equations above in a more ‘physical’ way, let’s explore the concepts of divergence and curl somewhat more. We said the divergence was related to the ‘flux’ of a vector field, and the curl was related to its ‘circulation’. In my previous post, I had already illustrated those two concepts copying the following diagrams from Feynman’s Lectures:

flux = (average normal component)·(surface area)

So that’s the flux (through a non-closed surface).

To illustrate the concept of circulation, we have not one but three diagrams, shown below. Diagram (a) gives us the vector field, such as the velocity field in a liquid. In diagram (b), we imagine a tube (of uniform cross section) that follows some arbitrary closed curve. Finally, in diagram (c), we imagine we’d suddenly freeze the liquid everywhere except inside the tube. Then the liquid in the tube would circulate as shown in (c), and so that’s the concept of circulation.

We have a similar formula as for the flux:

circulation = (the average tangential component)·(the distance around)

In both formulas (flux and circulation), we have a product of two scalars: (i) the average normal component and the average tangential component (for the flux and circulation respectively) and (ii) the surface area and the distance around (again, for the flux and circulation respectively). So we get a scalar as a result. Does that make sense? When we related the concept of flux to the divergence of a vector field, we said that the flux would have a positive value if the region is a source, and a negative value if the region is a sink. So we have a number here (otherwise we wouldn’t be talking ‘positive’ or ‘negative’ values). So that’s OK. But are we talking about the same number? Yes. I’ll show they are the same in a few minutes.

But what about circulation? When we related the concept of circulation of the curl of a vector field, we introduced a vector cross product, so that yields a vector, not a scalar. So what’s the relation between that vector and the number we get when multiplying the ‘average tangential component’ and the ‘distance around’. The answer requires some more mathematical analysis, and I’ll give you what you need in a minute. Let me first make a remark about conventions here.

From what I write above, you see that we use a plus or minus sign for the flux to indicate the direction of flow: the flux has a positive value if the region is a source, and a negative value if the region is a sink. Now, why don’t we do the same for circulation? We said the curl is a vector, and its direction is the axis of rotation as determined by the right-hand rule. Why do we need a vector here? Why can’t we have a scalar taking on positive or negative values, just like we do for the flux?

The intuitive answer to this question (i.e. the ‘non-mathematical’ or ‘physical’ explanation, I’d say) is the following. Although we can calculate the flux through a non-closed surface, from a mathematical point of view, flux is effectively being defined by referring to the infinitesimal volume around some point and, therefore, we can easily, and unambiguously, determine whether we’re inside or outside of that volume. Therefore, the concepts of positive and negative values make sense, as we can define them referring to some unique reference point, which is either inside or outside of the region.

When talking circulation, however, we’re talking about some curve in space. Now it’s not so easy to find some unique reference point. We may say that we are looking at some curve from some point ‘in front of’ that curve, but some other person whose position, from our point of view, would be ‘behind’ the curve, would not agree with our definition of ‘in front of’: in fact, his definition would be exactly the opposite of ours. In short, because of the geometry of the situation involved, our convention in regard to the ‘sign’ of circulation (positive or negative) becomes somewhat more complicated. It’s no longer a simple matter of ‘inward’ or ‘outward’ flow: we need something like a ‘right-hand rule’ indeed. [We could, of course, also adopt a left-hand rule but, by now, you know that, in physics, there’s not much use for a left hand. :-)]

That also ‘explains’ why the vector cross product is non-commutative: A×B ≠ B×A. To be fully precise, A×B and B×A have the same magnitude but opposite direction: A×B = |A||B|sin(θ)n = –|A||B|sin(θ)(–n) = –(B×A) = –B×A. The dot product, on the other hand, is fully commutative: A·B = B·A.

In fact, the concept of circulation is very much related to the concept of angular momentum which, as you’ll remember from a previous post, also involves a vector cross product.

[…]

I’ve confused you too much already. The only way out is the full mathematical treatment. So let’s go for that.

Flux

Some of the confusion as to what flux actually means in electromagnetism is probably caused by the fact that the illustration above is not a closed surface and, from my previous post, you should remember that Maxwell’s first and third equation define the flux of E and B through closed surfaces. It’s not that the formula above for the flux through a non-closed surface is wrong: it’s just that, in electromagnetism, we usually talk about the flux through a closed surface.

A closed surface has no boundary. In contrast, the surface area above does have a clear boundary and, hence, it’s not a closed surface. A sphere is an example of a closed surface. A cube is an example as well. In fact, an infinitesimally small cube is what’s used to prove a very convenient theorem, referred to as Gauss’ Theorem. We will not prove it here, but just try to make sure you ‘understand’ what it says.

Suppose we have some vector field C and that we have some closed surface S – a sphere, for example, but it may also be some very irregular volume. Its shape doesn’t matter: the only requirement is that it’s defined by a closed surface. Let’s then denote the volume that’s enclosed by this surface by V. Now, the flux through some (infinitesimal) surface element da will, effectively, be given by that formula above:

flux = (average normal component)·(surface area)

What’s the average normal component in this case? It’s given by that ΔJ/Δa₂ = (ΔJ/Δa₁)cosθ = h·n formula, except that we just need to substitute h for C here, so we have C·n instead of h·n. To get the flux through the closed surface S, we just need to add all the contributions. Adding those contributions amounts to taking the following surface integral:

Now, I talked about Gauss’ Theorem above, and I said I would not prove it, but this is what Gauss’ Theorem says:

Huh? Don’t panic. Just try to ‘read’ what’s written here. From all that I’ve said so far, you should ‘understand’ the surface integral on the left-hand side. So that should be OK. Let’s now look at the right-hand side. The right-hand side uses the divergence operator which I introduced above: ∇·(vector). In this case, ∇·C. That’s a scalar, as we know, and it represents the outward flux from an infinitesimally small cube inside the surface indeed. The volume integral on the right-hand side adds all of the fluxes out of each part (think of it as zillions of infinitesimally small cubes) of the volume V that is enclosed by the (closed) surface S. So that’s what Gauss’ Theorem is all about. In words, we can state Gauss’ Theorem as follows:

Gauss’ Theorem: The (surface) integral of the normal component of a vector (field) over a closed surface is the (volume) integral of the divergence of the vector over the volume enclosed by the surface.

Again, I said I would not prove Gauss’ Theorem, but its proof is actually quite intuitive: to calculate the flux out of a large volume, we can ‘cut it up’ in smaller volumes, and then calculate the flux out of these volumes. If we add it up, we’ll get the total flux. In any case, I’ll refer you to Feynman in case you’d want to see how it goes exactly. So far, I did what I promised to do, and that’s to relate the formula for flux (i.e. that (average normal component)·(surface area) formula) to the divergence operator. Let’s now do the same for the curl.

Curl

For non-native English speakers (like me), it’s always good to have a look at the common-sense definition of ‘curl’: as a verb (to curl), it means ‘to form or cause to form into a curved or spiral shape’. As a noun (e.g. a curl of hair), it means ‘something having a spiral or inwardly curved form’. It’s clear that, while not the same, we can indeed relate this common-sense definition to the concept of circulation that we introduced above:

circulation = (the average tangential component)·(the distance around)

So that’s the (scalar) product we already mentioned above. How do we relate it to that curl operator?

Patience, please ! The illustration below shows what we actually have to do to calculate the circulation around some loop Γ: we take an infinite number of vector dot products C·ds. Take a careful look at the notation here: I use bold-face for s and, hence, ds is some little vector indeed. Going to the limit, ds becomes a differential indeed. The fact that we’re talking a vector dot product here ensures that only the tangential component of C ‘enters the equation’, so to speak. I’ll come back to that in a moment. Just have a good look at the illustration first.

Such infinite sum of vector dot products C·ds is, once again, an integral. It’s another ‘special’ integral, in fact. To be precise, it’s a line integral. Moreover, it’s not just any line integral: we have to go all around the (closed) loop to take it. We cannot stop somewhere halfway. That’s why Feynman writes it with a little loop (ο) through the integral sign (∫):

Note the subtle difference between the two products in the integrands of the integrals above: C_tds versus C·ds. The first product is just a product of two scalars, while the second is a dot product of two vectors. Just check it out using the definition of a dot product (A·B = |A||B|cosθ) and substitute A and B by C and ds respectively, noting that the tangential component C_t equals C times cosθ indeed.

Now, once again, we want to relate this integral with that dot product inside to one of those vector operators we introduced above. In this case, we’ll relate the circulation with the curl operator. The analysis involves infinitesimal squares (as opposed to those infinitesimal cubes we introduced above), and the result is what is referred to as Stokes’ Theorem. I’ll just write it down:

Again, the integral on the left was explained above: it’s a line integral taking around the full loop Γ. As for the integral on the right-hand side, that’s a surface integral once again but, instead of a div operator, we have the curl operator inside and, moreover, the integrand is the normal component of the curl only. Now, remembering that we can always find the normal component of a vector (i.e. the component that’s normal to the surface) by taking the dot product of that vector and the unit normal vector (n), we can write Stokes’s Theorem also as:

That doesn’t look any ‘nicer’, but it’s the form in which you’ll usually see it. Once again, I will not give you any formal proof of this. Indeed, if you’d want to see how it goes, I’ll just refer you to Feynman’s Lectures. However, the philosophy behind is the same. The first step is to prove that we can break up the surface bounded by the loop Γ into a number of smaller areas, and that the circulation around Γ will be equal to the sum of the circulations around the little loops. The idea is illustrated below:

Of course, we then go to the limit and cut up the surface into an infinite number of infinitesimally small squares. The next step in the proof then shows that the circulation of C around an infinitesimal square is, indeed, (i) the component of the curl of C normal to the surface enclosed by that square multiplied by (ii) the area of that (infinitesimal) square. The diagram and formula below do not give you the proof but just illustrate the idea:

OK, you’ll say, so what? Well… Nothing much. I think you have enough to digest as for now. It probably looks very daunting, but so that’s all we need to know – for the moment that is – to arrive at a better ‘physical’ understanding of Maxwell’s famous equations. I’ll come back to them in my next post. Before proceeding to the summary of this whole post, let me just write down Stokes’ Theorem in words:

Stokes’ Theorem: The line integral of the tangential component of a vector (field) around a closed loop is equal to the surface integral of the normal component of the curl of that vector over any surface which is bounded by the loop.

Summary

We’ve defined three so-called vector operators, which we’ll use very often in physics:

∇T = grad T = a vector
∇·h = div h = a scalar
∇×h = curl h = a vector

Moreover, we also explained three important theorems, which we’ll use as least as much:

[1] The First Theorem:

[2] Gauss Theorem:

[3] Stokes Theorem:

As said, we’ll come back to them in my next post. As for now, just try to familiarize yourself with these div and curl operators. Try to ‘understand’ them as good as you can. Don’t look at them as just some weird mathematical definition: try to understand them in a ‘physical’ way, i.e. in a ‘completely unmathematical, imprecise, and inexact way’, remembering that’s what it takes to understand to truly understand physics. 🙂

Newtonian, Lagrangian and Hamiltonian mechanics

Post scriptum (dated 16 November 2015): You’ll smile because… Yes, I am starting this post with a post scriptum, indeed. 🙂 I’ve added it, a year later or so, because, before you continue to read, you should note I am not going to explain the Hamiltonian matrix here, as it’s used in quantum physics. That’s the topic of another post, which involves far more advanced mathematical concepts. If you’re here for that, don’t read this post. Just go to my post on the matrix indeed. 🙂 But so here’s my original post. I wrote it to tie up some loose end. 🙂

As an economist, I thought I knew a thing or two about optimization. Indeed, when everything is said and done, optimization is supposed to an economist’s forte, isn’t it? 🙂 Hence, I thought I sort of understood what a Lagrangian would represent in physics, and I also thought I sort of intuitively understood why and how it could be used it to model the behavior of a dynamic system. In short, I thought that Lagrangian mechanics would be all about optimizing something subject to some constraints. Just like in economics, right?

[…] Well… When checking it out, I found that the answer is: yes, and no. And, frankly, the honest answer is more no than yes. 🙂 Economists (like me), and all social scientists (I’d think), learn only about one particular type of Lagrangian equations: the so-called Lagrange equations of the first kind. This approach models constraints as equations that are to be incorporated in an objective function (which is also referred to as a Lagrangian–and that’s where the confusion starts because it’s different from the Lagrangian that’s used in physics, which I’ll introduce below) using so-called Lagrange multipliers. If you’re an economist, you’ll surely remember it: it’s a problem written as “maximize f(x, y) subject to g(x, y) = c”, and we solve it by finding the so-called stationary points (i.e. the points for which the derivative is zero) of the (Lagrangian) objective function f(x, y) + λ[g(x, y) – c].

Now, it turns out that, in physics, they use so-called Lagrange equations of the second kind, which incorporate the constraints directly by what Wikipedia refers to as a “judicious choice of generalized coordinates.”

Generalized coordinates? Don’t worry about it: while generalized coordinates are defined formally as “parameters that describe the configuration of the system relative to some reference configuration”, they are, in practice, those coordinates that make the problem easy to solve. For example, for a particle (or point) that moves on a circle, we’d not use the Cartesian coordinates x and y but just the angle that locates the particles (or point). That simplifies matters because then we only need to find one variable. In practice, the number of parameters (i.e. the number of generalized coordinates) will be defined by the number of degrees of freedom of the system, and we know what that means: it’s the number of independent directions in which the particle (or point) can move. Now, those independent directions may or may not include the x, y and z directions (they may actually exclude one of those), and they also may or may not include rotational and/or vibratory movements. We went over that when discussing kinetic gas theory, so I won’t say more about that here.

So… OK… That was my first surprise: the physicist’s Lagrangian is different from the social scientist’s Lagrangian.

The second surprise was that all physics textbooks seem to dislike the Lagrangian approach. Indeed, they opt for a related but different function when developing a model of a dynamic system: it’s a function referred to as the Hamiltonian. The modeling approach which uses the Hamiltonian instead of the Lagrangian is, of course, referred to as Hamiltonian mechanics. We may think the preference for the Hamiltonian approach has to do with William Rowan Hamilton being Anglo-Irish, while Joseph-Louis Lagrange (born as Giuseppe Lodovico Lagrangia) was Italian-French but… No. 🙂

And then we have good old Newtonian mechanics as well, obviously. In case you wonder what that is: it’s the modeling approach that we’ve been using all along. 🙂 But I’ll remind you of what it is in a moment: it amounts to making sense of some situation by using Newton’s laws of motion only, rather than a more sophisticated mathematical argument using more abstract concepts, such as energy, or action.

Introducing Lagrangian and Hamiltonian mechanics is quite confusing because the functions that are involved (i.e. the so-called Lagrangian and Hamiltonian functions) look very similar: we write the Lagrangian as the difference between the kinetic and potential energy of a system (L = T – V), while the Hamiltonian is the sum of both (H = T + V). Now, I could make this post very simple and just ask you to note that both approaches are basically ‘equivalent’ (in the sense that they lead to the same solutions, i.e. the same equations of motion expressed as a function of time) and that a choice between them is just a matter of preference–like choosing between an English versus a continental breakfast. 🙂 Of course, an English breakfast has usually some extra bacon, or a sausage, so you get more but… Well… Not necessarily something better. 🙂 So that would be the end of this digression then, and I should be done. However, I must assume you’re a curious person, just like me, and, hence, you’ll say that, while being ‘equivalent’, they’re obviously not the same. So how do the two approaches differ exactly?

Let’s try to get a somewhat intuitive understanding of it all by taking, once again, the example of a simple harmonic oscillator, as depicted below. It could be a mass on a spring. In fact, our example will, in fact, be that of an oscillating mass on a spring. Let’s also assume there’s no damping, because that makes the analysis soooooooo much easier.

Of course, we already know all of the relevant equations for this system just from applying Newton’s laws (so that’s Newtonian mechanics). We did that in a previous post. [I can’t remember which one, but I am sure I’ve done this already.] Hence, we don’t really need the Lagrangian or Hamiltonian. But, of course, that’s the point of this post: I want to illustrate how these other approaches to modeling a dynamic system actually work, and so it’s good we have the correct answer already so we can make sure we’re not going off track here. So… Let’s go… 🙂

I. Newtonian mechanics

Let me recapitulate the basics of a mass on a spring which, in jargon, is called a harmonic oscillator. Hooke’s law is there: the force on the mass is proportional to its distance from the zero point (i.e. the displacement), and the direction of the force is towards the zero point–not away from it, and so we have a minus sign. In short, we can write:

F = –kx (i.e. Hooke’s law)

Now, Newton‘s Law (Newton’s second law to be precise) says that F is equal to the mass times the acceleration: F = ma. So we write:

F = ma = m(d²x/dt²) = –kx

So that’s just Newton’s law combined with Hooke’s law. We know this is a differential equation for which there’s a general solution with the following form:

x(t) = A·cos(ωt + α)

If you wonder why… Well… I can’t digress on that here again: just note, from that differential equation, that we apparently need a function x(t) that yields itself when differentiated twice. So that must be some sinusoidal function, like sine or cosine, because these do that. […] OK… Sorry, but I must move on.

As for the new ‘variables’ (A, ω and α), A depends on the initial condition and is the (maximum) amplitude of the motion. We also already know from previous posts (or, more likely, because you already know a lot about physics) that A is related to the energy of the system. To be precise: the energy of the system is proportional to the square of the amplitude: E ∝ A². As for ω, the angular frequency, that’s determined by the spring itself and the oscillating mass on it: ω = (k/m)^1/2= 2π/T = 2πf (with T the period, and f the frequency expressed in oscillations per second, as opposed to the angular frequency, which is the frequency expressed in radians per second). Finally, I should note that α is just a phase shift which depends on how we define our t = 0 point: if x(t) is zero at t = 0, then that cosine function should be zero and then α will be equal to ±π/2.

OK. That’s clear enough. What about the ‘operational currency of the universe’, i.e. the energy of the oscillator? Well… I told you already/ We don’t need the energy concept here to find the equation of motion. In fact, that’s what distinguishes this ‘Newtonian’ approach from the Lagrangian and Hamiltonian approach. But… Now that we’re at it, and we have to move to a discussion of these two animals (I mean the Lagrangian and Hamiltonian), let’s go for it.

We have kinetic versus potential energy. Kinetic energy (T) is what it always is. It depends on the velocity and the mass: K.E. = T = mv²/2 = m(dx/dt)²/2 = p²/2m. Huh? What’s this expression with p in it? […] It’s momentum: p = mv. Just check it: it’s an alternative formula for T really. Nothing more, nothing less. I am just noting it here because it will pop up again in our discussion of the Hamiltonian modeling approach. But that’s for later. Onwards!

What about potential energy (V)? We know that’s equal to V = kx²/2. And because energy is conserved, potential energy (V) and kinetic energy (T) should add up to some constant. Let’s check it: dx/dt = d[Acos(ωt + α)]/dt = –Aωsin(ωt + α). [Please do the derivation: don’t accept things at face value. :-)] Hence, T = mA²ω²sin²(ωt + α)/2 = mA²(k/m)sin²(ωt + α)/2 = kA²sin²(ωt + α)/2. Now, V is equal to V = kx²/2 = k[Acos(ωt + α)]²/2 = k[Acos(ωt + α)]²/2 = kA²cos²(ωt + α)/2. Adding both yields:

T + V = kA²sin²(ωt + α)/2 + kA²cos²(ωt + α)/2

= (1/2)kA²[sin²(ωt + α) + cos²(ωt + α)] = kA²/2.

Ouff! Glad that worked out: the total energy is, indeed, proportional to the square of the amplitude and the constant of proportionality is equal to k/2. [You should now wonder why we do not have m in this formula but, if you’d think about it, you can answer your own question: the amplitude will depend on the mass (bigger mass, smaller amplitude, and vice versa), so it’s actually in the formula already.]

The point to note is that this Hamiltonian function H = T + V is just a constant, not only for this particular case (an oscillation without damping), but in all cases where H represents the total energy of a (closed) system.

OK. That’s clear enough. How does our Lagrangian look like? That’s not a constant obviously. Just so you can visualize things, I’ve drawn the graph below:

The red curve represents kinetic energy (T) as a function of the displacement x: T is zero at the turning points, and reaches a maximum at the x = 0 point.
The blue curve is potential energy (V): unlike T, V reaches a maximum at the turning points, and is zero at the x = 0 point. In short, it’s the mirror image of the red curve.
The Lagrangian is the green graph: L = T – V. Hence, L reaches a minimum at the turning points, and a maximum at the x = 0 point.

While that green function would make an economist think of some Lagrangian optimization problem, it’s worth noting we’re not doing any such thing here: we’re not interested in stationary points. We just want the equation(s) of motion. [I just thought that would be worth stating, in light of my own background and confusion in regard to it all. :-)]

OK. Now that we have an idea of what the Lagrangian and Hamiltonian functions are (it’s probably worth noting also that we do not have a ‘Newtonian function’ of some sort), let us now show how these ‘functions’ are used to solve the problem. What problem? Well… We need to find some equation for the motion, remember? [I find that, in physics, I often have to remind myself of what the problem actually is. Do you feel the same? 🙂 ] So let’s go for it.

II. Lagrangian mechanics

As this post should not turn into a chapter of some math book, I’ll just describe the how, i.e. I’ll just list the steps one should take to model and then solve the problem, and illustrate how it goes for the oscillator above. Hence, I will not try to explain why this approach gives the correct answer (i.e. the equation(s) of motion). So if you want to know why rather than how, then just check it out on the Web: there’s plenty of nice stuff on math out there.

The steps that are involved in the Lagrangian approach are the following:

Compute (i.e. write down) the Lagrangian function L = T – V. Hmm? How do we do that? There’s more than one way to express T and V, isn’t it? Right you are! So let me clarify: in the Lagrangian approach, we should express T as a function of velocity (v) and V as a function of position (x), so your Lagrangian should be L = L(x, v). Indeed, if you don’t pick the right variables, you’ll get nowhere. So, in our example, we have L = mv²/2 – kx²/2.
Compute the partial derivatives ∂L/∂x and ∂L/∂v. So… Well… OK. Got it. Now that we’ve written L using the right variables, that’s a piece of cake. In our example, we have: ∂L/∂x = – kx and ∂L/∂v = mv. Please note how we treat x and v as independent variables here. It’s obvious from the use of the symbol for partial derivatives: ∂. So we’re not taking any total differential here or so. [This is an important point, so I’d rather mention it.]
Write down (‘compute’ sounds awkward, doesn’t it?) Lagrange’s equation: d(∂L/∂v)/dt = ∂L/∂x. […] Yep. That’s it. Why? Well… I told you I wouldn’t tell you why. I am just showing the how here. This is Lagrange’s equation and so you should take it for granted and get on with it. 🙂 In our example: d(∂L/∂v)/dt = d(mv)/dt = –k(dx/dt) = ∂L/∂x = – kx. We can also write this as m(dv/dt) = m(d²x/dt²) = –kx.
Finally, solve the resulting differential equation. […] ?! Well… Yes. […] Of course, we’ve done that already. It’s the same differential equation as the one we found in our ‘Newtonian approach’, i.e. the equation we found by combining Hooke’s and Newton’s laws. So the general solution is x(t) = Acos(ωt + α), as we already noted above.

So, yes, we’re solving the same differential equation here. So you’ll wonder what’s the difference then between Newtonian and Lagrangian mechanics? Yes, you’re right: we’re indeed solving the same second-order differential equation here. Exactly. Fortunately, I’d say, because we don’t want any other equation(s) of motion because we’re talking the same system. The point is: we got that differential equation using an entirely different procedure, which I actually didn’t explain at all: I just said to compute this and then that and… – Surprise, surprise! – we got the same differential equation in the end. 🙂 So, yes, the Newtonian and Lagrangian approach to modeling a dynamic system yield the same equations, but the Lagrangian method is much more (very much more, I should say) convenient when we’re dealing with lots of moving bits and if there’s more directions (i.e. degrees of freedom) in which they can move.

In short, Lagrange could solve a problem more rapidly than Newton with his modeling approach and so that’s why his approach won out. 🙂 In fact, you’ll usually see the spatial variables noted as q_j. In this notation, j = 1, 2,… n, and n is the number of degrees of freedom, i.e. the directions in which the various particles can move. And then, of course, you’ll usually see a second subscript i = 1, 2,… m to keep track of every q_jfor each and every particle in the system, so we’ll have n×m q_ij‘s in our model and so, yes, good to stick to Lagrange in that case.

OK. You get that, I assume. Let’s move on to Hamiltonian mechanics now.

III. Hamiltonian mechanics

The steps here are the following. [Again, I am just explaining the how, not the why. You can find mathematical proofs of why this works in handbooks or, better still, on the Web.]

The first step is very similar as the one above. In fact, it’s exactly the same: write T and V as a function of velocity (v) and position (x) respectively and construct the Lagrangian. So, once again, we have L = L(x, v). In our example: L(x, v) = mv²/2 – kx²/2.
The second step, however, is different. Here, the theory becomes more abstract, as the Hamiltonian approach does not only keep track of the position but also of the momentum of the particles in a system. Position (x) and momentum (p) are so-called canonical variables in Hamiltonian mechanics, and the relation with Lagrangian mechanics is the following: p = ∂L/∂v. Huh? Yeah. Again, don’t worry about the why. Just check it for our example: ∂(mv²/2 – kx²/2)/∂v = 2mv/2 = mv. So, yes, it seems to work. Please note, once again, how we treat x and v as independent variables here, as is evident from the use of the symbol for partial derivatives. Let me get back to the lesson, however. The second step is: calculate the conjugate variables. In more familiar wording: compute the momenta.
The third step is: write down (or ‘build’ as you’ll see it, but I find that wording strange too) the Hamiltonian function H = T + V. We’ve got the same problem here as the one I mentioned with the Lagrangian: there’s more than one way to express T and V. Hence, we need some more guidance. Right you are! When writing your Hamiltonian, you need to make sure you express the kinetic energy as a function of the conjugate variable, i.e. as a function of momentum, rather than velocity. So we have H = H(x, p), not H = H(x, v)! In our example, we have H = T + V = p²/2m + kx²/2.
Finally, write and solve the following set of equations: (I) ∂H/∂p = dx/dt and (II) –∂H/∂x = dp/dt. [Note the minus sign in the second equation.] In our example: (I) p/m = dx/dt and (II) –kx = dp/dt. The first equation is actually nothing but the definition of p: p = mv, and the second equation is just Hooke’s law: F = –kx. However, from a formal-mathematical point of view, we have two first-order differential equations here (as opposed to one second-order equation when using the Lagrangian approach), which should be solved simultaneously in order to find position and momentum as a function of time, i.e. x(t) and p(t). The end result should be the same: x(t) = Acos(ωt + α) and p(t) = … Well… I’ll let you solve this: time to brush up your knowledge about differential equations. 🙂

You’ll say: what the heck? Why are you making things so complicated? Indeed, what am I doing here? Am I making things needlessly complicated?

The answer is the usual one: yes, and no. Yes. If we’d want to do stuff in the classical world only, the answer seems to be: yes! In that case, the Lagrangian approach will do and may actually seem much easier, because we don’t have a set of equations to solve. And why would we need to keep track of p(t)? We’re only interested in the equation(s) of motion, aren’t we? Well… That’s why the answer to your question is also: no! In classical mechanics, we’re usually only interested in position, but in quantum mechanics that concept of conjugate variables (like x and p indeed) becomes much more important, and we will want to find the equations for both. So… Yes. That means a set of differential equations (one for each variable (x and p) in the example above) rather than just one. In short, the real answer to your question in regard to the complexity of the Hamiltonian modeling approach is the following: because the more abstract Hamiltonian approach to mechanics is very similar to the mathematics used in quantum mechanics, we will want to study it, because a good understanding of Hamiltonian mechanics will help us to understand the math involved in quantum mechanics. And so that’s the reason why physicists prefer it to the Lagrangian approach.

[…] Really? […] Well… At least that’s what I know about it from googling stuff here and there. Of course, another reason for physicists to prefer the Hamiltonian approach may well that they think social science (like economics) isn’t real science. Hence, we – social scientists – would surely expect them to develop approaches that are much more intricate and abstract than the ones that are being used by us, wouldn’t we?

[…] And then I am sure some of it is also related to the Anglo-French thing. 🙂

Post scriptum 1 (dated 21 March 2016): I hate to write about stuff and just explain the how—rather than the why. However, in this case, the why is really rather complicated. The math behind is referred to as calculus of variations – which is a rather complicated branch of mathematics – but the physical principle behind is the Principle of Least Action. Just click the link, and you’ll see how the Master used to explain stuff like this. It’s an easy and difficult piece at the same time. Near the end, however, it becomes pretty complicated, as he applies the theory to quantum mechanics, indeed. In any case, I’ll let you judge for yourself. 🙂

Post scriptum 2 (dated 13 September 2017): I started a blog on the Exercises on Feynman’s Lectures, and the posts on the exercises on Chapter 4 have a lot more detail, and basically give you all the math you’ll ever want on this. Just click the link. However, let me warn you: the math is not easy. Not at all, really.

Logarithms: a bit of history (and the main rules)

Pre-scriptum (dated 26 June 2020): This post did not suffer much – if at all – from the attack by the dark force—which is good because I still like it. Enjoy !

Original post:

This post will probably be of little or no interest to you. I wrote it to get somewhat more acquainted with logarithms myself. Indeed, I struggle with them. I think they come across as difficult because we don’t learn about the logarithmic function when we learn about the exponential function: we only learn logarithms later – much later. And we don’t use them a lot: exponential functions pop up everywhere, but logarithms not so much. Therefore, we are not as familiar with them as we should be.

The second point issue is notation: x = log_a(y) looks more terrifying than y = a^x because… Well… Too many letters. It would be more logical to apply the same economy of symbols. We could just write x = _ay instead of log_a(y), for example, using a subscript in front of the variable–as opposed to a superscript behind the variable, as we do for the exponential function. Or, else, we could be equally verbose for the exponential function and write y = exp_a(x) instead of y = a^x. In fact, you’ll find such more explicit expressions in spreadsheets and other software, because these don’t take subscripts or superscripts. And then, of course, we also have the use of the Euler number e in e^xand ln(x). While it’s just a real number, e is not as familiar to us as π, and that’s again because we learned trigonometry before we learned advanced calculus.

Historically, however, the exponential and logarithmic functions were ‘invented’, so to say, around the same time and by the same people: they are associated with John Napier, a Scot (1550–1617), and Henry Briggs, an Englishman (1561–1630). Briggs is best known for the so-called common (i.e. base 10) logarithm tables, which he published in 1624 as the Arithmetica Logarithmica. It is logical that the mathematical formalism needed to deal with both was invented around the same time, because they are each other’s inverse: if y = a^x, then x = log_a(y).

These Briggs tables were used, in their original format more or less, until computers took over. Indeed, it’s funny to read what Feynman writes about these tables in 1965: “We are all familiar with the way to multiply numbers if we have a table of logarithms.” (Feynman’s Lectures, p. 22-4). Well… Not any more. And those slide rules, or slipsticks as they were called in the US, have disappeared as well, although you can still find circular slide rules on some expensive watches, like the one below.

It’s a watch for aviators, and it allows them to rapidly multiply numbers indeed: the time multiplied by the speed will give a pilot the distance covered. Of course, there’s all kinds of intricacies here because we’ll measure time in minutes (or even seconds), and speed in knots or miles per hour, and so that explains all the other fancy markings on it. 🙂 In case you have one, now you know what you’re paying for! A real aviator watch! 🙂

How does it work? Well… These slide rules can be used for a number of things but their most basic function is to multiply numbers indeed, and that function is based on the log_b(ac) = log_b(a) + log_b(c). In fact, this works for any base so we can just write log(ac) = log(a) + log(c). So the numbers on the slide rule below are the a, b and c. Note that the slides start with 1 because we’re working with positive numbers only and log(1) = 0, so that corresponds with the zero point indeed. The example below is simple (2 times 3 is six, obviously): it would have been better to demonstrate 1.55×2.35 or something. But you see how it goes: we add log(2) and log(3) to get log(6) = log(2×3). For 1.55×2.35, the slider would show a position between 3.6 and 3.7. The calculator on my $30 Nokia phone gives me 3.6425. So, yes, it’s not far off. However, it’s hard to imagine that engineers and scientists actually used these slide rules over the past 300 years or so, if not longer.

Of course, Briggs’ tables are more accurate. It’s quite amazing really: he calculated the logarithms of 30,000 (natural) numbers to to fourteen decimal places. It’s quite instructive to check how he did that: all he did, basically, was to calculate successive square roots of 10.

Huh?

Yes. The secret behind is the basic rule of exponentiation: exponentiation is repeated multiplication, and so we can write: a^m+n =a^maⁿ and, more importantly, a^m–n = a^ma^–n = a^m/aⁿ. Because Briggs used the common base 10, we should write 10^m–n = 10^m/10ⁿ. Now Briggs had a table with the successive square roots of 10, like the one below (it’s only six significant digits behind the decimal point, not fourteen, but I just want to demonstrate the principle here), and so that’s basically what he used to calculate the logarithm (to base 10) of 30,000 numbers! Talking patience ! Can you imagine him doing that, day after day, week after week, month after month, year after year? Waw !

So how did he do it? Well… Let’s do it for x = log₁₀(2) = log(2). So we need to find some x for which 10^x = 2. From the table above, it’s obvious that log(2) cannot be 1/2 (= 0.5), because 10^1/2= 3.162278, so that’s too big (bigger than 2). Hence, x = log(2) must be smaller than 0.5 = 1/2. On the other hand, we can see that x will be bigger than 1/4 = 0.25 because 10^1/4= 1.778279, and so that’s less than 2.

In short, x = log(2) will be between 0.25 (= 1/4) and 0.5. What Briggs did then, is to take that 10^1/4factor out using the 10^m–n = 10^m/10ⁿ formula indeed:

10^x–0.25 = 10^x/10^0.25 = 2/1.778279 = 1.124683

If you’re panicking already, relax. Just sit back. What we’re doing here, in this first step, is to write 2 as

2 = 10^x = 10^{[0.25 + (x–0.25)]} = 10^1/410^x–0.25= (1.778279)(1.124683)

[If you’re in doubt, just check using your calculator.] We now need log(10^x–0.25) = log(1.124683). Now, 1.124683 is between 1.154782 and 1.074608 in the table. So we’ll use the lowest value (10^1/32) to take another factor out. Hence, we do another division: 1.124683/1.074608 = 1.046598. So now we have 2 = 10^x = 10^{[1/4 + 1/32 + (x – 1/4 – 1/32)]} = (1.778279)(1.074608)(1.046598).

We now need log(10^{x–1/4–1/32}) = log(1.046598). We check the table once again, and see that 1.046598 is bigger than the value for 10^1/64, so now we can take that 10^1/64value out by doing another division. (10^{x–1/4–1/32})/10^1/64 = 1.046598/1.036633 = 1.009613. Waw, this is getting small! However, we can still take an additional factor out because it’s larger than the 1.009035 value in the table. So we can do another division: 1.009613/1.009035 = 1.000573. So now we have 2 = 10^x = 10^{[1/4 + 1/32 + 1/64 + 1/256 + (x – 1/4 –1/32 – 1/64 –1/256)]} = 10^1/410^1/3210^1/6410^1/25610^{x–1/4–1/32–1/64–1/256}= (1.778279)(1.074608)(1.036633)(1.009035)(1.000573).

Now, the last factor is outside of the range of our table: it’s too small to find a fraction. However, we had a linear approximation based on the gradient for very small fractions x: 10^r= 1 + 2.302585·r. So, in this case, we have 1.000573 = 1 + 2.302585·r and, hence, we can calculate r as 0.000248. [I can shown where this approximation comes from: just check my previous posts if you want to know. It’s not difficult.] So, now, we can finally write the result of our iterations:

2 = 10^x ≈ 10^{(1/4 + 1/32 + 1/64 + 1/256 + 0.000248)}

So log(2) is approximated by 0.25 + 0.03125 + 0.015625 + 0.00390625 + 0.000248 = 0.30103. Now, you can check this easily: it’s essentially correct, to an accuracy of six digits that is!

Hmm… But how did Briggs calculate these square roots of 10? Well… That was done ‘by cut and try’ apparently! Pf-ff ! Talk of patience indeed ! I think it’s amazing ! And I am sure he must have kept this table with the square roots of 10 in a very safe place ! 🙂

So, why did I show this? Well… I don’t know. Just to pay homage to those 17th century mathematicians, I guess. 🙂 But there’s another point as well. While the argument above basically demonstrated the a^m+n = a^maⁿformula or, to be more precise, the a^m–n = a^m/aⁿ formula, it also shows the so-called product rule for logarithms:

log_b(ac) = log_b(a) + log_b(c)

Indeed, we wrote 2 as a product of individual factors 10^rand then we could see the exponents r in all of these individual factors add up to 2. However, the more formal proof is interesting, and much shorter too: 🙂

Let m = log_a(x) and n = log_a(y)
Write in exponent form: x = a^mand y = aⁿ
Multiply x and y: xy = a^maⁿ = a^m+n
Now take log_a of both sides: log_a(xy) = log_a(a^m+n) = (m+n)log_a(a) = m+n = log_a(x) + log_a(y)

You’ll notice that we used another rule in this proof, and that’s the so-called power rule for logarithms:

log_a(xⁿ)= nlog_a(x)

This power rule is proved as follows:

Let m = log_a(x)
Write in exponent form: x = a^m
Raise both sides to the power of n: xⁿ = (a^m)ⁿ
Convert back to a logarithmic equation: log_a(xⁿ)= mn
Substitute for m = log_a(x): log_a(xⁿ)= n log_a(x)

Are there any other rules?

Yes. Of course, we have the quotient rule:

log_a(x/y) = log_a(x) – log_a(y)

The proof of this follows the proof of the product rule, and so I’ll let you work on that.

Finally, we have the ‘change-of-base’ rule, which shows us how we can easily switch from one base to another indeed:

The proof is as follows:

Let x = log_a b
Write in exponent form: a^x= b
Take log_c of both sides and evaluate:

log _c a^x = log _c bxlog _c a = log _c b

[I copied these rules and proofs from onlinemathlearning.com, so let me acknowledge that here. :-)]

Is that it? Well… Yes. Or no. Let me add a few more lines on these logarithmic scales that you often encounter in various graphs. It the same scale as those logarithmic scales used for that slide that we showed above but it covers several orders of magnitude, all equally spaced: 1, 10, 100, 1000, etcetera, instead of 0, 1, 2, 3, etcetera. So each unit increase on the scale corresponds to a unit increase of the exponent for a given base (base 10 in this case): 10¹, 10², 10³, etcetera. The illustration below (which I took from Wikipedia) compares logarithmic scales to linear ones, for one or both axes.

So, on a logarithmic scale, the distance from 1 to 100 is the same as the distance from 10 to 1000, or the distance from 0.1 to 10, or the distance between any point that’s 100 (= 10²) times another point. This is easily explained by the product rule, or the quotient rule rather:

log(10) – log(0.1) = log(10¹/1^–1) = log(10²) = 2

= log(1000) – log(10) = log(10³/1¹) = log(10²/) = 2

= log(100) – log(1) = log(10²/10⁰) = log(10²) = 2

And, of course, we could say the same for the distance between 1 and 1000, and 0.1 and 100. The distance on the scale is 3 units here, while the point is 1000 = 10³the other point.

Why would we use logarithmic scales? Well… Large quantities are often better expressed like that. For example, the Richter scale used to measure the magnitude of an earthquake is just a base–10 logarithmic scale. With magnitude, we mean the amplitude of the seismic waves here. So an earthquake that registers 5.0 units on the Richter scale has a ‘shaking amplitude’ that is 10 times greater than that of an earthquake that registers 4.0. Both are fairly light earthquakes, however: magnitude 7, 8 or 9 are the big killers. Note that, theoretically, we could have earthquakes of a magnitude higher than 10 on the Richter scale: scientists think that the asteroid that created the Chicxulub crater created a cataclysm that would have measured 13 on Richter’s scale, and they associate it with the extinction of the dinosaurs.

The decibel, measuring the level of sound, is another logarithmic unit, so the power associated with 40 decibel is not two times but one hundred times that of 20 decibel!

Now that we’re talking sound, it seems that logarithmic scales are more ‘natural’ when it comes to human perception in general, but I’ll let you have fun googling some more stuff on that! 🙂

Real exponentials and double roots: a post for my kids

Pre-scriptum (dated 26 June 2020): This post did not suffer much – if at all – from the attack by the dark force—which is good because I still like it. Enjoy !

Original post:

There is one loose end related to exponentials that I want to tie up here. It’s the issue of multiple roots (or multiple-valuedness as it’s called in the context of inverse functions).

Introduction

You’ll remember that, for integer exponents n, we had two inverse operations for aⁿ:

The logarithm: the instruction here is to find n (i.e. the exponent) given the value aⁿ and given a (i.e. the base).
The ‘n^throot’ function: the instruction here is find a (i.e. the base) given the value aⁿ and given n (i.e. the exponent).

We have two inverse operations because the exponentiation operation is not commutative: while a + b = b + a (and, therefore, a×b = b×a, so multiplication is commutative as well), aⁿis surely not the same as n^a (except if a = n, of course).

Having two inverse operations is somewhat confusing, of course. However, when we expand the domain of the exponential function to also include rational exponents, the ‘n^throot’ function becomes an exponential function itself: a^1/n. That’s nice, because it tidies things up. We only have one inverse operation now: the logarithm.

Now, my kids understand exponentials, but they find logarithms weird. There are two reasons for that. The most important one is that we don’t learn about the logarithm function when we learn about the exponential function. We only learn logarithms later – much later. Therefore, we are not as familiar with them as we should be. There is no good reason for that but that’s what it is. [I guess I am like Euler here: I’d suggest logarithms and complex numbers should be taught earlier in life. Then we would have less trouble understanding them.]

The second one is notation, I think. Indeed, x = log_a(y) looks much more frightening than y = a^xbecause… Well… Too many letters. It would be more logical to apply the same economy of symbols. We could just write x = _ay instead of log_a(y), for example, using a subscript in front of the variable–as opposed to a superscript behind the variable, as we do for the exponential function. Or, else, we could be equally verbose for the exponential function and write y = exp_a(x) instead of y = a^x. In fact, you’ll find such more explicit expressions in spreadsheets and other software, because these don’t take subscripts or superscripts.

In any case, that’s not the point here. I will come back to the logarithmic function later. The point that I want to discuss here is that, while we sort of merged our ‘n^throot function’ with our exponential function as we allowed for rational exponents as well (as opposed to integers only), we’re actually still taking roots, so to say, and then we note another problem: the square root function yields not one but two numbers when the base (a) is real and positive: ± a^1/2.

In fact, that’s a more general problem.

Odd and even rational exponents

You’ll remember the following rules for exponentiation:

1. For a positive real number a, we have always have two real n^throots when n is even: a^1/n: ± a^1/n. That’s obviously a consequence of having two real square roots ± a^1/2, because the definition of even parity is that n can be written as n = 2k with k any integer, i.e. k ∈ Z (so k can be negative). Hence, a^1/ncan then be written as a^1/2k= a^1/2k= (a^1/k)^1/2. Hence, whatever the value of a^1/k(if k is even, then we have two k^throots once again, but that doesn’t matter), we will have two real roots: plus (a^1/k)^1/2and minus (a^1/k)^1/2

2. If n is uneven (or odd I should say), so n ∈ {2k+1: k ∈ Z}, we have only one real root a^1/2k+1: that root is positive when a is positive and negative when a is negative.

3. For the sake of completeness, let me add the third case: a is negative and n is even. We know there’s no real n^throot of a in that case. That’s why mathematicians invented i: we’ll associate an even root of a negative real number with two complex-valued roots: a^1/n: ± ia^1/n.

The first and second case are illustrated below for n = 2 and n = 3 respectively. The complex roots of the third case cannot be visualized because y is a real axis. Of course, we could imagine the complex-roots ± ia^1/nif we would flip or mirror the blue and red graph (i.e. the graphs for n = 2) along the vertical axis and re-label that axis as the iy-axis, i.e. the imaginary axis. But so I’ll leave that to your imagination indeed.

How does this parity business turn out for rational exponents?

If r is a rational number r = m/n, we’ll have to express it as an irreducible fraction first, so the numerator m and denominator n have no other common divisors than 1, or –1 when considering negative numbers. But let’s look at positive numbers first. If we write r as an irreducible fraction m/n, then m and n cannot both be even. Why not? Because m and n can then both be divided by 2 and m/n is not an irreducible fraction in that case. Let’s assume m is even. Hence, n must be odd in that case. We can then write a^2k/nas (a^k/n)². This number will always be positive, because we are squaring something. So it doesn’t matter if a^k/n has one or two roots: we’ll square them and so the result will always be positive.

Now let’s assume the second possibility: m is odd. We can then write a^m/nas (a^1/n)^m. So now it will depend on whether or not n is even. If n is even, we have two real roots, if n is uneven, then we have only one. Let’s work a few examples:

8^2/3= (8^1/3)²= 2²= 4
4^3/2= (4^1/2)³= (±2)³=±2³=±2³= ±8
16^1/4= (16^1/2)^1/2= (±4)^1/2=±4^1/2=±2= ±2
(–8)^5/3= (–8^1/3)⁵= (–2)⁵= 32

So we have two roots if m is odd and n is even, and only one root in all other cases. However, we said that m and n cannot both be even, hence, if n is even, m must be odd. In short, we can say that a rational exponent m/n is even (i.e. there will be two roots), if n is even. Does that work for complex roots as well? Let’s work that out with an example:

(–4)^3/2= (–4^1/2)³= (±2i)³=(±2)³i³=±8i

So, yes! It works for complex roots as well. 🙂

OK. But let’s ask the obvious question now: where are these even numbers on the real line?

Well… They are everywhere: we can start from 1/2 and then change the numerator: 3/2, 5/2, etcetera. It’s all fine, as long as we use an odd number. However, we can also go down and change the denominator: 1/4, 1/6, 1/8 etcetera. And then we can, of course, take odd multiples of these fractions once again, such as 1025/1024 = 1.0009765625, for example, or on the other side, 1023/1024 = 0.9990234375. So we have two even numbers here right next to the odd number 1. We may increase the precision: we could take ± 1/3588 for example. 🙂

Of course, you may have noticed something here. The first thing, of course, is that we’ve defined these two even numbers 1.0009765625 and 0.9990234375 with a precision of 10 digits behind the decimal point, i.e. 1/1024 = 1/2¹⁰= 0.0009765625. The second point to note is that the last digit of these two rational coefficients, when expressed as a decimal, was 5. Now, you may think that should always be the case because of that 1/2 factor. But it’s not true: 1/6, for example, is a rational number that, written in decimal form, will yield 0.166666… This is an expression with a recurring decimal. And 1/10, of course, just yields 0.1. So there’s no easy rule here. You need to look at the fraction itself, and rational numbers are either as a finite decimal or an infinite repeating decimal. Of course, there are rules for that, but this is not a post on number theory, so I won’t write anything more on this: you can Google some more stuff yourself if you’re interested in this.

Irrational exponents

How does the business of parity work for irrational exponents? The gist of the rather long story above can be summarized easily. We can write a^m/n as a^m/n = a^m·(1/n) =(a^1/n)^m = a^1/n·a^1/n·a^1/n·a^1/n =·… (m times) and so whether or not we have multiple roots (two instead of one) depends on whether or not n is even. Indeed, remember – once again – that exponentiation is repeated multiplication, and so for the sign of the result, what matters is whether or not the number of times that we do that multiplication is even or odd, not only for integer but for rational exponents as well.

For irrational exponents, we also have repeated multiplication, but now we have an infinite expression, not a finite one:

a^r= a^{r(1/Δ + 1/Δ + 1/Δ + 1/Δ +…)}= a^r/Δ·a^r/Δ·a^r/Δ·a^r/Δ…

I explained this expression in my previous post: 1/Δ is an infinitesimally small fraction. In fact, I calculated rational powers of e using the fraction 1/Δ = 1/1024 = 1/2¹⁰. I used that fraction because I had started backwards, taking successive square roots of e, so e^1/2, and then e^1/4, e^1/8, e^1/16, etcetera.

However, as I mentioned when I started doing that, there was no compelling reason to cut things up by dividing them in 2. We could use 1/3 as the fraction to start with and, then, of course, or fraction 1/Δ would have been equal to 1/3¹⁰= 1/59049, so we have an odd number in the denominator here. So that’s one problem: we cannot say if Δ is even or odd. And the the second problem, of course, is that it’s an infinite expression and, hence, we cannot say if we multiplied 1/Δ an even or an odd number of times.

That leads to the third problem: we cannot say if r itself is even or uneven, which is basically what we were looking at: can we define irrational exponents as even or odd?

In short, the answer is no. In practice, that means that we will associate a^rwith one ‘r^throot’ only.

Hmm… That obviously makes a lot of sense but how do we ‘justify’ it from a more formal point of view? Where do these negative roots (for even powers) go? I am not sure. I guess there must be some more formal argument but I’ll leave that to you to look it up. I am fairly happy with what Wikipedia writes on that:

“[Real] Powers of a positive real number are always positive real numbers. […] If the definition of exponentiation of real numbers is extended to allow negative results then the result is no longer well behaved.”

In fact, the article actually does give a somewhat more formal argument, as it writes:

Neither the logarithm method nor the rational exponent method can be used to define b^r as a real number for a negative real number b and an arbitrary real number r. Indeed, e^ris positive for every real number r, so ln(b) is not defined as a real number for b ≤ 0.
As for the rational exponent method, that cannot be used for negative values of b because it relies on continuity. The function f(r) = b^r has a unique continuous extension from the rational numbers to the real numbers for each b > 0. But when b < 0, the function f is not even continuous on the set of rational numbers r for which it is defined.

I am not quite sure I fully understand the last line, but I guess this refers to what I pointed out above: all these even and odd numbers that are so close to each other. When we go from rational to irrational exponents, we can no longer define odd or even.

The bottom line

The bottom line is that, in practice, we will only work with positive real bases. Hence, if b is negative, then we will define b^r as –(–b)^r. Huh?

Yes. Think about it. If b is negative, we’ll just multiply it with –1 to ensure that the base is a positive real number. And then we just put a minus in front to get a graph such as, for example, that x^1/3function for the negative side of the x-axis as well.

You should also note that most applications, like the one I use to draw simple graphs like the ones above (rechneronline.de/function-graphs) are not capable of showing you both roots. They do check whether the exponent is even or odd though, because it plots the function x^1/3on both sides of the zero point, and the x^1/2graph on the positive side only: it’s just not capable to associate more than one y value with one x value indeed. [In case you’re curious to see what it does with an irrational exponent, go and check it yourself: you can put in x^pi or x^e. Will it give function values for negative values of x as well? What’s your guess? :-)]

You’ll wonder why I am emphasizing this point. Well… I just wanted to note that we should be aware of the fact that, as we go from rational to irrational exponents, we sort of deliberately ‘forget’ about the second (negative) root. The point to note is that the issue of multiple-valued functions – such as discussed in the context of, for example, Riemann surfaces – is not necessarily related to complex-valued functions. We have it here (double roots), and we also have it, in general, for periodic functions.

But that’s for a next post. And there we’ll use our ‘natural’ exponential e^x, and its inverse function, ln(x), an awful lot. So I’ll just conclude here with their graphs, noting, as Wikipedia does, that, nowadays, the term ‘exponential function’ is almost exclusively used as a shortcut for describing the natural exponential function e^x. But, to my kids, I say: it’s good that you know where it comes from. 🙂

Post scriptum:

When thinking about such minor things, it’s always to good to think about why we are manipulating all these symbols. Exponentiation is repeated multiplication. What does it mean to multiply something with a negative number? A minus sign is an instruction to reverse direction, to turn around, 180 degrees. So we multiply the magnitudes of both numbers a and b, but we change the direction: if we’re walking down the positive real axis, then now we’re walking down the negative axis.

So repeated multiplication with a negative real number means we’re switching back and forth, wildly jumping from the positive to the negative side of the zero point and then back again. You’ll admit you would appreciate being told in advance how many times we need to do the multiplication if the multiplier is negative: if n is even, then we’ll end up going in the same direction: (–1)ⁿ= 1. No sign reversal. If n is uneven, then we know that, besides the ‘booster’ effect (i.e. the exponentiation operation), we’re expected to speed in the opposite direction: (–1)ⁿ= –1.

Hence, if b would happen to be a negative real number, then defining b^r as –(–b)^r, or assuming that, in general, our base will be a positive real number makes sense. Of course, the math has to keep track of the theoretical possibility that, if the exponent would happen to be even, b might be a negative number, but you can see it’s more of a theoretical possibility indeed. Not something we’d associate with something happening in real life.

In that sense, I should note that multiplication with a complex multiplier is much more ‘real-life’, so to say. Multiplying something with a complex number does the same to the magnitude of both numbers as real multiplication: it multiplies the magnitudes, thereby changing the scale. So the product of a vector that’s 2 units long and a vector that’s 3 units long will still be 6 units long. However, complex numbers also allow for a more gradual change of direction. Instead of just a gear to move forward and backward, we also get a steering wheel so to say: multiplying two complex numbers also adds their angles (as measured from some kind of zero direction obviously), besides multiplying their magnitudes. For example, suppose that the zero direction is east, and we have a vector pointing east indeed (that means its imaginary part is zero) that we need to multiply with a vector pointing north (so that’s a vector with a zero real part, along the imaginary axis), then the final vector will be pointing north.

However, with that subtlety comes complexity as well. With real numbers, you can go in the same direction by reversing direction two times, and so that’s why we have two 2^ndroots (i.e. two square roots) of 1: (a) +1, so then we just stay where we are, and (b) –1, so then we rotate two times a full 180 degrees around the zero point: indeed, (–1)(–1) corresponds to two successive rotations by 180 degrees (or π in radians)–clockwise or counterclockwise, it doesn’t matter: one full loop around the zero point will get us back to square one, or point 1, I should say. 🙂

With complex numbers, it all depends. The 3^rdroot (i.e. the cube root) of 1 was only 1 in the real space but, in the complex space, we have three 3 cube roots of unity. The first one (W¹= W³) is the root we’re used to: unity itself, so the angle here is zero, i.e. straight ahead. In fact, with 1, we just stay where we are: 1×1×1 = 1³= 1 indeed. But that’s not the only way. The illustration below shows two other ways to end up where we are (i.e. at point 1):

The second cube root is W²: 120 degrees. You can see we get back at 1 by making three successive turns of 120 degrees indeed, so that’s one full loop around the or<igin. Using complex numbers (in polar notation), we write e^2π/3×e^2π/3×e^2π/3 = e^6π/3 = e^2π= e⁰= 1.
The third cuberoot is W¹: that’s 240 degrees ! Indeed, here we get back at square one by making three successive turns of 4π/3 radians, i.e .by making two loops, in total, around the origin: e^4π/3×e^4π/3×e^4π/3 = e^12π/3 = e^4π= e⁰= 1.

In short, we gain flexibility (of course, we have four 4^th roots (with which we make 0, 1, 2 and 3 loops around the origin respectively), 5^th roots, and so on), and the great Leonhard Euler was obviously fully right: complex numbers are more ‘natural’ numbers as they allow us to model real-life situations much better.

However, if you think that double roots are a problem… Well… Think again ! With complex numbers, the problem of multiple-valuedness is much more ‘real’, I’d say. 🙂

P.S: As mentioned in my previous post, I talk about that problem of multiple-valuedness when talking about Riemann surfaces in my October-November 2013 posts, so I won’t repeat what I wrote there. It’s about time I get back to both Feynman as well Penrose. 🙂

Just one last (philosophical) question to test your understanding. Negative real numbers have no real square root. That includes –1 obviously. Why is that? Why do we have two square real roots for +1 and no (none!) (real) square roots for –1?

[…] No? Come on!

[…] OK. Let me tell you: it’s all a question of definition. What’s implicit here is that we have only one real direction: from zero to infinity along the positive axis, and then –1 is nothing but a reversal of direction. So it’s an operation really, not a ‘real’ number. In a philosophical sense, of course: negative numbers don’t exist, so to say! Indeed, ask yourself: what is a negative number? It’s an operation: we subtract things when we use the minus sign, and we reverse direction when multiplying numbers with –1. So, if we multiply something with –1 two times in succession, we are back where we are.

Of course, we could say that the negative direction is the ‘real’ direction and, hence, that it’s the positive numbers that don’t ‘really’ exist. Indeed, math doesn’t care about what we say, so let’s say that the negative axis is the ‘real’ one, in a physical sense. What happens then? Well… Let’s see… Let’s do what we did before. We still define –1 as a reversal of direction, or a rotation by 180 degrees and, hence, doing that two times should bring us back where we want to be, so that’s –1 now. OK. So we have (–1)(–1)(–1) = (–1). But so that means that (–1)(–1) = 1, and… How can we write something like that for –1? What number a gives us the result that a×a = –1. Hmm… Only this imaginary number: i×i = i² = –1. So, no matter how hard you try: the way we use symbols is pretty consequent, and so you will find that (–1)(–1) = 1×1 = 1 (so we have two square roots of 1), but we will not find that 1×1 = –1. If you would want to do that, you’d have to define +1 as a reversal of direction, so that basically means that the + sign would take the function of the – sign. Huh?

🙂 You must think I’ve gone crazy. I don’t think so. The idea I want to convey here is that, no matter how abstract math may seem to be – when everything is said and done – it’s intimately connected to our most basic notions of space, and our motion in that space. We go from here to there, or backwards, we change direction, we count things, we measure lengths or distances,… All that math does is to capture that in a non-ambiguous and consistent way. That also results in terse ‘truths’ such as: 1 has two real square roots, +1 and –1, but the square roots of –1 are only imaginary: ± i.

However, that terse statement hides another fun ‘truth’: +i and −i are as real as –1. Indeed, they are a rotation by 90 degrees, counterclockwise (+i) or clockwise (−i), as opposed to, for example, a rotation by 180 degrees (–1), or a full loop (1). 🙂

Euler’s formula revisited

Pre-scriptum (dated 26 June 2020): This post – part of a series of rather simple posts on elementary math and physics – did not suffer much from the attack by the dark force—which is good because I still like it. Enjoy !

Original post:

This post intends to take some of the magic out of Euler’s formula. In fact, I started doing that in my previous post but I think that, in this post, I’ve done a better job at organizing the chain of thought. [Just to make sure: with ‘Euler’s formula’, I mean e^ix= cos(x) + isin(x). Euler produced a lot of formulas, indeed, but this one is, for math, what E = mc²is for physics. :-)]

The grand idea is to start with an initial linear approximation for the value of the complex exponential e^is near s = 0 (to be precise, we’ll use the eⁱ^ε = 1 + iε formula) and then show how the ‘magic’ of i – through the i²= –1 factor – gives us the sine and cosine functions. What we are going to do, basically, is to construct the sine and cosine functions algebraically.

Let us, as a starting point – just to get us focused – graph (i) the real exponential function e^x, i.e. the blue graph, and (ii) the real and imaginary part of the complex exponential function e^ix= cos(x) + isin(x), i.e. the red and green graph—the cosine and sine function. From these graphs, it’s clear that e^x and e^ixare two very different beasts.

1. e^xis just a real-valued function of x, so it ‘maps’ the real number x to some other real number y = e^x. That y value ‘rockets’ away, thereby demonstrating the power of exponential growth. There’s nothing really ‘special’ about e^x. Indeed, writing e^xinstead of 10^xobviously looks better when you’re doing a blog on math or physics but, frankly, there’s no real reason to use that strange number e ≈ 2.718 when all you need is just a standard real exponential. In fact, if you’re a high school student and you want to attract attention with some paper involving something that grows or shrinks, I’d recommend the use of π^x. 🙂

2. e^ixis something that’s very different. It’s a complex-valued function of x and it’s not about exponential growth (though it obviously is about exponentiation, i.e. repeated multiplication): y = e^ixdoes not ‘explode’. On the contrary: y is just a periodic ‘thing’ with two components: a sine and a cosine. [Note that we could also change the base, to 10, for example: then we write 10^ix. We’d also get something periodic, but let’s not get lost before we even start.]

Two different beasts, indeed. How can the addition of one tiny symbol – the little i in e^ix– can make such big difference?

The two beasts have one thing in common: the value of the function near x = 0 can be approximated by the same linear formula:

In case you wonder where this comes from, it’s basically the definition of the derivative of a function, as illustrated below. This is nothing special. It’s a so-called first-order approximation of a function. The point to note is that we have a similar-looking formula for the complex-valued e^ixfunction. Indeed, its derivative is d(e^ix)/dx = ie^ixand when we evaluate that derivative at x = 0, then we get ie⁰= i. So… Yes, the grand result is that we can effectively write:

e^iε≈ 1 + iε for small ε

Of course, 1 + iε is also a different ‘beast’ than 1 + ε. Indeed, 1 + ε is just a continuation of our usual walk along the real axis, but 1 + iε points in a different direction (see below). This post will show you where it’s headed.

Let’s first work with e^xagain, and think about a value for ε. We could take any value, of course, like 0.1 or some fraction 1/n. We’ll use a fraction—for reasons that will become clear in a moment. So the question now is: what value should we use for n in that 1/n fraction? Well… Because we are going to use this approximation as the initial value in a series of calculations—be patient: I’ll explain in a moment—we’d like to have a sufficiently small fraction, so our subsequent calculations based on that initial value are not too far off. But what’s sufficiently small? Is it 1/10, or 1/100,000, or 1/10¹⁰⁰? What gives us ‘good enough’ results? In fact, how do we define ‘good enough’?

Good question! In order to try to define what’s ‘good enough’, I’ll turn the whole thing on its head. In the table below, I calculate backwards from e¹= e by taking successive square roots of e. Huh? What? Patience, please! Just go along with me for a while. First, I calculate e^1/2, so our fraction ε, which I’ll just write as x, is equal to 1/2 here, so the approximation for e^1/2 is 1 + 1/2 = 1.5. That’s off. How much? Well… The actual value of e^1/2 is about 1.648721 (see the table below (or use a calculator or spreadsheet yourself): note that, because I copied the table from Excel, e^x is shown as e^x). Now, 1.648721 is 1.5 + 0.148721, so our approximation (1.5) is about 9% off (as compared to the actual value). Not all that much, but let’s see how we can improve. Let’s take the square root once again: (e^1/2)^1/2= e^1/4, so x = 1/4. And then I do that again, so I get e^1/8, and so on and so on. All the way down to x = 1/1024 = 1/2¹⁰, so that’s ten iterations. Our approximation 1 + x (see the fifth/last column in the table below is then equal to 1 + 1/1024 = 1 + 0.0009765625, which we rounded to 1.000977 in the table.

The actual value of e^1/1024is also about 1.000977, as you can see in the third column of the table. Not exactly, of course, but… Well… The accuracy of our approximation here is six digits behind the decimal point, so that’s equivalent to one part in a millionth. That’s not bad, but is it ‘good enough’? Hmm… Let’s think about it, but let’s first calculate some other things. The fourth column in the table above calculates the slope of that AB line in the illustration above: its value converges to one, as we would expect, because that’s the slope of the tangent line at x = 0. [So that’s the value of the derivative of e^xat x = 0. Just check it: de^x/dx = e^x, obviously, and e⁰= 1.] Note that our 1 + x approximation also converges to 1—as it should!

So… Well… Let’s now just assume we’re happy with with that approximation that’s accurate to one part in a million, so let’s just continue to work with this fraction 1/1024 for x. Hence, we will write that e^1/1024≈ 1 + 1/1024 and now we will use that value also for the complex exponential. Huh? What? Why? Just hang in here for a while. Be patient. 🙂 So we’ll just add the i again and, using that e^iε≈ 1 + iε expression, we write:

e^i/1024≈ 1 + i/1024

It’s quite obvious that 1 + i/1024 is a complex number: its real part is 1, and its imaginary part is 1/1024 = 0.0009765625.

Let’s now work our way up again by using that complex number 1 + i/1024 = 1 + i·0.0009765625 to calculate e^i/512, e^i/256, e^i/128etcetera. All the way back up to x = 1, i.e. eⁱ. I’ll just use a different symbol for x: in the table below, I’ll substitute x for s because I’ll refer to the real part of our complex numbers as ‘x’ from time to time (even if I write a and b in the table below), and so I can’t use the symbol x to denote the fraction. [I could have started with s, but then… Well… Real numbers are usually denoted by x, and so it was easier to start that way.] In any case…

The thing to note is how I calculate those values e^i/512, e^i/256, e^i/128etcetera. I am doing it by squaring, i.e. I just multiply the (complex) number by itself. To be very explicit, note that e^i/512 = (e^i/1024)²= e^i·2/1024 = (e^i/1024)(e^i/1024). So all that I am doing in the table below is multiply the complex number that I have with itself, and then I have a new result, and then I square that once again, and then again, and again, and again etcetera. In other words, when going back up, I am just taking the square of a (complex) number. Of course, you know how to multiply a number with itself but, because we’re talking complex numbers here, we should actually write it out:

(a + i·b)²= a²– b² + i·2ab = a²– b² + 2abi

[It would be good to always separate the imaginary unit i from real numbers like a, b, or ab, but then I am lazy and so I hope you’ll always recognize that i is the imaginary unit.] In any case… When we’re going back up (by squaring), the real part of the next number (i.e. the ‘x’ in x + iy) is a²– b² and the complex part (the ‘y’) is 2abi. So that’s what’s shown below—in the fourth and fifth column, that is.

Look at what happens. The x goes to zero and then becomes negative, and the y increases to one. Now, we went down from e^1/n = e¹ = e^1/1to e^1/n = e^1/1024, but we could have started with e², or e^4/n, or whatever. Hence, I should actually continue the calculations above so you can see what happens when s goes to 2, and then to 3, and then to 4, and so on and so on. What you’d see is that the value of the real and imaginary part of this complex exponential goes up and down between –1 and +1. You’d see both are periodic functions, like the sine and cosine functions, which I added in the last two columns of the table above. Now compare those a and b values (i.e. the second and third column) with the cosine and sine values (i.e. the last two columns). […] Do you see it? Do you see how close they are? Only a few parts in a million, indeed.

You need to let this sink it for a while. And I’d recommend you make a spreadsheet yourself, so you really ‘get’ what’s going on here. It’s all there is to the so-called ‘magic’ of Euler’s formula. That simple (a + ib)²= a²– b² + 2abi formula shows us why (and how) the real and imaginary part oscillate between –1 and +1, just like the cosine and sine function. In fact, the values are so close that it’s easy to understand what follows. They are the same—in the limit, of course.

Indeed, these values a²– b² and 2ab, i.e. the real and imaginary part of the next complex number in our series, are what Feynman refers to as the algebraic cosine and sine functions, because we calculate them as (a + ib)²= a²– b² + 2abi. These algebraic cosine and sine values are close to the real cosine and sine values, especially for small fractions s. Of course, there is a discrepancy becomes – when everything is said and done – we do carry a little error with us from the start, because we stopped at 1/n = 1/1024, before going back up.

There’s actually a much more obvious way to appreciate the error: we know that e^1/1024 should be some point on the unit circle itself. Therefore, we should not equate a with 1 if we have some value b > 0. Or – what amounts to saying the same – if if b is slightly bigger than 0, then a should be slightly smaller than 1. So the e^iε≈ 1 + iε is an approximation only. It cannot be exact for positive values of ε. It’s only exact when ε = 0.

So we’re off—but not far off as you can see. In addition, you should note that the error becomes bigger and bigger for larger s. For example, in the line for s = 1, we calculated the values of the algebraic cosine and sine for s = 2 (see the a^2 – b^2 and 2ab column) as –0.416553 and 0.910186, but the actual values are cos(2) = –0.416146 and sin(2) = 0.909297, which shows our algebraic cosine and sine function is gradually losing accuracy indeed (we’re off like one part in a thousand here, instead of one part in a million). That’s what we’d expect, of course, as we’re multiplying the errors as we move ‘back up’.

The graph below plots the values of the table.

This graph also shows that, as we’re doubling our ratio r all the time, the data points are being spaced out more and more. This ‘spacing out’ gets a lot worse when further increasing s: from s = 1 (that’s the ‘highest’ point in the graph above), we’d go to s = 2, and then to s = 4, s = 8, etcetera. Now, these values are not shown above but you can imagine where they are: for s = 2, we’re somewhere in the second quadrant, for s = 4, we’re in the third, etcetera. So that does not make for a smooth graph. We need points in-between. So let’s ‘fix’ this problem by taking just one value for s out of the table (s = 1/4, for example) and we’ll continue to use that value as a multiplier.

That’s what’s done in the table below. It looks somewhat daunting at first but it’s simple really. First, we multiply the value we got for e^1/4with itself once again, so that gives us a real and an imaginary part for e^1/8(we had that already in the table above and you can check: we get the same here). We then take that value (i.e. e^1/8) not to multiply it with itself but with e^1/4once again. Of course, because the complex numbers are not the same, we cannot use the (a + ib)²= a²– b² + 2abi rule any more. We must now use the more general rule for multiplying different complex numbers: (a + ib)(c + id) = (ac – bd) + i(ad + bc). So that’s why I have an a, b, c and d column in this table: a and b are the components of the first number, and c and d of the second (i.e. e^1/4= 0.969031 + 0.247434i)

In the table above, I let s range from zero (0) to seven (7) in steps of 0.25 (= 1/4). Once again, I’ve added the real cosine and sine values for these angles (they are, of course, expressed in radians), because that’s what s is here: an angle, aka as the phase of the complex number. So you can compare.

The table confirms, once again, that we’re slowly losing accuracy (we’re now 3 to 4 parts in a thousand off), but it is very slowly only indeed: we’d need to do many ‘loops’ around the center before we could actually see the difference on a graph. Hey! Let’s do a graph. [Excel is such a great tool, isn’t it?] Here we are: the thick black line describing a circle on the graph below connects the actual cosine and sine values associated with an angle of 1/4, 1/2, 3/8 etcetera, all the way up to 7 (7 is about 2.3π, so we’re some 40 degrees past our original point after the ‘loop’), while the little ‘+‘ marks are the data points for the algebraic cosine and sine. They match perfectly because our eye cannot see the little discrepancy.

So… That’s it. End of story.

What?

Yes. That’s it. End of story. I’ve done what I promised to do. I constructed the sine and cosine functions algebraically. No compass. 🙂 Just plain arithmetic, including one extra rule only: i²= –1. That’s it.

So I hope I succeeded. The goal was to take some of the magic out of Euler’s formula by showing how that eⁱ^ε = 1 + iε approximation and the definition of i²= –1 gives us the cosine and sine function itself as we move around the unit circle starting from the unity point on the real axis, as shown in that little graph:

Of course, the ε we were working with was much smaller than the size of the arrow suggests (it was equal to 1/1024 ≈ 0.000977 to be precise) but that’s just to show how differentials work. 🙂 Pretty good, isn’t it? 🙂

Post scriptum:

I. If anything, all this post did was to demonstrate multiplication of complex numbers. Indeed, when everything is said and done, exponentiation is repeated multiplication–both for real as well as for complex exponents. The only difference is–well… Complex exponents give us these oscillating things, because a complex exponent effectively throws a sine and cosine function in.

Now, we can do all kinds of things with that. In this post, we constructed a circle without a compass. Now, that’s not as good as squaring the circle 🙂 but, still, it would have awed Pythagoras. Below, I construct a spiral doing the same kind of math: I start off with a complex number again but now it’s somewhat more off the unit circle (1 + 0.247434i). In fact, I took the same sine value as the one we had for e^i/4but I replaced the cosine value (0.969031) with 1 exactly). In other words, my ε is a lot bigger here.

Then I multiply that complex number 1 + 0.247434i with itself to get the next number (0.938776 + 0.494868i), and then I multiply that result once again with my first number (1 + 0.247434i), just like we did when we were constructing the circle. And then it goes on and on and on. So the only difference is the initial value: that’s a bit more off the unit circle. [When we constructed the circle, our initial value was also a bit off but much less. Here we go for a much larger difference.]

So you can see what happens: multiplying complex numbers amounts to adding angles and multiplying magnitudes: αe^iβ·γe^iδ = αγe^i(β+^δ)=|αe^iβ|·|γe^iδ|e^i(β+^δ)| = |α||γ|e^i(β+^δ). So, because we started off with a complex number with magnitude slightly bigger than 1 (you calculate it using Pythagoras’ theorem: it’s 1.03, more or less, which is 3% off, as opposed less than one part in a million for the 1 + 0.000977i number), the next point is, of course, slightly off the unit circle too, and some more than 3% actually. And so that goes on and on and on and the ‘some more’ becomes bigger and bigger in the process.

Constructing a graph like this one is like doing the kind of silly stuff I did when programming little games with our Commodore 64 in the 1980s, so I shouldn’t dwell too much on this. In fact, now that I think of it: I should have started near –i, then my spiral would have resembled an e. 🙂 And, yes – for family reading this – this is also like the favorite hobby of our dad: calculating a better value for π. 🙂

However… The only thing I should note, perhaps, is that this kind of iterative process resembles – to some extent – the kind of process that iterative function systems (IFSs) use to create fractals. So… Well… It’s just nice, I guess. [OK. That’s just an excuse. Sorry.]

II. The other thing that I demonstrated in this post may seem to be trivial but I’ll emphasize it here because it helped me (not sure about you though) to understand the essence of real exponentials much better than I did before. So, what is it?

Well… It’s that rather remarkable fact that calculating (real) irrational powers amounts to doing some infinite iteration. What do I mean with that?

Well… Remember that we kept on taking the square root of e, so we calculated e^1/2, and then (e^1/2)^1/2= e^1/4, and then (e^1/4)^1/2= e^1/8, and then we went on: e^1/16, e^1/32, e^1/64, all the way down to e^1/1024, where we stopped. That was 10 iterations only. However, it was clear we could go on and on and on, to find that limit we know so well: e^1/Δtends to 1 (not to zero (0), and not to e either!) for Δ → ∞.

Now, e = e¹ is an exponential itself and so we can switch to another base, base-10 for example, using the general a^s= (b^k)^s= b^ks= b^tformula, with k = log_b(a). Let’s do base-10: we get e¹ = [10^log₁₀(e)]¹= 10^{0.434294…etcetera}. Now, because e is an irrational number, log₁₀(e) is irrational too, so we indeed have an infinite number of decimals behind the decimal point in 0.434294…etcetera. In fact, e is not only irrational but transcendental: we can’t calculate it algebraically, i.e. as the root of some polynomial with rational coefficients. Most irrational numbers are like that, by the way, so don’t think that being ‘transcendental’ is very special. In any case… That’s a finer point that doesn’t matter much here. You get the idea, I hope. It’s the following:

When we have a rational power a^m/n , it helps to think of it as a product of m factors a^1/n (and surely if we would want to calculate a^m/n without using a calculator, which, I admit, is not very fashionable anymore and so nobody ever does that: too bad, because the manual work involved does help to better understand things). Let’s write it down: a^m/n = a^m·(1/n) =(a^1/n)^m = a^1/n·a^1/n·a^1/n·a^1/n =·… (m times). That’s simple indeed: exponentiation is repeated multiplication. [Of course, if m is negative, then we just write a^m/nas 1/(a^m/n), but so that doesn’t change the general idea of exponentiation.]
However, it is much more difficult to see why, and how, exponentiation with irrational powers amounts to repeated multiplication too. The rather lengthy exposé above shows… Well, perhaps not why, but surely how. [And in math, if we can show how, that usually amounts to showing why also, isn’t it? :-)] Indeed, when we think of a^r (i.e. an irrational power of some (real) number a), we can think of it as a product of an infinite number of factors a^r/Δ. Indeed, we can write a^ras:

a^r= a^{r(1/Δ + 1/Δ + 1/Δ + 1/Δ +…)}= a^r/Δ·a^r/Δ·a^r/Δ·a^r/Δ…

Not convinced? Let’s work an example: 10^π= [e^ln10]^π= [e^ln10]^π = e^ln10·^π= e^ln10·^π= e^7.233784…Of course, if you take your calculator, you’ll find something like 1385.455731, both for 10^π and e^7.233784 (hopefully!), but so that’s not the point here. We’ve shown that e is an infinite product e^1/Δ·e^1/Δ·e^1/Δ·e^1/Δ·… =e^(1/Δ+^{1/Δ+1/Δ+1/Δ+…)}= e^Δ/Δ with Δ some infinitely large (but integer) number. In our example, we stopped the calculation at Δ = 1024, but you see the logic: we could have gone on forever. Therefore, we can write e^7.233784… as

e^7.233784… = e^7.233784…(^1/Δ+^{1/Δ+1/Δ+1/Δ+…)}= e^{7.233784…/Δ}·e^{7.233784…/Δ}·e^{7.233784…/Δ}…

Still not convinced? Let’s revert back to base 10. We can write the factors e^{7.233784…/Δ}as e^{(ln10·π)/Δ}= [e^ln10]^π/Δ= 10^π/Δ. So our original power 10^πis equal to: 10^π= 10^π/Δ·10^π/Δ·10^π/Δ·10^π/Δ·10^π/Δ·10^π/Δ… = 10^π(Δ/Δ), and of course, 10^1/Δ also tends to 1 as Δ goes to infinity (not to zero, and not to 10 either). 🙂 So, yes, we can do this for any real number a and for any r really.

Again, this may look very trivial to the trained mathematical eye but, as a novice in Mathematical Wonderland, I felt I had to go through this to truly understand irrational powers. So it may or may not help you, depending on where you are in MW.

[Proving that the limit for Δ/Δ goes to 1 as Δ goes to ∞ should not be necessary, I hope? 🙂 But, just in case you wonder how the formula for rational and irrational powers could possibly be related, we can just write a^m/n= a^{(m/n)(1/Δ + 1/Δ + 1/Δ + 1/Δ +…)}= a^m/nΔ·a^m/nΔ·a^m/nΔ·a^m/nΔ·…= (a^{1/Δ + 1/Δ + 1/Δ + 1/Δ +…})^m/n= a^m/n, as we would expect. :-)]

III. So how does that a^r= a^r/Δ·a^r/Δ·a^r/Δ·a^r/Δ… formula work for complex exponentials? We just add the i, so we write a^ir but we know what effect that has: we have a different beast now. A complex-valued function of r, or… Well… If we keep the exponent fixed, then it’s a complex-valued function of a! Indeed, do remember we have a choice here (and two inverse functions as well!).

However, note that we can write a^ir in two slightly different ways. We have two interpretations here really:

A. The first interpretation is the easiest one: we write a^ir as a^ir = (a^r)ⁱ= (a^{r/Δ + r/Δ + r/Δ + r/Δ +…})ⁱ.

So we have a real power here, a^r, and so that’s some real number, and then we raise it to the power i to create that new beast: a complex-valued function with two components, one imaginary and one real. And then we know how to relate these to the sine and cosine function: we just change the base to e and then we’re done.

In fact, now that we’re here, let’s go all the way and do it. As mentioned in my previous post – it follows out of that a^s= (e^k)^s= e^ks= e^tformula, with k = ln(a) – the only effect of a change of base is a change of scale of the horizontal axis: the graph of a^sis fully identical to the graph of e^tindeed: we just we need to substitute s by t = ks = ln(a)·s. That’s all. So we actually have our ‘Euler formula for a^ishere. For example, for base 10, we have 10^is= cos[ln(a)·s] + isin[ln(a)·s].

But let’s not get lost in the nitty-gritty here. The idea here is that we let i ‘act’ on a^r, so to say. And then, of course, we can write a^r as we want, but that doesn’t change the essence of what we’re dealing with.

B. The second interpretation is somewhat more tricky: we write a^ir as a^ir = a^ir/Δ·a^ir/Δ·a^ir/Δ·a^ir/Δ·…

So that’s a product of an (infinite) number of complex factors a^ir/Δ. Now, that is a very different interpretation than the one above, even if the mathematical result when putting real numbers in for a and r will – obviously – have to be the same. If the result is the same, then what am I saying really? Well… Nothing much, I guess. Just that the interpretation of an exponentiation as repeated multiplication makes sense for complex exponentials as well:

For rational r, we’ll have a finite number of complex factors: a^im/n = a^i/n·a^i/n·a^i/n·a^i/n·… (m times).
For irrational r, we’ll have an infinite number of complex factors a^ir = a^ir/Δ·a^ir/Δ·a^ir/Δ·a^ir/Δ… etcetera.

So the difference with the first interpretation is that, instead of looking at a^iras a real number a^rthat’s being raised to the complex power i, we’re looking at a^iras a complex number aⁱthat’s being raised to the real power r. As said, the mathematical result when putting real numbers in for a and r will – obviously – have to be the same. [Otherwise we’d be in serious trouble of course: math is math. We can’t have the same thing being associated with two different results.] But, as said, we can effectively interpret a^ir in two ways.

[…]

What I am doing here, of course, is contemplating all kinds of mathematical operations here – including exponentiation – on the complex space, rather on the real space. So the first step is to raise a complex number to a real power (as opposed to raising a real number to a complex power). The next step will be to raise a complex number to a complex power. So then we’re talking complex-valued functions of complex variables.

Now, that’s what complex analysis is all about, and I’ve written very extensively about that in my October-November 2013 post. So I would encourage you to re-read those, now that you’ve got, hopefully, a bit more of an ‘intuitive’ understanding of complex numbers with the background given in this and my previous post.

Complex analysis involves mapping (i.e. mapping from one complex space to another) and that, in turn, involves the concept of so-called analytic and/or holomorphic functions. Understanding those advanced concepts is, in turn, essential to understanding the kind of things that Penrose is writing about in Chapter 9 to 12 of his Road to Reality. […] I’ll probably re-visit these chapters myself in the coming weeks, as I realize I might understand them somewhat better now. If I could get through these, I’d be at page 250 or so, so that’s only one quarter of the total volume. Just an indication of how long that Road to Reality really is. 🙂

And then I am still not sure if it really leads to ‘reality’ because, when everything is said and done, those new theories (supersymmetry, M-theory, or string theory in general) are quite speculative, aren’t they? 🙂

Reflecting on complex numbers (again)

Original post:

This will surely be not my most readable post – if only because it’s soooooo long and – at times – quite ‘philosophical’. Indeed, it’s not very rigorous or formal, unlike those posts on complex analysis I wrote last year. At the same time, I think this post digs ‘deeper’, in a sense. Indeed, I really wanted to get to the heart of the ‘magic’ behind complex numbers. I’ll let you judge if I achieved that goal.

Complex numbers: why are they useful?

The previous post demonstrated the power of complex numbers (i.e. why they are used for), but it didn’t say much about what they are really. Indeed, we had a simple differential equation–an expression modeling an oscillator (read: a spring with a mass on it), with two terms only: d²x/dt² = –ω²x–but so we could not solve it because of the minus sign in front of the term with the x.

Indeed, the so-called characteristic equation for this differential equation is r² = –ω² and so we’re in trouble here because there is no real-valued r that solves this. However, allowing complex-valued roots (r = ±iω) to solve the characteristic equation does the trick. Let’s analyze what we did (and don’t worry if you don’t ‘get’ this: it’s not essential to understand what follows):

Using those complex roots, we wrote the general solution for the differential equation as Aeⁱ^ωt+ Be^–iωt. Now, note that everything is complex in this general solution, not only the eⁱ^ωtand e^–iωt ‘components’ but also the (random) coefficients A and B.
However, because we wanted to find a real-valued function in the end (remember: x is a vertical displacement from an equilibrium position x = 0, so that’s ‘real’ indeed), we imposed the condition that Aeⁱ^ωtand Be^–iωt had to be each other’s complex conjugate. Hence, B must beequal to A* and our ‘general’ (real-valued) solution was Aeⁱ^ωt+ A*e^–iωt. So we only have one complex (but equally random) coefficient now – A – and we get the other one (A*) for free, so to say.
Writing A in polar notation, i.e. substituting A for A = x₀e^iΔ, which implies that A* = x₀e^–iΔ, yields A₀eⁱ^Δeⁱ^ωt+ A₀e^-i^Δe^–iω = A₀[eⁱ^{(ωt + Δ}⁾+ e^{–i(ωt + Δ}⁾].
Expanding this, using Euler’s formula (and the fact that cos(-α) = cosα but sin(-α) = –sinα) then gives us, finally, the following (real-valued) functional form for x:

A₀[cos(ωt + Δ) + isin(ωt + Δ) + cos(ωt + Δ) – isin(ωt + Δ)]

= 2A₀cos(ωt + Δ) = x₀cos(ωt + Δ)

That’s easy enough to follow, I guess (everything is relative of course), but do we really understand what we’re doing here? Let me rephrase what’s going on here:

In the initial problem, our dependent variable x(t) was the vertical displacement, so that was a real-valued function of a real-valued (independent) variable (time).
Now, we kept the independent variable t real – time is always real, never imaginary 🙂 – but so we made x = x(t) a complex (dependent) variable by equating x(t) with the complex-valued exponential e^rt. So we’re doing a substitution here really.
Now, if e^rt is complex-valued, it means, of course, that r is complex and so that allows us to equate r with the square root of a negative number (r = ±iω).
We then plug these imaginary roots back in and get a general complex-valued solution (as expected).
However, we then impose the condition that the imaginary part of our solution should be zero.

In other words, we had a family of complex-valued functions as a general solution for the differential equation, but we limited the solution set to a somewhat less general solution including real-valued functions only.

OK. We all get this. But it doesn’t mean we ‘understand’ complex numbers. Let’s try to take the magic out of those complex numbers.

Complex numbers: what are they?

I’ve devoted two or three posts to this already (October-November 2013) but let’s go back to basics. Let’s start with that imaginary unit i. The essence of i – and, yes, I am using the term ‘essence’ in a very ‘philosophical’ sense here I guess: i‘s intrinsic nature, so to speak – is that its square is equal to minus one: i²= –1.

That’s it really. We don’t need more. Of course, we can associate i with lots of other things if we would want to (and we will, of course!), such as Euler’s formula for example, but these associations are not essential – or not as essential as this definition I should say. Indeed, while that ‘rule’ or ‘definition’ is totally weird and – at first sight – totally random, it’s the only one we need: all other arithmetic rules do not change and, in fact, it’s just that one extra rule that allows us to deal with any algebraic equation – so that’s literally every equation involving addition, multiplication and exponentiation (so that’s every polynomial basically). However, stating that i²= –1 still doesn’t answer the question: what is a complex number really?

In order to not get too confused, I’ve started to think we should just take complex numbers at face value: it’s the sum of (i) some real number and (ii) a so-called imaginary part, which consists of another real number multiplied with i. [So the only ‘imaginary’ bit is, once again, i: all the rest is real! ] Now, when I say the ‘sum’, then that’s not some kind of ‘new’ sum. Well… Let me qualify that. It’s not some kind of ‘new’ sum because we’re just adding two things the way we’re used to: two and two apples are four apples, and one orange plus two more is three. However, it is true that we’re adding two separate beasts now, so to say, and so we do keep the things with an i in them separate from the real bits. In short, we do keep the apples and the oranges separate.

Now, I would like to be able to say that multiplication of complex numbers is just as straightforward as adding them, but that’s not true. When we multiply complex numbers, that i²= –1 rule kicks in and produces some ‘effects’ that are logical but not all that ‘straightforward’ I’d say.

Let’s take a simple example–but a significant one (if only because we’ll use the result later): let’s multiply a complex number with itself, i.e. let’s take the square of a complex number. We get (a + bi)²= (a + bi)(a + bi) = a·a + a·(bi) + (bi)·a + (bi)·(bi) = a²+ 2abi + b²i²= a² + 2abi – b². That’s very different as compared to the square of a real sum a + b: (a + b)²= a²+ 2ab + b². How? Just look at it: we’ve got a real bit (a² – b²) and then an imaginary bit (2abi). So what?

Well… The thumbnail graph below illustrates the difference for a = b: it maps x to (a) 4x²[i.e. (x + x)²] and to (b) 2x² [i.e. (x + ix)²] respectively. Indeed, when we’re squaring real numbers, we get (a + b)²= 4a²–i.e. a ‘real bit’ only, of course!–but when we’re squaring complex numbers, we need to keep track of two components: the real part and the imaginary part. However, the real part (a² – b²) is zero in this case (a = b), and so it’s only the imaginary part 2abi = 2a²i that counts!

That’s kids stuff, you’ll say… In fact, when you’re a mathematician, you’ll say it’s a nonsensical graph. Why? Because it compares an apple and an orange really: we want to show 2ix²really, not 2x².

That’s true. However, that’s why the graph is actually useful. The red graph introduces a new idea, and with a ‘new’ idea I mean something that’s not inherent in the i²= –1 identity: it associates i with the vertical axis in the two-dimensional plane.

Hmm… This is an idea that is ‘nice’ – very nice actually – but, once again, I should note that it’s not part of i‘s essence. Indeed, the Italian mathematicians who first ‘invented’ complex numbers in the early 16th century (Tartaglia (‘the Stammerer’) and da Vinci’s friend Cardano) introduced roots of –1 because they needed them to solve algebraic equations. That’s it. Full stop. It was only much later (some hundred years later that is!) that Euler and Descartes associated imaginary numbers (like 2ix²) with the vertical coordinate axis. To my readers who have managed not to fall asleep while reading this: please continue till the end, and you will understand why I am saying the idea of a geometrical interpretation is ‘not essential’.

To the same readers, I’ll also say the following, however: if we do associate complex numbers with a second dimension, then we can associate the algebraic operations with things we can visualize in space. Most of you–all of you I should say–know that already, obviously, but let’s just have a look at that to make sure we’re on the same page.

A very basic thing in physical mathematics is reversing the direction of something. Things go in one direction, but we should be able to visualize them going in the opposite direction. We may associate this with a variable going from 0 to infinity (+∞): it may be time (t), or a time-dependent variable x, y or z. Of course, we know what we have here: we think of the positive real axis. So, what we do when we multiply with –1 is reversing its direction, and so then we’re talking the negative real axis: a variable going from 0 to minus infinity (-∞). Therefore, we can associate multiplication by –1 with a full rotation around the center (i.e. around the zero point) by 180 degrees (i.e. by π, in radians).

You may think that’s a weird way of looking at multiplication by minus one. Well… Yes and no. But think of it: the concept of negative numbers is actually as ‘weird’ as the concept of the imaginary unit in a way. I mean… Think about it: we’re used to use negative numbers because we learned about them when we were very small kids but what are they really? What does it mean to have minus three apples? You know the answer of course: it probably means that you owe someone three apples but that you don’t have any right now. 🙂 […] But that’s not the point here. I hope you see what I mean: negative numbers are weird too, in a sense. Indeed, we should be aware of the fact that we often look at concepts as being ‘weird’ because we weren’t exposed to them early enough: the great mathematician Leonhard Euler thought complex numbers were so ‘essential’ to math and, hence, so ‘natural’ that he thought kids should learn complex numbers as soon as they started learning ‘real’ numbers. In fact, he probably thought we should only be using complex numbers because… Well… They make the arithmetic space complete, so to say. […] But then I guess that’s because Euler understood complex numbers in a way we don’t, which is why I am writing about them here. 🙂

OK. Back to the main story line. In order to understand complex numbers somewhat better, it is actually useful – but, again, not necessarily essential – to think of i as a halfway rotation, i.e. a rotation by 90 degrees only, clockwise or counterclockwise, as illustrated above: multiplication with i means a counterclockwise rotation by 90 degrees (or π/2 radians) and multiplication with –i means a clockwise rotation by the same amount. Again, the minus sign gives the direction here: clockwise or counterclockwise. It works indeed: i·i =(-i)·(-i) = –1.

OK. Let’s wrap this up: we might say that

a positive real number is associated with some (absolute) quantity (i.e. a magnitude);
a minus sign says: “Go the opposite way! Go back! Subtract!”– so it’s associated with the opposite direction or the opposite of something in general; and, finally,
the imaginary unit adds a second dimension: instead of moving on a line only, we can now walk around on a plane.

Once we understand that, it’s easy to understand why, in most applications of complex numbers, you’ll see the polar notation for complex numbers. Indeed, instead of writing a complex number z as z = a+ ib, we’ll usually see it written as:

z = re^iθ with e^iθ = cosθ + isinθ

Huh? Well… Yes. Let me throw it in here straight away. You know this formula: it’s Euler’s formula. The so-called ‘magical’ formula! Indeed, Feynman calls it ‘our jewel’: the ‘most remarkable formula in mathematics’ as he puts it. Waw ! If he says so, it must be right. 🙂 So let’s try to understand it.

Is it magical really? Well… I guess the answer is ‘Yes’ and ‘No’ at the same time:

No. There is no ‘magic’ here. Associating the real part a and the imaginary part b with a magnitude r and an angle θ (a = rcosθ and b = rcosθ) is actually just an application of the Pythagorean theorem, so that’s ‘magic’ you learnt when you were very little and, hence, it does not look like magic anymore. [Although you should try to appreciate its ‘magic’ once again, I feel. Remember that you heard about the Pythagorean theorem because your teacher wanted to tell you what the square root of 2 actually is: a so-called irrational number that we get by taking the ‘one-half power’ of 2, i.e. 2^1/2= 2^0.5, or, what amounts to the same, the square root of 2. Of course, you and I are both used to irrational numbers now, like 2^1/2, but they are also ‘weird’. As weird as i. In fact, it is said that the Greek mathematician who claimed their existence was exiled, because these irrational numbers did not fit into the (early) Pythagorean school of thought. Indeed, that school of thought wanted to reduce geometry to whole numbers and their ratios only. So there was no place for irrational numbers there!]
Yes. It is ‘magical’. Associating e^iθ – so that’s a complex exponential function really! – with the unit circle is something you learnt much later in life only, if ever. It’s a strange thing indeed: we have a real (but, I admit, irrational) number here – e is 2.718 followed by an infinite number of decimals as you know, just like π – and then we raise to the power iθ, so that’s i once again multiplied by a real number θ (i.e. the so-called phase or – to put it simply – the angle). By now, we know what it means to multiply something with i, and–of course–we also know what exponentiation is (it’s just a shorthand for repeated multiplication), but we haven’t defined complex exponentials yet.

In fact… That’s what we’re going to do here. But in a rather ‘weird’ way as you will see: we won’t define them really but we’ll calculate them. For the moment, however, we’ll leave it at this and just note that, through Euler’s relation, we can see how a fraction or a multiple of i, e.g. 0.1i or 2.3i, corresponds to a fraction or a multiple of the angle associated with i, i.e. 0.1 times π/2 or 2.3 times π/2. In other words, Euler’s formula shows how the second (spatial) dimension is associated with the concept of the angle.

[…] And then the third (spatial) dimension is, of course, easy to add: it’s just an angle in another direction. What direction? Well… An angle away from the plane that we just formed by introducing that first angle. 🙂 […] So, from our zero point (here and now), we use a ruler to draw lines, and then a compass to measure angles away from that line, and then we create a plane, and then we can just add dimensions as we please by adding more ‘angles’ away from what we already have (a line, or a plane, and any higher-dimensional thing really).

Dimensions

I feel I need to digress briefly here, just to make sure we’re on the same page. Dimensions. What is a dimension in physics or in math? What do we mean if we say that spacetime is a four-dimensional continuum? From what we wrote above, the concept of a spatial dimension should be obvious: we have three dimensions in space (the x, y and z direction), and so we need three numbers indeed to describe the position of an object, from our point of view that is (i.e. in our reference frame).

But so we also have a fourth number: time. By now, you also know that, just like position and/or motion in space, time is relative too: that is relative to some frame of reference indeed. So, yes, we need four numbers, i.e. four dimensions, to describe an event in spacetime. That being said, time is obviously still something different (I mean different than space), despite the fact that Einstein’s relativity theory relates it to space: indeed, we showed in our post on (special) relativity that there’s no such thing as absolute time. However, that actually reinforces the point: a point in time is something fundamentally different than a point in space. Despite the fact that

Time is just like a space dimension in the physical-mathematical meaning of the term ‘dimension’ (a dimension of a space or an object is one of the coordinates that is needed to specify a point within that space, or to ‘locate’ the object – both in time and space that is); and that,
We can express distance and time in the same units because the speed of light is absolute (so that allows us to express time in meter, despite the fact that time is relative or “local”, as Hendrik Lorentz called it); and that, finally,
If we do that (i.e. if we express time and distance in equivalent units), the equations for space and time in the Lorentz transformation equations mirror each other nicely – ‘mixing’ the space and time variables in the same way, so to say – and, therefore, space and time do form a ‘kind of union’, as Minkowski famously said;

Despite all that, time and space are fundamentally different things. Perhaps not for God – because He (or She, or It?) is said to be Everywhere Always – but surely for us, humans. For us, humans, always busy constructing that mental space with our ruler and our compass, time is and remains the one and only truly independent variable. Indeed, for us, mortal beings, the clocks just tick (locally indeed – that’s why I am using a plural: clocks – but that doesn’t change the fact they’re ticking, and in one direction only).

And so things happen and equations such as the one we started with – i.e. the differential equation modeling the behavior of an oscillator – show us how they happen. In one of my previous posts, I also showed why the laws of physics do not allow us to reverse time, but I won’t talk about that here. Let’s get back to complex numbers. Indeed, I am only talking about dimensions here because, despite all I wrote above about the imaginary axis in the complex plane, the thing to note here is that we did not use complex numbers in the physical-mathematical problem above to bring in an extra spatial dimension.

We just did it because we could not solve the equation with one-dimensional numbers only: we needed to take the square root of a negative number and we couldn’t. That was it basically. So there was no intention of bringing in a y- or z-dimension, and we didn’t. If we would have wanted to do that, we would have had to insert another dependent variable in the differential equation, and so it would have become a so-called partial differential equation in two or three dependent variables (x, y and z), with time – once again – as the independent variable (t). [A differential equation in one variable only (real- or complex-valued), like the ones we’re used to now, are referred to as ordinary differential equations, as opposed to… no, not extraordinary, but partial differential equations.]

In fact, if we would have generalized to two- or three-dimensional space, we would have run into the same type of problem (roots of negative numbers) when trying to solve the partial differential equation and so we would have needed complex-valued variables to solve it analytically in this case too. So we would have three ‘dimensions’ but each ‘dimension’ would be associated with complex (i.e. ‘two-dimensional) numbers. Is this getting complicated? I guess so.

The point is that, when studying physics or math, we will have to get used to the fact that these ‘two-dimensional numbers’ which we introduced, i.e. complex numbers, are actually more ‘natural’ ‘numbers’ to work with from a purely analytic point of view (as for the meaning of ‘analytic’, just read it as ‘logical problem-solving’), especially when we write them in their polar form, i.e. as complex exponentials. We can then take advantage of that wonderful property that they already are a functional form (z =re^iθ), so to speak, and that their first, second etcetera derivative is easy to calculate because that ‘functional form’ is an exponential, and exponentials come back to themselves when taking the derivative (with the coefficient in the exponent in front). That makes the differential equation a simple algebraic equation (i.e. without derivatives involved), which is easy to solve.

In short, we should just look at complex numbers here (i.e. in the context of my three previous posts, or in the context of differential equations in general) as a computational device, not as an attempt to add an extra spatial dimension to the analysis.

Now, that’s probably the reason why Feynman inserts a chapter on ‘algebra’ that, at first, does not seem to make much sense. As usual, however, I worked through it and then found it to be both instructive as well as intriguing because it makes the point that complex exponentials are, first and foremost, an algebraic thing, not a geometrical thing.

I’ll try to present his argument here but don’t worry if you can’t or don’t want to follow it all the way through because… Well… It’s a bit ‘weird’ indeed, and I must admit I haven’t quite come to terms with it myself. On the other hand, if you’re ready for some thinking ‘outside of the box’, I assure you that I haven’t found anything like this in a math textbook or on the Web. This proves the fact that Feynman was a bit of a maverick… Well… In any case, I’ll let you judge. Now that you’re here, I would really encourage you to read the whole thing, as loooooooong as it is.

Complex exponentials from an algebraic point of view: introduction

Exponentiation is nothing but repeated multiplication. That’s easy to understand when the exponents are integers: a to the power n (aⁿ) is a×a×a×a×… etcetera – repeated n times, so we have n factors (all equal to a) in the product. That’s very straightforward.

Now, to understand rational exponents (so that’s an m/n exponent, with m and n integers), we just need to understand one thing more, and that is the inverse operation of exponentiation, i.e. the n^throot. We then get a^m/n= (a^m)^1/n. So, that’s easy too. […] Well… No. Not that easy. In fact, our problems starts right here:

If n is even, and a is a positive real number, we have two (real) n^throots a^1/n: ± a^1/n.
However, if a is negative (and n is still even obviously), then we have a problem. There’s no real n^throot of a in that case. That’s why Cardano invented i: we’ll associate an even root of a negative real number with two complex-valued roots.
What if n is uneven? Then we have only one real root: it’s positive when a is positive, and negative when a is negative. Done.

But let’s not complicate matters from the start. The point here is to do some algebra that should help us to understand complex exponentials. However, I will make one small digression, and that’s on logarithmic functions. It’s not essential but, again, useful. […] Well… Maybe. 🙂 I hope so. 🙂

We know that exponentials are actually associated with two inverse operations:

Given some value y and some number n, we can take the n^throot of y (y^1/n) to find the original base x for which y = xⁿ.
Given some value y and some number a, we can take the logarithm (to base a) of y to find the original exponent x for which y = a^x.

In the first case, the problem is: given n, find x for which y = xⁿ. In the second case, the problem is: given a, find x for which y = a^x. Is that complicated? Probably. In order to further confuse you, I’ve inserted a thumbnail graph with y = 2^x (so that’s the exponential function with base 2) and y = log₂x (so that’s the logarithmic function with base 2). You can see these two functions mirror each other, with the x = y line as the mirror axis.

We usually find logarithms more ‘difficult’ than roots (I do, for sure), but that’s just because we usually learn about them much later in life–like in a senior high school class, for example, as opposed to a junior high school class (I am just guessing, but you know what I mean).

In addition, we have these extra symbols ‘log‘–L-O-G :-)–to express the function. Indeed, we use just two symbols to write the y = 2^xfunction: 2 and x – and then the meaning is clear from where we write these: we write 2 in normal script and x as a superscript and so we know that’s exponentiation. But so we’re not so economical for the logarithmic function. Not at all. In fact, we use three symbols for the logarithmic function: (1) ‘log’ (which is quite verbose as a symbol in itself, because it consists of three letters), (2) 2 and (3) x. That’s not economical at all! Indeed, why don’t we just write y = ₂x or something? So that’s a subscript in front, instead of a superscript behind. It would work. It’s just a matter of getting used to it, i.e. it’s just a convention in other words.

Of course, I am joking a bit here but you get my point: in essence, the logarithmic function should not come across as being more ‘difficult’ or less ‘natural’ than the exponential function: exponentiation involves two numbers – a base and an exponent – and, hence, it’s logical that we have two inverse operations, rather than one. [You’ll say that a sum or a product involves (at least) two terms or two factors as well, so why don’t they have two inverse operations? Well… Addition and multiplication are commutative operations: a+b = b+a, and a·b = b·a. Exponentiation isn’t: aⁿ≠ n^a. That’s why. Check it: 2×3 = 3×2, but 2³= 8 ≠ 3²= 9.]

Now, apart from us ‘liking’ exponential functions more than logarithmic functions because of the non-relevant fact that we learned about log functions only much later in our life, we will usually also have a strong preference for one or the other base for an exponential. The most preferred base is, obviously, ten (10). We use that base in so-called scientific notations for numbers. For example: the elementary charge (i.e. the charge of an electron) is approximately –1.6×10⁻¹⁹ coulombs. […] Oh… We have a minus sign in the exponent here (–19). So what’s that? Sorry. I forgot to mention that. But it’s easy: a^–n= (aⁿ)^–1= 1/aⁿ.

Our most preferred base is 10 because we have a decimal system, and we have a decimal system because we have ten fingers. Indeed, the Maya used a base-20 system because they used their toes to count as well (so they counted in twenties instead of tens), and it also seems that some tribes had octal (base-8) systems because they used the spaces between their fingers, rather than the fingers themselves. And, of course, we all know that computers use a base-2 system because… Well… Because they’re computers. In any case, 10 is called the common base, because… Well… Because it’s common.

However, by now you know that, in physics and mathematics, we prefer that strange number e as a base. However, remember it’s not that strange: it’s just a number like π. Why do we call it ‘natural’? Because of that nice property: the derivative of the exponential function e^xcomes back to itself: d(e^x)/dt = e^x. That’s not the case for 10^x. In fact, taking the derivative of 10^xis pretty easy too: we just need to put a coefficient in front. To be specific, we need to put the logarithm (to base e) of the base of our exponential function (i.e. 10) in front: d(10^x)/dt = 10^xln(10). [Ln(10) is yet another notation that has been introduced, it seems, to confuse young kids and ensure they hate logarithms: ln(10) is just log_e(10) or, if I would have had my way in terms of conventions (which would ensure an ‘economic’ use of symbols), we could also write ln(10) = _e10. :-)]

Stop! I am going way too fast here. We first need to define what irrational powers are! Indeed, from all that I’ve written so far, you can imagine what a^m/nis (a^m/n = a^m)^1/n, but what if m is not an integer? What if m equals the square root of 2, for example? In other words, what is 10^xor e^xor 2^xor whatever for irrational exponents?

We all sort of ‘know’ what irrationals are: it involves limits, infinitesimals, fractions of fractions, Dedekind cuts. Whatever, even if you don’t understand a word of what I am writing here, you do – intuitively: irrationals can be approximated by fractions of fractions. The grand idea is that we divide some number by 2, and then we divide by 2 once again (so we divide by 4), and then once again (so we take 1/8), and again (1/16), and so on and so on. These are Dedekind cuts. Of course, dividing by two is a pretty random way of cutting things up. Why don’t we divide by three, or by four, for example? Well… It’s the same as with those other ‘natural’ numbers: we have to start somewhere and so this ‘binary’ way of cutting things up is probably the most ‘natural’. 🙂 [Have you noticed how many ‘natural’ numbers we’ve mentioned already: 10, e, π, 2… And one (1) itself of course. :-)]

So we’ll use something like Dedekind cuts for irrational powers as well. We’ll define them as a sort of limit (in fact, that’s exactly what they are) and so we have to find some approximation (or convergence) process that allows us to do so.

We’ll start with base 10 here because, as mentioned above, base 10 comes across as more ‘natural’ (or ‘common’) to us non-mathematicians than the so-called ‘natural’ base e. However, I should note that the base doesn’t matter much because it’s quite easy to switch from one base to another. Indeed, we can always write a^s= (b^k)^s= b^ks= b^twith a = b^kand t = k·s (as for k, k is obviously equal to log_b(a). From this simple formula, you can see that changing base amounts to changing the horizontal scale: we replace s by t = k·s. That’s it. So don’t worry about our choice of base. 🙂

Complex exponentials from an algebraic point of view: well… Not the introduction 🙂

Ouf! So much stuff! But so here we go. We take base 10 and see how such an approximation of an irrational power of 10 (10^x) looks like. Of course, we can write any irrational number x as some (positive or negative) integer plus an endless series of decimals after the zero (e.g. e = 2 + 0.7182818284590452… etc). So let’s just focus on numbers between 0 and 1 as for now (so we’ll take the integer out of the total, so to speak). In fact, before we start, I’ll cheat and show you the result, just to make sure you can follow the argument a bit.

Yes. That’s how 10^x looks like, but so we don’t know that yet because we don’t know what irrational powers are, and so we can’t make a graph like that–yet. We only know very general things right now, such as:

10⁰ = 1 and 10¹ = 10 etcetera.
Most importantly, we know that 10^m/n = (10^m)^1/n = (10^1/n)^mfor integer m and n.

In fact, we’ll use the second fact to calculate 10^x for x = 1/2, 1/4, 1/8, 1/16, and so on and so on. We’ll go all the way down to where x becomes a fraction very close to zero: that’s the table below. Note that the x values in the table are rational fractions 1/2, 1/4, 1/8 etcetera indeed, so x is not an irrational exponent: x is a real number but rational, so x can be expressed either as a fraction of two integers m and n (m = 1 and n = 1, 4, 8, 16, 32 and so on here), or as a decimal number with a finite number of decimals behind the decimal point (0.5, 0.25, 0.125, 0.0625 etcetera).

The third column gives the value 10^xfor these fractions x = 1/2, 1/4, 1/8 etcetera. How do we get these? Hmm… It’s true. I am jumping over another hurdle here. The key assumption behind the table is that we know how to take the square root of a number, so that we can calculate 10^1/2, to quite some precision indeed, as 10^1/2= 3.162278 (and there’s more decimals but we’re not too interested in them right now), and then that we can take the square root of that value (3.162278). That’s quite an assumption indeed.

However, if we don’t want this post to become a book in itself, then I must assume we can do that. In fact, I’ve done it with a calculator here but, before there were calculators, this kind of calculations could and had to be done with a table of logarithms. That’s because of a very convenient property of logarithms: log_c(ab) =log_c(a) + log_c(b). However, as said, I should be writing a post here only, not a book. [Already now, this post beats the record in terms of length and verbosity…] So I’ll just ask you to accept that – at this stage – we know how to calculate the square root of something and, therefore, to accept that we can take the square root not only of 10 but of any number really, including 3.162278, and then the root of that number, and then the root of that result, and so and so on. So that gives us the values in the third column of the table above: they’re successive square roots. [Please do double-check! It will help you to understand what I am writing about here.]

So… Back to the main story. What we are doing in the table above is to take the square root in succession, so that’s (10^1/2)^1/2= 10^1/4, and then again: (10^1/4)^1/2= 10^1/8 , and then again: (10^1/8)^1/2= 10^1/16 , so we get 10^1/2, 10^1/4, 10^1/8, 10^1/16, 10^1/32and so on and so on. All the way down. Well… Not all the way down. In fact, in the table above, we stop after ten iterations already, so that’s when x = 1/1024. [Note that 1/1024 is 2 to the power minus 10: 2^–10= 1/2¹⁰ = 1/1024. I am just throwing that in here because that little ‘fact’ will come in handy later.]

Why do we stop after ten iterations? Well… Actually, there’s no real good reason to stop at exactly ten iterations. We could have 15 iterations: then x would be 1/2¹⁵= 1/32768. Or 20 (x = 1/1048576). Or 39 (x = 1/too many digits to write down). Whatever. However, we start to notice something interesting that actually allows us to stop. We note that 10 to the power x (10^x) tends to one as x becomes very small.

Now you’re laughing. Well… Surely ! That’s what we’d expect, isn’t it? 10⁰= 1. Is that the grand conclusion?

No.

The question is how small should x be? That’s where the fourth column of the table above comes in. We’re calculating a number there that converges to some value quite near to 2.3 as x goes to zero and – importantly – it converges rather quickly. In fact, if you’d do the calculations yourself, you’d see that it converges to 2.302585 after a while. [With Excel or some similar application, you can do 20 or more iterations in no time, and so that’s what you’ll find.]

Of course, we can keep going and continue adding zillions of decimals to this number but we don’t want to do that: 2.302585 is fine. We don’t need any more decimals. Why? Well… We’re going to use this number to approximate 10^xnear x = 0: it turns out that we can get a real good approximation of 10^xnear x = 0 using that 2.302585 factor, so we can write that

10^x≈ 1 + 2.302585x

That approximation is the last column in the table above. In order to show you how good it is as an ‘approximation’, I’ve plotted the actual values for 10^x(blue markers) and the approximated values for 10^x(black markers) using that 1 + 2.302585x formula. You can see it’s a pretty good match indeed if x is small. And ‘small’ here is not that small: a ratio like x = 1/8 (i.e. x = 0.125) is good enough already! In fact, the graph below shows that 1/16 = 0.0625 is almost perfect! So we don’t need to ‘go down’ too far: ten iterations is plenty!

I’ve probably ‘lost’ you by now. What are we doing here really? How did we get that linear approximation formula, and why do we need it? Well… See the last column: we calculate (10^x–1)/x, so that’s the difference between 10^xand 1 divided by the (fractional) exponent x and we see, indeed, that that number converges to a value very near to 2.302585. Why? Well… What we are actually doing is calculating the gradient of 10^x, i.e. the slope of the tangent line to the (non-linear) 10^xcurve. That’s what’s shown in the graph below.

Working backwards, we can then re-write (10^x–1)/x ≈ 2.302585 as 10^x≈ 1 + 2.302585x indeed.

So what we’ve got here is quite standard: we know we can approximate a non-linear curve with a linear curve, using the gradient near the point that we’re observing (and so that’s near the point x = 0 in this case) and so that‘s what we’re doing here.

Of course, you should remember that we cannot actually plot a smooth curve like that, for the moment that is, because we can only calculate 10^xfor rational real numbers. However, it’s easy to generalize and just ‘fill the gaps’ so to speak, and so that’s how irrational powers are defined really.

Hmm… So what’s the next step? Well… The next step is not to continue and continue and continue and continue etcetera to show that the smooth curve above is, indeed, the graph of 10^x. No. The next step is to use that linear approximation to algebraically calculate the value of 10^is, so that’s a power of 10 with a complex exponent.

HUH!?

Yes. That’s the gem I found in Feynman’s 1965 Lectures. [Well… One of the gems, I should say. There are many. :-)]

It’s quite interesting. In his little chapter on ‘algebra’ (Lectures, I-22), Feynman just assumes that this ‘law’ that 10^x= 1 + 2.302585x is not only ‘correct’ for small real fractions x but also for very small complex fractions, and then he just reverses the procedure above to calculate 10^ixfor larger values of x. Let’s see how that goes.

However, let’s first switch the variable from x to s, because we’re talking complex numbers now. Indeed, I can’t use the symbol x as I used it above anymore because x is now the real part of some complex number 10^is. In addition, I should note that Feynman introduces this delta (Δ). The idea behind is to make things somewhat easier to read by relating s to an integer: Δ = 1024s, so Δ = 1, 2, 4, 8,… 1024 for s = 1/1024, 1/512, 1/256 etcetera (see the second column in the table below). I am not entirely sure why he does that: Feynman must think fractions are harder to ‘read’. [Frankly, the introduction of this Δ makes Feynman’s exposé somewhat harder to ‘read’ IMHO – but that’s just a matter of taste, I guess.] Of course, the approximation then becomes

10^x= 1 + 2.302585·Δ/1024 = 1 + 0.0022486Δ.

The table below is the one that Feynman uses. The important thing is that you understand the first line in this table: 10^i/1024= 1 + 0.00225i·Δ = 1 + 0.00225i·1 = 1 + 0.00225i. And then we go to the second line: 10^i/512 = 10^i/1024·10^i/1024 = 10^2i/1024 = 10^i/512, so we’re doing the reverse thing here: we don’t take square roots but we square what we’ve found already. So we multiply 1 + 0.00225i with itself and get (1+0.00225i)(1+0.00225i) = 1 + 2·0.00225i + 0.00225²i²= 1 – 0.000005 + 0.45i ≈ 0.999995 + 0.45i ≈ 1 + 0.0045i.

Let’s go to the third line now. In fact, what we’re doing here is working our way back up, i.e. all the way from s = 1/1024 to s = 1. And that’s where the ‘magic’ of i (i.e. the fact that i²= –1) is starting to show: (0.999995+0.0045i)² = 0.99999 + 2·0.999995·0.0045i + 0.0045²i²= 0.99997 + 0.009i. So the real part of 10^isis changing as well – it is decreasing in fact! Why is that? Because of the term with the i²factor! [I write 0.99997 instead of 0.99996 because I round up here, while Feynman consistently rounds down.]

So now the game is clear: we take larger and larger fractions s (i/512, i/256, i/128, etcetera), and calculate 10^isby squaring the previous result. After ten iterations, we get the grand result for s = i/1 = i:

10^is= –0.66928 + 0.74332i (more or less that is)

Note the minus sign in front of the real part, and look at the intermediate values for x and y too. Isn’t that remarkable?

OK. Waw ! But… So what? What’s next?

Well… To graph 10^is, we should not just keep squaring things because that amounts to doubling the exponent again and again and so that means the argument is just making larger and larger jumps along the positive real axis really (see that graph that I made above: the distance between the successive values of x gets larger and larger, and so that’s a bad recipe for a smooth graph).

So what can we do? Well… We should just take a sufficiently small power, i/8 for example, and multiply that with 1, 2, 3 etcetera so we get something more ‘regular’. That’s what’s done in the table below and what’s represented in the graph underneath (to get the scale of the horizontal axis, note that s = p/8).

Hey! Look at that! There we are! That’s the graph we were looking for: it shows a (complex) exponential (10^is) as a periodic (complex-valued) function with the real part behaving like a cosine function and the imaginary part behaving like as a sine function.

Note the upper and lower bounds: +1 and –1. Indeed, it doesn’t seem to matter whether we use 10 or e as a base: the x and y part oscillate between −1 and +1. So, whatever the base, we’ll see the same pattern: the base only changes the scale of the horizontal axis (i.e. s). However, that being said, because of this scale factor, I do need to say like a cosine/sine function when discussing that graph above. So I cannot say they are a cosine and a sine function. Feynman calls these functions algebraic sine and cosine functions.

But – remember! – we can always switch base through a clever substitution so 10^is= e^it and recalculate stuff to whatever number of decimals behind the decimal point we’d want. So let’s do that: let’s switch to base e. WOW! What happens?

We then [Finally! you’ll say!] get values that – Surprise ! Surprise ! – correspond to the real cosine and sine function. That then, in turn, allows us to just substitute the ‘algebraic’ cosine and sine function for the ‘real’ cosine in an equation that – Yes! – is Euler’s formula itself:

e^it= cos(t) + isin(t)

So that’s it. End of story.

[…]

You’ll say: So what? Well… Not sure what to say. I think this is rather remarkable. This is not the formal mathematical proof of Euler’s formula (at least not of the kind that you’ll find in a textbook or on Wikipedia). No, we are just calculating the values x and y of e^it= x + iy using an approximation process used to calculate real powers and then, well… Just some bold assumption involving infinitesimals really.

I think this is amazing stuff (even if I’ll downplay that statement a bit in my post scriptum). I really don’t understand these things the way I would like to understand them. I guess I just haven’t got the right kind of brain for these things. 😦 Indeed, just think about it: when we have the real exponential e^x, then we’ve got that typical ‘rocket’ graph (i.e. the blue one in the graph below): just something blasting away indeed. But when we put i in the exponent (e^ix), then we get two components oscillating up and down like the cosine and sine function. Well… Not only like the cosine and sine function: the green and red line– i.e. the real and imaginary part of e^ix!– actually are the cosine and sine function!

Do you understand this in an intuitive way? Yes? You do? Waw ! Please write me and tell me how. I don’t. 😦

Oh well… The good thing about it is… Well… At least complex numbers will always stay ‘magical’ to me. 🙂

Post scriptum: When I write, above, that I don’t understand this in an intuitive way, I don’t mean to say it’s not logical. In fact, it is. It has to be, of course, because we’re talking math here! 🙂

The logic is pretty clear indeed. We have an exponential function here (y = 10^x) and we’re evaluating that function in the neighborhood of x = 0 (we do it on the positive side only but we could, of course, do the same analysis on the other side as well). So then we use that very general mathematical procedure of calculating approximate values for the (non-linear) 10^xcurve using the gradient. So we plug in some differential value for x (in differential terms, we’d write Δx – but so the delta symbol here has nothing to do with Feynman’s Δ above) and, of course, we find Δy = 2.302585·Δx. So we add that to 1 (the value of 10^xat point x = 0) and, then, we go through these iterations, not using that linear equation any more, but the very fundamental property of an exponential function that 10^2x= (10^x)². So we start with an approximate value, but then the value we plug into these iterative calculations is the square of the previous value. So, to calculate the next points, we do not use an approximation method any more, but we just square the first result, and then the second and so on and so on, and that’s just calculation, not approximation.

[In fact, you may still wonder and think that it’s quite remarkable that the points we calculate using this process are so accurate, but that’s due to the rapid convergence of that value we found for the gradient. Well… Yes and no. Here I must admit that Feynman (and I) cheated a bit because we used a rather precise value for the gradient: 2.302585, so that’s six significant digits after the decimal point. Now, that value is actually calculated based on twenty (rather than 10) iterations when ‘going down’. But that little factoid is not embarrassing because it doesn’t change much: the argument itself is sound. Very sound.]

OK… That’s easy enough to understand. The thing that is not easy to understand – intuitively that is – is that we can just insert some complex differential Δs into that Δy = 2.302585·Δx equation. Isn’t it ‘weird’, indeed, that we can just use a complex fraction i·s = i/1024 to calculate our first point, instead of a real fraction x = 1/1024? It is. That’s the only thing really. Indeed, once we’ve done that, it’s plain sailing again: we just square the result to get the next result, and then we square that again, and so on and so on. However, that being said, the difference is that the ‘magic’ of i comes into play indeed. When squaring, we do not get a 4a²result but an (a+bi)²= a²– b²+ 2abi. So it’s that minus sign and the i that give an entirely different ‘dynamic’ to how the function evolves from there (i.e. different as compared to working with a real base only). It’s all quite remarkable really because we start off with a really tiny value b here: 0.00225 to be precise, so that’s (less than) 1/445 ! [Of course, the real part a, at the point from where we start doing these iterations, is one.]

But so that first step is ‘weird’ indeed. Why is it no problem whatsoever to insert the complex fraction s = i/1024 into 1 + 2.302585o·s, instead of the real fraction 1/1024, and then afterwards, to square these complex numbers that we’re getting, instead of real numbers?

It just doesn’t feel right, does it? I must admit that, at first, I felt that Feynman was doing something ‘illegal’ too. But, obviously, he’s not. It’s plain mathematical logic. We have two functions here: one is linear (y = 1 + 2.302585·x), and the other is quadratic (y = x²) and so what’s happening really is that, at the point x = 0, we change the function. We substitute not x for ix really but y = 10^xfor y = 10^ix. So we still have an independent real variable x but, instead of a real-valued y = 10^xfunction, we now have a complex-valued y = 10^ixfunction.

However, the ‘output’ of that function, of course, is a complex y, not a real y. In our case, because we’re plotting a function really–to be precise, we’re calculating the exponential function y = 10^xthrough all these iterations–we get a complex-valued function of the shape that, by now, we know so well.

So it is ‘discontinuous’ in a way, and so I can’t say all that much about it. Look at the graph below where, once again, we have the real exponential function e^x and then the two components of the complex exponential e^ix. This time, I’ve plotted them on both sides of the zero point because they’re continuous on both sides indeed. Imagine we’re walking along this blue e^x curve from some negative x to zero. We’re familiar with the path. It has, for instance, that property we exploited above: as we doubled the ‘input’ (so from x we went to 2x), the ‘output’ went up not as the double but as the square of the original value: e^2x= (e^x)². And then we also know that, around the point x = o, we can approximate it with a linear function. In fact, in this case, the linear approximation is super-simple: y = 1 + x. Indeed, the gradient for e^x at point x = 0 is equal to 1! So, yes, we know and understand that blue curve. But then we arrive at point x = 0 and we decide something radical: we change the function!

Yes. That’s what we’re really doing in that very lengthy story above: e^ix is a complex-valued function of the real variable x. That’s something different. However, we continue to say that the approximation y = 1 + x must also be valid for complex x and y. So we say that e^ix= 1 + ix. Is that wrong? No. Not at all. Functional forms are functional forms and gradients are gradients: d(e^ix)/dx = ie^ix, and ie^ix at x = 0 is equal to ie⁰ = i! Hence, e^ix= 1 + ix is a perfectly legitimate linear approximation. And then it’s just the same thing again: we use that iteration mechanism to calculate successive squares of complex numbers because, for complex exponentials as well, we have e^2(ix)= (e^ix)².

So. The ‘magic’ is a lot of ‘confusion’ really. The point to note is that we do have a different function here: e^ixand e^x‘look’ similar–it’s just that i, right?– but, in fact, when we replace x by ix in the exponent of e, that’s quite a radical change. We can use the same linear approximation at x = ix = 0 but then it’s over. Our blue graph stops: we’re no longer walking along it. I can’t even say it bifurcates, so to say, into the red and the green one, because it doesn’t. We’re talking apples and oranges indeed, and so the comparison is quickly done: they’re different. Full stop.

Is there any geometrical relationship between all these curves? Well… Yes and no. I can see one, at the very start: the gradient of our e^x function at x = 0 is equal to unity (i.e. 1), and so that’s the same gradient as the gradient of the imaginary part of our new e^ixfunction (the gradient of the real part is zero, before it becomes negative). But that’s just… I mean… That just comes out of Euler’s formula: e⁰= cos(0) + isin(0). Honestly, it’s no use to try to be smart here and think about stuff like that. We’re no longer walking on the blue curve. We’re looking at a new function: a complex-valued function e^ix (instead of a real-valued function e^x) of a real variable (x). That’s it. Just don’t try to relate the two too much: you switched functions. Full stop. It’s like changing trains! 🙂

So… What’s the conclusion? Well… I’d say: “Complex numbers can be analyzed as extensions of real numbers, so to say, but – frankly – they are different.”

[…]

I’ll probably never understand complex numbers in the way I would like to understand them–that is like I understand that one plus one is two. However, this rather lengthy forage in the complex forest has helped me somewhat. I hope it helped you too.

Differential equations revisited: the math behind oscillators

Pre-scriptum (dated 26 June 2020): This post – part of a series of rather simple posts on elementary math and physics – does not seem to have been targeted in the the attack by the dark force—which is good because I still like it. While my views on the true nature of light, matter and the force or forces that act on them have evolved significantly as part of my explorations of a more realist (classical) explanation of quantum mechanics, I think most (if not all) of the analysis in this post remains valid and fun to read. In fact, I would dare to say the whole Universe consists of oscillators!

Original post:

When wrapping up my previous post, I said that I might be tempted to write something about how to solve these differential equations. The math behind them is pretty essential indeed. So let’s revisit the oscillator from a formal-mathematical point of view.

Modeling the problem

The simplest equation we used was the one for a hypothetical ‘ideal’ oscillator without friction and without any external driving force. The equation for a mechanical oscillator (i.e. a mass on a spring) is md²x/dt² = –kx. The k in this equation is a factor of proportionality: the force pulling back is assumed to be proportional to the amount of stretch, and the minus sign is there because the force is pulling back indeed. As for the equation itself, it’s just Newton’s Law: the mass times the acceleration equals the force: ma = F.

You’ll remember we preferred to write this as d²x/dt² = –(k/m)x = –ω₀²x with ω₀²= k/m. You’ll also remember that ω₀is an angular frequency, which we referred to as the natural frequency of the oscillator (because it determines the natural motion of the spring indeed). We also gave the general solution to the differential equation: x(t) = x₀cos(ω₀t + Δ). That solution basically states that, if we just let go of that spring, it will oscillate with frequency ω₀ and some (maximum) amplitude x₀, the value of which depends on the initial conditions. As for the Δ term, that’s just a phase shift depending on where x is when we start counting time: if x would happen to pass through the equilibrium point at time t = 0, then Δ would be π/2. So Δ allows us to shift the beginning of time, so to speak.

In my previous posts, I just presented that general equation as a fait accompli, noting that a cosine (or sine) function does indeed have that ‘nice’ property of come back to itself with a minus sign in front after taking the derivative two times: d²[cos(ω₀t)]/dt² = –ω₀²cos(ω₀t). We could also write x(t) as a sine function because the sine and cosine function are basically the same except for a phase shift: x₀cos(ω₀t + Δ) = x₀sin(ω₀t + Δ + π/2).

Now, the point to note is that the sine or cosine function actually has two properties that are ‘nice’ (read ‘essential’ in the context of this discussion):

Sinusoidal functions are periodic functions and so that’s why they represent an oscillation–because that’s something periodic too!
Sinusoidal functions come back to themselves when we derive them two times and so that’s why it effectively solves our second-order differential equation.

However, in my previous post, I also mentioned in passing that sinusoidal functions share that second property with exponential functions: d²e^t/dt²= d[de^t/dt]/dt = de^t/dt = e^t. So, if it we would not have had that minus sign in our differential equation, our solution would have been some exponential function, instead of a sine or a cosine function. So what’s going on here?

Solving differential equations using exponentials

Let’s scrap that minus sign and assume our problem would indeed be to solve the d²x/dt² = ω₀²x equation. So we know we should use some exponential function, but we have that coefficient ω₀². Well… That’s actually easy to deal with: we know that, when deriving an exponential function, we should bring the exponent down as a coefficient: d[e^ω₀t]/dt = ω₀e^ω₀t. If we do it two times, we get d²[e^ω₀t]/dt² = ω₀²e^ω₀t, so we can immediately see that e^ω₀tis a solution indeed.

But it’s not the only one: e^–ω₀t is a solution too: d²[e^–ω₀t]/dt² = (–ω₀)(–ω₀)e^–ω₀t = ω₀²e^–ω₀t. So e^–ω₀tsolves the equation too. It is easy to see why: ω₀²has two square roots–one positive, and one negative.

But we have more: in fact, every linear combination c₁e^ω₀t+ c₂e^–ω₀tis also a solution to that second-order differential equation. Just check it by writing it all out: you’ll find that d²[c₁e^ω₀t+ c₂e^–ω₀t]/dt² = ω₀²[c₁e^ω₀t+c₂e^–ω₀t] and so, yes, we have a whole family of functions here, that are all solutions to our differential equation.

Now, you may or may not remember that we had the same thing with first-order differential equations: we would find a whole family of functions, but only one would be the actual solution or the ‘real‘ solution I should say. So what’s the real solution here?

Well… That depends on the initial conditions: we need to know the value of x at time t₀= 0 (or some other point t = t₁). And that’s not enough: we have two coefficients (c₁and c₂), and, therefore, we need one more initial condition (it takes two equations to solve for two variables). That could be another value for x at some other point in time (e.g. t₂) but, when solving problems like this, you’ll usually get the other ‘initial condition’ expressed in terms of the first derivative, so that’s in terms of dx/dt = v. For example, it is not illogical to assume that the initial velocity v₀ would be zero. Indeed, we can imagine we pull or push the spring and then let it go. In fact, that’s what we’ve been assuming here all along in our example! Assuming that v₀ = 0 is equivalent to writing that

d[c₁e^ω₀t+ c₂e^–ω₀t]/dt = 0 for t = 0

⇒ ω₀c₁– ω₀c₂= 0 (e⁰= 1) ⇔ c₁= c₂

Now we need the other initial condition. Let’s assume the initial value of x is equal to x₀= 2 (it’s just an example: we could take any value, including negative values). Then we get:

c₁e^ω₀t+ c₂e^–ω₀t = 2 for t = 0 ⇔ c₁+ c₂= 2 (again, note that e⁰= 1)

Combining the two gives us the grand result that c₁= c₂= 1 and, hence, the ‘real’ or actual solution is x = e^ω₀t+ e^–ω₀t. The graph below plots that function for ω₀= 1 and ω₀= 0.5 respectively. We could take other values for ω₀ but, whatever the value, we’ll always get an exponential function like the ones below. It basically graphs what we expect to happen: the mass just accelerates away from its equilibrium point. Indeed, the differential equation is just a description of an accelerating object. Indeed, the e^–ω₀t term quickly goes to zero, and then it’s the e^ω₀tterm that rockets that object sky-high – literally. [Note that the acceleration is actually not constant: the force is equal to kx and, hence, the force (and, therefore, the acceleration) actually increases as the mass goes further and further away from its equilibrium point. Also note that if the initial position would have been minus 2, i.e. x₀= –2, then the object would accelerate away in the other direction, i.e. downwards. Just check it to make sure you understand the equations.]

The point to note is our general solution. More formally, and more generally, we get it as follows:

If we have a linear second-order differential equation ax” + bx’ + cx = 0 (because of the zero on the right-hand side, we call such equation homogeneous, so it’s quite a mouthful: a linear and homogeneous DE of the second order), then we can find an exponential function e^rtthat will be a solution for it.
If such function is a solution, then plugging in it yields ar²e^rt+ bre^rt + ce^rt = 0 or (ar²+ br + c)e^rt = 0.
Now, we can read that as a condition, and the condition amounts to ar²+ br + c = 0. So that’s a quadratic equation we need to solve for r to find two specific solutions r₁and r₂, which, in turn, will then yield our general solution:

x(t) = c₁e^r₁t+ c₂e^r₂t

Note that the general solution is based on the principle of superposition: any linear combination of two specific solutions will be a solution as well. I am mentioning this here because we’ll use that principle more than once.

Complex roots

The steps as described above implicitly assume that the quadratic equation above (i.e. ar²e^rt+ bre^rt + ce^rt = 0), which is better known as the characteristic equation, does yield two real and distinct roots r₁and r₂. In fact, it amounts to assuming that that exponential e^rtis a real-valued exponential function. We know how to find these real roots from our high school math classes: r = (–b ± [b²– 4ac]^1/2)/2a. However, what happens if the discrimant b²– 4ac is negative?

If the disciminant is negative, we will still have two roots, but they will be complex roots. In fact, we can write these two complex roots as r = α ± βi, with i the imaginary unit. Hence, the two complex roots are each other’s complex conjugate and our e^r₁tand e^r₂t can be written as:

e^r₁t= e^(α+βi)t and e^r₂t= e^(α–βi)t

Also, the general solution based on these two particular solutions will be c₁e^(α+βi)t+ c₂e^(α–βi)t.

[You may wonder why complex roots have to be complex conjugates from each other. Indeed, that’s not so obvious from the raw r = (–b ± [b²– 4ac]^1/2)/2a formula. But you can re-write it as r = –b/2a ± [b²– 4ac]^1/2)/2a and, if b²– 4ac is negative, as r = –b/2a ± i·[(−b²+4ac)^1/2/2a]. So that gives you the α and β and shows that the two roots are, in effect, each other’s complex conjugate.]

We should briefly pause here to think about what we are doing here really: if we allow r to be complex, then what we’re doing really is allow a complex-valued function (to be precise: we’re talking the complex exponential functions e^(λ±μi)t, or any linear combination of the two) of a real variable (the time variable t) to be part of our ‘solution set’ as well.

Now, we’ve analyzed complex exponential functions before–long time ago: you can check out some of my posts last year (November 2013). In fact, we analyzed even more complex – in fact, I should say more complicated rather than more complex here: complex numbers don’t need to be complicated! 🙂 – because we were talking complex-valued functions of complex variables there! That’s not the case here: the argument t (i.e. the input into our function) is real, not complex, but the output – or the function itself – is complex-valued. Now, any complex exponential e^(α+βi)t can be written as e^αte^iβt, and so that’s easy enough to understand:

1. The first factor (i.e. e^αt) is just a real-valued exponential function and so we should be familiar with that. Depending on the value of α (negative or positive: see the graph below), it’s a factor that will create an envelope for our function. Indeed, when α is negative, the damping will cause the oscillation to stop after a while. When α is positive, we’ll have a solution resembling the second graph below: we have an amplitude that’s getting bigger and bigger, despite the friction factor (that’s obviously possible only because we keep reinforcing the movement, so we’re not switching off the force in that case). When α is equal to zero, then e^αt is equal to unity and so the amplitude will not change as the spring goes up and down over time: we have no friction in that case.

2. The second factor (i.e. e^iβt) is our periodic function. Indeed, e^iβt is the same as e^iθand so just remember Euler’s formula to see what it is really:

e^iθ= cos(θ) + isin(θ)

The two graphs below represent the idea: as the phase θ = ωt + Δ (the angular frequency or velocity times the time is equal to the phase, plus or minus some phase shift) goes round and round and round (i.e. increases with time), the two components of e^iθ, i.e. the real and imaginary part e^iθ, oscillate between –1 and 1 because they are both sinusoidal functions (cosine and sine respectively). Now, we could amplify the amplitude by putting another (real) factor in front (a magnitude different than 1) and write re^iθ= r·cos(θ) + i·r·sin(θ) but that wouldn’t change the nature of this thing.

But so how does all of this relate to that other ‘general’ solution which we’ve found for our oscillator, i.e. the one we got without considering these complex-valued exponential functions as solutions. Indeed, what’s the relation between that x = x₀cos(ω₀t + Δ) equation and that rather frightening c₁e^(α+βi)t+ c₂e^(α–βi)t equation? Perhaps we should look at x = x₀cos(ω₀t + Δ) as the real part of that monster? Yes and no. More no than yes actually. Actually… No. We are not going to have some complex exponential and then forget about the imaginary part. What we will do, though, is to find that general solution – i.e. a family of complex-valued functions – but then we’ll only consider those functions for which the imaginary part is zero, so that’s the subset of real-valued functions only.

I guess this must sound like Chinese. Let’s go step by step.

Using complex roots to find real-valued functions

If we re-write d²x/dt² = –ω₀²x in the more general ax” + bx’ + cx = 0 form, then we get x” + ω₀²x = 0 and so the discriminant b²– 4ac is equal to –4ω₀², and so that’s a negative number. So we need to go for these complex roots. However, before solving this, let’s first restate what we’re actually doing. We have a differential equation that, ultimately, depends on a real variable (the time variable t), but so now we allow complex-valued functions e^r₁t= e^(α+βi)t and e^r₂t= e^(α–βi)t as solutions. To be precise: these are complex-valued functions x of the real variable t.

That being said, it’s fine to note that real numbers are a subset of the complex numbers and so we can just shrug our shoulders and say all that we’re doing is switch to complex-valued functions because we got stuck with that negative determinant and so we had to allow for complex roots. However, in the end, we do want a real-valued solution x(t). So our x(t) = c₁e^(α+βi)t+ c₂e^(α–βi)t has to be a real-valued function, not a complex-valued function.

That means that we have to take a subset of the family of functions that we’ve found. In other words, the imaginary part of c₁e^(α+βi)t+ c₂e^(α–βi)t has to be zero. How can it be zero? Well… It basically means that c₁e^(α+βi)tand c₂e^(α–βi)t have to be complex conjugates.

OK… But how do we do that? We need to find a way to write that c₁e^(α+βi)t+ c₂e^(α–βi)tsum in a more manageable ζ + i·η form. We can do that by using Euler’s formula once again to re-write those two complex exponentials as follows:

e^(α+βi)t = e^αte^iβt = e^αt[cos(βt) + isin(βt)]
e^(α–βi)t = e^αte^–iβt = e^αt[cos(–βt) + isin(–βt)] = e^αt[cos(βt) – isin(βt)]

Note that, for the e^(α–βi)t expression, we’ve used the fact that cos(–θ) = cos(θ) and that sin(–θ) = –sin(θ). Also note that α and β are real numbers, so they do not have an imaginary part–unlike c₁and c₂, which may or may not have an imaginary part (i.e. they could be pure real numbers, but they could be complex as well).

We can then re-write that c₁e^(α+βi)t+ c₂e^(α–βi)t sum as:

c₁e^(α+βi)t+ c₂e^(α–βi)t = c₁e^αt[cos(βt) + isin(βt)] + c₂e^αt[cos(βt) – isin(βt)]

= (c₁ + c₂)e^αtcos(βt) + (c₁ – c₂)ie^αtsin(βt)

So what? Well, we want that imaginary part in our solution to disappear and so it’s easy to see that the imaginary part will indeed disappear if c₁ – c₂ = 0, i.e. if c₁= c₂= c. So we have a fairly general real-valued solution x(t) = 2c·e^αtcos(βt) here, with c some real number. [Note that c has to be some real number because, if we would assume that c₁and c₂(and, therefore, c) would be equal complex numbers, then the c₁ – c₂ factor would also disappear, but then we would have a complex c₁ + c₂ sum in front of the e^αtcos(βt) factor, so that would defeat the purpose of finding real-valued function as a solution because (c₁ + c₂)e^αtcos(βt) would still be complex! […] Are you still with me? :-)]

So, OK, we’ve got the solution and so that should be it, isn’t it? Well… No. Wait. Not yet. Because these coefficients c₁ and c₂ may be complex, there’s another solution as well. Look at that formula above. Let us suppose that c₁ would be equal to some (real) number c divided by i (so c₁= c/i), and that c₂would be its opposite, so c₂= –c₁(i.e. minus c₁). Then we would have two complex numbers consisting of an imaginary part only: c₁= c/i and c₂= –c₁= –c/i, and they would be each other’s complex conjugate. Indeed, note that 1/i = i^–1= –i and so we can write c₁= –c·i and c₂= c·i. Then we’d get the following for that c₁e^(α+βi)t+ c₂e^(α–βi)t sum:

(c₁ + c₂)e^αtcos(βt) + (c₁ – c₂)ie^αtsin(βt)

= (c/i – c/i)e^αtcos(βt) + (c/i + c/i)ie^αtsin(βt) = 2c·e^αtsin(βt)

So, while c₁and c₂ are complex, our grand result is a real-valued function once again or – to be precise – another family of real-valued functions (that’s because c can take on any value).

Are we done? Yes. There are no other possibilities. So now we just need to remember to apply the principle of superposition: any (real) linear combination of 2c·e^αtcos(μt) and 2c·e^αtsin(μt) will also be a (real-valued) solution, so the general (real-valued) solution for our problem is:

x(t) = a·2c·e^αtcos(βt) + b·2c·e^αtsin(βt) = Ae^αtcos(βt) + Be^αtsin(βt)

= e^αt[Acos(βt) + Bsin(βt)]

So what do we have here? Well, the first factor is, once again, an ‘envelope’ function: depending on the value of α, (i) negative, (ii) positive or (iii) zero, we have an oscillation that (i) damps out, (ii) goes out of control, or (iii) keeps oscillating in the same steady way forever.

The second part is equivalent to our ‘general’ x(t) = x₀cos(ω₀t + Δ) solution. Indeed, that x(t) = x₀cos(ω₀t + Δ) solution is somewhat less ‘general’ than the one above because it does not have the e^αt factor. However, x(t) = x₀cos(ω₀t + Δ) solution is equivalent to the Acos(βt) + Bsin(βt) factor. How’s that? We can show how they are related by using the trigonometric formula for adding angles: cos(α + β) = cos(α)cos(β) – sin(α)sin(β). Indeed, we can write:

x₀cos(ω₀t + Δ) = x₀cos(Δ)cos(ω₀t) – x₀sin(Δ)sin(ω₀t) = Acos(βt) + Bsin(βt)

with A = x₀cos(Δ), B = – x₀sin(Δ) and, finally, μ = ω₀

Are you convinced now? If not… Well… Nothing much I can do, I feel. In that case, I can only encourage you to do a full ‘work-out’ by reading the excellent overview of all possible situations in Paul’s Online MathNotes (tutorial.math.lamar.edu/Classes/DE/Vibrations.aspx).

Feynman’s treatment of second-order differential equations

Feynman takes a somewhat different approach in his Lectures. He solves them in a much more general way. At first, I thought his treatment was too confusing and, hence, I would not have mentioned it. However, I like the logic behind, even if his approach is somewhat more messy in terms of notations and all that. Let’s first look at the differential equation once again. Let’s take a system with a friction factor that’s proportional to the speed: F_f = –c·dx/dt. [See my previous post for some comments on that assumption: the assumption is, generally speaking, too much of a simplification but it makes for a ‘nice’ linear equation and so that’s why physicists present it that way.] To ease the math, c is usually written as c = mγ. Hence, γ = c/m is the friction per unit of mass. That makes sense, I’d think. In addition, we need to remember that ω₀²= k/m, so k = mω₀². Our differential equation then becomes m·d²x/dt² = –γm·dx/dt – kx (mass times acceleration is the sum of the forces) or m·d²x/dt² + γm·dx/dt + mω₀²·x = 0. Dividing the mass factor away gives us an even simpler form:

d²x/dt² + γdx/dt + ω₀²x = 0

You’ll remember this differential equation from the previous post: we used it to calculate the (stored) energy and the Q of a mechanical oscillator. However, we didn’t show you how. You now understand why: the stuff above is not easy–the length of the arguments involved is why I am devoting an entire post to it!

Now, instead of assuming some exponential e^rtas a solution, real- or complex-valued, Feynman assumes a much more general complex-valued function as solution: he substitutes x for x = Ae^iαt, with A a complex number as well so we can write A as A = A₀e^iΔ. That more general assumption allows for the inclusion of a phase shift straight from the start. Indeed, we can write x as x = A₀e^iΔe^iαt= = A₀e^i(αt+Δ). Does that look complicated? It probably does, because we also have to remember that α is a complex number! So we’ve got a very general complex-valued exponential function indeed here!

However, let’s not get ahead of ourselves and follow Feynman. So he plugs in that complex-valued x = Ae^iαt and we get:

(–α²+ iγα + ω₀²)Ae^iαt = 0

So far, so good. The logic now is more or less the same as the logic we developed above. We’ve got two factors here: (1) a quadratic equation –α²+ iγα + ω₀² (with one complex coefficient iγ) and (2) a complex exponential function Ae^iαt. The second factor (Ae^iαt) cannot be zero, because that’s x and we assume our oscillator is not standing still. So it’s the first factor (i.e. the quadratic equation in α with a complex coefficient iγ) which has to be zero. So we solve for the roots α and find

α = –iγ/(–2) ± i·[(–(iγ)²–4ω₀²)^1/2/(-2)] = iγ/2 ± i·[(γ²–4ω₀²)^1/2/(-2)]

= iγ/2 ± (ω₀²– γ²/4)^1/2= iγ/2 ± ω_γ

[We get this by bringing i and –2 inside of the square root expression. It’s not very straightforward but you should be able to figure it out.]

So that’s an interesting expression: the imaginary part of α is iγ/2 and its real part is (ω₀²– γ²/4)^1/2, which we denoted as ω_γ in the expression above. [Note that we assume there’s no problem with the square root expression: γ²/4 should be smaller than ω₀² so ω_γis supposed to be some real positive number.] And so we’ve got the two solutions x₁and x₂:

x₁= Ae^{i(iγ/2 + ω_γ)t} = Ae^{–γt/2+iω_γt}= Ae^–γt/2e^iω_γt

x₂= Be^{i(iγ/2 – ω_γ)t} = Be^{–γt/2–iω_γt}= Be^–γt/2e^–iω_γt

Note, once again, that A and B can be any (complex) number and that, because of the principle of superposition, any linear combination of these two solutions will also be a solution. So the general solution is

x= Ae^–γt/2e^iω_γt+ Be^–γt/2e^–iω_γt= e^–γt/2(Ae^iω_γt+ Be^–iω_γt)

Now, we recognize the shape of this: a (real-valued) envelope function e^–γt/2 and then a linear combination of two exponentials. But so we want something real-valued in the end so, once again, we need to impose the condition that Ae^iω_γtand Be^–iω_γtare complex conjugates of each other. Now, we can see that e^iω_γtand e^–iω_γtare complex conjugates but what does this say about A and B? Well… The complex conjugate of a product is the product of the complex conjugates of the factors involved: (z₁z₂)* = (z₁*)(z₁*). That implies that B has to be the complex conjugate of A: B = A*. So the final (real-valued) solution becomes:

x= e^–γt/2(Ae^iω_γt+ A*e^–iω_γt)

Now, I’ll leave it to you to prove that the second factor in the product above (Ae^iω_γt+ A*e^–iω_γt) is a real-valued function of the real variable t. It should be the same as x₀cos(Δ)cos(ω₀t) – x₀sin(Δ)sin(ω₀t), and that gives you a graph like the one below. However, I can readily imagine that, by now, you’re just thinking: Oh well… Whatever! 🙂

So the difference between Feynman’s approach and the one I presented above (which is the one you’ll find in most textbooks) is the assumption in terms of the specific solution: instead of substituting x for e^rt, with allowing r to take on complex values, Feynman substitutes x for Ae^iαt, and allows both A and α to take on complex values. It makes the calculations more complicated but, when everything is said and done, I think Feynman’s approach is more consistent because more encompassing. However, that’s subject to taste, and I gather, from comments on the Web, that many people think that this chapter in Feynman’s Lectures is not his best. So… Well… I’ll leave it to you to make the final judgment.

Note: The one critique that is relevant, in regard to Feynman’s treatment of the matter, is that he devotes quite a bit of time and space to explain how these oscillatory or periodic displacements can be viewed as being the real part of a complex exponential. Indeed, cos(ωt) is the real part of e^iωt. But so that’s something different than (1) expanding the realm of possible solutions to a second-order differential equation from real-valued functions to complex-valued functions in order to (2) then, once we’ve found the general solution, consider only real-valued functions once again as ‘allowable’ solutions to that equation. I think that’s the gist of the matter really. It took me a while to fully ‘get’ this. I hope this post helps you to understand it somewhat quicker than I did. 🙂

Conclusion

I guess the only thing that I should do now is to work some examples. However, I’ll refer you Paul’s Online Math Notes for that once again (see the reference above). Indeed, it is about time I end my rather lengthy exposé (three posts on the same topic!) on oscillators and resonance. I hope you enjoyed it, although I can readily imagine that it’s hard to appreciate the math involved.

It is not easy indeed: I actually struggled with it, despite the fact that I think I understand complex analysis somewhat. However, the good thing is that, once we’re through it, we can really solve a lot of problems. As Feynman notes: “Linear (differential) equations are so important that perhaps fifty percent of the time we are solving linear equations in physics and engineering.” So, bearing in that mind, we should move on to the next.

Resonance phenomena

Pre-scriptum (dated 26 June 2020): This post – part of a series of rather simple posts on elementary math and physics – has suffered only a little bit from the attack by the dark force—which is good because I still like it. A few illustrations were removed because of perceived ‘unfair use’, but you will be able to google equivalent stuff. While my views on the true nature of light, matter and the force or forces that act on them have evolved significantly as part of my explorations of a more realist (classical) explanation of quantum mechanics, I think most (if not all) of the analysis in this post remains valid and fun to read. In fact, I would dare to say the whole Universe is all about resonance!

Original post:

One of the most common behaviors of physical systems is the phenomenon of resonance: a body (not only a tuning fork but any body really, such as a body of water, such as the ocean for example) or a system (e.g. an electric circuit) will have a so-called natural frequency, and an external driving force will cause it to oscillate. How it will behave, then, can be modeled using a simple differential equation, and the so-called resonance curve will usually look the same, regardless of what we are looking at. Besides the standard example of an electric circuit consisting of (i) a capacitor, (ii) a resistor and (iii) an inductor, Feynman also gives the following non-standard examples:

1. When the Earth’s atmosphere was disturbed as a result of the Krakatoa volcano explosion in 1883, it resonated at its own natural frequency, and its period was measured to be 10 hours and 20 minutes.

[In case you wonder how one can measure that, an explosion such as that one creates all kinds of waves, but the so-called infrasonic waves are the one we are talking about here. They circled the globe at least seven times, shattering windows hundreds of miles away. They did not only shatter windows in a radius , but they were also recorded worldwide. That’s how they could be measured a second, third, etc time. How? There was no wind or so, but the infrasonic waves (i.e. ‘sounds’ beneath the lowest limits of human hearing (about 16 or 17 Hz), down to 0.001 Hz) of such oscillation cause minute changes in the atmospheric pressure which can be measured by microbarometers. So the ‘ringing’ of the atmosphere was measurable indeed. A nice article on infrasound waves is journal.borderlands.com/1997/infrasound. Of course, the surface of the Earth was ‘ringing’ as well, and such seismic shocks then produce tsunami waves, which can also be analyzed in terms of natural frequencies.]

2. Crystals can be made to oscillate in response to a changing external electric field, and this crystal resonance phenomenon is used in quartz clocks: the quartz crystal resonator in a basic quartz wristwatch is usually in the shape of a very small tuner fork. Literally: there’s a tiny tuning fork in your wristwatch, made of quartz, that has been laser-trimmed to vibrate at exactly 32,768 Hz, i.e. 2¹⁵ cycles per second.

3. Some quantum-mechanical phenomena can be analyzed in terms of resonance as well, but then it’s the energy of the interfering particles that assumes the role of the frequency of the external driving force when analyzing the response of the system. Feynman gives the example of gamma radiation from lithium as a function of the energy of protons bombarding the lithium nuclei to provoke the reaction. Indeed, when graphing the intensity of the gamma radiation emitted as a function of the energy, one also gets a resonance curve, as shown below. [Don’t you just love the fact it’s so old? A Physical Review article of 1948! There’s older stuff as well, because this journal actually started in 1893.]

However, let us analyze the phenomenon first in its most classical appearance: an oscillating spring.

Basics

We’ve seen the equation for an oscillating spring before. From a math point of view, it’s a differential equation (because one of the terms is a derivative of the dependent variable x) of the second order (because the derivative involved is of the second order):

m(d²x/dt²) = –kx

What’s written here is simply Newton’s Law: the force is –kx (the minus sign is there because the force is directed opposite to the displacement from the equilibrium position), and the force has to equal the oscillating mass on the spring times its acceleration: F = ma.

Now, this can be written as d²x/dt² = –(k/m)x = –ω₀²x with ω₀²= k/m. This ω₀symbol uses the Greek omega once again, which we used for the angular velocity of a rotating body. While we do not have anything that’s rotating here, ω₀is still an angular velocity or, to be more precise, it’s an angular frequency. Indeed, the solution to the differential equation above is

x = x₀cos(ω₀t + Δ)

The x₀factor is the maximum amplitude and that’s, quite simply, determined by how far we pulled or pushed the spring when we started the motion. Now, ω₀t + Δ = θ is referred to as the phase of the motion, and it’s easy to see that ω₀is an angular frequency indeed, because ω₀equals the time derivative dθ/dt. Hence, ω₀is the phase change, measured in radians, per second, and that’s the definition of angular frequency or angular velocity. Finally, we have Δ. That’s just a phase shift, and it basically depends on our t = 0 point.

Something on the math

I’ll do a separate post on the math that’s associated with this (second-order differential equations) but, in this case, we can solve the equation in a simple and intuitive way. Look at it: d²x/dt² = –ω₀²x. It’s obvious that x has to be a function that comes back to itself after two derivations, but with a minus sign in front, and then we also have that coefficient –ω₀². Hmm… What can we think of? An exponential function comes back to itself, and if there’s a coefficient in the exponent, then it will end up as a coefficient in front too: d(e^at)/dt = ae^atand, hence, d²(e^at)/dt² = a²e^at. Waw ! That’s close. In fact, that’s the same equation as the one above, except for the minus sign.

In fact, if you’d quickly look at Paul’s Online Math Notes, you’ll see that we can indeed get the general solution for such second-order differential equation (to be precise: it’s a so-called linear and homogeneous second-order DE with constant coefficients) using that remarkable property of exponentials indeed. However, because of the minus sign, our solution for the equation above will involve complex exponentials, and so we’ll get a general function in a complex variable. However, we’ll then impose that our solution has to be real only and, hence, we’ll take a subset of our more general solution. However, don’t worry about that here now. There’s an easier way.

Apart from the exponential function, there are two other functions that come back to themselves after two derivatives: the sine and cosine functions. Indeed, d²cos(t)/dt² = –cos(t) and d²sin(t)/dt² = –sin(t). In fact, the sine and cosine function are obviously the same except for a phase shift equal π/2: cos(t) = sin(t + π/2), so we can choose either. Let’s work with the cosine as for now (we can always convert it to a sine function using that cos(t) = sin(t + π/2) identity). The nice thing about the cosine (and sine) function is that we do get that minus sign when deriving it two times, and we also get that coefficient in front. Indeed: d²cos(ω₀t)/dt² = –ω₀²cos(ω₀t). In short, cos(ω₀t) is the right function. The only thing we need to add is that x₀and Δ, i.e. the amplitude and some phase shift but, as mentioned above, it is easy to understand these will depend on the initial conditions (i.e. the value of x at point t = 0 and the initial pull or push on the spring). In short, x = x₀cos(ω₀t + Δ) is the complete general solution of the simple (differential) equation we started with (i.e. m(d²x/dt²) = –kx).

Introducing a driving force

Now, most real-life oscillating systems will be driven by an external force, permanently or just for a short while, and they will also lose some of their energy in a so-called dissipative process: friction or, in an electric circuit, electrical resistance will cause the oscillation to slowly lose amplitude, thereby damping it.

Let’s look at the friction coefficient first. The friction will often be proportional to the speed with which the object moves. Indeed, in the case of a mass on a spring, the drag (i.e. the force that acts on a body as it travels through air or a fluid) is dependent on a lot of things: first and foremost, there’s the fluid itself (e.g. a thick liquid will create more drag than water), and then there’s also the size, shape and velocity of the object. I am following the treatment you’ll find in most textbooks here and so that includes an assumption that the resistance force is proportional to the velocity: F_f = –cv = –c(dx/dt). Furthermore, the constant of proportionality c will usually be written as a product of the mass and some other coefficient γ, so we have F_f = –cv = –mγ(dx/dt). That makes sense because we can look at γ = c/m as the friction per unit of mass.

That being said, the simplification as a whole (i.e. the assumption of proportionality with speed) is rather strange in light of the fact that drag forces are actually proportional to the square of the velocity. If you look it up, you’ll find a formula resembling F_D = ρC_DAv²/2, with ρ the fluid density, C_D the drag coefficient of drag (determined by the shape of the object and a so-called Reynolds number, which is determined from experiments), and A the cross-section area. It’s also rather strange to relate drag to mass by writing c as c = mγ because drag has nothing to do with mass. What about dry friction? So that would be kinetic friction between two surfaces, like when the mass is sliding on a surface? Well… In that case, mass would play a role but velocity wouldn’t, because kinetic friction is independent of the sliding velocity.

So why do physicists use this simplification? One reason is that it works for electric circuits: the equivalent of the velocity in electrical resonance is the current I = dq/dt, so that’s the time derivative of the charge on the capacitor. Now, I is proportional to the voltage difference V, and the proportionality coefficient is the resistance R, so we have V = RI = R(dq/dt). So, in short, the resistance curve we’re actually going to derive below is one for electric circuits. The other reason is that this assumption makes it easier to solve the differential equation that’s involved: it makes for a linear differential equation indeed. In fact, that’s the main reason. After all, professors are professors and so they have to give their students stuff that’s not too difficult to solve. In any case, let’s not be bothered too much and so we’ll just go along with it.

Modeling the driving force is easy: we’ll just assume it’s a sinusoidal force with angular frequency ω (and ω is, obviously, more likely than not somewhat different than the natural frequency ω₀). If F is sinusoidal force, we can write it as F = F₀cos(ωt + Δ). [So we also assume there is some phase shift Δ.] So now we can write the full equation for our oscillating spring as:

m(d²x/dt²) + γm(dx/dt) + kx = F ⇔ (d²x/dt²)+ γ(dx/dt) + ω₀²x = F

How do we solve something like that for x? Well, it’s a differential equation once again. In fact, it’s, once again, a linear differential equation with constant coefficients, and so there’s a general solution method for that. As I mentioned above, that general solution method will involve exponentials and, in general, complex exponentials. I won’t walk you through that. Indeed, I’ll just write the solution because this is not an exercise in solving differential equations. I just want you to understand the solution:

x = ρF₀cos(ωt + Δ + θ)

ρ in this equation has nothing to do with some density or so. It’s a factor which depends on m, ω and ω₀, in a fairly complicated way in fact:

As we can see from the equation above, the (maximum) amplitude of the oscillation is equal to ρF₀. So we have the magnitude of the force F here multiplied by ρ. Hence, ρ is a magnification factor which, multiplied with F₀, gives us the ‘amount’ of oscillation.

As for the θ in the equation above, we’re using this Greek letter (theta) not to refer to the phase, as we usually do, because the phase here is the whole ωt + Δ + θ expression, not just theta! The theta (θ) here is a phase shift as compared to the original force phase ωt + Δ, and θ also depends on ω and ω₀. Again, I won’t show how we derived this solution but just accept it as for now:

These three equations, taken together, should allow you to understand what’s going on really. We’ve got an oscillation x = ρF₀cos(ωt + Δ + θ), so that’s an equation with this amplification or magnification factor ρ and some phase shift θ. Both depend on the difference between ω₀and ω, and the two graphs below show how exactly.

The first graph shows the resonance phenomenon and, hence, it’s what’s referred to as the resonance curve: if the difference between ω₀and ω is small, we get an enormous amplification effect. It would actually go to infinity if it weren’t for the frictional force (but, of course, if the frictional force was not there, the spring would just break as the oscillation builds up and the swings get bigger and bigger).

The second graph shows the phase shift θ. It is interesting to note that the lag θ is equal –π/2 when ω₀is equal to ω, but I’ll let you figure out why this makes sense. [It’s got something to do with that cos(t) = sin(t + π/2) identity, so it’s nothing ‘deep’ really.]

I guess I should, perhaps, also write something about the energy that gets stored in an oscillator like this because, in that resonance curve above, we actually have ρ squared on the vertical axis, and that’s because energy is proportional to the square of the amplitude: E ∝ A². I should also explain a concept that’s closely related to energy: the so-called Q of an oscillator. It’s an interesting topic, if only because it helps us to understand why, for instance, the waves of the sea are such tremendous stores of energy! Furthermore, I should also write something about transients, i.e. oscillations that dampen because the driving force was turned off so to say. However, I’ll leave that for you to look it up if you’re interested in this topic. Here, I just wanted to present the essentials.

[…] Hey ! I managed to keep this post quite short for a change. Isn’t that good? 🙂

Understanding gyroscopes

Pre-scriptum (dated 26 June 2020): This post – part of a series of rather simple posts on elementary math and physics – has suffered only a little bit from the attack by the dark force—which is good because I still like it. Only one or two illustrations were removed because of perceived ‘unfair use’, but you will be able to google equivalent stuff. While my views on the true nature of light, matter and the force or forces that act on them have evolved significantly as part of my explorations of a more realist (classical) explanation of quantum mechanics, I think most (if not all) of the analysis in this post remains valid and fun to read. Understanding the dynamics of rotations is extremely important in any realist interpretation of quantum physics. In fact, I would dare to say it is all about rotation!

Original post:

You know a gyroscope: it’s a spinning wheel or disk mounted in a frame that itself is free to alter in direction, so the axis of rotation is not affected as the mounting tilts or moves about. Therefore, gyroscopes are used to provide stability or maintain a reference direction in navigation systems. Understanding a gyroscope itself is simple enough: it only involves a good understanding of the so-called moment of inertia. Indeed, in the previous post, we introduced a lot of concepts related to rotational motion, notably the concepts of torque and angular momentum but, because that post was getting too long, I did not talk about the moment of inertia and gyroscopes. Let me do that now. However, I should warn you: you will not be able to understand this post if you haven’t read or didn’t understand the previous post. So, if you can’t follow, please go back: it’s probably because you didn’t get the other post.

The moment of inertia and angular momentum are related but not quite the same. Let’s first recapitulate angular momentum. Angular momentum is the equivalent of linear momentum for rotational motion:

If we want to change the linear motion of an object, as measured by its momentum p = mv, we’ll need to apply a force. Changing the linear motion means changing either (a) the speed (v), i.e. the magnitude of the velocity vector v, (b) the direction, or (c) both. This is expressed in Newton’s Law, F = m(dv/dt), and so we note that the mass is just a factor of proportionality measuring the inertia to change.
The same goes for angular momentum (denoted by L): if we want to change it, we’ll need to apply a force, or a torque as it’s referred to when talking rotational motion, and such torque can change either (a) L’s magnitude (L), (b) L’s direction or (c) both.

Just like linear momentum, angular momentum is also a product of two factors: the first factor is the angular velocity ω, and the second factor is the moment of inertia. The moment of inertia is denoted by I so we write L = Iω. But what is I? If we’re analyzing a rigid body (which is what we usually do), then it will be calculated as follows:

This is easy enough to understand: the inertia for turning will depend not just on the masses of all of the particles that make up the object, but also on their distance from the axis of rotation–and note that we need to square these distances. The L = Iω formula, combined with the formula for I above, explains why a spinning skater doing a ‘scratch spin’ speeds up tremendously when drawing in his or her arms and legs. Indeed, the total angular momentum has to remain the same, but I becomes much smaller as a result of that r²factor in the formula. Hence, if I becomes smaller, then ω has to go up significantly in order to conserve angular momentum.

Finally, we note that angular momentum and linear momentum can be easily related through the following equation:

That’s all kids stuff. To understand gyroscopes, we’ll have to go beyond that and do some vector analysis. In the previous post, we explained that rotational motion is usually analyzed in terms of torques than forces, and we detailed the relations between force and torque. More in particular, we introduced a torque vector τ with the following components:

τ = (τ_yz, τ_zx, τ_xy) = (τ_x, τ_y, τ_z) with

τ_x = τ_yz = yF_z – zF_y

τ_y = τ_zx = zF_x – xF_z

τ_z = τ_xy = xF_y – yF_x.

We also noted that this torque vector could be written as a cross product of a radius vector and the force: τ = r×F. Finally, we also pointed out the relation between the x-, y- and z-components of the torque vector and the plane of rotation:

(1) τ_x = τ_yz is rotational motion about the x-axis (i.e. motion in the yz-plane)

(2) τ_y = τ_zx is rotational motion about the y-axis (i.e. motion in the zx plane)

(3) τ_z = τ_xy is rotational motion about the z-axis (i.e. motion in the xy-plane)

The angular momentum vector L will have the same direction as the torque vector, but it’s the cross product of the radius vector and the momentum vector: L = r×p. For clarity, I reproduce the animation I used in my previous post once again.

How do we get that cross vector product for L? We noted that τ (i.e. the Greek tau) = dL/dt. So we need to take the time derivative of all three components of L. What are the components of L? They look very similar to those of τ:

L = (L_yz, L_zx, L_xy) = (L_x, L_y, L_z) with

L_x = L_yz = yp_z – zp_y

L_y = L_zx = zp_x – xp_z

L_z = L_xy = xp_y – yp_x.

Now, just check the time derivatives of L_x, L_y, and L_z and you’ll find the components of the torque vector τ. Together with the formulas above, that should be sufficient to convince you that L is, indeed, a vector cross product of r and p: L = r×p.

Again, if you feel this is too difficult, please read or re-read my previous post. But if you do understand everything, then you are ready for a much more difficult analysis, and that’s an explanation of why a spinning top does not fall as it rotates about.

In order to understand that explanation, we’ll first analyze the situation below. It resembles the experiment with the swivel chair that’s often described on ‘easy physics’ websites: the man below holds a spinning wheel with its axis horizontal, and then turns this axis into the vertical. As a result, the man starts to turn himself in the opposite direction.

Let’s now look at the forces and torques involved. These are shown below.

This looks very complicated–you’ll say! You’re right: it is quite complicated–but not impossible to understand. First note the vectors involved in the starting position: we have an angular momentum vector L₀ and an angular velocity vector ω₀. These are both axial vectors, as I explained in my previous post: their direction is perpendicular to the plane of motion, i.e. they are arrows along the axis of rotation. This is in line with what we wrote above: if an object is rotating in the zx-plane (which is the case here), then the angular momentum vector will have a y-component only, and so it will be directed along the y-axis. Which side? That’s determined by the right-hand screw rule. [Again, please do read my previous post for more details if you’d need them.]

So now we have explained L₀ and ω₀. What about all the other vectors? First note that there would be no torque if the man would not try to turn the axis. In that case, the angular momentum would just remain what it is, i.e. dL/dt = 0, and there would be no torque. Indeed, remember that τ = dL/dt, just like F = dp/dt, so dL/dt = 0, then τ = 0. But so the man is turning the axis of rotation and, hence, τ = dL/dt ≠ 0. What’s changing here is not the magnitude of the angular momentum but its direction. As usual, the analysis is in terms of differentials.

As the man turns the spinning wheel, the directional change of the angular momentum is defined by the angle Δθ, and we get a new angular momentum vector L₁. The difference between L₁ and L₀ is given by the vector ΔL. This ΔL vector is a tiny vector in the L₀L₁ plane and, because we’re looking at a differential displacement only, we can say that, for all practical purposes, this ΔL is orthogonal to L₀ (as we move from L₀ to L₁, we’re actually moving along an arc and, hence, ΔL is a tangential vector). Therefore, simple trigonometry allows us to say that its magnitude ΔL will be equal to L₀Δθ. [We should actually write sin(Δθ) but, because we’re talking differentials and measuring angles in radians (so the value reflects arc lengths), we can equate sin(Δθ) with Δθ).]

Now, the torque vector τ has the same direction as the ΔL vector (that’s obvious from their definitions), but what is its magnitude? That’s an easy question to answer: τ = ΔL/Δt = L₀Δθ/Δt = L₀ (Δθ/Δt). Now, this result induces us to define another axial vector which we’ll denote using the same Greek letter omega, but written as a capital letter instead of in lowercase: Ω. The direction of Ω is determined by using that right-hand screw rule which we’ve always been using, and Ω‘s magnitude is equal to Ω = Δθ/Δt. So, in short, Ω is an angular velocity vector just like ω: its magnitude is the speed with which the man is turning the axis of rotation of the spinning wheel, and its direction is determined using the same rules. If we do that, we get the rather remarkable result that we can write the torque vector τ as the cross product of Ω and L₀:

τ = Ω×L₀

Now, this is not an obvious result, so you should check it yourself. When doing that, you’ll note that the two vectors are orthogonal and so we have τ = Ω×L₀ = Ω×L₀ =|Ω||L₀|sin(π/2)n = ΩL₀n with n the normal unit vector given, once again, by the right-hand screw rule. [Note how the order of the two factors in a cross product matters: a×b = –b×a.]

You’re probably tired of this already, and so you’ll say: so what?

Well… We have a torque. A torque is produced by forces, and a torque vector along the z-axis is associated with rotation about the z-axis, i.e. rotation in the xy-plane. Such rotation is caused by the forces F and –F that produce the torque, as shown in the illustration. [Again, their direction is determined by the right-hand screw rule – but I’ll stop repeating that from now on.] But… Wait a minute. First, the direction is wrong, isn’t it? The man turns the other way in reality. And, second, where do these forces come from? Well… The man produces them, and the direction of the forces is not wrong: as the man applies these forces, with his hands, as he holds the spinning wheel and turns it into the vertical direction, equal and opposite forces act on him (cf. the action-reaction principle), and so he starts to turn in the opposite direction.

So there we are: we have explained this complex situation fully in terms of torques and forces now. So that’s good. [If you don’t believe the thing about those forces, just get one of your wheels out of your mountainbike, let it spin, and try to change the plane in which it is spinning: you’ll see you’ll need a bit of force. Not much, but enough, and it’s exactly the kind of force that the man in the illustration is experiencing.]

Now, what if we would not be holding the spinning wheel? What if we would let it pivot, for example? Well… It would just pivot, as shown below.

But… Why doesn’t it fall? Hah! There we are! Now we are finally ready for the analysis we really want to do, i.e. explaining why these spinning tops (or gyros as they’re referred to in physics) don’t fall.

Such spinning top is shown in the illustration below. It’s similar to the spinning wheel: there’s a rotational axis, and we have the force of gravity trying to change the direction of that axis, so it’s like the man turning that spinning wheel indeed, but so now it’s gravity exerting the force that’s needed to change the angular momentum. Let’s associate the vertical direction with the z-axis, and the horizontal place with the xy-axis, and let’s go step-by-step:

The gravitational force wants to pull that spinning top down. So the ΔL vector points downward this time, not upward. Hence, the torque vector will point downward too. But so it’s a torque pointing along the z-axis.
Such torque along the z-axis is associated with a rotation in the xy-plane, so that’s why the spinning top will slowly revolve about the z-axis, parallel to the xy-plane. This process is referred to as precession, and so there’s a precession torque and a precession angular velocity.

So that explains precession and so that’s all there is to it. Now you’ll complain, and rightly so: what I write above, does not explain why the spinning top does not actually fall. I only explained that precession movement. So what’s going on? That spinning top should fall as it precesses, shouldn’t it?

It actually does fall. The point to note, however, is that the precession movement itself changes the direction of the angular momentum vector as well. So we have a new ΔL vector pointing sideways, i.e. a vector in the horizontal plane–so not along the z axis. Hence, we should have a torque in the horizontal plane, and so that implies that we should have two equal and opposite forces acting along the z-axis.

In fact, the right-hand screw rule gives us the direction of those forces: if these forces were effectively applied to the spinning top, it would fall even faster! However, the point to note is that there are no such forces. Indeed, it is not like the man with the spinning wheel: no one (or nothing) is pushing or applying the forces that should produce the torque associated with this change in angular momentum. Hence, because these forces are absent, the spinning top begins to ‘fall’ in the opposite direction of the lacking force, thereby counteracting the gravitational force in such a way that the spinning top just spins about the z-axis without actually falling.

Now, this is, most probably, very difficult to understand in the way you would like to understand it, so just let it sink in and think about it for a while. In this regard, and to help the understanding, it’s probably worth noting that the actual process of reaching equilibrium is somewhat messy. It is illustrated below: if we hold a spinning gyro for a while and then, suddenly, we let it fall (yes, just let it go), it will actually fall. However, as it’s falling, it also starts turning and then, because it starts turning, it also starts ‘falling’ upwards, as explained in that story of the ‘missing force’ above. Initially, the upward movement will overshoot the equilibrium position, thereby slowing the gyro’s speed in the horizontal plane. And so then, because its horizontal speed becomes smaller, it stops ‘falling upward’, and so that means it’s falling down again. But then it starts turning again, and so on and so on. I hope you grasp this–more or less at least. Note that frictional effects will cause the up-and-down movement to dampen out, and so we get a so-called cycloidal motion dampening down to the steady motion we associate with spinning tops and gyros.

That, then, is the ‘miracle’ of a spinning top explained. Is it less of a ‘miracle’ now that we have explained it in terms of torques and missing forces? That’s an appreciation which each of us has to make for him- or herself. I actually find it all even more wonderful now that I can explain it more or less using the kind of math I used above–but then you may have a different opinion.

In any case, let us – to wrap it all up – ask some simple questions about some other spinning objects. What about the Earth for example? It has an axis of rotation too, and it revolves around the Sun. Is there anything like precession going on?

The first answer is: no, not really. The axis of rotation of the Earth changes little with respect to the stars. Indeed, why would it change? Changing it would require a torque, and where would the required force for such torque come from? The Earth is not like a gyro on a pivot being pulled down by some force we cannot see. The Sun attracts the Earth as a whole indeed. It does not change its axis of rotation. That’s why we have a fairly regular day and night cycle.

The more precise answer is: yes, there actually is a very slow axial precession. The whole precessional cycle takes approximately 26,000 years, and it causes the position of stars – as perceived by us, earthlings, that is – to slowly change. Over this cycle, the Earth’s north axial pole moves from where it is now, in a circle with an angular radius of about 23.5 degrees, as illustrated below.

What is this precession caused by? There must be some torque. There is. The Earth is not perfectly spherical: it bulges outward at the equator, and the gravitational tidal forces of the Moon and Sun apply some torque here, attempting to pull the equatorial bulge into the plane of the ecliptic, but instead causing it to precess. So it’s a quite subtle motion, but it’s there, and it’s got also something to do with the gravitational force. However, it’s got nothing to do with the way gravitation makes a spinning top do what it does. [The most amazing thing about this, in my opinion, is that, despite the fact that the precessional movement is so tiny, the Greeks had already discovered it: indeed, the Greek astronomer and mathematician Hipparchus of Nicaea gave a pretty precise figure for this so-called ‘precession of the equinoxes’ in 127 BC.]

What about electrons? Are they like gyros rotating around some pivot? Here the answer is very simple and very straightforward: No, not at all! First, there are no pivots in an atom. Second, the current understanding of an electron – i.e. the quantum-mechanical understanding of a electron – is not compatible with the classical notion of spin. Let me just copy an explanation from Georgia State University’s HyperPhyics website. It basically says it all:

“Experimental evidence like the hydrogen fine structure and the Stern-Gerlach experiment suggest that an electron has an intrinsic angular momentum, independent of its orbital angular momentum. These experiments suggest just two possible states for this angular momentum, and following the pattern of quantized angular momentum, this requires an angular momentum quantum number of 1/2. With this evidence, we say that the electron has spin 1/2. An angular momentum and a magnetic moment could indeed arise from a spinning sphere of charge, but this classical picture cannot fit the size or quantized nature of the electron spin. The property called electron spin must be considered to be a quantum concept without detailed classical analogy.”

So… I guess this should conclude my exposé on rotational motion. I am not sure what I am going to write about next, but I’ll see. 🙂

Post scriptum:

The above treatment is largely based on Feynman’s Lectures.(Vol. I, Chapter 18, 19 and 20). The subject could also be discussed using the concept of a force couple, aka pure moment. A force couple is a system of forces with a resultant moment but no resultant force. Hence, it causes rotation without translation or, more generally, without any acceleration of the centre of mass. In such analysis, we can say that gravity produces a force couple on the spinning top. The two forces of this couple are equal and opposite, and they pull at opposite ends. However, because one end of the top is fixed (friction forces keep the tip fixed to the ground), the force at the other end makes the top go about the vertical axis.

The situation we have is that gravity causes such force couple to appear, just like the man tilting the spinning wheel causes such force couple to appear. Now, the analysis above shows that the direction of the new force is perpendicular to the plane in which the axis of rotation changes, or wants to change in the case of our spinning top. So gravity wants to pull the top down and causes it to move sideways. This horizontal movement will, in turn, create another force couple. The direction of the resultant force, at the free end of the axis of rotation of the top, will, once again, be vertical, but it will oppose the gravity force. So, in a very simplified explanation of things, we could say:

Gravity pulls the top downwards, and causes a force that will make the top move sideways. So the new force, which causes the precession movement, is orthogonal to the gravitation force, i.e. it’s a horizontal force.
That horizontal force will, in turn, cause another force to appear. That force will also be orthogonal to the horizontal force. As we made two 90 degrees turns, so to say, i.e. 180 degrees in total, it means that this third force will be opposite to the gravitational force.
In equilibrium, we have three forces: gravity, the force causing the precession and, finally, a force neutralizing gravity as the spinning top precesses about the vertical axis.

This approach allows for a treatment that is somewhat more intuitive than Feynman’s concept of the ‘missing force.’