Maxwell-Boltzmann, Bose-Einstein and Fermi-Dirac statistics

I’ve discussed statistics, in the context of quantum mechanics, a couple of times already (see, for example, my post on amplitudes and statistics). However, I never took the time to properly explain those distribution functions which are referred to as the Maxwell-Boltzmann, Bose-Einstein and Fermi-Dirac distribution functions respectively. Let me try to do that now—without, hopefully, getting lost in too much math! It should be a nice piece, as it connects quantum mechanics with statistical mechanics, i.e. two topics I had nicely separated so far. :-)

You know the Boltzmann Law now, which says that the probabilities of different conditions of energy are given by e−energy/kT = 1/eenergy/kT. Different ‘conditions of energy’ can be anything: density, molecular speeds, momenta, whatever. The point is: we have some probability density function f, and it’s a function of the energy E, so we write:

f(E) = C·e−energy/kT = C/eenergy/kT

C is just a normalization constant (all probabilities have to add up to one, so the integral of this function over its domain must be one), and k and T are also usual suspects: T is the (absolute) temperature, and k is the Boltzmann constant, which relates the temperate to the kinetic energy of the particles involved. We also know the shape of this function. For example, when we applied it to the density of the atmosphere at various heights (which are related to the potential energy, as P.E. = m·g·h), assuming constant temperature, we got the following graph. The shape of this graph is that of an exponential decay function (we’ll encounter it again, so just take a mental note of it).


A more interesting application is the quantum-mechanical approach to the theory of gases, which I introduced in my previous post. To explain the behavior of gases under various conditions, we assumed that gas molecules are like oscillators but that they can only take on discrete levels of energy. [That’s what quantum theory is about!] We denoted the various energy levels, i.e. the energies of the various molecular states, by E0, E1, E2,…, Ei,…, and if Boltzmann’s Law applies, then the probability of finding a molecule in the particular state Ei is proportional to e−Ei /kT. We can then calculate the relative probabilities, i.e. the probability of being in state Ei, relative to the probability of being in state E0, is:

Pi/P0 = e−Ei /kT/e−E0 /kT = e−(Ei–E0)/kT = 1/e(Ei–E0)/kT

Now, Pi obviously equals ni/N, so it is the ratio of the number of molecules in state Ei (ni) and the total number of molecules (N). Likewise, P0 = n0/N and, therefore, we can write:

ni/ne−(Ei−E0)/kT = 1/e(Ei–E0)/kT

This formulation is just another Boltzmann Law, but it’s nice in that it introduces the idea of a ground state, i.e. the state with the lowest energy level. We may or may not want to equate E0 with zero. It doesn’t matter really: we can always shift all energies by some arbitrary constant because we get to choose the reference point for the potential energy.

So that’s the so-called Maxwell-Boltzmann distribution. Now, in my post on amplitudes and statistics, I had jotted down the formulas for the other distributions, i.e. the distributions when we’re not talking classical particles but fermions and/or bosons. As you know, fermions are particles governed by the Fermi exclusion principle: indistinguishable particles cannot be together in the same state. For bosons, it’s the other way around: having one in some quantum state actually increases the chance of finding another one there, and we can actually have an infinite number of them in the same state.

We also know that fermions and bosons are the real world: fermions are the matter-particles, bosons are the force-carriers, and our ‘Boltzmann particles’ are nothing but a classical approximation of the real world. Hence, even if we can’t see them in the actual world, the Fermi-Dirac and Bose-Einstein distributions are the real-world distributions. :-) Let me jot down the equations once again:

Fermi-Dirac (for fermions): f(E) = 1/[Ae−(E − EF)/kT + 1]

Bose-Einstein (for bosons):  f(E) = 1/[Ae−E/kT − 1]

We’ve got some other normalization constant here (A), which we shouldn’t be too worried about—for the time being, that is. Now, to see how these distributions are different from the Maxwell-Boltzmann distribution (which we should re-write as f(E) = C·e−E/kT = 1/[A·eE/kT] so as to make all formulas directly comparable), we should just make a graph. Please go online to find a graph tool (I found a new one recently—really easy to use), and just do it. You’ll see they are all like that exponential decay function. However, in order to make a proper comparison, we would actually need to calculate the normalization coefficients and, for the Fermi energy, we would also need the Fermi energy E(note that, for simplicity, we did equate E0 with zero). Now, we could give it a try, but it’s much easier to google and find an example online.

The HyperPhysics website of Georgia State University gives us one: the example assumes 6 particles and 9 energy levels, and the table and graph below compare the Maxwell-Boltzmann and Bose-Einstein distributions for the model.

Graph Table

Now that is an interesting example, isn’t it? In this example (but all depends on its assumptions, of course), the Maxwell-Boltzmann and Bose-Einstein distributions are almost identical. Having said that, we can clearly see that the lower energy states are, indeed, more probable with Bose-Einstein statistics than with the Maxwell-Boltzmann statistics. While the difference is not dramatic at all in this example, the difference does become very dramatic, in reality, with large numbers (i.e. high matter density) and, more importantly, at very low temperatures, at which bosons can condense into the lowest energy state. This phenomenon is referred to as Bose-Einstein condensation: it causes superfluidity and superconductivity, and it’s real indeed: it has been observed with supercooled He-4, which is not an everyday substance, but real nevertheless!

What about the Fermi-Dirac distribution for this example? The Fermi-Dirac distribution is given below: the lowest energy state is now less probable, the mid-range energies much more, and none of the six particles occupy any of the four highest energy levels. Again, while the difference is not dramatic in this example, it can become very dramatic, in reality, with large numbers (read: high matter density) and very low temperatures: at absolute zero, all of the possible energy states up to the Fermi energy level will be occupied, and all the levels above the Fermi energy will be vacant.

graph 2 Table 2

What can we make out of all of this? First, you may wonder why we actually have more than one particle in one state above: doesn’t that contradict the Fermi exclusion principle? No. We need to distinguish micro- and macro-states. In fact, the example assumes we’re talking electrons here, and so we can have two particles in each energy state—with opposite spin, however. At the same time, it’s true we cannot have three, or more, in any state. That results, in the example we’re looking at here, in five possible distributions only, as shown below.

Table 3

The diagram is an interesting one: if the particles were to be classical particles, or bosons, then 26 combinations are possible, including the five Fermi-Dirac combinations, as shown above. Note the little numbers above the 26 possible combinations (e.g. 6, 20, 30,… 180): they are proportional to the likelihood of occurring under the Maxwell-Boltzmann assumption (so if we assume the particles are ‘classical’ particles). Let me introduce you to the math behind the example by using the diagram below, which shows three possible distributions/combinations (I know the terminology is quite confusing—sorry for that!).

table 4

If we could distinguish the particles, then we’d have 2002 micro-states, which is the total of all those little numbers on top of the combinations that are shown (6+60+180+…). However, the assumption is that we cannot distinguish the particles. Therefore, the first combination in the diagram above, with five particles in the zero energy state and one particle in state 9, occurs 6 times into 2002 and, hence, it has a probability of 6/2002 ≈ 0.003 only. In contrast, the second combination is 10 times more likely, and the third one is 30 times more likely! In any case, the point is, in the classical situation (and in the Bose-Einstein hypothesis as well), we have 26 possible macro-states, as opposed to 5 only for fermions, and so that leads to a very different density function. Capito?

No? Well, this blog is not a textbook on physics and, therefore, I should refer you to the mentioned site once again, which references a 1992 textbook on physics (Frank Blatt, Modern Physics, 1992) as the source of this example. However, I won’t do that: you’ll find the details in the Post Scriptum to this post. :-)

Let’s first focus on the fundamental stuff, however. The most burning question is: if the real world consists of fermions and bosons, why is that that we only see the Maxwell-Boltzmann distribution in our actual (non-real?) world? :-) The answer is that both the Fermi-Dirac and Bose-Einstein distribution approach the Maxwell–Boltzmann distribution if higher temperatures and lower particle densities are involved. In other words, we cannot see the Fermi-Dirac distributions (all matter is fermionic, except for weird stuff like superfluid helium-4 at 1 or 2 degrees Kelvin), but they are there!

Let’s approach it mathematically: the most general formula, encompassing both Fermi-Dirac and Bose-Einstein statistics, is:

Ni(Ei) ∝ 1/[e(Ei − μ)/kT ± 1]

If you’d google, you’d find a formula involving an additional coefficient, gi, which is the so-called degeneracy of the energy level Ei. I included it in the formula I used in the above-mentioned post of mine. However, I don’t want to make it any more complicated than it already is and, therefore, I omitted it this time. What you need to look at are the two terms in the denominator: e(Ei − μ)/kT and ± 1.

From a math point of view, it is obvious that the values of e(Ei − μ)/kT + 1 (Fermi-Dirac) and e(Ei − μ)/kT − 1 (Bose-Einstein) will approach each other if e(Ei − μ)/kT is much larger than ±1, so if e(Ei − μ)/kT >> 1. That’s the case, obviously, if the (Ei − μ)/kT ratio is large, so if (Ei − μ) >> kT. In fact, (Ei − μ) should, obviously, be much larger than kT for the lowest energy levels too! Now, the conditions under which that is the case are associated with the classical situation (such as a cylinder filled with gas, for example). Why?

Well… […] Again, I have to say that this blog can’t substitute for a proper textbook. Hence, I am afraid I have to leave it to you to do the necessary research to see why. :-) The non-mathematical approach is to simple note that quantum effects, i.e. the ±1 term, only apply if the concentration of particles is high enough. Indeed, quantum effects appear if the concentration of particles is higher than the so-called quantum concentration. Only when the quantum concentration is reached, particles will start interacting according to what they are, i.e. as bosons or as fermions. At higher temperature, that concentration will not be reached, except in massive objects such as a white dwarf (white dwarfs are stellar remnants with the mass like that of the Sun but a volume like that of the Earth). So, in general, we can say that at higher temperatures and at low concentration we will not have any quantum effects. That should settle the matter—as for now, at least.

You’ll have one last question: we derived Boltzmann’s Law from the kinetic theory of gases, but how do we derive that Ni(Ei) = 1/[Ae(Ei − μ)/kT ± 1] expression? Good question but, again, we’d need more than a few pages to explain that! The answer is: quantum mechanics, of course! Go check it out in Feynman’s third Volume of Lectures! :-)

Post scriptum: combinations, permutations and multiplicity

The mentioned example from HyperPhysics is really interesting, if only because it shows you also need to master a bit of combinatorics to get into quantum mechanics. Let’s go through the basics. If we have n distinct objects, we can order hem in n! ways, with n! (read: n factorial) equal to n·(n–1)·(n–2)·…·3·2·1. Note that 0! is equal to 1, per definition. We’ll need that definition.

For example, a red, blue and green ball can be ordered in 3·2·1 = 6 ways. Each way is referred to as a permutation.

Besides permutations, we also have the concept of a k-permutation, which we can denote in a number of ways but let’s choose P(n, k). [The P stands for permutation here, not for probability.] P(n, k) is the number of ways to pick k objects out of a set of n objects. Again, the objects are supposed to be distinguishable. The formula is P(n, k) = n·(n–1)·(n–2)·…·(n–k+1) = n!/(n–k)!. That’s easy to understand intuitively: on your first pick you have n choices; on your second, n–1; on your third, n–2, etcetera. When n = k, we obviously get n! again.

There is a third concept: the k-combination (as opposed to the k-permutation), which we’ll denote by C(n, k). That’s when the order within our subset doesn’t matter: an ace, a queen and a jack taken out of some card deck are a queen, a jack, and an ace: we don’t care about the order. If we have k objects, there are k! ways of ordering them and, hence, we just have to divide P(n, k) by k! to get C(n, k). So we write: C(n, k) = P(n, k)/k! = n!/[(n–k)!k!]. You recognize C(n, k): it’s the binomial coeficient.

Now, the HyperPhysics example illustrating the three mentioned distributions (Maxwell-Boltzmann, Bose-Einstein and Fermi-Dirac) is a bit more complicated: we need to associate q energy levels with N particles. Every possible configuration is referred to as a micro-state, and the total number of possible micro-states is referred to as the multiplicity of the system, denoted by Ω(N, q). The formula for Ω(N, q) is another binomial coefficient: Ω(N, q) = (q+N–1)!/[q!(N–1)!]. Ω(N, q) = Ω(6, 9) = (9+6–1)!/[9!(6–1)!] = 2002.

In our example, however, we do not have distinct particles and, therefore, we only have 26 macro-states (as opposed to 2002 micro-states), which are also referred to, confusingly, as distributions or combinations.

Now, the number of micro-states associated with the same macro-state is given by yet another formula: it is equal to N!/[n1!·n2!·n3!·…·nq!], with ni! the number of particles in level i. [See why we need the 0! = 1 definition? It ensures unoccupied states do not affect the calculation.] So that’s how we get those numbers 6, 60 and 180 for those three macro-states.

But how do we calculate those average numbers of particles for each energy level? In other words, how do we calculate the probability densities under the Maxwell-Boltzmann, Fermi-Dirac and Bose-Einstein hypothesis respectively?

For the Maxwell-Boltzmann distribution, we proceed as follows: for each energy level j (or Ej, I should say), we calculate n= ∑nij·Pi over all macro-states i. In this summation, we have nij, which is the number of particles in energy level j in micro-state i, while Pi is the probability of macro-state i as calculated by the ratio of (i) the number of micro-states associated with macro-state i and (ii) the total number of micro-states. For Pi, we gave the example of 3/2002 ≈ 0.3%. For 60 and 180, we get 60/2002 ≈ 3% and 180/2002 ≈ 9%. Calculating all the nj‘s for j ranging from 1 to 9 should yield the numbers and the graph below indeed.

M-B graphOK. That’s how it works for Maxwell-Boltzmann. Now, it is obvious that the Fermi-Dirac and the Bose-Einstein distribution should not be calculated in the same way because, if they were, they would not be different from the Maxwell-Boltzmann distribution! The trick is as follows.

For the Bose-Einstein distribution, we give all macro-states equal weight—so that’s a weight of one, as shown below. Hence, the probability Pi  is, quite simply, 1/26 ≈ 3.85% for all 26 macro-states. So we use the same n= ∑nij·Pformula but with Pi = 1/26.


Finally, I already explained how we get the Fermi-Dirac distribution: we can only have (i) one, (ii) two, or (iii) zero fermions for each energy level—not more than two! Hence, out of the 26 macro-states, only five are actually possible under the Fermi-Dirac hypothesis, as illustrated below once more. So it’s a very different distribution indeed!

Table 3

Now, you’ll probably still have questions. For example, why does the assumption, for the Bose-Einstein analysis, that macro-states have equal probability favor the lower energy states? The answer is that the model also integrates other constraints: first, when associating a particle with an energy level, we do not favor one energy level over another, so all energy levels have equal probability. However, at the same time, the whole system has some fixed energy level, and so we cannot put the particles in the higher energy levels only! At the same time, we know that, if we have q particles, and the probability of a particle having some energy level j is the same for all j, then they are likely not to be all at the same energy level: they’ll be distributed, effectively, as evidenced by the very low chance (0.3% only) of having 5 particles in the ground state and 1 particle at a higher level, as opposed to the 3% and 9% chance of the other two combinations shown in that diagram with three possible Maxwell-Boltzmann (MB) combinations.

So what happens when assigning an equal probability to all 26 possible combinations (with value 1/26) is that the combinations that were previously rather unlikely – because they did have a rather heavy concentration of particles in the ground state only – are now much more likely. So that’s why the Bose-Einstein distribution, in this example at least, is skewed towards the lowest energy level—as compared to the Maxwell-Boltzmann distribution, that is.

So that’s what’s behind, and that should also answer the other question you surely have when looking at those five acceptable Fermi-Dirac configurations: why don’t we have the same five configurations starting from the top down, rather than from the bottom up? Now you know: such configuration would have much higher energy overall, and so that’s not allowed under this particular model.

There’s also this other question: we said the particles were indistinguishable, but so then we suddenly say there can be two at any energy level, because their spin is opposite. It’s obvious this is rather ad hoc as well. However, if we’d allow only one particle at any energy level, we’d have no allowable combinations and, hence, we’d have no Fermi-Dirac distribution at all in this example.

In short, the example is rather intuitive, which is actually why I like it so much: it shows how bosonic and fermionic behavior appear rather gradually, as a consequence of variables that are defined at the system level, such as density, or temperature. So, yes, you’re right if you think the HyperPhysics example lacks rigor. That’s why I think it’s such wonderful pedagogic device. :-)

Maxwell-Boltzmann, Bose-Einstein and Fermi-Dirac statistics

The Quantum-Mechanical Gas Law

In my previous posts, it was mentioned repeatedly that the kinetic theory of gases is not quite correct: the experimentally measured values of the so-called specific heat ratio (γ) vary with temperature and, more importantly, their values differ, in general, from what classical theory would predict. It works, more or less, for noble gases, which do behave as ideal gases and for which γ is what the kinetic theory of gases would want it to be: γ = 5/3—but we get in trouble immediately, even for simple diatomic gases like oxygen or hydrogen, as illustrated below: the theoretical value is 9/7 (so that’s 1.286, more or less), but the measured value is very different.

Heat ratioLet me quickly remind you how we get the theoretical number. According to classical theory, a diatomic molecule like oxygen can be represented as two atoms connected by a spring. Each of the atoms absorbs kinetic energy, and for each direction of motion (x, y and z), that energy is equal to kT/2, so the kinetic energy of both atoms – added together – is 2·3·kT/2 = 3kT. However, I should immediately add that not all of that energy is to be associated with the center-of-mass motion of the whole molecule, which determines the temperature of the gas: that energy is and remains equal to the 3kT/2, always. We also have rotational and vibratory motion. The molecule can rotate in two independent directions (and any combination of these directions, of course) and, hence, rotational motion is to absorb an amount of energy equal to 2·kT/2 = kT. Finally, the vibratory motion is to be analyzed as any other oscillation, so like a spring really. There is only one dimension involved and, hence, the kinetic energy here is just kT/2. However, we know that the total energy in an oscillator is the sum of the kinetic and potential energy, which adds another kT/2 term. Putting it all together, we find that the average energy for each diatomic particle is (or should be) equal to 7·kT/2 = (7/2)kT. Now, as mentioned above, the temperature of the gas (T) is proportional to the mean molecular energy of the center-of-mass motion only (in fact, that’s how temperature is defined), with the constant of proportionality equal to 3k/2. Hence, for monatomic ideal gases, we can write: U = N·(3k/2)T and, therefore, PV = NkT = (2/3)·U. Now, γ appears as follows in the ideal gas law: PV = (γ–1)U. Therefore, γ = 2/3 + 1 = 5/3, but so that’s for monatomic ideal gases only! The total kinetic energy of our diatomic molecule is U = N·(7k/2)T and, therefore, PV = (2/7)·U. So γ must be γ = 2/7 + 1 = 9/7 ≈ 1.286 for diatomic gases, like oxygen and hydrogen.

Phew! So that’s the theory. However, as we can see from the diagram, γ approaches that value only when we heat the gas to a few thousand degrees! So what’s wrong? One assumption is that certain kinds of motions “freeze out” as the temperature falls—although it’s kinda weird to think of something ‘freezing out’ at a thousand degrees Kelvin! In any case, at the end of the 19th century, that was the assumption that was advanced, very reluctantly, by scientists such as James Jeans. However, the mystery was about to be solved then, as Max Planck, even more reluctantly, presented his quantum theory of energy at the turn of the century itself.

But the quantum theory was confirmed and so we should now see how we can apply it to the behavior of gas. In my humble view, it’s a really interesting analysis, because we’re applying quantum theory here to a phenomenon that’s usually being analyzed as a classical problem only.

Boltzmann’s Law

We derived Boltzmann’s Law in our post on the First Principles of Statistical Mechanics. To be precise, we gave Boltzmann’s Law for the density of a gas (which we denoted by n = N/V)  in a force field, like a gravitational field, or in an electromagnetic field (assuming our gas particles are electrically charged, of course). We noted, however, Boltzmann’s Law was also applicable to much more complicated situations, like the one below, which shows a potential energy function for two molecules that is quite characteristic of the way molecules actually behave: when they come very close together, they repel each other but, at larger distances, there’s a force of attraction. We don’t really know the forces behind but we don’t need to: as long as these forces are conservative, they can combine in whatever way they want to combine, and Boltzmann’s Law will be applicable. [It should be obvious why. If you hesitate, just think of the definition of work and how it affects potential energy and all that. Work is force times distance, but when doing work, we’re also changing potential energy indeed! So if we’ve got a potential energy function, we can get all the rest.]

randomBoltzmann’s Law itself is illustrated by the graph below, which also gives the formula for it: n = n0·e−P.E/kT.


It’s a graph starting at n = n0 for P.E. = 0, and it then decreases exponentially. [Funny expression, isn’t it? So as to respect mathematical terminology, I should say that it decays exponentially.] In any case, if anything, Boltzmann’s Law shows the natural exponential function is quite ‘natural’ indeed, because Boltzmann’s Law pops up in Nature everywhere! Indeed, Boltzmann’s Law is not limited to functions of potential energy only. For example, Feynman derives another Boltzmann Law for the distribution of molecular speeds or, so as to ensure the formula is also valid in relativity, the distribution of molecular momenta. In case you forgot, momentum (p) is the product of mass (m) and velocity (u), and the relevant Boltzmann Law is:

f(p)·dp = C·e−K.E/kT·dp

The argument is not terribly complicated but somewhat lengthy, and so I’ll refer you to the link for more details. As for the f(p) function (and the dp factor on both sides of the equation), that’s because we’re not talking exact values of p but some range equal to dp and some probability of finding particles that have a momentum within that range. The principle is illustrated below for molecular speeds (denoted by u = p/m), so we have a velocity distribution below. The illustration for p would look the same: just substitute u for p.


Boltzmann’s Law can be stated, much more generally, as follows:

The probability of different conditions of energy (E), potential or kinetic, is proportional to e−E/kT

As Feynman notes, “This is a rather beautiful proposition, and a very easy thing to remember too!” It is, and we’ll need it for the next bit.

The quantum-mechanical theory of gases

According to quantum theory, energy comes in discrete packets, quanta, and any system, like an oscillator, will only have a discrete set of energy levels, i.e. states of different energy. An energy state is, obviously, a condition of energy and, hence, Boltzmann’s Law applies. More specifically, if we denote the various energy levels, i.e. the energies of the various molecular states, by E0, E1, E2,…, Ei,…, and if Boltzmann’s Law applies, then the probability of finding a molecule in the particular state Ei will be proportional to e−Ei /kT.

Now, we know we’ve got some constant there, but we can get rid of that by calculating relative probabilities. For example, the probability of being in state E1, relative to the probability of being in state E0, is:

P1/P0 = e−E1 /kT/e−E0 /kT = e−(E1–E0)/kT

But the relative probability Pshould, obviously, also be equal to the ratio n1/N, i.e. the ratio of the number of molecules in state E1 and the total number of molecules. Likewise, P= n0/N. Hence, P1/P0 = n1/nand, therefore, we can write:

n = n0e−(E1–E0)/kT

What can we do with that? Remember we want to explain the behavior of non-monatomic gas—like diatomic gas, for example. Now we need some other assumption, obviously. As it turns out, the assumption that we can represent a system as some kind of oscillation still makes sense! In fact, the assumption that our diatomic molecule is like a spring is equally crucial to our quantum-theoretical analysis of gases as it is to our classical kinetic theory of gases. To be precise, in both theories, we look at it as a harmonic oscillator.

Don’t panic. A harmonic oscillator is, quite simply, a system that, when displaced from its equilibrium position, experiences some kind of restoring force. Now, for it to be harmonic, the force needs to be linear. For example, when talking springs, the restoring force F will be proportional to the displacement x). It basically means we can use a linear differential equation to analyze the system, like m·(d2x/dt2) = –kx. […] I hope you recognize this equation, because you should! It’s Newton’s Law: F = m·a with F = –k·x. If you remember the equation, you’ll also remember that harmonic oscillations were sinusoidal oscillations with a constant amplitude and a constant frequency. That frequency did not depend on the amplitude: because of the sinusoidal function involved, it was easier to write that frequency as an angular frequency, which we denoted by ω0 and which, in the case of our spring, was equal to ω0 = (k/m)1/2. So it’s a property of the system. Indeed, ωis the square root of the ratio of (1) k, which characterizes the spring (it’s its stiffness), and (2) m, i.e. the mass on the spring. Solving the differential equation yielded x = A·cos(ω0t + Δ) as a general solution, with A the (maximum) amplitude, and Δ some phase shift determined by our t = 0 point. Let me quickly jot down too more formulas: the potential energy in the spring is kx2/2, while its kinetic energy is mv2/2, as usual (so the kinetic energy depends on the mass and its velocity, while the potential energy only depends on the displacement and the spring’s stiffness). Of course, kinetic and potential energy add up to the total energy of the system, which is constant and proportional to the square of the (maximum) amplitude: K.E. + P.E. = E ∝ A2. To be precise, E = kA2/2.

That’s simple enough. Let’s get back to our molecular oscillator. While the total energy of an oscillator in classical theory can take on any value, Planck challenged that assumption: according to quantum theory, it can only take up energies equal to ħω at a time. [Note that we use the so-called reduced Planck constant here (i.e. h-bar), because we’re dealing with angular frequencies.] Hence, according to quantum theory, we have an oscillator with equally spaced energy levels, and the difference between them is ħω. Now, ħω is terribly tiny—but it’s there. Let me visualize what I just wrote:


So our expression for P1/P0 becomes P1/P0 = e−ħω/kT/e−0/kT = e−ħω/kT. More generally, we have Pi/P0 = e−i·ħω/kT. So what? Well… We’ve got a function here which gives the chance of finding a molecule in state Pi relative to that of finding it in state E0, and it’s a function of temperature. Now, the graph below illustrates the general shape of that function. It’s a bit peculiar, but you can see that the relative probability goes up and down with temperature. The graph makes it clear that, at extremely low temperatures, most particles will be in state E0 and, of course, the internal energy of our body of gas will be close to nil.


Now, we can look at the oscillators in the bottom state (i.e. particles in the molecular energy state E0) as being effectively ‘frozen': they don’t contribute to the specific heat. However, as we increase the temperature, our molecules gradually begin to have an appreciable probability to be in the second state, and then in the next state, and so on, and so the internal energy of the gas increases effectively. Now, when the probability is appreciable for many states, the quantized states become nearly indistinguishable and, hence, the situation is like classical physics: it is nearly indistinguishable from a continuum of energies.

Now, while you can imagine such analysis should explain why the specific heat ratio for oxygen and hydrogen varies as it does in the very first graph of this post, you can also imagine the details of that analysis fill quite a few pages! In fact, even Feynman doesn’t include it in his Lectures. What he does include is the analysis of the blackbody radiation problem, which is remarkably similar. So… Well… For more details on that, I’ll refer you to Feynman indeed. :-)

I hope you appreciated this little ‘lecture’, as it sort of wraps up my ‘series’ of posts on statistical mechanics, thermodynamics and, central to both, the classical theory of gases. Have fun with it all!

The Quantum-Mechanical Gas Law

Entropy, energy and enthalpy

Phew! I am quite happy I got through Feynman’s chapters on thermodynamics. Now is a good time to review the math behind it. We thoroughly understand the gas equation now:

PV = NkT = (γ–1)U

The gamma (γ) in this equation is the specific heat ratio: it’s 5/3 for ideal gases (so that’s about 1.667) and, theoretically, 4/3 ≈ 1.333 or 9/7 ≈ 1.286 for diatomic gases, depending on the degrees of freedom we associate with diatomic molecules. More complicated molecules have even more degrees of freedom and, hence, can absorb even more energy, so γ gets closer to one—according to the kinetic gas theory, that is. While we know that the kinetic gas theory is not quite accurate – an approach involving molecular energy states is a better match for reality – that doesn’t matter here. As for the term (specific heat ratio), I’ll explain that later. [I promise. :-) You’ll see it’s quite logical.]

The point to note is that this body of gas (or whatever substance) stores an amount of energy U that is directly proportional to the temperature (T), and Nk/(γ–1) is the constant of proportionality. We can also phrase it the other way around: the temperature is directly proportional to the energy, with (γ–1)/Nk the constant of proportionality. It means temperature and energy are in a linear relationship. [Yes, direct proportionality implies linearity.] The graph below shows the T = [(γ–1)/Nk]·U relationship for three different values of γ, ranging from 5/3 (i.e. the maximum value, which characterizes monatomic noble gases such as helium, neon or krypton) to a value close to 1, which is characteristic of more complicated molecular arrangements indeed, such as heptane (γ = 1.06) or methyl butane ((γ = 1.08). The illustration shows that, unlike monatomic gas, more complicated molecular arrangements allow the gas to absorb a lot of (heat) energy with a relatively moderate rise in temperature only.

CaptureWe’ll soon encounter another variable, enthalpy (H), which is also linearly related to energy: H = γU. From a math point of view, these linear relationships don’t mean all that much: they just show these variables – temperature, energy and enthalphy – are all directly related and, hence, can be defined in terms of each other.

We can invent other variables, like the Gibbs energy, or the Helmholtz energy. In contrast, entropy, while often being mentioned as just some other state function, is something different altogether. In fact, the term ‘state function’ causes a lot of confusion: pressure and volume are state variables too. The term is used to distinguish these variables from so-called process functions, notably heat and work. Process functions describe how we go from one equilibrium state to another, as opposed to the state variables, which describe the equilibrium situation itself. Don’t worry too much about the distinction—for now, that is.

Let’s look at non-linear stuff. The PV = NkT = (γ–1)U says that pressure (P) and volume (V) are inversely proportional one to another, and so that’s a non-linear relationship. [Yes, inverse proportionality is non-linear.] To help you visualize things, I inserted a simple volume-pressure diagram below, which shows how pressure and volume are related for three different values of U (or, what amounts to the same, three different values of T).

graph 2

The curves are simple hyperbolas which have the x- and y-axis as horizontal and vertical asymptote respectively. If you’ve studied social sciences (like me!) – so if you know a tiny little bit of the ‘dismal science’, i.e. economics (like me!) – you’ll note they look like indifference curves. The x- and y-axis then represent the quantity of some good X and some good Y respectively, and the curves closer to the origin are associated with lower utility. How much X and Y we will buy then, depends on (a) their price and (b) our budget, which we represented by a linear budget line tangent to the curve we can reach with our budget, and then we are a little bit happy, very happy or extremely happy, depending on our budget. Hence, our budget determines our happiness. From a math point of view, however, we can also look at it the other way around: our happiness determines our budget. [Now that‘s a nice one, isn’t it? Think about it! :-) And, in the process, think about hyperbolas too: the y = 1/x function holds the key to understanding both infinity and nothingness. :-)]

U is a state function but, as mentioned above, we’ve got quite a few state variables in physics. Entropy, of course, denoted by S—and enthalpy too, denoted by H. Let me remind you of the basics of the entropy concept:

  1. The internal energy U changes because (a) we add or remove some heat from the system (ΔQ), (b) because some work is being done (by the gas on its surroundings or the other way around), or (c) because of both. Using the differential notation, we write: dU = dQ – dW, always. The (differential) work that’s being done is PdV. Hence, we have dU = dQ – PdV.
  2. When transferring heat to a system at a certain temperature, there’s a quantity we refer to as the entropy. Remember that illustration of Feynman’s in my post on entropy: we go from one point to another on the temperature-volume diagram, taking infinitesimally small steps along the curve, and, at each step, an infinitesimal amount of work dW is done, and an infinitesimal amount of entropy dS = dQ/T is being delivered.
  3. The total change in entropy, ΔS, is a line integral: ΔS = ∫dQ/T = ∫dS.

That’s somewhat tougher to understand than economics, and so that’s why it took me more time to come with terms with it. :-) Just go through Feynman’s Lecture on it, or through that post I referenced above. If you don’t want to do that, then just note that, while entropy is a very mysterious concept, it’s deceptively simple from a math point of view: ΔS = ΔQ/T, so the (infinitesimal) change in entropy is, quite simply, the ratio of (1) the (infinitesimal or incremental) amount of heat that is being added or removed as the system goes from one state to another through a reversible process and (2) the temperature at which the heat is being transferred. However, I am not writing this post to discuss entropy once again. I am writing it to give you an idea of the math behind the system.

So dS = dQ/T. Hence, we can re-write dU = dQ – dW as:

dU = TdS – PdV ⇔ dU + d(PV) = TdS – PdV + d(PV)

⇔ d(U + PV) = dH = TdS – PdV + PdV + VdP = TdS + VdP

The U + PV quantity on the left-hand side of the equation is the so-called enthalpy of the system, which I mentioned above. It’s denoted by H indeed, and it’s just another state variable, like energy: same-same but different, as they say in Asia. We encountered it in our previous post also, where we said that chemists prefer to analyze the behavior of substances using temperature and pressure as ‘independent variables’, rather than temperature and volume. Independent variables? What does that mean, exactly?

According to the PV = NkT equation, we only have two independent variables: if we assign some value to two variables, we’ve got a value for the third one. Indeed, remember that other equation we got when we took the total differential of U. We wrote U as U(V, T) and, taking the total differential, we got:

dU = (∂U/∂T)dT + (∂U/∂V)dV

We did not need to add a (∂U/∂P)dP term, because the pressure is determined by the volume and the temperature. We could also have written U = U(P, T) and, therefore, that dU = (∂U/∂T)dT + (∂U/∂P)dP. However, when working with temperature and pressure as the ‘independent’ variables, it’s easier to work with H rather than U. The point to note is that it’s all quite flexible really: we have two independent variables in the system only. The third one (and all of the other variables really, like energy or enthalpy or whatever) depend on the other two. In other words, from a math point of view, we only have two degrees of freedom in the system here: only two variables are actually free to vary. :-)

Let’s look at that dH = TdS + VdP equation. That’s a differential equation in which not temperature and pressure, but entropy (S) and pressure (P) are ‘independent’ variables, so we write:

dH(S, P) = TdS + VdP

Now, it is not very likely that we will have some problem to solve with data on entropy and pressure. At our level of understanding, any problem that’s likely to come our way will probably come with data on more common variables, such as the heat, the pressure, the temperature, and/or the volume. So we could continue with the expression above but we don’t do that. It makes more sense to re-write the expression substituting TdS for dQ once again, so we get:

dH = dQ + VdP

That resembles our dU = dQ – PdV expression: it just substitutes V for –P. And, yes, you guessed it: it’s because the two expressions resemble each other that we like to work with H now. :-) Indeed, we’re talking the same system and the same infinitesimal changes and, therefore, we can use all the formulas we derived already by just substituting H for U, V for –P, and dP for dV. Huh? Yes. It’s a rather tricky substitution. If we switch V for –P (or vice versa) in a partial derivative involving T, we also need to include the minus sign. However, we do not need to include the minus sign when substituting dV and dP, and we also don’t need to change the sign of the partial derivatives of U and H when going from one expression to another! It’s a subtle and somewhat weird point, but a very important one! I’ll explain it in a moment. Just continue to read as for now. Let’s do the substitution using our rules:

dU = (∂Q/∂T)VdT + [T(∂P/∂T)V − P]dV becomes:

dH = (∂Q/∂T)PdT + (∂H/∂P)TdP = CPdT + [–T·(∂V/∂T)P + V]dP

Note that, just as we referred to (∂Q/∂T)as the specific heat capacity of a substance at constant volume, which we denoted by CV, we now refer to (∂Q/∂T)P as the specific heat capacity at constant pressure, which we’ll denote, logically, as CP. Dropping the subscripts of the partial derivatives, we re-write the expression above as:

dH = CPdT + [–T·(∂V/∂T) + V]dP

So we’ve got what we wanted: we switched from an expression involving derivatives assuming constant volume to an expression involving derivatives assuming constant pressure. [In case you wondered what we wanted, this is it: we wanted an equation that helps us to solve another type of problem—another formula for a problem involving a different set of data.]

As mentioned above, it’s good to use subscripts with the partial derivatives to emphasize what changes and what is constant when calculating those partial derivatives but, strictly speaking, it’s not necessary, and you will usually not find the subscripts when googling other texts. For example, in the Wikipedia article on enthalpy, you’ll find the expression written as:

dH = CPdT + V(1–αT)dP with α = (1/V)(∂V/∂T)

Just write it all out and you’ll find it’s the same thing, exactly. It just introduces another coefficient, α, i.e. the coefficient of (cubic) thermal expansion. If you find this formula is easier to remember, then please use this one. It doesn’t matter.

Now, let’s explain that funny business with the minus signs in the substitution. I’ll do so by going back to that infinitesimal analysis of the reversible cycle in my previous post, in which we had that formula involving ΔQ for the work done by the gas during an infinitesimally small reversible cycle: ΔW = ΔVΔP = ΔQ·(ΔT/T). Now, we can either write that as:

  1. ΔQ = T·(ΔP/ΔT)·ΔV = dQ = T·(∂P/∂T)V·dV – which is what we did for our analysis of (∂U/∂V)or, alternatively, as
  2. ΔQ = T·(ΔV/ΔT)·ΔP = dQ = T·(∂V/∂T)P·dP, which is what we’ve got to do here, for our analysis of (∂H/∂P)T.

Hence, dH = dQ + VdP becomes dH = T·(∂V/∂T)P·dP + V·dP, and dividing all by dP gives us what we want to get: dH/dP = (∂H/∂P)= T·(∂V/∂T)+ V.

[…] Well… NO! We don’t have the minus sign in front of T·(∂V/∂T)P, so we must have done something wrong or, else, that formula above is wrong.

The formula is right (it’s in Wikipedia, so it must be right :-)), so we are wrong. Indeed! The thing is: substituting dT, dV and dP for ΔT, ΔV and ΔP is somewhat tricky. The geometric analysis (illustrated below) makes sense but we need to watch the signs.

Carnot 2

We’ve got a volume increase, a temperature drop and, hence, also a pressure drop over the cycle: the volume goes from V to V+ΔV (and then back to V, of course), while the pressure and the temperature go from P to P–ΔP and T to T–ΔT respectively (and then back to P and T, of course). Hence, we should write: ΔV = dV, –ΔT = dT, and –ΔP = dP. Therefore, as we replace the ratio of the infinitesimal change of pressure and temperature, ΔP/ΔT, by a proper derivative (i.e. ∂P/∂T), we should add a minus sign: ΔP/ΔT = –∂P/∂T. Now that gives us what we want: dH/dP = (∂H/∂P)= –T·(∂V/∂T)+ V, and, therefore, we can, indeed, write what we wrote above:

dU = (∂Q/∂T)VdT + [T(∂P/∂T)V − P]dV becomes:

dH = (∂Q/∂T)PdT + [–T·(∂V/∂T)P + V]dP = CPdT + [–T·(∂V/∂T)P + V]dP

Now, in case you still wonder: what’s the use of all these different expressions stating the same? The answer is simple: it depends on the problem and what information we have. Indeed, note that all derivatives we use in our expression for dH expression assume constant pressure, so if we’ve got that kind of data, we’ll use the chemists’ representation of the system. If we’ve got data describing performance at constant volume, we’ll need the physicists’ formulas, which are given in terms of derivatives assuming constant volume. It all looks complicated but, in the end, it’s the same thing: the PV = NkT equation gives us two ‘independent’ variables and one ‘dependent’ variable. Which one is which will determine our approach.

Now, we left one thing unexplained. Why do we refer to γ as the specific heat ratio? The answer is: it is the ratio of the specific heat capacities indeed, so we can write:

γ = CP/CV

However, it is important to note that that’s valid for ideal gases only. In that case, we know that the (∂U/∂V)derivative in our dU = (∂U/∂T)VdT + (∂U/∂V)TdV expression is zero: we can change the volume, but if the temperature remains the same, the internal energy remains the same. Hence, dU = (∂U/∂T)VdT = CVdT, and dU/dT = CV. Likewise, the (∂H/∂P)T derivative in our dH = (∂H/∂T)PdT + (∂H/∂P)TdP expression is zero—for ideal gases, that is. Hence, dH = (∂H/∂T)PdT = CPdT, and dH/dT = CP. Hence,

CP/CV = (dH/dT)/(dU/dT) = dH/dU

Does that make sense? If dH/dU = γ, then H must be some linear function of U. More specifically, H must be some function H = γU + c, with c some constant (it’s the so-called constant of integration). Now, γ is supposed to be constant too, of course. That’s all perfectly fine: indeed, combining the definition of H (H = U + PV), and using the PV = (γ–1)U relation, we have H = U + (γ–1)U = γU (hence, c = 0). So, yes, dH/dU = γ, and γ = CP/CV.

Note the qualifier, however: we’re assuming γ is constant (which does not imply the gas has to be ideal, so the interpretation is less restrictive than you might think it is). If γ is not a constant, it’s a different ballgame. […] So… Is γ actually constant? The illustration below shows γ is not constant for common diatomic gases like hydrogen or (somewhat less common) oxygen. It’s the same for other gases: when mentioning γ, we need to state the temperate at which we measured it too. :-(  However, the illustration also shows the assumption of γ being constant holds fairly well if temperature varies only slightly (like plus or minus 100° C), so that’s OK. :-)

Heat ratio

I told you so: the kinetic gas theory is not quite accurate. An approach involving molecular energy states works much better (and is actually correct, as it’s consistent with quantum theory). But so we are where we are and I’ll save the quantum-theoretical approach for later. :-)

So… What’s left? Well… If you’d google the Wikipedia article on enthalphy in order to check if I am not writing nonsense, you’ll find it gives γ as the ratio of H and U itself: γ = H/U. That’s not wrong, obviously (γ = H/U = γU/U = γ), but that formula doesn’t really explain why γ is referred to as the specific heat ratio, which is what I wanted to do here.

OK. We’ve covered a lot of ground, but let’s reflect some more. We did not say a lot about entropy, and/or the relation between energy and entropy. Too bad… The relationship between entropy and energy is obviously not so simple as between enthalpy and energy. Indeed, because of that easy H = γU relationship, enthalpy emerges as just some auxiliary variable: some temporary variable we need to calculate something. Entropy is, obviously, something different. Unlike enthalpy, entropy involves very complicated thinking, involving (ir)reversibility and all that. So it’s quite deep, I’d say – but I’ll write more about that later. I think this post has gone as far as it should. :-)

Entropy, energy and enthalpy

Is gas a reversible engine?

We’ve worked on very complicated matters in the previous posts. In this post, I am going to tie up a few loose ends, not only about the question in the title but also other things. Let me first review a few concepts and constructs.


We’ve talked a lot about temperature, but what it is really? You have an answer ready of course: it is the mean kinetic energy of the molecules of a gas or a substance. You’re right. To be precise, it is the mean kinetic energy of the center-of-mass (CM) motions of the gas molecules.

The added precision in the definition above already points out temperature is not just mean kinetic energy or, to put it differently, that the concept of mean kinetic energy itself is not so simple when we are not talking ideal gases. So let’s be precise indeed. First, let me jot down the formula for the mean kinetic energy of the CM motions of the gas particles:

(K.E.)CM = <(1/2)·mv2>

Now let’s recall the most fundamental law in the kinetic theory of gases, which states that the mean value of the kinetic energy for each independent direction of motion will be equal to kT/2. [I know you know the kinetic theory of gases itself is not accurate – we should be talking about molecular energy states – but let’s go along with it.] Now, because we have only three independent directions of motions (the x, y and z directions) for ideal gas molecules (or atoms, I should say), the mean kinetic energy of the gas particles is kT/2 + kT/2 + kT/2 = 3kT/2.

What’s going on here is that we are actually defining temperature here: we basically say that the kinetic energy is linearly proportional to something that we define as the temperature. For practical reasons, that constant of proportionality is written as 3k/2, with k the Boltzmann constant. So we write our definition of temperature as:

(K.E.)CM = 3kT/2 ⇔ T = (3k/2)–1<(1/2)·mv2> = [2/(3k)]·(K.E.)CM

What happens with temperature when considering more complex gases, such as diatomic gases? Nothing. The temperature will still be proportional to the kinetic energy of the center-of-mass motions, but we should just note it’s the (K.E.)CM of the whole diatomic molecule, not of the individual atoms. The thing with more complicated arrangements is that, when adding or removing heat, we’ve got something else going on too: part of the energy will go into the rotational and vibratory motions inside the molecule, which is why we’ll need to add a lot more heat in order to achieve the same change in temperature or, vice versa, we’ll be able to extract a lot more heat out of the gas – as compared to an ideal gas, that is – for the same drop in temperature. [When talking molecular energy states, rather than independent directions of motions, we’re saying the same thing: energy does not only go in center-of-mass motion but somewhere else too.]

You know the ideal gas law is based on the reasoning above and the PV = NkT equation, which is always valid. For ideal gases, we write:

PV = NkT = Nk(3k/2)–1<(1/2)mv2> = (2/3)N<(1/2)·mv2> = (2/3)·U

For diatomic gases, we have to use another coefficient. According to our theory above, which distinguishes 6 independent directions of motions, the mean kinetic energy is twice 3kT/2 now, so that’s 3kT, and, hence, we write: T = (3k)–1<K.E.> =

PV = NkT = Nk(3k)–1<K.E.> = (1/3)·U

The two equations above will usually be written as PV = (γ–1)U, so γ, which is referred to as the specific heat ratio, would be equal 5/3 ≈ 1.67 for ideal gases and 4/3 ≈ 1.33 for diatomic gases. [If you read my previous posts, you’ll note I used 9/7 ≈ 1.286, but that’s because Feynman suddenly decides to add the potential energy of the oscillator as another ‘independent direction of motion’.]

Now, if we’re not adding or removing heat to/from the gas, we can do a differential analysis yielding a differential equation (what did you expect?), which we can then integrate to find that P = C/Vγ relationship. You’ve surely seen it before. The C is some constant related to the energy and/or the state of the gas. It is actually interesting to plot the pressure-volume relationship using that P = C/Vγ relationship for various values of γ. The blue graph below assumes γ = 5/3 ≈ 1.667, which is the theoretical value for ideal gases (γ for helium or krypton comes pretty close to that), while the red graph gives the same relationship for γ = 4/3 ≈ 1.33, which is the theoretical value for diatomic gases (gases like bromine and iodine have a γ that’s close to that).

graph 1

Let me repeat that this P = C/Vγ relationship is only valid for adiabatic expansion or compression: we do not add or remove heat and, hence, this P = C/Vγ function gives us the adiabatic segments only in a Carnot cycle (i.e. the adiabatic lines in a pressure-volume diagram). Now, it is interesting to observe that the slope of the adiabatic line for the ideal gas is more negative than the slope of the adiabatic line for the diatomic gas: the blue curve is the steeper one. That’s logical: for the same volume change, we should get a bigger drop in pressure for the ideal gas, as compared to the diatomic gas, because… Well… You see the logic, don’t you?

Let’s freewheel a bit and see what it implies for our Carnot cycle.

Carnot engines with ideal and non-ideal gas

We know that, if we could build an ideal frictionless gas engine (using a cylinder with a piston or whatever other device we can think of), its efficiency will be determined by the amount of work it can do over a so-called Carnot cycle, which consists of four steps: (1) isothermal expansion (gas absorbs heat and the volume expands at constant temperature), (2) adiabatic expansion (the volume expands while the temperature drops), (3) isothermal compression (the volume decreases at constant temperature, so heat is taken out), and (4) isothermal compression (the volume decreases as we bring the gas back to the same temperature).

Capture Carnot cycle graph

It is important to note that work is being done, by the gas on its surroundings, or by the surroundings on the gas, during each step of the cycle: work is being done by the gas as it expands, always, and work is done on the gas as it is being compressed, always.

You also know that there is only one Carnot efficiency, which is defined as the ratio of (a) the net amount of work we get out of our machine in one such cycle, which we’ll denote by W, and (b) the amount of heat we have to put in to get it (Q1). We’ve also shown that W is equal to the difference between the heat we put during the first step (isothermal expansion) and the heat that’s taken out in the third step (isothermal compression): W = Q1 − Q2, which basically means that all heat is converted into useful work—which is why it’s an efficient engine! We also know that the formula for the efficiency is given by:

W/Q1 = (T1 − T2)/T1.

Where’s Q2 in this formula? It’s there, implicitly, as the efficiency of the engine depends on T2. In fact, that’s the crux of the matter: for efficient engines, we also have the same Q1/T= Q2/Tratio, which we define as the entropy S = Q1/T= Q2/T2. We’ll come back to this.

Now how does it work for non-ideal gases? Can we build an equally efficient engine with actual gases? This was, in fact, Carnot’s original question, and we haven’t really answered it in our previous posts, because we weren’t quite ready for it. Let’s consider the various elements to the answer:

  1. Because we defined temperature the way we defined it, it is obvious that the gas law PV = NkT still holds for diatomic gases, or whatever gas (such as steam vapor, for example, the stuff which was used in Carnot’s time). Hence, the isothermal lines in our pressure-volume diagrams don’t change. For a given temperature T, we’ll have the same green and red isothermal line in the diagram above.
  2. However, the adiabatic lines (i.e .the blue and purple lines in the diagram above) for the non-ideal gas are much flatter than the one for an ideal gas. Now, just take that diagram and draw two flatter curves through point a and c indeed—but not as flat as the isothermal segments, of course! What you’ll notice is that the area of useful work becomes much smaller.

What does that imply in terms of efficiency? Well… Also consider the areas under the graph which, as you know, represent the amount of work done during each step (and you really need to draw the graph here, otherwise you won’t be able to follow my argument):

  1. The phase of isothermal expansion will be associated with a smaller volume change, because our adiabatic line for the diatomic gas intersects the T = T1 isothermal line at a smaller value for V. Hence, less work is being done during that stage.
  2. However, more work will be done during adiabatic expansion, and the associated volume change is also larger.
  3. The isothermal compression phase is also associated with a smaller volume change, because our adiabatic line for the diatomic gas intersects the T = T2 isothermal line at a larger value for V.
  4. Finally, adiabatic compression requires more work to be done to get from T2 to Tagain, and the associated volume change is also larger.

The net result is clear from the graph: the net amount of work that’s being done over the complete cycle is less for our non-ideal gas than as compared to our engine working with ideal gas. But, again, the question here is what it implies in terms of efficiency? What about the W/Q1 ratio?

The problem is that we cannot see how much heat is being put in (Q1) and how much heat is being taken out (Q2) from the graph. The only thing we know is that we have an engine working here between the same temperature T1 to T2. Hence, if we use subscript A for the ideal gas engine and subscript B for the one working with ordinary (i.e. non-ideal) gas, and if both engines are to have the same efficiency W/Q= WB/Q1= WA/Q1A, then it’s obvious that,

if W> WB, then Q1A > Q1B.

Is that consistent with what we wrote above for each of the four steps? It is. Heat energy is taken in during the first step only, as the gas expands isothermally. Now, because the temperature stays the same, there is no change in internal energy, and that includes no change in the internal vibrational and rotational energy. All of the heat energy is converted into work. Now, because the volume change is less, the work will be less and, hence, the heat that’s taken in must also be less. The same goes for the heat that’s being taken out during the third step, i.e. the isothermal compression stage: we’ve got a smaller volume change here and, hence, the surroundings of the gas do less work, and a lesser amount of heat energy is taken out.

So what’s the grand conclusion? It’s that we can build an ideal gas engine working between the same temperature T1 and T1, and with exactly the same efficiency and W/Q1 = (T1 − T2)/Tusing non-ideal gas. Of course, there must be some difference! You’re right: there is. While the ordinary gas machine will be as efficient as the ideal gas machine, it will not do the same amount of work. The key to understanding this is to remember that efficiency is a ratio, not some absolute number.  Let’s go through it. Because their efficiency is the same, we know that the W/Q1 ratios for both engines (A and B) is the same and, hence, we can write:


What about the entropy? The entropy S = Q1A/T1 = Q2A/T2 is not the same for both machines. For example, if the engine with ideal gas (A) does twice the work of the engine with ordinary gas (B), then Q1A will also be twice the amount Q1B. Indeed, SA = Q1A /T1 and SB = Q1B/T1. Hence, SA/SB = Q1A/Q1B. For example, if Q1A = 2·Q1B, then engine A’s entropy will also be twice that of engine B. [Now that we’re here, I should also note you’ll have the same ratio for Q2A. Indeed, we know that, for an efficient machine, we have: Q1/T= Q2/T2. Hence, Q1A/Q2A = T1/T2 and Q1B/Q2B = T1/T2. So Q1A/Q2= Q1B/Q2and, therefore, So Q1A/Q1= Q2A/Q2B.]

Why would the entropy be any different? We’ve got the same number of particles, the same volume and the same working temperatures, and so the only difference is that the particles in engine B are diatomic: the molecules consist of two atoms, rather than one only. An intuitive answer to the question as to why the entropy is different can be given by comparing it to another example, which I mentioned in a previous post, for which the entropy is also different fro some non-obvious reason. Indeed, we can think of the two atoms as the equivalent of the white and black particles in the box (see my previous post on entropy): if we allow the white and black particles to mix in the same volume, rather than separate them in two compartments, then the entropy goes up (we calculated the increase as equal to k·ln2). Likewise, the entropy is much lower if all particles have to come in pairs, which is the case for a diatomic gas. Indeed, if they have to come in pairs, we significantly reduce the number of ways all particles can be arranged, or the ‘disorder’, so to say. As the entropy is a measure of that number (one can loosely define entropy as the logarithm of the number of ways), the entropy must go down as well. Can we illustrate that using the ΔS = Nkln(V2/V1) formula we introduced in our previous post, or our more general S(V, T) = Nk[lnV + (1/γ-1)lnT] + a formula? Maybe. Let’s give it a try.

We know that our diatomic molecules have an average kinetic energy equal to 3kT/2. Well… Sorry. I should be precise: that’s the kinetic energy of their center-of-mass motion only! Now, let us suppose all our diatomic molecules spit up. We know the average kinetic energy of the constituent parts will also equal 3kT/2. Indeed, if a gas molecule consists of two atoms (let’s just call them atom A and B respectively), and if their combined mass is M = mA + mB, we know that:

<mAvA2/2> = <mBvB2/2> = <MvCM2/2> = 3kT/2

Hence, if they split, we’ll have twice the number of particles (2N) in the same volume with the same average kinetic energy: 3kT/2. Hence, we double the energy, but the average kinetic energy of the particles is the same, so the temperature should be the same. Hmm… You already feel something is wrong here… What about the energy that we associated with the internal motions within the molecule, i.e. the internal rotational and vibratory motions of the atoms, when they were still part of the same molecule? That was also equal to 3kT/2, wasn’t it? It was. Yes. In case you forgot why, let me remind you: the total energy is the sum of the (average) kinetic energy of the two atoms, so that’s <mAvA2/2> + <mBvB2/2> = 3kT/2 + 3kT/2 = 3kT. Now, that sum is also equal to the sum of the center-of-mass motion (which is 3 kT/2) and the average kinetic energy of the rotational and vibratory motions. Hence, the average kinetic energy of the rotational and vibratory motions is 3kT – 3 kT/2 = 3 kT/2. It’s all part of the same theorem: the average kinetic energy for each independent direction of motion is kT/2, and the number of degrees of freedom for a molecule consisting of r atoms is 3, because each atom can move in three directions. Rotation involves another two independent motions (in three dimensions, we’ve got two axes of rotation only), and vibration another one. So the kinetic energy going into rotation is kT/2 + kT/2 = kT and for vibration it’s kT/2. Adding all yields 3kT/2 + kT + kT/2 = 3kT.

The arithmetic is quite tricky. Indeed, you may think that, if we split the molecule, that the rotational and vibratory energy has to go somewhere, and that it is only natural to assume that, when we spit the diatomic molecule, the individual atoms have to absorb it. Hence, you may think that the temperature of the gas will be higher. How much higher? We had an average energy of 3kT per molecule in the diatomic situation, but so now we have twice as many particles, and hence, the average energy per particle now is… Re-read what I wrote above: it’s just 3kT/2 again. The energy that’s associated with the center-of-mass motions and the rotational and vibratory motions is not something extra: it’s part of the average kinetic energy of the atoms themselves. So no rise in temperature!

Having said that, our PV = NkT = (2/3)U equation obviously doesn’t make any sense anymore, as we’ve got twice as many particles now. While the temperature has not gone up, both the internal energy and the pressure have doubled, as we’ve got twice as many particles hitting the walls of our cylinder now. To restore the pressure to its ex ante value, we need to increase the volume. Remember, however, that pressure is force per unit surface area, not per volume unit: P = F/A. So we don’t have to double the volume: we only have to double the surface area. Now, it all depends on the shape of the volume: are we thinking of a box or of some sphere? One thing we know though: if we calculate the volume using some radius r, which may also be the length of the edge of a cube, then we know the volume is going to be proportional to r3, while the surface area is going to be proportional to r2. Hence, the ratio between the surface area and the volume is going to be proportional to r2/r3 = r2/3. So that’s another 2/3 ratio which pops us here, as an exponent this time. It’s not a coincidence, obviously.

Hmm… Interesting exercise. I’ll let you work it out. I am sure you’ll find some sensible value for the new volume, so you should able to use that ΔS = Nkln(V2/V1) formula. However, you also need to think about the comparability of the two situations. We wanted to compare two equal volumes with an equal number of particles (diatomic molecules versus atoms), and so you’ll need to move back in that direction to get a final answer to your question. Please do mail me the answer: I hope it makes sense. :-)

Inefficient engines

When trying to understand efficient engines, it’s interesting to also imagine how inefficient engines work, so as to see what they imply for our Carnot diagram. Suppose we’ve tried to build a Carnot engine in our kitchen, and we end up with one that is fairly frictionless, and fairly well isolated, so there is little heat loss during the heat transfer steps. We also have good contact surfaces so we think the the heat transfer processes will also be fairly frictionless, so to speak. So we did our calculations and built the engine using the best kitchen design and engineering practices. Now it’s the time for the test. Will it work?

What might happen is the following: while we’ve designed the engine to get some net amount of work out of it (in each and every cycle) that is given by the isothermic and adiabatic lines below, we may find that we’re not able to keep the temperature constant. So we try to follow the green isothermic line alright, but we can’t. We may also find that, when our heat counter tells us we’ve put Q1 in already, that our piston hasn’t moved out quite as far we thought it would. So… Damn, we’re never going to get to c. What’s the reason? Some heat loss, because our isolation wasn’t perfect, and friction.

Inefficient engine

So we’re likely to have followed an actual path that’s closer to the red arrow, which brings us near point d. So we’ve missed point c. We have no choice, however: the temperature has dropped to T2 and, hence, we need to start with the next step. Which one? The second? The third? It’s not quite clear, because our actual path on the pressure-volume diagram doesn’t follow any of our ideal isothermal or adiabatic lines. What to do? Let’s just take some heat out and start compressing to see what happens. If we’ve followed a path like the red arrow, we’re likely to be on something like the black arrow now. Indeed, if we’ve got a problem with friction or heat loss, we’ll continue to have that problem, and so the temperature will drop much faster than we think it should, and so we will not have the expected volume decrease. In fact, we’re not able to maintain the temperature even at T2. What horror! We can’t repeat our process and, hence, it is surely not reversible! All our work for nothing! We have to start all over and re-examine our design.

So our kitchen machine goes nowhere. But then how do actual engines work? The answer is: they put much more heat in, and they also take much more heat out. More importantly, they’re also working much below the theoretical efficiency of an ideal engine, just like our kitchen machine. So that’s why we’ve got the valves and all that in a steam engine. Also note that a car engine works entirely different: it converts chemical energy into heat energy by burning fuel inside of the cylinder. Do we get any useful work out? Of course! My Lamborghini is fantastic. :-) Is it efficient? Nope. We’re converting huge amounts of heat energy into a very limited amount of useful work, i.e. the type of energy we need to drive the wheels of my car, or a dynamo. Actual engines are a shadow only of ideal engines. So what’s the Carnot cycle really? What does it mean in practice? Does the mathematical model have any relevance at all?

The Carnot cycle revisited

Let’s look at those differential equations once again. [Don’t be scared by the concept of a differential equation. I’ll come back to it. Just keep reading.] Let’s start with the ΔU = (∂U/∂T)ΔT + (∂U/∂V)ΔV equation, which mathematical purists would probably prefer to write as:

dU = (∂U/∂T)dT + (∂U/∂V)dV

I find Feynman’s use of the Δ symbol more appropriate, because, when dividing by dV or dT, we get dU/dV and dU/dt, which makes us think we’re dealing with ordinary derivatives here, and we are not: it’s partial derivatives that matter here. [I’ll illustrate the usefulness of distinguishing the Δ and d symbol in a moment.] Feynman is even more explicit about that as he uses subscripts for the partial derivatives, so he writes the equation above as:

ΔU = (∂U/∂T)VΔT+ (∂U/∂V)TΔV

However, partial derivatives always assume the other variables are kept constant and, hence, the subscript is not needed. It makes the notation rather cumbersome and, hence, I think it makes the analysis even more unreadable than it already is. In any case, it is obvious that we’re looking at a situation in which all changes: the volume, the temperature and the pressure. However, in the PV = NkT equation (which, I repeat, is valid for all gases, ideal or not, and in all situations, be it adiabatic or isothermal expansion or compression), we have only two independent variables for a given number of particles. We can choose: volume and temperature, or pressure and temperature, or volume and pressure. The third variable depends on the two other variables and, hence, is referred to as dependent. Now, one should not attach too much importance to the terms (dependent or independent does not mean more or less fundamental) but, when everything is said and done, we need to make a choice when approaching the problem. In physics, we usually look at the volume and the temperature as the ‘independent’ variables but the partial derivative notation makes it clear it doesn’t matter. With three variables, we’ll have three partial derivatives: ∂P/∂T, ∂V/∂T and ∂P/∂V, and their reciprocals ∂T/∂P, ∂T/∂V and ∂V/∂P too, of course!

Having said that, when calculating the value of derived variables like energy, or entropy, or enthalpy (which is a state variable used in chemistry), we’ll use two out of the three mentioned variables only, because the third one is redundant, so to speak. So we’ll have some formula for the internal energy of a gas that depends on temperature and volume only, so we write:

U = U(V, T)

Now, in physics, one will often only have a so-called differential equation for a variable, i.e. something that is written in terms of differentials and derivatives, so we’ll do that here too. But let me give some other example first. You may or may not remember that we had this differential equation telling us how the density (n = N/V) of the atmosphere changes with the height (h), as a function of the molecular mass (m), the temperature (T) and the density (n) itself: dn/dh = –(mg/kT)·n, with g the gravitational constant and k the Boltzmann constant. Now, it  is not always easy to go from a differential equation to a proper formula, but this one can be solved rather easily. Indeed, a function which has a derivative that is proportional to itself (that’s what this differential equation says really) is an exponential, and the solution was n = n0e–mgh/kT, with n0 some other constant (the density at h = 0, which can be chosen anywhere). This explicit formula for n says that the density goes down exponentially with height, which is what we would expect.

Let’s get back to our gas though. We also have differentials here, which are infinitesimally small changes in variables. As mentioned above, we prefer to write them with a Δ in front (rather than using the symbol)—i.e. we write ΔT, ΔU, ΔU, or ΔQ. When we have two variables only, say x and y, we can use the d symbol itself and, hence, write Δx and Δy as dx and dy. However, it’s still useful to distinguish, in order to write something like this:

Δy = (dy/dx)Δx

This says we can approximate the change in y at some point x when we know the derivative there. For a function in two variables, we can write the same, which is what we did at the very start of this analysis:

ΔU = (∂U/∂T)ΔT + (∂U/∂V)ΔV

Note that the first term assumes constant volume (because of the ∂U/∂T derivative), while the second assumes constant temperature (because of the ∂U/∂V derivative).

Now, we also have a second equation for ΔU, expressed in differentials only (so no partial derivatives here):


This equation basically states that the internal energy of a gas can change because (a) some heat is added or removed or (b) some work is being done by or on the gas as its volume gets bigger or smaller. Note the minus sign in front of PΔV: it’s there to ensure the signs come out alright. For example, when compressing the gas (so ΔV is negative), ΔU = – PΔV will be positive. Conversely, when letting the gas expand (so ΔV is positive), ΔU = – PΔV will be negative, as it should be.

What’s the relation between these two equations? Both are valid, but you should surely not think that, just because we have a ΔV in the second term of each equation, we can write –P = ∂U/∂V. No.

Having said that, let’s look at the first term of the ΔU = (∂U/∂T)ΔT + (∂U/∂V)ΔV equation and analyze it using the ΔU = ΔQ – PΔV equation. We know (∂U/∂T)ΔT assumes we keep the volume constant, so ΔV = 0 and, hence, ΔU = ΔQ: all the heat goes into changing the internal energy; none goes into doing some work. Therefore, we can write:

(∂U/∂T)ΔT = (∂Q/∂T)ΔT = CVΔT

You already know that we’ve got a name for that CV function (remember: a derivative is a function too!): it’s the (specific) heat capacity of the gas (or whatever substance) at constant volume. For ideal gases, CV is some constant but, remember, we’re not limiting ourselves to analyzing ideal gases only here!

So we’re done with the first term in that ΔU = (∂U/∂T)ΔT + (∂U/∂V)ΔV. Now it’s time for the second one: (∂U/∂V)ΔV. Now both ΔQ and –PΔV are relevant: the internal energy changes because (a) some heat is being added and (b) because the volume changes and, hence, some work is being done. You know what we need to find. It’s that weird formula:

∂U/∂V = T(∂P/∂T) – P

But how do we get there? We can visualize what’s going on as a tiny Carnot cycle. So we think of gas as an ideal engine itself: we put some heat in (ΔQ) which gets an isothermal expansion at temperature T going, during a tiny little instant, doing a little bit of work. But then we stop adding heat and, hence, we’ll have some tiny little adiabatic expansion, during which the gas keeps going and also does a tiny amount of work as it pushes against the surrounding gas molecules. However, this step involves an infinitesimally small temperature drop—just a little bit, to T–ΔT. And then the surrounding gas will start pushing back and, hence, we’ve got some isothermal compression going, at temperature T–ΔT, which is then followed, once again, by adiabatic compression as the temperature goes back to T. The last two steps involve the surroundings of the tiny little volume of gas we’re looking at, doing work on the gas, instead of the other way around.

Carnot 2 equivalence

You’ll say this sounds very fishy. It does, but it is Feynman’s analysis, so who am I to doubt it? You’ll ask: where does the heat go, and where does the work go? Indeed, if ΔQ is Q1, what about Q2? Also, we can sort of imagine that the gas can sort of store the energy of the work that’s being done during step 1 and 2, to then give (most of it) back during step 3 and 4, but what about the net work that’s being done in this cycle, which is (see the diagram) equal to W = Q1 – Q2 = ΔPΔV? Where does that go? In some kind of flywheel or something? Obviously not! Hmm… Not sure. In any case, Q1 is infinitesimally small and, hence, nearing zero. Q2 is even smaller, so perhaps we should equate it to zero and just forget about it. As for the net work done by the cycle, perhaps this may just go into moving the gas molecules in the equally tiny volume of gas we’re looking at. Hence, perhaps there’s nothing left to be transferred to the surrounding gas. In short, perhaps we should look at ΔQ as the energy that’s needed to do just one cycle.

Well… No. If gas is an ideal engine, we’re talking elastic collisions and, hence, it’s not like a transient, like something that peters out. The energy has to go somewhere—and it will. The tiny little volume we’re looking at will come back to its original state, as it should, because we’re looking at (∂U/∂V)ΔV, which implies we’re doing an analysis at constant temperature, but the energy we put in has got to go somewhere: even if Q2 is zero, and all of ΔQ goes into work, it’s still energy that has to go somewhere!

It does go somewhere, of course! It goes into the internal energy of the gas we’re looking at. It adds to the kinetic energy of the surrounding gas molecules. The thing is: when doing such infinitesimal analysis, it becomes difficult to imagine the physics behind. All is blurred. Indeed, if we’re talking a very small volume of gas, we’re talking a limited number of particles also and, hence, these particles doing work on other gas particles, or these particles getting warmer or colder as they collide with the surrounding body of gas, it all becomes more or less the same. To put it simply: they’re more likely to follow the direction of the red and black arrows in our diagram above. So, yes, the theoretical analysis is what it is: a mathematical idealization, and so we shouldn’t think that’s what actually going on in a gas—even if Feynman tries to think of it in that way. So, yes, I agree with some critics, but to a very limited extent only, who say that Feynman’s Lectures on thermodynamics aren’t the best in the Volume: it may be simpler to just derive the equation we need from some Hamiltonian or whatever other mathematical relationship involving state variables like entropy or what have you. However, I do appreciate Feynman’s attempt to connect the math with the physics, which is what he’s doing here. If anything, it’s sure got me thinking!

In any case, we need to get on with the analysis, so let’s wrap it up. We know the net amount of work that’s being done is equal to W = Q1(T1 – T2)/ T1 = ΔQ(ΔT/T). So that’s equal to ΔPΔV and, hence, we can write:

net work done by the gas = ΔPΔV = ΔQ(ΔT/T)

This implies ΔQ = T(ΔP/ΔT)ΔV. Now, looking at the diagram, we can appreciate ΔP/ΔT is equal to ∂P/∂T (ΔP is the change in pressure at constant volume). Hence, ΔQ = T(∂P/∂T)ΔV. Now we have to add the work, so that’s −PΔV. We get:

ΔU = ΔQ − PΔV = T(∂P/∂T)ΔV − PΔV ⇔ ΔU/ΔV = ∂U/∂V = T(∂P/∂T) − P

So… We are where we wanted to be. :-) It’s a rather surprising analysis, though. Is the Q2 = 0 assumption essential? It is, as part of the analysis of the analysis of the second term in the ΔU = (∂U/∂T)ΔT + (∂U/∂V)ΔV expression, that is. Make no mistake: the W = Q1(T1−T2)/ T1 = ΔQ(ΔT/T) formula is valid, always, and the Q2 is taken into account in it implicitly, because of the ΔT (which is defined using T2). However, if Q2 would not be zero, it would add to the internal energy without doing any work and, as such, it would be part of the first term in the ΔU = (∂U/∂T)ΔT + (∂U/∂V)ΔV expression: we’d have heat that is not changing the volume (and, hence, that is not doing any work) but that’s just… Well… Heat that’s just adding heat to the gas. :-)

To wrap everything up, let me jot down the whole thing now:

ΔU = (∂Q/∂T)·ΔT + [T(∂P/∂T) − P]·ΔV

Now, strangely enough, while we started off saying the second term in our ΔU expression assumed constant temperature (because of the ∂U/∂V derivative), we now re-write that second term using the ∂P/∂T derivative, which assumes constant volume! Now, our first term assumes constant volume too, and so we end up with an expression which assumes constant volume throughout! At the same time, we do have that ΔV factor of course, which implies we do not really assume volume is constant. On the contrary: the question we started off with was about how the internal energy changes with temperature and volume. Hence, the assumptions of constant temperature and volume only concern the partial derivatives that we are using to calculate that change!

Now, as for the model itself, let me repeat: when doing such analysis, it is very difficult to imagine the physics behind. All is blurred. When talking infinitesimally small volumes of gas, one cannot really distinguish between particles doing work on other gas particles, or these particles getting warmer or colder as they collide with them. It’s all the same. So, in reality, the actual paths are more like the red and black arrows in our diagram above. Even for larger volumes of gas, we’ve got a problem: one volume of gas is not thermally isolated from another and, hence, ideal gas is not some Carnot engine. A Carnot engine is this theoretical construct, which assumes we can nicely separate isothermal from adiabatic expansion/compression. In reality, we can’t. Even to get the isothermal expansion started, we need a temperature difference in order to get the energy flow going, which is why the assumption of frictionless heat transfer is so important. But what’s frictionless, and what’s an infinitesimal temperature difference? In the end, it’s a difference, right? So we already have some entropy increase: some heat (let’s say ΔQ) leaves the reservoir, which has temperature T, and enters the cylinder, which has to have a temperature that’s just-a-wee bit lower, let’s say T – ΔT. Hence, the entropy of the reservoir is reduced by ΔQ/T, and the entropy of the cylinder is increased by ΔQ/(T – ΔT). Hence, ΔS = ΔQ/(T–ΔT) –  ΔQ/T = ΔQΔT/[T(T–ΔT)].

You’ll say: sure, but then the temperature in the cylinder must go up to T and… No. Why? We don’t have any information on the volume of the cylinder here. We should also involve the time derivatives, so we should start asking questions like: how much power goes into the cylinder, so what’s the energy exchange per unit time here? The analysis will become endlessly more complicated of course – it may have played a role in Sadi Carnot suffering from “mania” and “general delirium” when he got older :-) – but you should arrive at the same conclusion: when everything is said and done, the model is what it is, and that’s a mathematical model of some ideal engine – i.e. an idea of a device we don’t find in Nature, and which we’ll never be able to actually build – that shows how we could, potentially, get some energy out of a gas when using some device build to do just that. As mentioned above, thinking in terms of actual engines – like steam engines or, worse, combustion engines – does not help. Not at all really: just try to understand the Carnot cycle as it’s being presented, and that’s usual a mathematical presentation, which is why textbooks always remind the reader to not take the cylinder and piston thing too literally.

Let me note one more thing. Apart from the heat or energy loss question, there’s another unanswered question: from what source do we take the energy to move our cylinder from one heat reservoir to the other? We may imagine it all happens in space so there’s no gravity and all that (so we do not really have to spend some force just holding it) but even then: we have to move it from one place to another, and so that involves some acceleration and deceleration and, hence, some force times a distance. In short, the conclusion is all the same: the reversible Carnot cycle does not really exist and entropy increases, always.

With this, you should be able to solve some practical problems, which should help you to get the logic of it all. Let’s start with one.

Feynman’s rubber band engine

Feynman’s rubber band engine shows the model is quite general indeed, so it’s not limited to some Carnot engine only. A rubber band engine? Yes. When we heat a rubber band, it does not expand: it contracts, as shown below.

rubber band engine

Why? It’s not very intuitive: heating a metal bar makes it longer, not shorter. It’s got to do with the fact that rubber consists of an enormous tangle of long chains of molecules: think of molecular spaghetti. But don’t worry about the details: just accept we could build an engine using the fact, as shown above. It’s not a very efficient machine (Feynman thinks he’d need heating lamps delivering 400 watts of power to lift a fly with it), but let’s apply our thermodynamic relations:

  1. When we heat the rubber band, it will pull itself in, thereby doing some work. We can write that amount of work as FD So that’s like -PΔV in our ΔU = ΔQ – PΔV equation, but not that F has a direction that’s opposite to the direction of the pressure, so we don’t have the minus sign.
  2. So here we can write: ΔU = ΔQ + FΔL.

So what? Well… We can re-write all of our gas equations by substituting –F for P and L for V, and they’ll apply! For example, when analyzing that infinitesimal Carnot cycle above, we found that ΔQ = T(∂P/∂T)ΔV, with ΔQ the heat that’s needed to change the volume by ΔV at constant temperature. So now we can use the above-mentioned substitution (P becomes –F and V becomes L) to calculate the heat that’s needed to change the length of the rubber band by ΔL at constant temperature: it is equal to ΔQ = –T(∂F/∂T)ΔL. The result may not be what we like (if we want the length to change significantly, we’re likely to need a lot of heat and, hence, we’re likely to end up melting the rubber), but it is what it is. :-)

As Feynman notes: the power of these thermodynamic equations is that we can apply them to very different situations than gas. Another example is a reversible electric cell, like a rechargeable storage battery. Having said that, the assumption that these devices are all efficient is a rather theoretical one and, hence, that constrains the usefulness of our equations significantly. Having said that, engineers have to start somewhere, and the efficient Carnot cycle is the obvious point of departure. It is also a theoretical reference point to calculate actual efficiencies of actual engines, of course.

Post scriptum: Thermodynamic temperature

Let me quickly say something about an alternative definition of temperature: it’s what Feynman refers to as the thermodynamic definition. It’s an equivalent to the kinetic definition really, but let me quickly show why. As we think about efficient engines, it would be good to have some reference temperature T2, so we can drop the subscripts and have our engines run between T and that reference temperature, which we’ll simply call ‘one degree’ (1°). The amount of heat that an ideal engine will deliver at that reference temperature is denoted by QS, so we can drop the subscript for Q1 and denote it, quite simply, as Q.

We’ve defined entropy as S = Q/T, so Q = ST and QS = S·1°. So what? Nothing much. Just note we can use the S = Q/T and QS = S×1° equations to define temperature in terms of entropy. This definition is referred to as the thermodynamic definition, and it is fully equivalent with our kinetic energy. It’s just a different approach. Feynman makes kind of a big deal out of this but, frankly, there’s nothing more to it.

Just note that the definition also works for our ideal engine with non-ideal gas: the amounts of heat involved for the engine with non-ideal gas, i.e. Q and QS, will be proportionally less than the Q and QS amounts for the reversible engine with ideal gas. Remember that Q1A/Q1= Q2A/Q2B equation, in case you’d have doubts.] Hence, we do not get some other thermodynamical temperature! All makes sense again, as it should! :-)

Is gas a reversible engine?

The Ideal versus the Actual Gas Law

In previous posts, we referred, repeatedly, to the so-called ideal gas law, for which we have various expressions. The expression we derived from analyzing the kinetics involved when individual gas particles (atoms or molecules) move and collide was P·V = N·k·T, in which the variables are P (pressure), V (volume), N (the number of particles in the given volume), T (temperature) and k (the Boltzmann constant). We also wrote it as P·V = (2/3)·U, in which U represents the total energy, i.e. the sum of the energies of all gas particles. We also said the P·V = (2/3)·U formula was only valid for monatomic gases, in which case U is the kinetic energy of the center-of-mass motion of the atoms.

In order to provide some more generality, the equation is often written as P·V = (γ–1)·U. Hence, for monatomic gases, we have γ = 5/3. For a diatomic gas, we’ll also have vibrational and rotational kinetic energy. As we pointed out in a previous post, each independent direction of motion, i.e. each degree of freedom in the system, will absorb an amount of energy equal to k·T/2. For monatomic gases, we have three independent directions of motion (x, y, z) and, hence, the total energy U = 3·k·T/2 = (2/3)·U.

Finally, when we’re considering adiabatic expansion/compression only – so when we do not add or remove any heat to/from to the gas – we can also write the ideal gas law as PVγ = C, with C some constant. [It is importqnt to note that this PVγ = C relation can be derived from the more general P·V = (γ–1)·U expression, but that the two expressions are not equivalent. Please have a look at the P.S. to this post on this, which shows how we get that PVγ = constant expression, and talks a bit about its meaning.]

So what’s the gas law for diatomic gas, like O2, i.e. oxygen? The key to the analysis of diatomic gases is, basically, a model which represents the oxygen molecule as two atoms connected by a spring, but with a force law that’s not as simplistic as Hooke’s law: we’re not looking at some linear force, but a force that’s referred to as a van der Waals force. The image below gives a vague idea of what that might imply. Remember: when moving an object in a force field, we change its potential energy, and the work done, as we move with or against the force, is equal to the change in potential energy. The graph below shows the force is anything but linear.

randomThe illustration above is a graph of potential energy for two molecules, but we can also apply it for the ‘spring’ model for two atoms within a single molecule. For the detail, I’ll refer you to Feynman’s Lecture on this. It’s not that the full story is too complicated: it’s just too lengthy to reproduce it in this post. Just note the key point of the whole story: one arrives at a theoretical value for γ that is equal to γ = 9/7 ≈ 1.286Wonderful! Yes. Except for the fact that value does not correspond to what is measured in reality: the experimentally confirmed value for γ for oxygen (O2) is about 1.40.

What about other gases? When measuring the value for other diatomic gases, like iodine (I2) or bromine (Br2), we get a value closer to the theoretical value (1.30 and 1.32 respectively) but, still, there’s a variation to be explained here. The value for hydrogen H2 is about 1.4, so that’s like oxygen again. For other gases, we again get different values. Why? What’s the problem?

It cannot be explained using classical theory. In addition, doing the measurements for oxygen and hydrogen at various temperatures also reveals that γ is a function of temperature, as shown below. Now that’s another experimental fact that does not line up with our kinetic theory of gases!

Heat ratioReality is right, always. Hence, our theory must be wrong. Our analysis of the independent direction of motions inside of a molecule doesn’t work—even for the simple case of a diatomic molecule. Great minds such as James Clerk Maxwell couldn’t solve the puzzle in the 19th century and, hence, had to admit classical theory was in trouble. Indeed, popular belief has it that the black-body radiation problem was the only thing classical theory couldn’t explain in the late 19th century but that’s not true: there were many more problems keeping physicists awake. But so we’ve got a problem here. As Feynman writes: “We might try some force law other than a spring but it turns out that anything else will only make γ higher. If we include more forms of energy, γ approaches unity more closely, contradicting the facts. All the classical theoretical things that one can think of will only make it worse. The fact is that there are electrons in each atom, and we know from their spectra that there are internal motions; each of the electrons should have at least kT/2 of kinetic energy, and something for the potential energy, so when these are added in, γ gets still smaller. It is ridiculous. It is wrong.

So what’s the answer? The answer is to be found in quantum mechanics. Indeed, one can develop a model distinguishing various molecular states with various energy levels E0, E1, E2,…, Ei,…, and then associate a probability distribution which gives us the probability of finding a molecule in a particular state. Some more assumptions, all quite similar to the assumptions used by Planck when he solved the black-body radiation problem, then give us what we want: to put it simply, it is like some of the motions ‘freeze out’ at lower temperatures. As a result, γ goes up as we go down in temperature.

Hence, quantum mechanics saves the day, again. However, that’s not what I want to write about here. What I want to do here is to give you an equation for the internal energy of a gas which is based on what we can actually measure, so that’s pressure, volume and temperature. I’ll refer to it as the Actual Gas Law, because it takes into account that γ is not some fixed value (so it’s not some natural constant, like Planck’s or Boltzmann’s constant), and it also takes into account that we’re not always gas—ideal or actual gas—but also liquids and solids.

Now, we have many inter-connected variables here, and so the analysis is quite complicated. In fact, it’s a great opportunity to learn more about partial derivatives and how we can use them. So the lesson is as much about math as it about physics. In fact, it’s probably more about math. :-) Let’s see what we can make out of it.

Energy, work, force, pressure and volume

First, I should remind you that work is something that is done by a force on some object in the direction of the displacement of that object. Hence, work is force times distance. Now, because the force may actually vary as our object is being displaced and while the work is being done, we represent work as a line integral:

W = ∫F·ds

We write F and s in bold-face and, hence, we’ve got a vector dot product here, which ensures we only consider the component of the force in the direction of the displacement: F·Δ= |F|·|Δs|·cosθ, with θ the angle between the force and the displacement.

As for the relationship between energy and work, you know that one: as we do work on an object, we change its energy, and that’s what we are looking at here: the (internal) energy of our substance. Indeed, when we have a volume of gas exerting pressure, it’s the same thing: some force is involved (pressure is the force per unit area, so we write: P = F/A) and, using the model of the box with the frictionless piston (illustrated below), we write:

dW = F(–dx) = – PAdx = – PdV


The dW = – PdV formula is the one we use when looking at infinitesimal changes. When going through the full thing, we should integrate, as the volume (and the pressure) changes over the trajectory, so we write:

W = ∫PdV

Now, it is very important to note that the formulas above (dW = – PdV and W = ∫PdV) are always valid. Always? Yes. We don’t care whether or not the compression (or expansion) is adiabatic or isothermal. [To put it differently, we don’t care whether or not heat is added to (or removed from) the gas as it expands (or decreases in volume).] We also don’t keep track of the temperature here. It doesn’t matter. Work is work.

Now, as you know, an integral is some area under a graph so I can rephrase our result as follows: the work that is being done by a gas, as it expands (or the work that we need to put in in order to compress it), is the area under the pressure-volume graph, always.

Of course, as we go through a so-called reversible cycle, getting work out of it, and then putting some work back in, we’ll have some overlapping areas cancelling each other. That’s how we derived the amount of useful (i.e. net) work that can be done by an ideal gas engine (illustrated below) as it goes through a Carnot cycle, taking in some amount of heat Q1 from one reservoir (which is usually referred to as the boiler) and delivering some other amount of heat (Q1) to another reservoir (usually referred to as the condenser). As I don’t want to repeat myself too much, I’ll refer you to one of my previous posts for more details. Hereunder, I just present the diagram once again. If you want to understand anything of what follows, you need to understand it—thoroughly.

Carnot cycle graphIt’s important to note that work is being done in each of the four steps of the cycle, and that the work done by the gas is positive when it expands, and negative when its volume is being reduced. So, let me repeat: the W = ∫PdV formula is valid for both adiabatic as well as isothermal expansion/compression. We just need to be careful about the sign and see in which direction it goes. Having said that, it’s obvious adiabatic and isothermal expansion/compression are two very different things and, hence, their impact on the (internal) energy of the gas is quite different:

  1. Adiabatic compression/expansion assumes that no (external) heat energy (Q) is added or removed and, hence, all the work done goes into changing the internal energy (U). Hence, we can write: W = PΔV = –ΔU and, therefore, ΔU = –PΔV. Of course, adiabatic compression/expansion must involve a change in temperature, as the kinetic energy of the gas molecules is being transferred from/to the piston. Hence, the temperature (which is nothing but the average kinetic energy of the molecules) changes.
  2. In contrast, isothermal compression/expansion (i.e. a volume change without any change in temperature) must involve an exchange of heat energy with the surroundings so to allow the temperature to remain constant. So ΔQ ≠ 0 in this case.

The grand but simple formula capturing all is, obviously:


It says what we’ve said already: the internal energy of a substance (a gas) changes because some work is being done as its volume changes and/or because some heat is added or removed.

Now we have to get serious about partial derivatives, which relate one variable (the so-called ‘dependent’ variable) to another (the ‘independent’ variable). Of course, in reality, all depends on all and, hence, the distinction is quite artificial. Physicists tend to treat temperature and volume as the ‘independent’ variables, while chemists seem to prefer to think in terms of pressure and temperature. In math, it doesn’t matter all that much: we simply take the reciprocal and there you go: dy/dx = 1/(dx/dy). We go from one to another. Well… OK… We’ve got a lot of variables here, so… Yes. You’re right. It’s not going to be that simple, obviously! :-)

Differential analysis

If we have some function f in two variables, x and y, then we can write: Δf = f(x + Δx, y + Δy) –  f(x, y). We can then write the following clever thing:

partial derivativeWhat’s being said here is that we can approximate Δf using the partial derivatives ∂f/∂x and ∂f/∂y. Note that the formula above actually implies that we’re evaluating the (partial) ∂f/∂x derivative at point (x, y+Δy), rather than the point (x, y) itself. It’s a minor detail, but I think it’s good to signal it: this ‘clever thing’ is just pedagogical. [Feynman is the greatest teacher of all times! :-)] The mathematically correct approach is to simply give the formal definition of partial derivatives, and then just get on with it:

Partial derivative definitionNow, let us apply that Δf formula to what we’re interested in, and that’s the change in the (internal) energy U. So we write:

formula 1Now, we can’t do anything with this, in practice, because we cannot directly measure the two partial derivatives. So, while this is an actual gas law (which is what we want), it’s not a practical one, because we can’t use it. :-) Let’s see what we can do about that. We need to find some formula for those partial derivatives. Let’s have a look at the (∂U/∂T)factor first. That factor is defined and referred to as the specific heat capacity at constant volume, and it’s usually denoted by CV. Hence, we write:

CV = specific heat capacity at constant volume = (∂U/∂T)V

Heat capacity? But we’re talking internal energy here? It’s the same. Remember that ΔU = ΔQ – PΔV formula: if we keep the volume constant, then ΔV = 0 and, hence, ΔU = ΔQ. Hence, all of the change in internal energy (and I really mean all of the change) is the heat energy we’re adding or removing from the gas. Hence, we can also write CV in its more usual definitional form:

C= (∂Q/∂T)V

As for its interpretation, you should look at it as a ratio: Cis the amount of heat one must put into (or remove from) a substance in order to change its temperature by one degree with the volume held constant. Note that the term ‘specific heat capacity’ is usually referred to as the ‘specific heat’, as that’s shorter and simpler. However, you can see it’s some kind of ‘capacity’ indeed. More specifically, it’s a capacity of a substance to absorb heat. Now that’s stuff we can actually measure and, hence, we’re done with the first term in that ΔU = ΔT·(∂U/∂T)+ ΔV·(∂U/∂V)expression, which we can now write as:

ΔT·(∂U/∂T)= ΔT·(∂Q/∂T)= ΔT·CV

OK. So we’re done with the first term. Just to make sure we’re on the right track here, let’s have a quick look at the units here: the unit in which we should measure Cis, obviously, joule per degree (Kelvin), i.e. J/K. And then we multiply with ΔT, which is measured in degrees Kelvin, and we get some amount in Joule. Fine. We’re done, indeed. :-)

Let’s look at the second term now, i.e. the ΔV·(∂U/∂V)T term. Now, you may think that we could define CT = (∂U/∂V)as the specific heat capacity at constant temperature because… Well… Hmm… It is the amount of heat one must put into (or remove from) a substance in order to change its volume by one unit with the temperature held constant, isn’t it? So we write CT = (∂U/∂V)T = (∂Q/∂V)T and we’re done here too, aren’t we?


It’s not that simple. Two very different things are happening here. Indeed, the change in (internal) energy ΔU, as the volume changes by ΔV while keeping the temperature constant (we’re looking at that (∂U/∂V)T factor here, and I’ll remind you of that subscript T a couple of times), consists of two parts:

  1. First, the volume is not being kept constant and, hence, the internal energy (U) changes because work is being done.
  2. Second, the internal energy (U) also changes because heat is being put in, so the temperature can be kept constant indeed.

So we cannot simplify. We’re stuck with the full thing: ΔU = ΔQ – PΔV, in which – PΔV is the (infinitesimal amount of) work that’s being done on the substance, and ΔQ is the (infinitesimal amount of) heat that’s being put in. What can we do? How can we relate this to actual measurables?

Now, the logic is quite abstruse, so please be patient and bear with me. The key to the analysis is that diagram of the reversible Carnot cycle, with the shaded area representing the net work that’s being done, except that we’re now talking infinitesimally small changes in volume, temperature and pressure. So we redraw the diagram and get something like this:

Carnot 2Now, you can easily see the equivalence between the shaded area and the ΔPΔV rectangle below:

equivalenceSo the work done by the gas is the shaded area, whose surface is equal to ΔPΔV. […] But… Hey, wait a minute! You should object: we are not talking ideal engines here and, hence, we are not going through a full Carnot cycle, are we? We’re calculating the change in internal energy when the temperature changes with ΔT, the volume changes with ΔV, and the pressure changes with ΔP. Full stop. So we’re not going back to where we came from and, hence, we should not be analyzing this thing using the Carnot cycle, should we? Well… Yes and no. More yes than no. Remember we’re looking at the second term only here: ΔV·(∂U/∂V)T. So we are changing the volume (and, hence, the internal energy) but the subscript in the (∂U/∂V)term makes it clear we’re doing so at constant temperature. In practice, that means we’re looking at a theoretical situation here that assumes a complete and fully reversible cycle indeed. Hence, the conceptual idea is, indeed, that we put some heat in, that the gas does some work as it expands, and that we then are actually putting some work back in to bring the gas back to its original temperature T. So, in short, yes, the reversible cycle idea applies.

[…] I know, it’s very confusing. I am actually struggling with the analysis myself, so don’t be too hard on yourself. Think about it, but don’t lose sleep over it. :-) I added a note on it in the P.S. to this post on it so you can check that out too. However, I need to get back to the analysis itself here. From our discussion of the Carnot cycle and ideal engines, we know that the work done is equal to the difference between the heat that’s being put in and the heat that’s being delivered: W = Q1 – Q2. Now, because we’re talking reversible processes here, we also know that Q1/T1 = Q2/T2. Hence, Q2 = (T 2/T1)Q1 and, therefore, the work done is also equal to W = Q– (T 2/T1)Q1 = Q1(1 – T 2/T1) = Q1[(T– T2)/T1]= Q1(ΔT/T1). Let’s now drop the subscripts by equating Q1 with ΔQ, so we have:

W = ΔQ(ΔT/T)

You should note that ΔQ is not the difference between Q1 and Q2. It is not. ΔQ is the heat we put in as it expands isothermally from volume V to volume V + ΔV. I am explicit about it because the Δ symbol usually denotes some difference between two values. In case you wonder how we can do away with Q2, think about it. […] The answer is that we did not really get away with it: the information is captured in the ΔT factor, as T–ΔT is the final temperature reached by the gas as it expands adiabatically on the second leg of the cycle, and the change in temperature obviously depends on Q2! Again, it’s all quite confusing because we’re looking at infinitesimal changes only, but the analysis is valid. [Again, go through the P.S. of this post if you want more remarks on this, although I am not sure they’re going to help you much. The logic is really very deep.]

[…] OK… I know you’re getting tired, but we’re almost done. Hang in there. So what do we have now? The work done by the gas as it goes through this infinitesimally small cycle is the shaded area in the diagram above, and it is equal to:


From this, it follows that ΔQ = T·ΔV·ΔP/ΔT. Now, you should look at the diagram once again to check what ΔP actually stands for: it’s the change in pressure when the temperature changes at constant volume. Hence, using our partial derivative notation, we write:

ΔP/ΔT = (∂P/∂T)V

We can now write ΔQ = T·ΔV·(∂P/∂T)and, therefore, we can re-write ΔU = ΔQ – PΔV as:

ΔU = T·ΔV·(∂P/∂T)– PΔV

Now, dividing both sides by ΔV, and writing all using the partial derivative notation, we get:

ΔU/ΔV = (∂U/∂V)T = T·(∂P/∂T)– P

So now we know how to calculate the (∂U/∂V)factor, from measurable stuff, in that ΔU = ΔT·(∂U/∂T)+ ΔV·(∂U/∂V)expression, and so we’re done. Let’s write it all out:

ΔU = ΔT·(∂U/∂T)+ ΔV·(∂U/∂V)= ΔT·C+ ΔV·[T·(∂P/∂T)– P]

Phew! That was tough, wasn’t it? It was. Very tough. As far as I am concerned, this is probably the toughest of all I’ve written so far.

Dependent and independent variables 

Let’s pause to take stock of what we’ve done here. The expressions above should make it clear we’re actually treating temperature and volume as the independent variables, and pressure and energy as the dependent variables, or as functions of (other) variables, I should say. Let’s jot down the key equations once more:

  1. ΔU = ΔQ – PΔV
  2. ΔU = ΔT·(∂U/∂T)+ ΔV·(∂U/∂V)
  3. (∂U/∂T)= (∂Q/∂T)V = CV
  4. (∂U/∂V)T = T·(∂P/∂T)– P

It looks like Chinese, doesn’t it? :-) What can we do with this? Plenty. Especially the first equation is really handy for analyzing and solving various practical problems. The second equation is much more difficult and, hence, less practical. But let’s try to apply this equation for actual gases to an ideal gas—just to see if we’re getting our ideal gas law once again. :-) We know that, for an ideal gas, the internal energy depends on temperature, not on V. Indeed, if we change the volume but we keep the temperature constant, the internal energy should be the same, as it only depends on the motion of the molecules and their number. Hence, (∂U/∂V)must equal zero and, hence, T·(∂P/∂T)– P = 0. Replacing the partial derivative with an ordinary one (not forgetting that the volume is kept constant), we get:

T·(dP/dT) – P = 0 (constant volume)

⇔ (1/P)·(dP/dT) = 1/T (constant volume)

Integrating both sides yields: lnP = lnT + constant. This, in turn, implies that P = T × constant. [Just re-write the first constant as the (natural) logarithm of some other constant, i.e. the second constant, obviously).] Now that’s consistent with our ideal gas P = NkT/V, because N, k and V are all constant. So, yes, the ideal gas law is a special case of our more general thermodynamical expression. Fortunately! :-)

That’s not very exciting, you’ll say—and you’re right. You may be interested – although I doubt it :-) – in the chemists’ world view: they usually have performance data (read: values for derivatives) measured under constant pressure. The equations above then transform into:

  1. ΔH = Δ(U + P·V) = ΔQ + VΔP
  2. ΔH = ΔT·(∂H/∂T)+ ΔP·(∂H/∂P)
  3. (∂H/∂P)T = –T·(∂V/∂T)+ V

H? Yes. H is another so-called state variable, so it’s like entropy or internal energy but different. As they say in Asia: “Same-same but different.” :-) It’s defined as H = U + PV and its name is enthalpy. Why do we need it? Because some clever man noted that, if you take the total differential of P·V, i.e. Δ(P·V) = P·ΔV + V·ΔP, and our ΔU = ΔQ – P·ΔV expression, and you add both sides of both expressions, you get Δ(U + P·V) = ΔQ + VΔP. So we’ve substituted –P for V – so as to please the chemists – and all our equations hold provided we substitute U for H and, importantly, –P for V. [Note the sign switch is to be applied to derivatives as well: if we substitute P for –V, then ∂P/∂T becomes ∂(–V)/∂T = –(∂V/∂T)!

So that’s the chemists’ model of the world, and they’ll usually measure the specific heat capacity at constant pressure, rather than at constant volume. Indeed, one can show the following:

(∂H/∂T)= (∂Q/∂T)= CP = the specific heat capacity at constant pressure

In short, while we referred to γ as the specific heat ratio in our previous posts, assuming we’re talking ideal gases only, we can now appreciate the fact there is actually no such thing as the specific heat: there are various variables and, hence, various definitions. Indeed, it’s not only pressure or volume: the specific heat capacity of some substance will usually also be expressed as a function of its mass (i.e. per kg), the number of particles involved (i.e. per mole), or its volume (i.e. per m3). In that case, we talk about the molar or volumetric heat capacity respectively. The name for the same thing expressed in joule per degree Kelvin and per kg (J/kg·K) is the same: specific heat capacity. So we’ve got three different concepts here, and two ways of measuring them: at constant pressure or at constant volume. No wonder one gets quite confused when googling tables listing the actual values! :-)

Now, there’s one question left: why is γ being referred to as the specific heat ratio? The answer is simple: it actually is the ratio of the specific heat capacities CP and CV. Hence, γ is equal to:

γ = CP/CV

I could show you how that works. However, I would just be copying the Wikipedia article on it, so I won’t do that: you’re sufficiently knowledgeable now to check it out yourself, and verify it’s actually true. Good luck with it ! In the process, please also do check why Cis always larger than Cso you can explain why γ is always larger than one. :-)

Post scriptum: As usual, Feynman’s Lectures, were the inspiration here—once more. Now, Feynman has a habit of ‘integrating’ expressions and, frankly, I never found a satisfactory answer to a pretty elementary question: integration in regard to what variable? His exposé on both the ideal as well as the actual gas law has enlightened me. The answer is simple: it doesn’t matter. :-) Let me show that by analyzing the following argument of Feynman:


So… What is that ‘integration’ that ‘yields’ that γlnV + lnP = lnC expression? Are we solving some differential equation here? Well… Yes. But let’s be practical and take the derivative of the expression in regard to V, P and T respectively. Let’s first see where we come from. The fundamental equation is PV = (γ–1)U. That means we’ve got two ‘independent’ variables, and one that ‘depends’ on the others: if we fix P and V, we have U, or if we fix U, then P and V are in an inversely proportional relationship. That’s easy enough. We’ve got three ‘variables’ here: U, P and V—or, in differential form, dU, dP and dV. However, Feynman eliminates one by noting that dU = –PdV. He rightly notes we can only do that because we’re talking adiabatic expansion/compression here: all the work done while expanding/compressing the gas goes into changing the internal energy: no heat is added or removed. Hence, there is no dQ term here.

So we are left with two ‘variables’ only now: P and V, or dP and dV when talking differentials. So we can choose: P depends on V, or V depends on P. If we think of V as the independent variable, we can write:

d[γ·lnV + lnP]/dV = γ·(1/V)·(dV/dV) + (1/P)·(dP/dV), while d[lnC]/dV = 0

So we have γ·(1/V)·(dV/dV) + (1/P)·(dP/dV) = 0, and we can then multiply sides by dV to get:

(γ·dV/V) + (dP/P) = 0,

which is the core equation in this argument, so that’s the one we started off with. Picking P as the ‘independent’ variable and, hence, integrating with respect to P yields the same:

d[γ·lnV + lnP]/dP = γ·(1/V)·(dV/dP) + (1/P)·(dP/dP), while d[lnC]/dP = 0

Multiplying both sides by dP yields the same thing: (γ·dV/V) + (dP/P) = 0. So it doesn’t matter, indeed. But let’s be smart and assume both P and V, or dP and dV, depend on some implicit variable—a parameter really. The obvious candidate is temperature (T). So we’ll now integrate and differentiate in regard to T. We get:

d[γ·lnV + lnP]/dT = γ·(1/V)·(dV/dT) + (1/P)·(dP/dT), while d[lnC]/dT = 0

We can, once again, multiply both sides with dT and – surprise, surprise! – we get the same result: 

(γ·dV/V) + (dP/P) = 0

The point is that the γlnV + lnP = lnC expression is damn valid, and C or lnC or whatever is ‘the constant of integration’ indeed, in regard to whatever variable: it doesn’t matter. So then we can, indeed, take the exponential of both sides (which is much more straightforward than ‘integrating both sides’), so we get:

eγlnV + lnP = eln= C

It then doesn’t take too much intelligence to see that eγlnV + lnP = e(lnV)γ+ln= e(lnV)γ·elnP Vγ·P = P·Vγ. So we’ve got the grand result that what we wanted: PVγ = C, with C some constant determined by the situation we’re in (think of the size of the box, or the density of the gas).

So, yes, we’ve got a ‘law’ here. We should just remind ourselves, always, that it’s only valid when we’re talking adiabatic compression or expansion: so we we do not add or remove heat energy or, as Feynman puts it, much more succinctly, “no heat is being lost“. And, of course, we’re also talking ideal gases only—which excludes a number of real substances. :-) In addition, we’re talking adiabatic processes only: we’re not adding nor removing heat.

It’s a weird formula: the pressure times the volume to the 5/3 power is a constant for monatomic gas. But it works: as long as individual atoms are not bound to each other, the law holds. As mentioned above, when various molecular states, with associated energy levels are at play, it becomes an entirely different ballgame. :-)

I should add one final note as to the functional form of PVγ = C. We can re-write it as P = C/Vγ. Because The shape of that graph is similar to the P = NkT/V relationship we started off with. Putting the two equations side by side, makes it clear our constant and temperature are obviously related one to another, but they are not directly proportional to each other. In fact, as the graphs below clearly show, the P = NkT/V gives us these isothermal lines on the pressure-volume graph (i.e. they show P and V are related at constant temperature), while the P = C/Vγ equation gives us the adiabatic lines. Just google an online function graph tool, and you can now draw your own diagrams of the Carnot cycle! Just change the denominator (i.e. the constants C and T in both equations). :-)

graphNow, I promised I would say something more about that infinitesimal Carnot cycle: why is it there? Why don’t we limit the analysis to just the first two steps? In fact, the shortest and best explanation I can give is something like this: think of the whole cycle as the first step in a reversible process really. We put some heat in (ΔQ) and the gas does some work, but so that heat has to go through the whole body of gas, and the energy has to go somewhere too. In short, the heat and the work is not being absorbed by the surroundings but it all stays in the ‘system’ that we’re analyzing, so to speak, and that’s why we’re going through the full cycle, not the first two steps only. Now, this ‘answer’ may or may not satisfy you, but I can’t do better. You may want to check Feynman’s explanation itself, but he’s very short on this and, hence, I think it won’t help you much either. :-(

The Ideal versus the Actual Gas Law


The two previous posts were quite substantial. Still, they were only the groundwork for what we really want to talk about: entropy, and the second law of thermodynamics, which you probably know as follows: all of the energy in the universe is constant, but its entropy is always increasing. But what is entropy really? And what’s the nature of this so-called law?

Let’s first answer the second question: Wikipedia notes that this law is more like an empirical finding that has been accepted as an axiom. That probably sums it up best. That description does not downplay its significance. In fact, Newton’s laws of motion, or Einstein’s relatively principle, have the same status: axioms in physics – as opposed to those in math – are grounded in reality. At the same time, and just like in math, one can often choose alternative sets of axioms. In other words, we can derive the law of ever-increasing entropy from other principles, notably the Carnot postulate, which basically says that, if the whole world were at the same temperature, it would impossible to reversibly extract and convert heat energy into work. I talked about that in my previous post, and so I won’t go into more detail here. The bottom line is that we need two separate heat reservoirs at different temperatures, denoted by Tand T2, to convert heat into useful work.

Let’s go to the first question: what is entropy, really?

Defining entropy

Feynman, the Great Teacher, defines entropy as part of his discussion on Carnot’s ideal reversible heat engine, so let’s have a look at it once more. Carnot’s ideal engine can do some work by taking an amount of heat equal to Qout of one heat reservoir and putting an amount of heat equal to Q2 into the other one (or, because it’s reversible, it can also go the other way around, i.e. it can absorb Q2 and put Q1 back in, provided we do the same amount of work W on the engine).

The work done by such machine, or the work that has to be done on the machine when reversing the cycle, is equal W = Q1 – Q2 (the equation shows the machine is as efficient as it can be, indeed: all of the difference in heat energy is converted into useful work, and vice versa—nothing gets ‘lost’ in frictional energy or whatever else!). Now, because it’s a reversible thermodynamic process, one can show that the following relationship must hold:

Q1/T= Q2/T2

This law is valid, always, for any reversible engine and/or for any reversible thermodynamic process, for any Q1, Q2, T1 and T2. [Ergo, it is not valid for non-reversible processes and/or non-reversible engines, i.e. real machines.] Hence, we can look at Q/T as some quantity that remains unchanged: an equal ‘amount’ of Q/T is absorbed and given back, and so there is no gain or loss of Q/T (again, if we’re talking reversible processes, of course). [I need to be precise here: there is no net gain or loss in the Q/T of the substance of the gas. The first reservoir obviously looses Q1/T1, and the second reservoir gains Q2/T2. The whole environment only remains unchanged if we’d reverse the cycle.]

In fact, this Q/T ratio is the entropy, which we’ll denote by S, so we write:

S = Q1/T= Q2/T2

What the above says, is basically the following: whenever the engine is reversible, this relationship between the heats must follow: if the engine absorbs Qat Tand delivers Qat T2, then Qis to Tas Qis to T2 and, therefore, we can define the entropy S as S = Q/T. That implies, obviously:

Q = S·T

From these relations (S = Q/T and Q = S·T), it is obvious that the unit of entropy has to be joule per degree (Kelvin), i.e. J/K. As such, it has the same dimension as the Boltzmann constant, k≈ 1.38×10−23 J/K, which we encountered in the ideal gas formula PV = NkT, and which relates the mean kinetic energy of atoms or molecules in an ideal gas to the temperature. However, while kis, quite simply, a constant of proportionality, S is obviously not a constant: its value depends on the system or, to continue with the mathematical model we’re using, the heat engine we’re looking at.

Still, this definition and relationships do not really answer the question: what is entropy, really? Let’s further explore the relationships so as to try to arrive at a better understanding.

I’ll continue to follow Feynman’s exposé here, so let me use his illustrations and arguments. The first argument revolves around the following set-up, involving three reversible engines (1, 2 and 3), and three temperatures (T1 > T> T3): Three engines

Engine 1 runs between T1 and  Tand delivers W13 by taking in Q1 at T1 and delivering Q3 at T3. Similarly, engine 2 and 3 deliver or absorb W32  and W12 respectively by running between T3 and  T2 and between T2 and  Trespectively. Now, if we let engine 1 and 2 work in tandem, so engine 1 produces W13 and delivers Q3, which is then taken in by engine 2, using an amount of work W32, the net result is the same as what engine 3 is doing: it runs between T1 and  Tand delivers W12, so we can write:

W12 = W13 – W32

This result illustrates that there is only one Carnot efficiency, which Carnot’s Theorem expresses as follows:

  1. All reversible engines operating between the same heat reservoirs are equally efficient.
  2. No actual engine operating between two heat reservoirs can be more efficient than a Carnot engine operating between the same reservoirs.

Now, it’s obvious that it would be nice to have some kind of gauge – or a standard, let’s say – to describe the properties of ideal reversible engines in order to compare them. We can define a very simple gauge by assuming Tin the diagram above is one degree. One degree what? Whatever: we’re working in Kelvin for the moment, but any absolute temperature scale will do. [An absolute temperature scale uses an absolute zero. The Kelvin scale does that, but the Rankine scale does so too: it just uses different units than the Kelvin scale (the Rankine units correspond to Fahrenheit units, while the Kelvin units correspond to Celsius degrees).] So what we do is to let our ideal engines run between some temperature T – at which it absorbs or delivers a certain heat Q – and 1° (one degree), at which it delivers or absorbs an amount of heat which we’ll denote by QS. [Of course, I note this assumes that ideal engines are able to run between one degree Kelvin (i.e. minus 272.15 degrees Celsius) and whatever other temperature. Real (man-made) engines are obviously likely to not have such tolerance. :-)] Then we can apply the Q = S·T equation and write:

Q= S·1°

Like that we solve the gauge problem when measuring the efficiency of ideal engines, for which the formula is W/Q= (T1 –  T)/T1. In my previous post, I illustrated that equation with some graphs for various values of T(e.g. T= 4, 1, or 0.3). [In case you wonder why these values are so small, it doesn’t matter: we can scale the units, or assume 1 unit corresponds to 100 degrees, for example.] These graphs all look the same but cross the x-axis (i.e. the T1-axis) at different points (at T= 4, 1, and 0.3 respectively, obviously). But let us now use our gauge and, hence, standardize the measurement by setting T2 to 1. Hence, the blue graph below is now the efficiency graph for our engine: it shows how the efficiency (W/Q1) depends on its working temperature Tonly. In fact, if we drop the subscripts, and define Q as the heat that’s taken in (or delivered when we reverse the machine), we can simply write:

 W/Q = (T – 1)/T = 1 – 1/T


Note the formula allows for negative values of the efficiency W/Q: if Twould be lower than one degree, we’d have to put work in and, hence, our ideal engine would have negative efficiency indeed. Hence, the formula is consistent over the whole temperature domain T > 0. Also note that, coincidentally, the three-engine set-up and the W/Q formula also illustrate the scalability of our theoretical reversible heat engines: we can think of one machine substituting for two or three others, or any combination really: we can have several machines of equal efficiency working in parallel, thereby doubling, tripling, quadruping, etcetera, the output as well as the heat that’s being taken in. Indeed, W/Q = 2W/2Q = 3W/3Q1 = 4W/4Q and so on.

Also, looking at that three-engine model once again, we can set T3 to one degree and re-state the result in terms of our standard temperature:

If one engine, absorbing heat Qat T1, delivers the heat QS at one degree, and if another engine absorbing heat Qat T2, will also deliver the same heat QS at one degree, then it follows that an engine which absorbs heat Qat the temperature T1 will deliver heat Qif it runs between T1 and T2.

That’s just stating what we showed, but it’s an important result. All these machines are equivalent, so to say, and, as Feynman notes, all we really have to do is to find how much heat (Q) we need to put in at the temperature T in order to deliver a certain amount of heat Qat the unit temperature (i.e. one degree). If we can do that, then we have everything. So let’s go for it.

Measuring entropy

We already mentioned that we can look at the entropy S = Q/T as some quantity that remains unchanged as long as we’re talking reversible thermodynamic processes. Indeed, as much Q/T is absorbed as is given back in a reversible cycle or, in other words: there is no net change in entropy in a reversible cycle. But what does it mean really?

Well… Feynman defines the entropy of a system, or a substance really (think of that body of gas in the cylinder of our ideal gas engine), as a function of its condition, so it is a quantity which is similar to pressure (which is a function of density, volume and temperature: P = NkT/V), or internal energy (which is a function of pressure and volume (U = (3/2)·PV) or, substituting the pressure function, of density and temperature: U = (3/2)·NkT). That doesn’t bring much clarification, however. What does it mean? We need to go through the full argument and the illustrations here.

Suppose we have a body of gas, i.e. our substance, at some volume Va and some temperature Ta (i.e. condition a), and we bring it into some other condition (b), so it now has volume Vb and temperature Tb, as shown below. [Don’t worry about the ΔS = Sb – Sa and ΔS = Sa – Sb formulas as for now. I’ll explain them in a minute.]  

Entropy change

You may think that a and b are, once again, steps in the reversible cycle of a Carnot engine, but no! What we’re doing here is something different altogether: we’ve got the same body of gas at point b but in a completely different condition: indeed, both the volume and temperature (and, hence, its pressure) of the gas is different in b as compared to a. What we do assume, however, is that the gas went from condition a to condition b through a completely reversible process. Cycle, process? What’s the difference? What do we mean with that?

As Feynman notes, we can think of going from a to b through a series of steps, during which tiny reversible heat engines take out an infinitesimal amount of heat dQ in tiny little reservoirs at the temperature corresponding to that point on the path. [Of course, depending on the path, we may have to add heat (and, hence, do work rather than getting work out). However, in this case, we see a temperature rise but also an expansion of volume, the net result of which is that the substance actually does some (net) work from a to b, rather than us having to put (net) work in.] So the process consists, in principle, of a (potentially infinite) number of tiny little cycles. The thinking is illustrated below. 

Entropy change 2

Don’t panic. It’s one of the most beautiful illustrations in all of Feynman’s Lectures, IMHO. Just analyze it. We’ve got the same horizontal and vertical axis here, showing volume and temperature respectively, and the same points a and b showing the condition of the gas before and after and, importantly, also the same path from condition a to condition b, as in the previous illustration. It takes a pedagogic genius like Feynman to think of this: he just draws all those tiny little reservoirs and tiny engines on a mathematical graph to illustrate what’s going on: at each step, an infinitesimal amount of work dW is done, and an infinitesimal amount of entropy dS = dQ/T is being delivered at the unit temperature.

As mentioned, depending on the path, some steps may involve doing some work on those tiny engines, rather than getting work out of them, but that doesn’t change the analysis. Now, we can write the total entropy that is taken out of the substance (or the little reservoirs, as Feynman puts it), as we go from condition a to b, as:

ΔS = Sb – Sa

Now, in light of all the above, it’s easy to see that this ΔS can be calculated using the following integral:

integral entropy

So we have a function S here which depends on the ‘condition’ indeed—i.e. the volume and the temperature (and, hence, the pressure) of the substance. Now, you may or may not notice that it’s a function that is similar to our internal energy formula (i.e. the formula for U). At the same time, it’s not internal energy. It’s something different. We write:

S = S(V, T)

So now we can rewrite our integral formula for change in S as we go from a to b as:

integral entropy 2

Now, a similar argument as the one we used when discussing Carnot’s postulate (all ideal reversible engines operating between two temperatures are essentially equivalent) can be used to demonstrate that the change in entropy does not depend on the path: only the start and end point (i.e. point a and b) matter. In fact, the whole discussion is very similar to the discussion of potential energy when conservative force fields are involved (e.g. gravity or electromagnetism): the difference between the values for our potential energy function at different points was absolute. The paths we used to go from one point to another didn’t matter. The only thing we had to agree on was some reference point, i.e. a zero point. For potential energy, that zero point is usually infinity. In other words, we defined zero potential energy as the potential energy of a charge or a mass at an infinite distance away from the charge or mass that’s causing the field.

Here we need to do the same: we need to agree on a zero point for S, because the formula above only gives the difference of entropy between two conditions. Now, that’s where the third law of thermodynamics comes in, which simply states that the entropy of any substance at the absolute zero temperature (T = 0) is zero, so we write:

S = 0 at T = 0

That’s easy enough, isn’t it?

Now, you’ll wonder whether we can actually calculate something with that. We can. Let me simply reproduce Feynman’s calculation of the entropy function for an ideal gas. You’ll need to pull all that I wrote in this and my previous posts together, but you should be able to follow his line of reasoning:

Entropy for ideal gas

Huh? I know. At this point, you’re probably suffering from formula overkill. However, please try again. Just go over the text and the formulas above, and try to understand what they really mean. [In case you wonder about the formula with the ln[Vb/Va] factor (i.e. the reference to section 44.4), you can check it in my previous post.] So just try to read the S(V, T) formula: it says that a substance (a gas, liquid or solid) consisting of N atoms or molecules, at some temperature T and with some volume V, is associated with some exact value for its entropy S(V, T). The constant, a, should, of course, ensure that S(V, T) = 0 at T = 0.

The first thing you can note is that S is an increasing function of V at constant temperature T. Conversely, decreasing the volume results in a decrease of entropy. To be precise, using the formula for S, we can derive the following formula for the difference in entropy when keeping the temperature constant at some value T:

Sb – Sa = S(Vb, T) – S(Va, T)

= ΔS = N·k·ln[Vb/Va]

What this formula says, for example, is that we’d do nothing but double the volume (while keeping the temperature constant) of a gas when going from  to a to b (hence, Vb/V= 2), the entropy will change by N·k·ln(2) ≈ 0.7·N·k. Conversely, if we would halve the volume (again, assuming the temperature remains constant), then the change in entropy will be N·k·ln(0.5) ≈ –0.7·N·k.

The graph below shows how it works. It’s quite simple really: it’s just the ln(x) function, and I just inserted it here so you have an idea of how the entropy changes with volume. [In case you would think it looks the same like that efficiency graph, i.e. the graph of the W/Q = (T – 1)/T = 1 – 1/T function, think again: the efficiency graph has a horizontal asymptote (y = 1), while the logarithmic function does not have any horizontal asymptote.]

Capture 2

Now, you may think entropy changes only marginally as we keep increasing the volume, but you should also think twice here. It’s just the nature of the logarithmic scale. Indeed, when we double the volume, going from V = 1 to V = 2, for example, the change in entropy will be equal to N·k·ln(2) ≈ 0.7·N·k. Now, that’s the same change as going from V = 2 to V = 4, and the same as going from V = 4 to V = 8. So, if we double the volume three times in a row, the total change in entropy will be that of going from V = 1 to V = 8, which is equal to N·k·ln(8) = N·k·ln(23) = 3·ln(2). So, yes, looking at the intervals here that are associated with the same ln(2) increase  in entropy, i.e. [1, 2], [2, 4] and [4, 8] respectively, you may think that the increase in entropy is marginal only, as it’s the same increase but the length of each interval is double that of the previous one. However, when reducing the volume, the logic works the other way around, and so the logarithmic function ensures the change is anything but marginal. Indeed, if we halve the volume, going from V = 1 to V = 1/2, and then halve it again, to V = 1/4, and the again, to V = 1/8, we get the same change in entropy once more—but with a minus sign in front, of course: N·k·ln(2–3) = –3·ln(2)—but the same ln(2) change is now associated with intervals on the x-axis (between 1 and 0.5, 0.5 and 0.25, and 0.25 and 0.125 respectively) that are getting smaller and smaller as we further reduce the volume. In fact, the length of each interval is now half of that of the previous interval. Hence, the change in entropy is anything but marginal now!

[In light of the fact that the (negative) change in entropy becomes larger and larger as we further reduce the volume, and in a way that’s anything but marginal, you may now wonder, for a very brief moment, whether or not the entropy might actually take on a negative value. The answer is obviously no. The change in entropy can take on a large negative volume but the S(V, T) = N·k·[ln(V) + ln(T)/(γ–1)] + a formula, with ensuring that the entropy is zero at T = 0, ensures things come out alright—as it should, of course!]

Now, as we’re continue to try to understand what entropy really means, it’s quite interesting to think of what this formula implies at the level of the atoms or molecules that make up the gas: the entropy change per molecule is k·ln2 – or k·ln(1/2) when compressing the gas at the same temperature. Now, its kinetic energy remains the same – because – don’t forget! – we’re changing the volume at constant temperature here. So what causes the entropy change here really? Think about it: the only thing that changed, physically, is how much room the molecule has to run around in—as Feynman puts it aptly. Hence, while everything stays the same (atoms or molecules with the same temperature and energy), we still have an entropy increase (or decrease) when the distribution of the molecules changes.

This remark brings us to the connection between order and entropy, which you vaguely know, for sure, but probably never quite understood because, if you did, you wouldn’t be reading this post. :-) So I’ll talk about in a moment. I first need to wrap up this section, however, by showing why all of the above is, somehow, related to that ever-increasing entropy law. :-)

However, before doing that, I want to quickly note something about that assumption of constant temperature here. How can it remain constant? When a body of gas expands, its temperature should drop, right? Well… Yes. But only if it is pushing against something, like in cylinder with a piston indeed, or as air escapes from a tyre and pushes against the (lower-pressure) air outside of the tyre. What happens here is that the kinetic energy of the gas molecules is being transferred (to the piston, or to the gas molecules outside of the tyre) and, hence, temperature decreases indeed. In such case, the assumption is that we add (or remove) heat from our body of gas as we expand (or decrease) its volume. Having said that, in a more abstract analysis, we could envisage a body of gas that has nothing to push against, except for the walls of its container, which have the same temperature. In such more abstract analysis, we need not worry about how we keep temperature constant: the point here is just to compare the ex post and ex ante entropy of the volume. That’s all.

The Law of Ever-Increasing Entropy 

With all of the above, we’re finally armed to ‘prove’ the second law of thermodynamics which we can also state as follows indeed: while the energy of the universe is constant, its entropy is always increasing. Why is this so? Out of respect, I’ll just quote Feynman once more, as I can’t see how I could possibly summarize it better:

Universe of entropy

So… That should sum it all up. You should re-read the above a couple of times, so you’re sure you grasp it. I’ll also let Feynman summarize all of those ‘laws’ of thermodynamics that we have just learned as, once more, I can’t see how I could possibly write more clearly or succinctly. His statement is much more precise that the statement we started out with: the energy of the universe is always constant but its entropy is always increasing. As Feynman notes, this version of the two laws of thermodynamics don’t say that entropy stays the same in a reversible cycle, and also doesn’t say what entropy actually is. So Feynman’s summary is much more precise and, hence, much better indeed:

Laws of thermodynamics

Entropy and order

What I wrote or reproduced above may not have satisfied you. So we’ve got this funny number, S, describing some condition or state of a substance, but you may still feel you don’t really know what it means. Unfortunately, I cannot do all that much about that. Indeed, technically speaking, a quantity like entropy (S) is a state function, just like internal energy (U), or like enthalpy (usually denoted by H), a related concept which you may remember from chemistry and which is defined H = U + PV. As such, you may just think of S as some number that pops up in a thermodynamical equations. It’s perfectly fine to think of it like that. However, if you’re reading this post, then it’s likely you do so because some popular science book mentioned entropy and related it to order and/or disorder indeed. However, I need to disappoint you here: that relationship is not as straightforward as you may think it is. To get some idea, let’s go through another example, which I’ll also borrow from Feynman.

Let’s go back to that relationship between volume and entropy, keeping temperature constant:

ΔS = N·k·ln[Vb/Va]

We discussed, rather at length, how entropy increases as we allow a body of gas to expand. As the formula shows, it increases logarithmically with the ratio of the ex ante and ex post volume. Now, let us think about two gases, which we can think of as ‘white’ and ‘black’ respectively. Or neon or argon. Whatever. Two different gases. Let’s suppose we’ve kept them into two separate compartments of a box, with some barrier in-between them.

Now, you know that, if we’d take out the barrier, they’ll mix it. That’s just a fact of life. As Feynman puts it: somehow, the whites will worm their way across in the space of blacks, and the blacks will worm their way, by accident, into the space of whites. [There’s a bit of a racist undertone in this, isn’t there? But then I am sure Feynman did not intend it that way.] Also, as he notes correctly: we’ve got a very simple example here of an irreversible process which is completely composed of reversible events. We know this mixing will not affect the kinetic (or internal) energy of the gas. Having said that, both the white and the black molecules now have ‘much more room to run around in’. So is there a change in entropy? You bet.

If we take away that barrier, it’s just similar to moving that piston out when we were discussing one volume of gas only. Indeed, we effectively double the volume for the whites, and we double the volume for the blacks, while keeping all at the same temperature. Hence, both the entropy of the white and black gas increases. By how much? Look at the formula: the amount is given by the product of the number of molecules (N), the Boltzman constant (k), and ln(2), i.e. the natural logarithm of the ratio of the ex post and ex ante volumes: ΔS = N·k·ln[Vb/Va].

So, yes, entropy increases as the molecules are now distributed over a much larger space. Now, if we stretch our mind a bit, we could define as a measure of order, or disorder, especially when considering the process going the other way: suppose the gases were mixed up to begin with and, somehow, we manage to neatly separate them in two separate volumes, each half of the original. You’d agree that amounts to an increase in order and, hence, you’d also agree that, if entropy is, somehow, some measure for disorder, entropy should decrease–which it obviously does using that ΔS = N·k·ln[Vb/Va] formula. Indeed, we calculated ΔS as –0.7·N·k.

However, the interpretation is quite peculiar and, hence, not as straightforward as popular science books suggest. Indeed, from that S(V, T) = Nk[lnV + (1/γ−1)lnT] + a formula, it’s obvious we can also decrease entropy by decreasing the number of molecules, or by decreasing the temperature. You’ll have to admit that in both cases (decrease in N, or decrease in T), you’ll have to be somewhat creative in interpreting such decrease as a decrease in disorder.

So… What more can we say? Nothing much. However, in order to be complete, I should add a final note on this discussion of entropy measuring order (or, to be more precise, measuring disorder). It’s about another concept of entropy, the so-called Shannon entropy. It’s a concept from information theory, and our entropy and the Shannon entropy do have something in common: in both, we see that logarithm pop up. It’s quite interesting but, as you might expect, complicated. Hence, I should just refer you to the Wikipedia article on it, from which I took the illustration and text below.

coin flip

We’ve got two coins with two faces here. They can, obviously, be arranged in 22 = 4 ways. Now, back in 1948, the so-called father of information theory, Claude Shannon, thought it was nonsensical to just use that number (4) to represent the complexity of the situation. Indeed, if we’d take three coins, or four, or five, respectively, then we’d have 2= 8, 2= 16, and 2= 32 ways, respectively, of combining them. Now, you’ll agree that, as a measure of the complexity of the situation, the exponents 1, 2, 3, 4 etcetera describe the situation much better than 2, 4, 8, 16 etcetera.

Hence, Shannon defined the so-called information entropy as, in this case,  the base 2 logarithm of the number of possibilities. To be precise, the information entropy of the situation which we’re describing here (i.e. the ways a set of coins can be arranged) is equal to S = N = log2(2N) = 1, 2, 3, 4 etcetera for N = 1, 2, 3, 4 etcetera. In honor of Shannon, the unit is shannons. [I am not joking.] However, information theorists usually talk about bits, rather than shannons. [We’re not talking a computer bit here, although the two are obviously related, as computer bits are binary too.]

Now, one of the many nice things of logarithmic functions is that it’s easy to switch bases. Hence, instead of expressing information entropy in bits, we can also express it in trits (for base 3 logarithms), nats (for base e logarithms, so that’s the natural logarithmic function ln), or dits (for base 10 logarithms). So… Well… Feynman is right in noting that “the logarithm of the number of ways we can arrange the molecules is (the) entropy”, but that statement needs to be qualified: the concepts of information entropy and entropy tout court, as used in the context of thermodynamical analysis, are related but, as usual, they’re also different. :-) Bridging the two concepts involves probability distributions and other stuff. One extremely simple popular account illustrates the principle behind as follows:

Suppose that you put a marble in a large box, and shook the box around, and you didn’t look inside afterwards. Then the marble could be anywhere in the box. Because the box is large, there are many possible places inside the box that the marble could be, so the marble in the box has a high entropy. Now suppose you put the marble in a tiny box and shook up the box. Now, even though you shook the box, you pretty much know where the marble is, because the box is small. In this case we say that the marble in the box has low entropy.

Frankly, examples like this make only very limited sense. They may, perhaps, help us imagine, to some extent, how probability distributions of atoms or molecules might change as the atoms or molecules get more space to move around in. Having said that, I should add that examples like this are, at the same time, also so simplistic they may confuse us more than they enlighten us. In any case, while all of this discussion is highly relevant to statistical mechanics and thermodynamics, I am afraid I have to leave it at this one or two remarks. Otherwise this post risks becoming a course! :-)

Now, there is one more thing we should talk about here. As you’ve read a lot of popular science books, you probably know that the temperature of the Universe is decreasing because it is expanding. However, from what you’ve learnt so far, it is hard to see why that should be the case. Indeed, it is easy to see why the temperature should drop/increase when there’s adiabatic expansion/compression: momentum and, hence, kinetic energy, is being transferred from/to the piston indeed, as it moves out or into the cylinder while the gas expands or is being compressed. But the expanding universe has nothing to push against, does it? So why should its temperature drop? It’s only the volume that changes here, right? And so its entropy (S) should increase, in line with the ΔS = Sb – Sa = S(Vb, T) – S(Va, T) = ΔS = N·k·ln[Vb/Va] formula, but not its temperature (T), which is nothing but the (average) kinetic energy of all of the particles it contains. Right? Maybe.

[By the way, in case you wonder why we believe the Universe is expanding, that’s because we see it expanding: an analysis of the redshifts and blueshifts of the light we get from other galaxies reveals the distance between galaxies is increasing. The expansion model is often referred to as the raisin bread model: one doesn’t need to be at the center of the Universe to see all others move away: each raisin in a rising loaf of raisin bread will see all other raisins moving away from it as the loaf expands.]

Why is the Universe cooling down?

This is a complicated question and, hence, the answer is also somewhat tricky. Let’s look at the entropy formula for an increasing volume of gas at constant temperature once more. Its entropy must change as follows:

ΔS = Sb – Sa = S(Vb, T) – S(Va, T) = ΔS = N·k·ln[Vb/Va]

Now, the analysis usually assumes we have to add some heat to the gas as it expands in order to keep the temperature (T) and, hence, its internal energy (U) constant. Indeed, you may or may not remember that the internal energy is nothing but the product of the number of gas particles and their average kinetic energy, so we can write:

U = N<mv2/2>

In my previous post, I also showed that, for an ideal gas (i.e. no internal motion inside of the gas molecules), the following equality holds: PV = (2/3)U. For a non-ideal gas, we’ve got a similar formula, but with a different coefficient: PV = (γ−1)U. However, all these formulas were based on the assumption that ‘something’ is containing the gas, and that ‘something’ involves the external environment exerting a force on the gas, as illustrated below.


As Feynman writes: “Suppose there is nothing, a vacuum, on the outside of the piston. What of it? If the piston were left alone, and nobody held onto it, each time it got banged it would pick up a little momentum and it would gradually get pushed out of the box. So in order to keep it from being pushed out of the box, we have to hold it with a force F.” We know that the pressure is the force per unit area: P = F/A. So can we analyze the Universe using these formulas?

Maybe. The problem is that we’re analyzing limiting situations here, and that we need to re-examine our concepts when applying them to the Universe. :-)

The first question, obviously, is about the density of the Universe. You know it’s close to a vacuum out there. Close. Yes. But how close? If you google a bit, you’ll find lots of hard-to-read articles on the density of the Universe. If there’s one thing you need to pick up from them, is that, in order for the Universe to expand forever, it should have some critical density (denoted by ρc), which is like a watershed point between an expanding and a contracting Universe.

So what about it? According to Wikipedia, the critical density is estimated to be approximately five atoms (of monatomic hydrogen) per cubic metre, whereas the average density of (ordinary) matter in the Universe is believed to be 0.2 atoms per cubic metre. So that’s OK, isn’t it?

Well… Yes and no. We also have non-ordinary matter in the Universe, which is usually referred to as dark matter in the Universe. The existence of dark matter, and its properties, are inferred from its gravitational effects on visible matter and radiation. In addition, we’ve got dark energy as well. I don’t know much about it, but it seems the dark energy and the dark matter bring the actual density (ρ) of the Universe much closer to the critical density. In fact, cosmologists seem to agree thatρ ≈ ρc and, according to a very recent scientific research mission involving an ESA space observatory doing very precise measurements of the Universe’s cosmic background radiation, the Universe should consist of 4.82 ± 0.05% ordinary matter,25.8 ± 0.4% dark matter and 69 ± 1% dark energy. I’ll leave it to you to challenge that. :-)

OK. Very low density. So that means very low pressure obviously. But what’s the temperature? I checked on the Physics Stack Exchange site, and the best answer is pretty nuanced: it depends on what you want to average. To be precise, the quoted answer is:

  1. If one averages by volume, then one is basically talking about the ‘temperature’ of the photons that reach us as cosmic background radiation—which is the temperature of the Universe that those popular science books refer to. In that case, we get an average temperature of 2.72 degrees Kelvin. So that’s pretty damn cold!
  2. If we average by observable mass, then our measurement is focused mainly on the temperature of all of the hydrogen gas (most matter in the Universe is hydrogen), which has a temperature of a few 10s of Kelvin. Only one tenth of that mass is in stars, but their temperatures are far higher: in the range of 104to 105 degrees. Averaging gives a range of 10to 104 degrees Kelvin. So that’s pretty damn hot!
  3. Finally, including dark matter and dark energy, which is supposed to have even higher temperature, we’d get an average by total mass in the range of 107 Kelvin. That’s incredibly hot!

This is enlightening, especially the first point: we’re not measuring the average kinetic energy of matter particles here but some average energy of (heat) radiation per unit volume. This ‘cosmological’ definition of temperature is quite different from the ‘physical’ definition that we have been using and the observation that this ‘temperature’ must decrease is quite logical: if the energy of the Universe is a constant, but its volume becomes larger and larger as the Universe expands, then the energy per unit volume must obviously decrease.

So let’s go along with this definition of ‘temperature’ and look at an interesting study of how the Universe is supposed to have cooled down in the past. It basically measures the temperature of that cosmic background radiation, i.e. a remnant of the Big Bang, a few billion years ago, which was a few degrees warmer then than it is now. To be precise, it was measured as 5.08 ± 0.1 degrees Kelvin, and this decrease has nothing to do with our simple ideal gas laws but with the Big Bang theory, according to which the temperature of the cosmic background radiation should, indeed, drop smoothly as the universe expands.

Going through the same logic but the other way around, if the Universe had the same energy at the time of the Big Bang, it was all focused in a very small volume. Now, very small volumes are associated with very small entropy according to that S(V, T) = N·k·[ln(V) + ln(T)/(γ–1)] + a formula, but then temperature was not the same obviously: all that energy has to go somewhere, and a lot of it was obviously concentrated in the kinetic energy of its constituent particles (whatever they were) and, hence, a lot of it was in their temperature. 

So it all makes sense now. It was good to check out it out, as it reminds us that we should not try to analyze the Universe as a simple of body of gas that’s not contained in anything in order to then apply our equally simple ideal gas formulas. Our approach needs to be much more sophisticated. Cosmologists need to understand physics (and thoroughly so), but there’s a reason why it’s a separate discipline altogether. :-)


First Principles of Thermodynamics

Thermodynamics is not an easy topic, but one can’t avoid it in physics. The main obstacle, probably, is that we very much like to think in terms of dependent and independent variables. While that approach is still valid in thermodynamics, it is more complicated, because it is often not quite clear what the dependent and independent variables are. We’ve got a lot of quantities in thermodynamics indeed: volume, pressure, internal energy, temperature and – soon to be defined – entropy, which are all some function of each other. Hence, the math involves partial derivatives and other subtleties. Let’s try to get through the basics.

Volume, pressure, temperature and the ideal gas law

We all know what a volume is. That’s an unambiguous quantity. Pressure and temperature are not so unambiguous. In fact, as far as I am concerned, the key to understanding thermodynamics is to be able to not only distinguish but also relate pressure and temperature.

The pressure of a gas or a liquid (P) is the force, per unit area, exerted by the atoms or molecules in that gas or liquid as they hit a surface, such as a piston, or the wall of the body that contains it. Hence, pressure is expressed in newton per square meter: 1 pascal (Pa) = 1 N/m2. It’s a small unit for daily use: the standard atmospheric pressure is 1 atm = 101,325 Pa = 1.01325×105 Pa = 1.01325 bar. We derived the formula for pressure in the previous post:

P = F/A = (2/3)·n·〈m·v2/2〉

This formula shows that the pressure depends on two variables:

  1. The density of the gas or the liquid (i.e. the number of particles per unit volume, so it’s two variables really: a number and a volume), and
  2. Their average kinetic energy.

Now, this average kinetic energy of the particles is nothing but the temperature (T), except that, because of historical reasons, we define temperature (expressed in degrees Kelvin) using a constant of proportionality—the Boltzmann constant k = kB. In addition, in order to get rid of that ugly 2/3 factor in our next formula, we’ll also throw in a 3/2 factor. Hence, we re-write the average kinetic energy 〈m·v2/2〉 as:

〈m·v2/2〉 = (3/2)·k·T

Now we substitute that definition into the first equation (while also noting that, if n is the number of particles in a unit volume, we will have N = n·V atoms in a volume V) to get what we want: the so-called ideal gas law, which you should remember from your high-school days:

PV = NkT

The equation implies that, for a given number of particles (for some given substance, that is), and for some given temperature, pressure and volume are inversely proportional one to another: P = NkT/V. The curve representing that relationship between P and V has the same shape as the reciprocal function y = 1/x. To be precise, it has the same shape as a rectangular hyperbola with the center at the origin, i.e. the shape of an y = m/x curve, assuming non-negative values for x and y only. The illustration below shows that graph for m = 1, 3 and 0.3 respectively. We’ll need that graph later when looking at more complicated graphs depicting processes during which we will not keep temperature constant—so that’s why I quickly throw it in here.


Of course, n·〈m·v2/2〉 is the number of atoms times the average kinetic energy of each and, therefore, it is also the internal energy of the gas. Hence, we can also write the PV = NkT equation as:

PV = (2/3)·U

We should immediately note that we’re considering an ideal gas here, so we disregard any possibility of excitation or motion inside the atoms or molecules. It matters because, if we’re decreasing the volume and, hence, increasing the pressure, we’ll be doing work, and the energy needs to go somewhere. The equation above assumes it all goes into that 〈m·v2/2〉 factor and, hence, into the temperature. Hence, it is obvious that, if were to allow for all kinds of rotational and vibratory motions inside of the atoms or molecules motions also, then the analysis would become more complicated. Having said, in my previous post I showed that the complications are limited: we can account for all kinds of internal motion by inserting another coefficient—i.e. other than 2/3. For example, Feynman calculates it as 2/7, rather than 2/3, for the diatomic oxygen molecule. That is why we usually see a much more general expression of the equation above. We will write:

PV = (γ – 1)·U

The gamma (γ) in the equation above is the rather infamous specific heat ratio, and so it’s equal to 5/3 for the ideal gas (5/3 – 1 = 2/3). I call γ infamous because its theoretical value does not match the experimental value for most gases. For example, while I just noted γ’s theoretical value for O(i.e. he diatomic oxygen molecule) – it’s 9/7 ≈ 1.286, because 9/7 – 1 = 2/7), the experimentally measured value for Ois 1.399. The difference can only be explained using quantum mechanics, which is obviously not the topic of this post, and so we won’t write much about γ. However, I need to say one or two things about it—which I’ll do by showing how we could possibly measure it. Let me reproduce the illustration in my previous post here.

Gas pressureThe pressure is the force per unit area (P = F/A and, hence, F = P·A), and compressing the gas amounts to applying a force over some (infinitesimal) distance dx. Hence, the (differential) work done is equal to dW = F·(−dx) = – P·A·dx = – P·dV, as A·dx = dV, obviously (the area A times the distance dx is the volume change). Now, all the work done goes into changing the internal energy U: there is no heat energy that’s being added or removed here, and no other losses of energy. That’s why it’s referred to as a so-called adiabatic compression, from the Greek a (not), dia (through) and bainein (to go): no heat is going through. The cylinder is thermally insulated. Hence, we write:

dU = – P·dV

This is a very simple differential equation. Note the minus sign: the volume is going to decrease while we do work by compressing the piston, thereby increasing the internal energy. [If you are clever (which, of course, you are), you’ll immediately say that, with increasing internal energy, we should also have an increase in pressure and, hence, we shouldn’t treat P as some constant. You’re right, but so we’re doing a marginal analysis only here: we’ll deal with the full thing later. As mentioned above, the complete picture involves partial derivatives and other mathematical tricks.]

Taking the total differential of U = PV/(γ – 1), we also have another equation:

dU = (P·dV + V·dP)/(γ – 1)

Hence, we have – P·dV = (P·dV + V·dP)/(γ – 1) or, rearranging the terms:

γdV/V + dP/P = 0

Assuming that γ is constant (which is true in theory but not in practice—another reason why this γ is rather infamous), we can integrate this. It gives γlnV + lnP = lnC, with lnC the constant of integration. Now we take the exponential of both sides to get that other formulation of the gas law, which you also may or may not remember from your high-school days:

PVγ = C (a constant)

So here you have the answer to the question as to how we can measure γ: the pressure times the volume to the γth power must be some constant. To be precise, for monatomic gases the pressure times the volume to the 5/3 ≈ 1.67 power must be a constant. The formula works for gases like helium, krypton and argon. However, the issue is more complicated when looking at more complex molecules. You should also note the serious limitation in this analysis: we should not think of P as a constant in the dU = – P·dV equation! But I’ll come back to this. As for now, just take note of it and move on to the next topic.

The Carnot heat engine

The definitions above should help us to understand and distinguish isothermal expansion and compression versus adiabatic expansion and compression which, in turn, should help us to understand what the Carnot cycle is all about. We’re looking at a so-called reversible engine here: there is no friction, and we also assume heat flows ‘frictionless’. The cycle is illustrated below: this so-called heat engine takes an amount of heat (Q1) from a high temperature (T1) heat pad (often referred to as the furnace or the boiler or, more generally, the heat source) and uses it to make some body (i.e. a piston in a cylinder in Carnot’s example) do some work, with some other amount of heat (Q2) goes back into some cold sink (usually referred to as the condenser), which is nothing but a second pad at much lower temperature (T2).

Carnot cycle

The four steps involved are the following:

(1) Isothermal expansion: The gas absorbs heat and expands while keeping the same temperature (T1). As the number of gas atoms or molecules, and their temperature, stays the same, the heat does work, as the gas expands and pushes the piston upwards. So that’s isothermal expansion. The next is different.

(2) Adiabatic expansion: The cylinder and piston are now removed from the heat pad, and the gas continues to expand, thereby doing even more work by pushing the piston further upwards. However, as the piston and cylinder are assumed to be thermally insulated, they neither gain nor lose heat. So it is the gas that loses internal energy: its temperature drops. So the gas cools. How much? It depends on the temperature of the condenser, i.e. T2, or – if there’s no condenser – the temperature of the surroundings. Whatever, the temperature cannot fall below T2.

(3) Isothermal compression: Now we (or the surroundings) will be doing work on the gas (as opposed to the gas doing work on its surroundings). The piston is being pushed back, and so the gas is slowly being compressed while, importantly, keeping it at the same temperature T2. Therefore, it delivers, through the head pad, a heat amount Q2 to the second heat reservoir (i.e. the condenser).

(4) Adiabatic compression: We take the cylinder off the heat pad and continue to compress it, without letting any heat flow out this time around. Hence, the temperature must rise, back to T1. At that point, we can put it back on the first heat pad, and start the Carnot cycle all over again.

The graph below shows the relationship between P and V, and temperature (T), as we move through this cycle. For each cycle, we put in Q1 at temperature T1, and take out Q2 at temperature T2, and then the gas does some work, some net work, or useful work as it’s labeled below.

Carnot cycle graph

Let’s go step by step once again:

  1. Isothermal expansion: Our engine takes in Q1 at temperature T1 from the heat source (isothermal expansion), as we move along line segment (1) from point a to point b on the graph above: the pressure drops, the volume increases, but the temperature stays the same.
  2. Adiabatic expansion: We take the cylinder off the heat path and continue to let the gas expand. Hence, it continues to push the piston, and we move along line segment (2) from point b to c: the pressure further drops, and the volume further increases, but the temperature drops too—from T1 to T2 to be precise.
  3. Isothermal compression: Now we bring the cylinder in touch with the T2 reservoir (the condenser or cold sink) and we now compress the gas (so we do work on the gas, instead of letting the gas do work on its surroundings). As we compress the gas, we reduce the volume and increase the pressure, moving along line segment (3) from c to d, while the temperature of the gas stays at T2.
  4. Adiabatic compression: Finally, we take the cylinder of the cold sink, but we further compress the gas. As its volume further decreases, its pressure and, importantly, its temperature too rises, from T2 to T1 – so we move along line segment 4 from d to – and then we put it back on the heat source to start another cycle.

We could also reverse the cycle. In that case, the steps would be the following:

  1. Our engine would first take in Q2 at temperature T2 (isothermal expansion). We move along line segment (3) here but in the opposite direction: from d to c.
  2. Then we would push the piston to compress the gas (so we’d be doing some work on the gas, rather than have the gas do work on its surroundings) so as to increase the temperature from T2 to T1 (adiabatic compression). On the graph, we go from c to b along line segment (2).
  3. Then we would bring the cylinder in touch with the T1 reservoir and further compress the gas so an amount of heat equal to Q1 is being delivered to the boiler at (the higher) temperature T1 (isothermal compression). So we move along line segment (1) from b to a.
  4. Finally, we would let the gas expand, adiabatically, so the temperature drops, back to T(line segment (4), from a to d), so we can put it back on the T2 reservoir, on which we will let it further expand to take in Q2 again.

It’s interesting to note that the only reason why we can get the machine to do some net work (or why, in the reverse case, we are able to transfer heat by putting some work into some machine) is that there is some mechanism here that allows the machine to take in and transfer heat through isothermal expansion and compression. If we would only have adiabatic expansion and compression, then we’d just be going back and forth between temperature T1 and T2 without getting any net work out of the engine. The shaded area in the graph above then collapses into a line. That is why actual steam engines are very complicated and involve valves and other engineering tricks, such as multiple expansion. Also note that we need two heat reservoirs: we can imagine isothermal expansion and compression using one heat reservoir only but then the engine would also not be doing any net work that is useful to us.

Let’s analyze the work that’s being doing during such Carnot cycle somewhat more in detail.

The work done when compressing a gas, or the work done by a gas as it expands, is an integral. I won’t explain in too much detail here but just remind you of that dW = F·(−dx) = – P·A·dx = – P·dV formula. From this, it’s easy to see that the integral is ∫ PdV.

An integral is an area under a curve: just substitute P for y = f(x) and V for x, and think of ∫ f(x)dx = ∫ y dx. So the area under each of the numbered curves is the work done by or on the gas in the corresponding step. Hence, the net work done (i.e. the so-called useful workis the shaded area of the picture. 

So what is it exactly?

Well… Assuming there are no other losses, the work done should, of course, be equal to the difference in the heat that was put in, and the heat that was taken out, so we write:

W = Q– Q2

So that’s key to understanding it all: an efficient (Carnot) heat engine is one that converts all of the heat energy (i.e. Q– Q2) into useful work or, conversely, which converts all of the work done on the gas into heat energy.

Schematically, Carnot’s reversible heat engine is represented as follows:

Heat engine

So what? You may we’ve got it all now, and that there’s nothing to add to the topic. But that’s not the case. No. We will want to know more about the exact relationship between Q1, Q2, Tand T2. Why? Because we want to be able to answer the very same questions Sadi Carnot wanted to answer, like whether or not the engine could be made more efficient by using another liquid or gas. Indeed, as a young military engineer, fascinated by the steam engines that had – by then – become quite common, Carnot wanted to find an unambiguous answer to two questions:

  1. How much work can we get out of a heat source? Can all heat be used to do useful work?
  2. Could we improve heat engines by replacing the steam with some other working fluid or gas?

These questions obviously make sense, especially in regard to the relatively limited efficiency of steam engines. Indeed, the actual efficiency of the best steam engines at the time was only 10 to 20 percent, and that’s under favorable conditions!

Sadi Carnot attempted to answer these in a memoir, published as a popular work in 1824 when he was only 28 years old. It was entitled Réflexions sur la Puissance Motrice du Feu (Reflections on the Motive Power of Fire). Let’s see if we can make sense of it using more modern and common language. [As for Carnot’s young age, like so many, he was not destined to live long: he was interned in a private asylum in 1832 suffering from ‘mania’ and ‘general delirium’, and died of cholera shortly after, aged 36.]

Carnot’s Theorem

You may think that both questions have easy answers. The first question is, obviously, related to the principle of conservation of energy. So… Well… If we’d be able to build a frictionless Carnot engine, including a ‘frictionless’ heat transfer mechanism, then, yes, we’d be able to convert all heat energy into useful work. But that’s an ideal only.

The second question is more difficult. The formal answer is the following: if an engine is reversible, then it makes no difference how it is designed. In other words, the amount of work that we’ll get out of a reversible Carnot heat engine as it absorbs a given amount of heat (Q1) at temperature Tand delivers some other amount of heat (Q2) at some other temperature T does not depend on the design of the machine. More formally, Carnot’s Theorem can be expressed as follows:

  1. All reversible engines operating between the same heat reservoirs are equally efficient.
  2. No actual engine operating between two heat reservoirs can be more efficient than a Carnot engine operating between the same reservoirs.

Feynman sort of ‘proves’ this Theorem from what he refers to as Carnot’s postulate. However, I feel his ‘proof’ is not a real proof, because Carnot’s postulate is too closely related to the Theorem, and so I feel he’s basically proving something using the result of the proof! However, in order to be complete, I did reproduce Feynman’s ‘proof’ of Carnot’s Theorem in the post scriptum to this post.

So… That’s it. What’s left to do is to actually calculate the efficiency of an ideal reversible Carnot heat engine, so let’s do that now. In fact, the calculation below is much more of a real proof of Carnot’s Theorem and, hence, I’d recommend you go through it.

The efficiency of an ideal engine

Above, I said I would need the result that PVγ is equal to some constant. We do, in the following proof that, for an ideal engine, the following relationship holds, always, for any Q1, Q2, T1 and T2:

Q1/T= Q2/T2


Now, we still don’t have the efficiency with this. The efficiency of an ideal engine is the ratio of the amount of work done and the amount of heat it takes in:

Efficiency = W/Q1

But W is equal to Q– Q2. Hence, re-writing the equation with the two heat/temperature ratios above as Q= (T/T1)·Q1, we get: W = Q1(1 –  T/T1) = Q1(T1 –  T)/T1. The grand result is:

Efficiency = W/Q= (T1 –  T)/T1

Let me help you to interpret this result by inserting a graph for T1 going from zero to 20 degrees, and for T2 set at 0.3, 1 and 4 degrees respectively.

graph efficiency

The graph makes it clear we need some kind of gauge so as to be able to actually compare the efficiency of ideal engines. I’ll come back to that in my next post. However, in the meanwhile, please note that the result makes sense: Tneeds to be higher than Tfor the efficiency to be positive (of course, we can interpret negative values for the efficiency just as well, as they imply we need to do work on the engine, rather than the engine doing work for us), and the efficiency is always less than unity, getting closer to one as the working temperature of the engine goes up.

Where does the power go?

So we have an engine that does useful work – so it works, literally – and we know where it gets its energy for that: it takes in more heat than it returns. But where is the work going? It is used to do something else, of course—like moving a car. Now how does that work, exactly? The gas exerts a force on the piston, thereby giving it an acceleration a = F/m, in accordance with Newton’s Law: F = m·a.

That’s all great. But then we need to re-compress the gas and, therefore, we need to (a) decelerate the piston, (b) reverse its direction and (c) push it back in. So that should cancel all of the work, shouldn’t it?

Well… No.

Let’s look at the Carnot cycle once more to show why. The illustrations below reproduce the basic steps in the cycle and the diagram relating pressure, volume and temperature for each of the four steps once more.

Carnot cycle Carnot cycle graph

Above, I wrote that the only reason why we can get the machine to do some net work (or why, in the reverse case, we are able to transfer heat from lower to higher temperature by doing some (net) work on it) is that there is some mechanism here that allows the machine to take in and transfer heat through isothermal expansion and compression and that, if we would only have adiabatic expansion and compression, then we’d just be going back and forth between temperature T1 and T2 without getting any net work out of the engine.

Now, that’s correct and incorrect at the same time. Just imagine a cylinder and a piston in equilibrium, i.e. the pressure on the inside and the outside of the piston are the same. Then we could push it in a bit but, as soon as we release, it would come back to its equilibrium situation. In fact, as we assume the piston can move in and out without any friction whatsoever, we’d probably have a transient response before the piston settles back into the steady state position (see below). Hence, we’d be moving back and forth on segment (2), or segment (4), in that P-V-T diagram above.


The point is: segment (2) and segment (4) are not the same: points a and b, and points c and d, are marked by the same temperature (T1 and Trespectively) butpressure and volume is very different. Why? Because we had a step in-between step (2) and (4): isothermal compression, which reduced the volume, i.e. step (3). Hence, the area underneath these two segments is different too. Indeed, you’ll remember we can write dW = F·(−dx) = – P·A·dx = – P·dV and, hence, the work done (or put in) during each step of the cycle is equal to the integral ∫ PdV, so that’s the area under each of the line segments. So it’s not like these two steps do not contribute to the net work that’s being done through the cycle. They do. Likewise, step (1) and (3) are not each other’s mirror image: they too take place at different volume and pressure, but that’s easier to see because they take place at different temperature and involve different amounts of heat (Q1 and Qrespectively).

But, again, what happens to the work? When everything is said and done, the piston does move up and down over the same distance in each cycle, and we know that work is force times distance. Hence, if the distance is the same… Yes. You’re right: the piston must exert some net force on something or, to put it differently, the energy W = Q1 − Qmust go somewhere. Now that’s where the time variable comes in, which we’ve neglected so far.

Let’s assume we connect the piston to a flywheel, as illustrated below, there had better be some friction on it because, if not, the flywheel would spin faster and faster and, eventually, spin out of control and all would break down. Indeed, each cycle would transfer additional kinetic energy to the flywheel. When talking work and kinetic energy, one usually applies the following formula: W = Q1 and Q= Δ[mv2/2] = [mv2/2]after − [mv2/2]before. However, we’re talking rotational kinetic energy so we should use the rotational equivalent for mv2/2, which is Iω2/2, in which I is the moment of inertia of the mass about the center of rotation and ω is the angular velocity.


You get the point. As we’re talking time now, we should also remind you of the concept of power. Power is the amount of work or energy being delivered, or consumed, per unit of time (i.e. per second). So we can write it as P(t) = dW/dt. For linear motion, P(t) can be written as the vector product (I mean the scalar, inner or dot product here) of the force and velocity vectors, so P(t) = F·v. Again, when rotation is involved, we’ve got an equivalent formula: P(t) = τ·ω, in which τ represents the torque and ω is, once again, the angular velocity of the flywheel. Again, we’d better ensure some load is placed on the engine, otherwise it will spin out of control as vand/or ω get higher and higher and, hence, the power involved gets higher and higher too, until all breaks down.

So… Now you know it all. :-)

Post scriptum: The analysis of the Carnot cycle involves some subtleties which I left out. For example, you may wonder why the gas would actually expand isothermically in the first step of the Carnot cycle. Indeed, if it’s at the same temperature Tas the heat source, there should be no heat flow between the heat pad and the gas and, hence, no gas expansion, no? Well… No. :-) The gas particles pound on every wall, but only the piston can move. As the piston moves out, frictionless, inside of the cylinder, kinetic energy is being transferred from the gas particles to the piston and, hence, the gas temperature will want to drop—but then that temperature drop will immediately cause a heat transfer. That’s why the description of a Carnot engine also postulates ‘frictionless’ heat transfer.

In fact, I note that Feynman himself struggles a bit to correctly describe what’s going on here, as his description of the Carnot cycle suggests some active involvement is needed to make the piston move and ensure the temperature does not drop too fast. Indeed, he actually writes following: “If we pull the piston out too fast, the temperature of the gas will fall too much below T and then the process will not quite be reversible.” This sounds, and actually is, a bit nonsensical: no pulling is needed, as the gas does all of the work while pushing the piston and, while it does, its temperature tends to drop, so it will suck it heat in order to equalize its temperature with its surroundings (i.e. the heat source). The situation is, effectively, similar to that of a can with compressed air: we can let the air expand, and thereby we let it do some work. However, the air will not re-compress itself by itself. To re-compress the air, you’ll need to apply the same force (or pressure I should say) but in the reverse direction.

Finally, I promised I would reproduce Feynman’s ‘proof’ of Carnot’s Theorem. This ‘proof’ involves the following imaginary set-up (see below): we’ve got engines, A and B. We assume A is an ideal reversible engine, while B may or may not be reversible. We don’t care about its design. We just assume that both can do work by taking a certain amount of heat out of one reservoir and putting another amount of heat back into another reservoir. In fact, in this set-up, we assume both engines share a large enough reservoir so as to be able to transfer heat through that reservoir.

Ideal engine

Engine A can take an amount of heat equal to Qat temperature T1 from the first reservoir, do an amount of work equal to W, and then deliver an amount of heat equal to Q= Q– W at temperature T2 to the second reservoir. However, because it’s a reversible machine, it can also go the other way around, i.e. it can take Q= Q– W from the second reservoir, have the surroundings do an amount of work W on it, and then deliver Q= Q+ W at temperature T1. We know that engine B can do the same, except that, because it’s different, the work might be different as well, so we’ll denote it by W’.

Now, let us suppose that the design of engine B is, somehow, more efficient, so we can get more work out of B for the same Qand the same temperatures Tand T2. What we’re saying, then, is that W’ – W is some positive non-zero amount. If that would be true, we could combine both machines. Indeed, we could have engine B take Qfrom the reservoir at temperature T1, do an amount of work equal to W on engine A so it delivers the same amount Qback to the reservoir at the same temperature T1, and we’d still be left with some positive amount of useful work W’ – W. In fact, because the amount of heat in the first reservoir is restored (in each cycle, we take Qout but we also put the same amount of heat back in), we could include it as part of the machine. It would no longer need to be some huge external thing with unlimited heat capacity.

So it’s great! Each cycle gives us an amount of useful work equal to W’ – W. What about the energy conservation law? Well… engine A takes Q– W from the reservoir at temperature T2, and engine B gives Q– W’ back to it, so we’re taking a net amount of heat equal to (Q– W) – (Q– W’) = W’ – W out of the T2 reservoir. So that works out too! So we’ve got a combined machine converting thermal energy into useful work. It looks like a nice set-up, doesn’t it?

Yes. The problem is that, according to Feynman, it cannot work. Why not? Because it violates Carnot’s postulate. The reasoning here is not easy. Let’s me try to do my best to present the argument correctly. What’s the problem? The problem is that we’ve got an engine here that operates at one temperature only. Now, according to Carnot’s postulate, it is not possible to extract the energy of heat at a single temperature with no other change in the system or the surroundings. Why not? Feynman gives the example of the can with compressed air. Imagine a can of compressed air indeed, and imagine we let the air expand, to drive a piston, for example. Now, we can imagine that our can with compressed air was in touch with a large heat reservoir at the same temperature, so its temperature doesn’t drop. So we’ve done work with that can at a single temperature. However, this doesn’t violate Carnot’s postulate because we’ve also changed the system: the air has expanded. It would only violate Carnot’s postulate if we’d find a way to put the air back in using exactly the same amount of work, so the process would be fully reversible. Now, Carnot’s postulate says that’s not possible at the same temperature. If the whole world is at the same temperature, then it is not possible to reversibly extract and convert heat energy into work.

I am not sure the example of the can with compressed air helps, but Feynman obviously thinks it should. He then phrases Carnot’s postulate as follows: “It is not possible to obtain useful work from a reservoir at a single temperature with no other changes.” He therefore claims that the combined machine as described above cannot exist. Ergo, W’ cannot be greater than W. Switching the role of A and B (so B becomes reversible too now), he concludes that W can also not be greater than W’. Hence, W and W’ have to be equal.

Hmm… I know that both philosophers and engineers have worked tirelessly to try to disprove Carnot’s postulate, and that they all failed. Hence, I don’t want to try to disprove Carnot’s postulate. In fact, I don’t doubt its truth at all. All that I am saying here is that I do have my doubts on the logical rigor of Feynman’s ‘proof’. It’s like… Well… It’s just a bit too tautological I’d say.

First Principles of Thermodynamics

First Principles of Statistical Mechanics

Feynman seems to mix statistical mechanics and thermodynamics in his chapters on it. At first, I thought all was rather messy but, as usual, after re-reading it a couple of times, it all makes sense. Let’s have a look at the basics. We’ll start by talking about gas first.

The ideal gas law

The pressure P is the force we have to apply to the piston containing the gas (see below)—per unit area, that is. So we write: P = F/A. Compressing the gas amounts to applying a force over some (infinitesimal) distance dx. Hence, we can write:

dU = F·(−dx) = – P·A·dx = – P·dV

Gas pressure

However, before looking at the dynamics, let’s first look at the stationary situation: let’s assume the volume of the gas does not change, and so we just have the gas atoms bouncing of the piston and, hence, exerting pressure on it. Every gas atom or particle delivers a momentum 2mvto the piston (the factor 2 is there because the piston does not bounce back, so there is no transfer of momentum). If there are N atoms in the volume N, then there are n = N/V in each unit volume. Of course, only the atoms within a distance vx·t are going to hit the piston within the time t and, hence, the number of atoms hitting the piston within that time is n·A·vx·t. Per unit time (i.e. per second), it’s n·A·vx·t/t = n·A·vx. Hence, the total momentum that’s being transferred per second is n·A·vx·2mvx.

So far, so good. Indeed, we know that the force is equal to the amount of momentum that’s being transferred per second. If you forget, just check the definitions and units: a force of 1 newton gives an mass of 1 kg an acceleration of 1 m/s per second, so 1 N = 1 kg·m/s= 1 kg·(m/s)/s. [The kg·(m/s) unit is the unit of momentum (mass times velocity), obviously. So there we are.] Hence,

P = F/A = n·A·vx·2mvx/A = 2nmvx2

Of course, we need to take an average 〈vx2〉 here, and we should drop the factor 2 because half of the atoms/particles move away from the piston, rather than towards it. In short, we get:

P = F/A = nm〈vx2

Now, the average velocity in the x-, y- and z-direction are all the same and uncorrelated, so 〈vx2〉 = 〈vy2〉 = 〈vz2〉 = [〈vx2〉 + 〈vy2〉 + 〈vz2〉]/3 = 〈v2〉/3. So we don’t worry about any direction and simply write:

P = F/A = (2/3)·n·〈m·v2/2〉

[As Feynman notes, the math behind this are not difficult but, at the same time, also less straightforward than you may think.] The last factor is, obviously, the kinetic energy of the (center-of-mass) motion of the atom or particle. Multiplying by V gives:

P·V = (2/3)·N·〈m·v2/2〉 = (2/3)U

Now, that’s not a law you’ll remember from your high school days because… Well… The internal energy of a gas – how do you measure that? We should link it to a measure we do know, and that’s temperature. The atoms or molecules in a gas will have an average kinetic energy which we could define as… Well… That average should have been defined as the temperature but, for historical reasons, the scale of what we know as the ‘temperature’ variable (T) is different. We need to apply a conversion factor, which is usually written as k. To be precise, we’ll write the mean atomic or molecular energy as (3/2)·kT = 3kT/2. The 3/2 factor has been thrown in here to get rid of it later (in a few seconds, that is), and you should also remember that we have three independent directions of motion, and that the magnitude of the component of motion in any of the three directions x, y or z is 1/2 kT = (3kT/2)/3 = kT/2.

I said we’d get rid of that 3/2 factor. Indeed, applying the above-mentioned definition of temperature, we get:

P·V = N·k·T

That k factor is a constant of proportionality, which makes the units come out alright. U is energy, indeed, and, hence, measured in joule (J). N is a pure number, so k is expressed in joule per degree (Kelvin). To be precise, k is (about) 1.38×10−23 joule for every degree Kelvin, so it’s a very tiny constant: it’s referred to as the Boltzmann constant and it’s usually denoted with a little B as subscript (kB). As for how the product of pressure and volume can (also) yield something in joule, you can work that out for yourself, remembering the definition of a joule.

One immediate implication of the formula above is that gases at the same temperature and pressure, in the same volume, must consist of an equal number of atoms/molecules. You’ll say: of course – because you remember that from your high school classes. However, thinking about it some more – and also in light of what we’ll be learning a bit later on gases composed of more complex molecules (diatomic molecules, for example) – you’ll have to admit it’s not all that obvious as a result.

Now, the number of atoms/molecules is usually measured in moles: one mole (or mol) is 6.02×1023 units (more or less, that is). To be somewhat more precise, its CODATA value is 6.02214129(27)×1023. It is defined as the amount of any substance that contains as many elementary entities (e.g. atoms, molecules, ions or electrons) as there are atoms in 12 grams of pure carbon-12 (12C), the isotope of carbon with relative atomic mass of exactly 12 (also by definition). The number corresponds to the Avogadro constant, and it’s one of the base units in the International Systems of Units, usually denoted by n or N0.

Now, if we reinterpret N as the number of moles, rather than the number of atoms, ions or molecules in a gas, we can re-write the same equation using the so-called universal or ideal gas constant, which is equal to R = (1.38×10−23 joule)×(6.02×1023/mol) per degree Kelvin = 8.314 J·K−1·mol−1. In short, the ideal gas constant is the product of two other constants: the Boltzmann constant (kB) and the Avogadro number (N0). So we get:

P·V = N·R·T with N = no. of moles and R = kB·N0

The ideal gas law and internal motion

There’s an interesting and essential remark to be made in regard to complex molecules in a gas. The simpler example of a complex molecule is a diatomic molecule, consisting of two parts, which we’ll denote by A and B, with mass mand mrespectively. A and B are together but are able to oscillate or move relative to one another. In short, we also have some internal motion here, in addition to the motion of the whole thing, which will also has some kinetic energy. Hence, the kinetic energy of the gas consists of two parts:

  1. The kinetic energy of the so-called center-of-mass motion of the whole thing (i.e. the molecule), which we’ll denote by M = m+ mB, and
  2. The kinetic energy of the rotational and vibratory motions of the two atoms (A and B) inside the molecule.

We noted that for single atoms the mean value of the kinetic energy in one direction is kT/2 and that the total kinetic energy is 3kT/2, i.e. three times as much. So what do we have here? Well… The reasoning we followed for the single atoms is also valid for the diatomic molecule considered as a single body of total mass M and with some center-of-mass velocity vCM. Hence, we can write that

M·vCM2/2 = (3/2)·kT

So that’s the same, regardless of whether or not we’re considering the separate pieces or the whole thing. But let’s look at the separate pieces now. We need some vector analysis here, because A and B can move in separate directions, so we have vand v(note the boldface used for vectors). So what’s the relation between vand von the one hand, and vCM on the other? The analysis is somewhat tricky here but – assuming that the vand vB representations themselves are some idealization of the actual rotational and vibratory movements of the A and B atoms – we can write:

   vCM = (mAv+ mBvB)/M

Now we need to calculate 〈vCM2〉, of course, i.e. the average velocity squared. I’ll refer you to Feynman for the details which, in the end, do lead to that M·vCM2/2 = (3/2)·kT equation. The whole calculation depends on the assumption that the relative velocity wvvis not any more likely to point in one direction than another, so its average component in any direction is zero. Indeed, the interim result is that

M·vCM2/2 = (3/2)·kT + 2mAmBvA·vB〉/M

Hence, one needs to prove, somehow, that 〈vA·vB〉 is zero in order to get the result we want, which is what that assumption about the relative velocity w ensures. Now, we still don’t have the kinetic energy of the A and B parts of the molecule. Because A and B can move in all three directions in space, their average kinetic energy 〈mA·vA2/2〉 and  〈mB·vB2/2〉 is also 3·k·T/2. Now, adding 3·k·T/2 and 3·k·T/2 yields 3kT. So now we have what we wanted:

  1. The kinetic energy of the center-of-mass motion of the diatomic molecule is (3/2)·k·T.
  2. The total energy of the diatomic molecule is the sum of the energies of A and B, and so that’s 3·k·T/2 + 3·k·T/2 = 3 k·T.
  3. The kinetic energy of the internal rotational and vibratory motions of the two atoms (A and B) inside the molecule is the difference, so that’s 3·k·T – (3/2)·k·T = (3/2)·k·T.

The more general result can be stated as follows:

  1. A r-atom molecule in a gas will have a kinetic energy of (3/2)·r·k·T, on average, of which
  2. 3/2·k·T is kinetic energy of the center-of-mass motion of the entire molecule,
  3. The rest, (3/2)·(r−1)·k·T, is internal vibrational and rotational kinetic energy.

Another way to state is that, for an r-atom molecule, we find that the average energy for each ‘independent direction of motion’, i.e. for each degree of freedom in the system, is kT/2, with the number of degrees of freedom being equal to 3r.

So in this particular case (example of a diatomic molecule), we have 6 degrees of freedom (two times three), because we have three directions in space for each of the two atoms. A common error is to consider the center-of-mass energy as something separate, rather than including it as a part of the total energy. Remember: the total kinetic energy is, quite simply, the sum of the kinetic energies of the separate atoms, which can be separated into (1) the kinetic energy associated with the center-of-mass motion and (2) the kinetic energy of the internal motions.

You see? It’s not that difficult. Let’s move on to the next topic.

The exponential atmosphere

Feynman uses this rather intriguing title to introduce Boltzmann’s Law, which is a law about densities. Let’s jot it down first:

n = n0·e−P.E/kT

In this equation, P.E. is the potential energy, with k and T our Boltzmann constant and the temperature expressed in Kelvin. As for n0, that’s just a constant which depends on the reference point (P.E. = 0). What are we calculating here? Densities, so that’s the relative or absolute number of molecules per unit volume, so we look for a formula for a variable like n = N/V.

Let’s do an example: the ‘exponential’ atmosphere. :-) Feynman models our ‘atmosphere’ as a huge column of gas (see below). To simplify the analysis, we make silly assumptions. For example, we assume the temperature is the same at all heights. That’s assured by the mechanism for equalizing temperature: if the molecules on top would have less energy than those at the bottom, the molecules at the bottom would shake the molecules at the top, via the rod and the balls. That’s a very theoretical set-up, of course, but let’s just go along with it. The idea is that the average kinetic energy of all molecules is the same. It makes for a much easier mathematical analysis. 


So what’s different? The pressure, of course, which is determined by the number of molecules per unit volume. The pressure must increase with lower altitude because it has to hold, so to speak, the weight of all the gas above it. Conversely, as we go higher, the atmosphere becomes more tenuous. So what’s the ‘law’ or formula here?

We’ll use our gas law: PV = NkT, which we can re-write as P = nkT with n = N/V, so n is the number of molecules per unit volume indeed. What’s stated here is that the pressure (P) and the number of molecules per unit volume (n) are directly proportional, with kT the proportionality factor. So we have gravity (the g force) and we can do a differential analysis: what happens when we go from h to h + dh? If m is the mass of each molecule, and if we assume we’re looking at unit areas (both at h as well as h + dh), then the gravitational force on each molecule will be mg, and ndh will be the total number of molecules in that ‘unit section’.

Now, we can write dP as dP = Ph+dh − Ph and, of course, we know that the difference in pressure must be sufficient to hold, so to speak, the molecules in that small unit section dh. So we can write the following:

dP = Ph+dh − Ph = − m·g·n·dh

Now, P is P = nkT and, hence, because we assume T to be constant, we can write the whole equation as dP = k·T·d= − m·g·n·dh. From that, we get a differential equation:

dn/d= − (m·g)/(k·T)·n

We all hate differential equations, of course, but this one has an easy solution: the equation basically states we should find a function for n which has a derivative which is proportional to itself. The exponential function is such function, so the solution of the differential equation is:

n = n0·e−mgh/kT

n0 is the constant of integration and is, as mentioned above, the density at h = 0. Also note that mgh is, indeed, the potential energy of the molecules, increasing with height. So we have a Boltzmann Law indeed here, which we can write as n = n0·e−P.E/kT. Done ! The illustration below was also taken from Feynman, and illustrates the ‘exponential atmosphere’ for two gases: oxygen and hydrogen. Because their mass is very different, the curve is different too: it shows how, in theory and in practice, lighter gases will dominate at great heights, because the exponentials for the heavier stuff have all died out.



It is easy to show that we’ll have a Boltzmann Law in any situation where the force comes from a potential. In other words, we’ll have a Boltzmann Law in any situation for which the work done when taking a molecule from x to x + dx can be represented as potential energy. An example would be molecules that are electrically charged and attracted by some electric field or another charge that attracts them. In that case, we have an electric force of attraction which varies with position and acts on all molecules. So we could take two parallel planes in the gas, separated by a distance dx indeed, and we’d have a similar situation: the force on each atom, times the number of atoms in the unit section that’s delineated by dx, would have to be balanced by the pressure change, and we’d find a similar ‘law': n = n0·e−P.E/kT.

Let’s quickly show it. The key variable is the density n, of course: n = N/V. If we assume volume and temperature remain constant, then we can use our gas law to write the pressure as P = NkT/V = kT·n, which implies that any change in pressure must involve a density change. To be precise, dP = d(kT·n) = kT·dn. Now, we’ve got a force, and moving a molecule from x to x + dx involves work, which is the force times the distance, so the work is F·dx. The force can be anything, but we assume it’s conservative, like the electromagnetic force or gravity. Hence, the force field can be represented by a potential and the work done is equal to the change in potential energy. Hence, we can write: Fdx = –d(P.E.). Why the minus sign? If the force is doing work, we’re moving with the force and, hence, we’ll have a decrease in potential energy. Conversely, if the surroundings are doing work against the force, we’ll increase potential energy.

Now, we said the force must be balanced by the pressure. What does that mean, exactly? It’s the same analysis as the one we did for our ‘exponential’ atmosphere: we’ve got a small slice, given by dx, and the difference in pressure when going from x to x + dx must be sufficient to hold, so to speak, the molecules in that small unit section dx. [Note we assume we’re talking unit areas once again.] So, instead of writing dP = Ph+dh − Ph = − m·g·n·dh, we now write dP = F·n·dx. So, when it’s a gravitational field, the magnitude of the force involved is, obviously, F = m·g.

The minus sign business is confusing, as usual: it’s obvious that dP must be negative for positive dh, and vice versa, but here we are moving with the force, so no minus sign is needed. If you find that confusing, let me give you another way of getting that dP = F·n·dx expression. The pressure is, quite simply, the force times the number of particles, so P = F·N. Dividing both sides by V yields P/V = F·N/V = F·n. Therefore, P = F·n·V and, hence, dP must be equal to dP = d(F·n·V) = F·n·dV = F·n·dx. [Again, the assumption is that our unit of analysis is the unit area.] OK. I need to move on. Combining (1) dP = d(kT·n) = kT·dn, (2) dP = F·n·dx and (3) Fdx = –d(P.E.), we get:

kT·dn = –d(P.E.)·n ⇔ dn/d(P.E.) = [1/(kT)]·n

That’s a differential equation that’s easy to solve. We’ve repeated it ad nauseam: a function which has a derivative proportional to itself is an exponential. Hence, we have our grand equation:

n = n0·e−P.E/kT

If the whole thing troubles you, just remember that the key to solving problems like this is to clearly identify and separate the so-called ‘dependent’ and ‘independent’ variables. In this case, we want a formula for n and, hence, it’s potential energy that’s the ‘independent’ variable. That’s all. The graph looks the same, of course: the density is greatest at P.E. = 0. To be precise, the density there will be equal to n = n0·e= n0 (don’t think it’s infinity there!). And for higher (potential) energy values, we get lower density values. It’s a simple but powerful graph, and so you should always remember it.


Boltzmann’s Law is a very simple law but it can be applied to very complicated situations. Indeed, while the law is simple, the potential energy curve can be very complicated. So our Law can be applied to other situations than gravity or the electric force. The potential can combine a number of forces (as long as they’re all conservative), as shown in the graph below, which shows a situation in which molecules will attract each other at a distance r > r(and, hence, their potential energy decreases as they come closer together), but repel each other strongly as r becomes smaller than r(so potential energy increases, and very much so as we try to force them on top of each other).

Potential energy

Again, despite the complicated shape of the curve, the density function will follow Boltzmann’s Law: in a given volume, the density will be highest at the distance of minimum energy, and the density will be much less at other distances, thereby respecting the e−P.E/kT distribution, in which the potential energy and the temperature are the only variables. So, yes, Boltzmann’s Law is pretty powerful !

First Principles of Statistical Mechanics

The Uncertainty Principle revisited

I’ve written a few posts on the Uncertainty Principle already. See, for example, my post on the energy-time expression for it (ΔE·Δt ≥ h). So why am I coming back to it once more? Not sure. I felt I left some stuff out. So I am writing this post to just complement what I wrote before. I’ll do so by explaining, and commenting on, the ‘semi-formal’ derivation of the so-called Kennard formulation of the Principle in the Wikipedia article on it.

The Kennard inequalities, σxσp ≥ ħ/2 and σEσt ≥ ħ/2, are more accurate than the more general Δx·Δp ≥ h and ΔE·Δt ≥ h expressions one often sees, which are an early formulation of the Principle by Niels Bohr, and which Heisenberg himself used when explaining the Principle in a thought experiment picturing a gamma-ray microscope. I presented Heisenberg’s thought experiment in another post, and so I won’t repeat myself here. I just want to mention that it ‘proves’ the Uncertainty Principle using the Planck-Einstein relations for the energy and momentum of a photon:

E = hf and p = h/λ

Heisenberg’s thought experiment is not a real proof, of course. But then what’s a real proof? The mentioned ‘semi-formal’ derivation looks more impressive, because more mathematical, but it’s not a ‘proof’ either (I hope you’ll understand why I am saying that after reading my post). The main difference between Heisenberg’s thought experiment and the mathematical derivation in the mentioned Wikipedia article is that the ‘mathematical’ approach is based on the de Broglie relation. That de Broglie relation looks the same as the Planck-Einstein relation (p = h/λ) but it’s fundamentally different.

Indeed, the momentum of a photon (i.e. the p we use in the Planck-Einstein relation) is not the momentum one associates with a proper particle, such as an electron or a proton, for example (so that’s the p we use in the de Broglie relation). The momentum of a particle is defined as the product of its mass (m) and velocity (v). Photons don’t have a (rest) mass, and their velocity is absolute (c), so how do we define momentum for a photon? There are a couple of ways to go about it, but the two most obvious ones are probably the following:

  1. We can use the classical theory of electromagnetic radiation and show that the momentum of a photon is related to the magnetic field (we usually only analyze the electric field), and the so-called radiation pressure that results from it. It yields the p = E/c formula which we need to go from E = hf to p = h/λ, using the ubiquitous relation between the frequency, the wavelength and the wave velocity (c = λf). In case you’re interested in the detail, just click on the radiation pressure link).
  2. We can also use the mass-energy equivalence E = mc2. Hence, the equivalent mass of the photon is E/c2, which is relativistic mass only. However, we can multiply that mass with the photon’s velocity, which is c, thereby getting the very same value for its momentum p = E/c= E/c.

So Heisenberg’s ‘proof’ uses the Planck-Einstein relations, as it analyzes the Uncertainty Principle more as an observer effect: probing matter with light, so to say. In contrast, the mentioned derivation takes the de Broglie relation itself as the point of departure. As mentioned, the de Broglie relations look exactly the same as the Planck-Einstein relationship (E = hf and p = h/λ) but the model behind is very different. In fact, that’s what the Uncertainty Principle is all about: it says that the de Broglie frequency and/or wavelength cannot be determined exactly: if we want to localize a particle, somewhat at least, we’ll be dealing with a frequency range Δf. As such, the de Broglie relation is actually somewhat misleading at first. Let’s talk about the model behind.

A particle, like an electron or a proton, traveling through space, is described by a complex-valued wavefunction, usually denoted by the Greek letter psi (Ψ) or phi (Φ). This wavefunction has a phase, usually denoted as θ (theta) which – because we assume the wavefunction is a nice periodic function – varies as a function of time and space. To be precise, we write θ as θ = ωt – kx or, if the wave is traveling in the other direction, as θ = kx – ωt.

I’ve explained this in a couple of posts already, including my previous post, so I won’t repeat myself here. Let me just note that ω is the angular frequency, which we express in radians per second, rather than cycles per second, so ω = 2π(one cycle covers 2π rad). As for k, that’s the wavenumber, which is often described as the spatial frequency, because it’s expressed in cycles per meter or, more often (and surely in this case), in radians per meter. Hence, if we freeze time, this number is the rate of change of the phase in space. Because one cycle is, again, 2π rad, and one cycle corresponds to the wave traveling one wavelength (i.e. λ meter), it’s easy to see that k = 2π/λ. We can use these definitions to re-write the de Broglie relations E = hf and p = h/λ as:

E = ħω and p = ħk with h = h/2π

What about the wave velocity? For a photon, we have c = λf and, hence, c = (2π/k)(ω/2π) = ω/k. For ‘particle waves’ (or matter waves, if you prefer that term), it’s much more complicated, because we need to distinguish between the so-called phase velocity (vp) and the group velocity (vg). The phase velocity is what we’re used to: it’s the product of the frequency (the number of cycles per second) and the wavelength (the distance traveled by the wave over one cycle), or the ratio of the angular frequency and the wavenumber, so we have, once again, λf = ω/k = vp. However, this phase velocity is not the classical velocity of the particle that we are looking at. That’s the so-called group velocity, which corresponds to the velocity of the wave packet representing the particle (or ‘wavicle’, if your prefer that term), as illustrated below.


The animation below illustrates the difference between the phase and the group velocity even more clearly: the green dot travels with the ‘wavicles’, while the red dot travels with the phase. As mentioned above, the group velocity corresponds to the classical velocity of the particle (v). However, the phase velocity is a mathematical point that actually travels faster than light. It is a mathematical point only, which does not carry a signal (unlike the modulation of the wave itself, i.e. the traveling ‘groups’) and, hence, it does not contradict the fundamental principle of relativity theory: the speed of light is absolute, and nothing travels faster than light (except mathematical points, as you can, hopefully, appreciate now).

Wave_group (1)

The two animations above do not represent the quantum-mechanical wavefunction, because the functions that are shown are real-valued, not complex-valued. To imagine a complex-valued wave, you should think of something like the ‘wavicle’ below or, if you prefer animations, the standing waves underneath (i.e. C to H: A and B just present the mathematical model behind, which is that of a mechanical oscillator, like a mass on a spring indeed). These representations clearly show the real as well as the imaginary part of complex-valued wave-functions.

Photon wave


With this general introduction, we are now ready for the more formal treatment that follows. So our wavefunction Ψ is a complex-valued function in space and time. A very general shape for it is one we used in a couple of posts already:

Ψ(x, t) ∝ ei(kx – ωt) = cos(kx – ωt) + isin(kx – ωt)

If you don’t know anything about complex numbers, I’d suggest you read my short crash course on it in the essentials page of this blog, because I don’t have the space nor the time to repeat all of that. Now, we can use the de Broglie relationship relating the momentum of a particle with a wavenumber (p = ħk) to re-write our psi function as:

Ψ(x, t) ∝ ei(kx – ωt) = ei(px/ħ – ωt) 

Note that I am using the ‘proportional to’ symbol (∝) because I don’t worry about normalization right now. Indeed, from all of my other posts on this topic, you know that we have to take the absolute square of all these probability amplitudes to arrive at a probability density function, describing the probability of the particle effectively being at point x in space at point t in time, and that all those probabilities, over the function’s domain, have to add up to 1. So we should insert some normalization factor.

Having said that, the problem with the wavefunction above is not normalization really, but the fact that it yields a uniform probability density function. In other words, the particle position is extremely uncertain in the sense that it could be anywhere. Let’s calculate it using a little trick: the absolute square of a complex number equals the product of itself with its (complex) conjugate. Hence, if z = reiθ, then │z│2 = zz* = reiθ·reiθ = r2eiθiθ = r2e0 = r2. Now, in this case, assuming unique values for k, ω, p, which we’ll note as k0, ω0, p0 (and, because we’re freezing time, we can also write t = t0), we should write:

│Ψ(x)│2 = │a0ei(p0x/ħ – ω0t02 = │a0eip0x/ħ eiω0t0 2 = │a0eip0x/ħ 2 │eiω0t0 2 = a02 

Note that, this time around, I did insert some normalization constant a0 as well, so that’s OK. But so the problem is that this very general shape of the wavefunction gives us a constant as the probability for the particle being somewhere between some point a and another point b in space. More formally, we get the surface for a rectangle when we calculate the probability P[a ≤ X ≤ b] as we should calculate it, which is as follows:


More specifically, because we’re talking one-dimensional space here, we get P[a ≤ X ≤ b] = (b–a)·a02. Now, you may think that such uniform probability makes sense. For example, an electron may be in some orbital around a nucleus, and so you may think that all ‘points’ on the orbital (or within the ‘sphere’, or whatever volume it is) may be equally likely. Or, in another example, we may know an electron is going through some slit and, hence, we may think that all points in that slit should be equally likely positions. However, we know that it is not the case. Measurements show that not all points are equally likely. For an orbital, we get complicated patterns, such as the one shown below, and please note that the different colors represent different complex numbers and, hence, different probabilities.


Also, we know that electrons going through a slit will produce an interference pattern—even if they go through it one by one! Hence, we cannot associate some flat line with them: it has to be a proper wavefunction which implies, once again, that we can’t accept a uniform distribution.

In short, uniform probability density functions are not what we see in Nature. They’re non-uniform, like the (very simple) non-uniform distributions shown below. [The left-hand side shows the wavefunction, while the right-hand side shows the associated probability density function: the first two are static (i.e. they do not vary in time), while the third one shows a probability distribution that does vary with time.]


I should also note that, even if you would dare to think that a uniform distribution might be acceptable in some cases (which, let me emphasize this, it is not), an electron can surely not be ‘anywhere’. Indeed, the normalization condition implies that, if we’d have a uniform distribution and if we’d consider all of space, i.e. if we let a go to –∞ and b to +∞, then a0would tend to zero, which means we’d have a particle that is, literally, everywhere and nowhere at the same time.

In short, a uniform probability distribution does not make sense: we’ll generally have some idea of where the particle is most likely to be, within some range at least. I hope I made myself clear here.

Now, before I continue, I should make some other point as well. You know that the Planck constant (h or ħ) is unimaginably small: about 1×10−34 J·s (joule-second). In fact, I’ve repeatedly made that point in various posts. However, having said that, I should add that, while it’s unimaginably small, the uncertainties involved are quite significant. Let us indeed look at the value of ħ by relating it to that σxσp ≥ ħ/2 relation.

Let’s first look at the units. The uncertainty in the position should obviously be expressed in distance units, while momentum is expressed in kg·m/s units. So that works out, because 1 joule is the energy transferred (or work done) when applying a force of 1 newton (N) over a distance of 1 meter (m). In turn, one newton is the force needed to accelerate a mass of one kg at the rate of 1 meter per second per second (this is not a typing mistake: it’s an acceleration of 1 m/s per second, so the unit is m/s2: meter per second squared). Hence, 1 J·s = 1 N·m·s = 1 kg·m/s2·m·s = kg·m2/s. Now, that’s the same dimension as the ‘dimensional product’ for momentum and distance: m·kg·m/s = kg·m2/s.

Now, these units (kg, m and s) are all rather astronomical at the atomic scale and, hence, h and ħ are usually expressed in other dimensions, notably eV·s (electronvolt-second). However, using the standard SI units gives us a better idea of what we’re talking about. If we split the ħ = 1×10−34 J·s value (let’s forget about the 1/2 factor for now) ‘evenly’ over σx and σp – whatever that means: all depends on the units, of course!  – then both factors will have magnitudes of the order of 1×10−17: 1×10−17 m times 1×10−17 kg·m/s gives us 1×10−34 J·s.

You may wonder how this 1×10−17 m compares to, let’s say, the classical electron radius, for example. The classical electron radius is, roughly speaking, the ‘space’ an electron seems to occupy as it scatters incoming light. The idea is illustrated below (credit for the image goes to Wikipedia, as usual). The classical electron radius – or Thompson scattering length – is about 2.818×10−15 m, so that’s almost 300 times our ‘uncertainty’ (1×10−17 m). Not bad: it means that we can effectively relate our ‘uncertainty’ in regard to the position to some actual dimension in space. In this case, we’re talking the femtometer scale (1 fm = 10−15 m), and so you’ve surely heard of this before.


What about the other ‘uncertainty’, the one for the momentum (1×10−17 kg·m/s)? What’s the typical (linear) momentum of an electron? Its mass, expressed in kg, is about 9.1×10−31 kg. We also know its relative velocity in an electron: it’s that magical number α = v/c, about which I wrote in some other posts already, so v = αc ≈ 0.0073·3×10m/s ≈ 2.2×10m/s. Now, 9.1×10−31 kg times 2.2×10m/s is about 2×10–26 kg·m/s, so our proposed ‘uncertainty’ in regard to the momentum (1×10−17 kg·m/s) is half a billion times larger than the typical value for it. Now that is, obviously, not so good. [Note that calculations like this are extremely rough. In fact, when one talks electron momentum, it’s usual angular momentum, which is ‘analogous’ to linear momentum, but angular momentum involves very different formulas. If you want to know more about this, check my post on it.]

Of course, now you may feel that we didn’t ‘split’ the uncertainty in a way that makes sense: those –17 exponents don’t work, obviously. So let’s take 1×10–26 kg·m/s for σp, which is half of that ‘typical’ value we calculated. Then we’d have 1×10−8 m for σx (1×10−8 m times 1×10–26 kg·m/s is, once again, 1×10–34 J·s). But then that uncertainty suddenly becomes a huge number: 1×10−8 m is 100 angstrom. That’s not the atomic scale but the molecular scale! So it’s huge as compared to the pico- or femto-meter scale (1 pm = 1×10−12 m, 1 fm = 1×10−15 m) which we’d sort of expect to see when we’re talking electrons.

OK. Let me get back to the lesson. Why this digression? Not sure. I think I just wanted to show that the Uncertainty Principle involves ‘uncertainties’ that are extremely relevant: despite the unimaginable smallness of the Planck constant, these uncertainties are quite significant at the atomic scale. But back to the ‘proof’ of Kennard’s formulation. Here we need to discuss the ‘model’ we’re using. The rather simple animation below (again, credit for it has to go to Wikipedia) illustrates it wonderfully.


Look at it carefully: we start with a ‘wave packet’ that looks a bit like a normal distribution, but it isn’t, of course. We have negative and positive values, and normal distributions don’t have that. So it’s a wave alright. Of course, you should, once more, remember that we’re only seeing one part of the complex-valued wave here (the real or imaginary part—it could be either). But so then we’re superimposing waves on it. Note the increasing frequency of these waves, and also note how the wave packet becomes increasingly localized with the addition of these waves. In fact, the so-called Fourier analysis, of which you’ve surely heard before, is a mathematical operation that does the reverse: it separates a wave packet into its individual component waves.

So now we know the ‘trick’ for reducing the uncertainty in regard to the position: we just add waves with different frequencies. Of course, different frequencies imply different wavenumbers and, through the de Broglie relationship, we’ll also have different values for the ‘momentum’ associated with these component waves. Let’s write these various values as kn, ωn, and pn respectively, with n going from 0 to N. Of course, our point in time remains frozen at t0. So we get a wavefunction that’s, quite simply, the sum of N component waves and so we write:

Ψ(x) = ∑ anei(pnx/ħ – ωnt0= ∑ an  eipnx/ħeiωnt= ∑ Aneipnx/ħ

Note that, because of the eiωnt0, we now have complex-valued coefficients An = aneiωnt0 in front. More formally, we say that An represents the relative contribution of the mode pn to the overall Ψ(x) wave. Hence, we can write these coefficients A as a function of p. Because Greek letters always make more of an impression, we’ll use the Greek letter Φ (phi) for it. :-) Now, we can go to the continuum limit and, hence, transform that sum above into an infinite sum, i.e. an integral. So our wave function then becomes an integral over all possible modes, which we write as:

wave function integral

Don’t worry about that new 1/√2πħ factor in front. That’s, once again, something that has to do with normalization and scales. It’s the integral itself you need to understand. We’ve got that Φ(p) function there, which is nothing but our An coefficient, but for the continuum case. In fact, these relative contributions Φ(p) are now referred to as the amplitude of all modes p, and so Φ(p) is actually another wave function: it’s the wave function in the so-called momentum space.

You’ll probably be very confused now, and wonder where I want to go with an integral like this. The point to note is simple: if we have that Φ(p) function, we can calculate (or derive, if you prefer that word) the Ψ(x) from it using that integral above. Indeed, the integral above is referred to as the Fourier transform, and it’s obviously closely related to that Fourier analysis we introduced above.

Of course, there is also an inverse transform, which looks exactly the same: it just switches the wave functions (Ψ and Φ) and variables (x and p), and then (it’s an important detail!), it has a minus sign in the exponent. Together, the two functions –  as defined by each other through these two integrals – form a so-called Fourier integral pair, also known as a Fourier transform pair, and the variables involved are referred to as conjugate variables. So momentum (p) and position (x) are conjugate variables and, likewise, energy and time are also conjugate variables (but so I won’t expand on the time-energy relation here: please have a look at one of my others posts on that).

Now, I thought of copying and explaining the proof of Kennard’s inequality from Wikipedia’s article on the Uncertainty Principle (you need to click on the show button in the relevant section to see it), but then that’s pretty boring math, and simply copying stuff is not my objective with this blog. More importantly, the proof has nothing to do with physics. Nothing at all. Indeed, it just proves a general mathematical property of Fourier pairs. More specifically, it proves that, the more concentrated one function is, the more spread out its Fourier transform must be. In other words, it is not possible to arbitrarily concentrate both a function and its Fourier transform.

So, in this case, if we’d ‘squeeze’ Ψ(x), then its Fourier transform Φ(p) will ‘stretch out’, and so that’s what the proof in that Wikipedia article basically shows. In other words, there is some ‘trade-off’ between the ‘compaction’ of Ψ(x), on the one hand, and Φ(p), on the other, and so that is what the Uncertainty Principle is all about. Nothing more, nothing less.

But… Yes? What’s all this talk about ‘squeezing’ and ‘compaction’? We can’t change reality, can we? Well… Here we’re entering the philosophical field, of course. How do we interpret the Uncertainty Principle? It surely does look like us trying to measure something has some impact on the wavefunction. In fact, usually, our measurement – of either position or momentum – usually makes the wavefunctions collapse: we suddenly know where the particle is and, hence, ψ(x) seems to collapse into one point. Alternatively, we measure its momentum and, hence, Φ(p) collapses.

That’s intriguing. In fact, even more intriguing is the possibility we may only partially affect those wavefunctions with measurements that are somewhat less ‘drastic’. It seems a lot of research is focused on that (just Google for partial collapse of the wavefunction, and you’ll finds tons of references, including presentations like this one).

Hmm… I need to further study the topic. The decomposition of a wave into its component waves is obviously something that works well in physics—and not only in quantum mechanics but also in much more mundane examples. Its most general application is signal processing, in which we decompose a signal (which is a function of time) into the frequencies that make it up. Hence, our wavefunction model makes a lot of sense, as it mirrors the physics involved in oscillators and harmonics obviously.

Still… I feel it doesn’t answer the fundamental question: what is our electron really? What do those wave packets represent? Physicists will say questions like this don’t matter: as long as our mathematical models ‘work’, it’s fine. In fact, if even Feynman said that nobody – including himself – truly understands quantum mechanics, then I should just be happy and move on. However, for some reason, I can’t quite accept that. I should probably focus some more on that de Broglie relationship, p = h/λ, as it’s obviously as fundamental to my understanding of the ‘model’ of reality in physics as that Fourier analysis of the wave packet. So I need to do some more thinking on that.

The de Broglie relationship is not intuitive. In fact, I am not ashamed to admit that it actually took me quite some time to understand why we can’t just re-write the de Broglie relationship (λ = h/p) as an uncertainty relation itself: Δλ = h/Δp. Hence, let me be very clear on this:

Δx = h/Δp (that’s the Uncertainty Principle) but Δλ ≠ h/Δp !

Let me quickly explain why.

If the Δ symbol expresses a standard deviation (or some other measurement of uncertainty), we can write the following:

p = h/λ ⇒ Δp = Δ(h/λ) = hΔ(1/λ) ≠ h/Δp

So I can take h out of the brackets after the Δ symbol, because that’s one of the things that’s allowed when working with standard deviations. More in particular, one can prove the following:

  1. The standard deviation of some constant function is 0: Δ(k) = 0
  2. The standard deviation is invariant under changes of location: Δ(x + k) = Δ(x + k)
  3. Finally, the standard deviation scales directly with the scale of the variable: Δ(kx) = |k |Δ(x).

However, it is not the case that Δ(1/x) = 1/Δx. However, let’s not focus on what we cannot do with Δx: let’s see what we can do with it. Δx equals h/Δp according to the Uncertainty Principle—if we take it as an equality, rather than as an inequality, that is. And then we have the de Broglie relationship: p = h/λ. Hence, Δx must equal:

Δx = h/Δp = h/[Δ(h/λ)] =h/[hΔ(1/λ)] = 1/Δ(1/λ)

That’s obvious, but so what? As mentioned, we cannot write Δx = Δλ, because there’s no rule that says that Δ(1/λ) = 1/Δλ and, therefore, h/Δp ≠ Δλ. However, what we can do is define Δλ as an interval, or a length, defined by the difference between its lower and upper bound (let’s denote those two values by λa and λb respectively. Hence, we write Δλ = λb – λa. Note that this does not assume we have a continuous range of values for λ: we can have any number of frequencies λbetween λa and λb, but so you see the point: we’ve got a range of values λ, discrete or continuous, defined by some lower and upper bound.

Now, the de Broglie relation associates two values pa and pb with λa and λb respectively:  pa = h/λa and pb = h/λb. Hence, we can similarly define the corresponding Δp interval as pa – pb, with pa = h/λa and p= h/λb. Note that, because we’re taking the reciprocal, we have to reverse the order of the values here: if λb > λa, then pa = h/λa > p= h/λb. Hence, we can write Δp = Δ(h/λ) = pa – pb = h/λ1 – h/λ= h(1/λ1 – 1/λ2) = h[λ2 – λ1]/λ1λ2. In case you have a bit of difficulty, just draw some reciprocal functions (like the ones below), and have fun connecting intervals on the horizontal axis with intervals on the vertical axis using these functions.


Now, h[λ2 – λ1]/λ1λ2) is obviously something very different than h/Δλ = h/(λ2 – λ1). So we can surely not equate the two and, hence, we cannot write that Δp = h/Δλ.

Having said that, the Δx = 1/Δ(1/λ) = λ1λ2/(λ2 – λ1) that emerges here is quite interesting. We’ve got a ratio here, λ1λ2/(λ2 – λ1, which shows that Δx depends only on the upper and lower bounds of the Δλ range. It does not depend on whether or not the interval is discrete or continuous.

The second thing that is interesting to note is Δx depends not only on the difference between those two values (i.e. the length of the interval) but also on their value: if the length of the interval, i.e. the difference between the two frequencies is the same, but their values as such are higher, then we get a higher value for Δx, i.e. a greater uncertainty in the position. Again, this shows that the relation between Δλ and Δx is not straightforward. But so we knew that already, and so I’ll end this post right here and right now. :-)    

The Uncertainty Principle revisited

The Strange Theory of Light and Matter (III)

This is my third and final comments on Feynman’s popular little booklet: The Strange Theory of Light and Matter, also known as Feynman’s Lectures on Quantum Electrodynamics (QED).

The origin of this short lecture series is quite moving: the death of Alix G. Mautner, a good friend of Feynman’s. She was always curious about physics but her career was in English literature and so she did not manage the math. Hence, Feynman introduces this 1985 publication by writing: “Here are the lectures I really prepared for Alix, but unfortunately I can’t tell them to her directly, now.”

Alix Mautner died from a brain tumor, and it is her husband, Leonard Mautner, who sponsored the QED lectures series at the UCLA, which Ralph Leigton transcribed and published as the booklet that we’re talking about here. Feynman himself died a few years later, at the relatively young age of 69. Tragic coincidence: he died of cancer too. Despite all this weirdness, Feynman’s QED never quite got the same iconic status of, let’s say, Stephen Hawking’s Brief History of Time. I wonder why, but the answer to that question is probably in the realm of chaos theory. :-) I actually just saw the movie on Stephen Hawking’s life (The Theory of Everything), and I noted another strange coincidence: Jane Wilde, Hawking’s first wife, also has a PhD in literature. It strikes me that, while the movie documents that Jane Wilde gave Hawking three children, after which he divorced her to marry his nurse, Elaine, the movie does not mention that he separated from Elaine too, and that he has some kind of ‘working relationship’ with Jane again.

Hmm… What to say? I should get back to quantum mechanics here or, to be precise, to quantum electrodynamics.

One reason why Feynman’s Strange Theory of Light and Matter did not sell like Hawking’s Brief History of Time, might well be that, in some places, the text is not entirely accurate. Why? Who knows? It would make for an interesting PhD thesis in History of Science. Unfortunately, I have no time for such PhD thesis. Hence, I must assume that Richard Feynman simply didn’t have much time or energy left to correct some of the writing of Ralph Leighton, who transcribed and edited these four short lectures a few years before Feynman’s death. Indeed, when everything is said and done, Ralph Leighton is not a physicist and, hence, I think he did compromise – just a little bit – on accuracy for the sake of readability. Ralph Leighton’s father, Robert Leighton, an eminent physicist who worked with Feynman, would probably have done a much better job.

I feel that one should not compromise on accuracy, even when trying to write something reader-friendly. That’s why I am writing this blog, and why I am writing three posts specifically on this little booklet. Indeed, while I’d warmly recommend that little book on QED as an excellent non-mathematical introduction to the weird world of quantum mechanics, I’d also say that, while Ralph Leighton’s story is great, it’s also, in some places, not entirely accurate indeed.

So… Well… I want to do better than Ralph Leighton here. Nothing more. Nothing less. :-) Let’s go for it.

I. Probability amplitudes: what are they?

The greatest achievement of that little QED publication is that it manages to avoid any reference to wave functions and other complicated mathematical constructs: all of the complexity of quantum mechanics is reduced to three basic events or actions and, hence, three basic amplitudes which are represented as ‘arrows’—literally.

Now… Well… You may or may not know that a (probability) amplitude is actually a complex number, but it’s not so easy to intuitively understand the concept of a complex number. In contrast, everyone easily ‘gets’ the concept of an ‘arrow’. Hence, from a pedagogical point of view, representing complex numbers by some ‘arrow’ is truly a stroke of genius.

Whatever we call it, a complex number or an ‘arrow’, a probability amplitude is something with (a) a magnitude and (b) a phase. As such, it resembles a vector, but it’s not quite the same, if only because we’ll impose some restrictions on the magnitude. But I shouldn’t get ahead of myself. Let’s start with the basics.

A magnitude is some real positive number, like a length, but you should not associate it with some spatial dimension in physical space: it’s just a number. As for the phase, we could associate that concept with some direction but, again, you should just think of it as a direction in a mathematical space, not in the real (physical) space.

Let me insert a parenthesis here. If I say the ‘real’ or ‘physical’ space, I mean the space in which the electrons and photons and all other real-life objects that we’re looking at exist and move. That’s a non-mathematical definition. In fact, in math, the real space is defined as a coordinate space, with sets of real numbers (vectors) as coordinates, so… Well… That’s a mathematical space only, not the ‘real’ (physical) space. So the real (vector) space is not real. :-) The mathematical real space may, or may not, accurately describe the real (physical) space. Indeed, you may have heard that physical space is curved because of the presence of massive objects, which means that the real coordinate space will actually not describe it very accurately. I know that’s a bit confusing but I hope you understand what I mean: if mathematicians talk about the real space, they do not mean the real space. They refer to a vector space, i.e. a mathematical construct. To avoid confusion, I’ll use the term ‘physical space’ rather than ‘real’ space in the future. So I’ll let the mathematicians get away with using the term ‘real space’ for something that isn’t real actually. :-)

End of digression. Let’s discuss these two mathematical concepts – magnitude and phase – somewhat more in detail.

A. The magnitude

Let’s start with the magnitude or ‘length’ of our arrow. We know that we have to square these lengths to find some probability, i.e. some real number between 0 and 1. Hence, the length of our arrows cannot be larger than one. That’s the restriction I mentioned already, and this ‘normalization’ condition reinforces the point that these ‘arrows’ do not have any spatial dimension (not in any real space anyway): they represent a function. To be specific, they represent a wavefunction.

If we’d be talking complex numbers instead of ‘arrows’, we’d say the absolute value of the complex number cannot be larger than one. We’d also say that, to find the probability, we should take the absolute square of the complex number, so that’s the square of the magnitude or absolute value of the complex number indeed. We cannot just square the complex number: it has to be the square of the absolute value.

Why? Well… Just write it out. [You can skip this section if you’re not interested in complex numbers, but I would recommend you try to understand. It’s not that difficult. Indeed, if you’re reading this, you’re most likely to understand something of complex numbers and, hence, you should be able to work your way through it. Just remember that a complex number is like a two-dimensional number, which is why it’s sometimes written using bold-face (z), rather than regular font (z). However, I should immediately add this convention is usually not followed. I like the boldface though, and so I’ll try to use it in this post.] The square of a complex number z = a + bi is equal to z= a+ 2abi – b2, while the square of its absolute value (i.e. the absolute square) is |z|= [√(a+ b2)]2 = a+ b2. So you can immediately see that the square and the absolute square of a complex numbers are two very different things indeed: it’s not only the 2abi term, but there’s also the minus sign in the first expression, because of the i= –1 factor. In case of doubt, always remember that the square of a complex number may actually yield a negative number, as evidenced by the definition of the imaginary unit itself: i= –1.

End of digression. Feynman and Leighton manage to avoid any reference to complex numbers in that short series of four lectures and, hence, all they need to do is explain how one squares a length. Kids learn how to do that when making a square out of rectangular paper: they’ll fold one corner of the paper until it meets the opposite edge, forming a triangle first. They’ll then cut or tear off the extra paper, and then unfold. Done. [I could note that the folding is a 90 degree rotation of the original length (or width, I should say) which, in mathematical terms, is equivalent to multiplying that length with the imaginary unit (i). But I am sure the kids involved would think I am crazy if I’d say this. :-) So let me get back to Feynman’s arrows.

B. The phase

Feynman and Leighton’s second pedagogical stroke of genius is the metaphor of the ‘stopwatch’ and the ‘stopwatch hand’ for the variable phase. Indeed, although I think it’s worth explaining why z = a + bi = rcosφ + irsinφ in the illustration below can be written as z = reiφ = |z|eiφ, understanding Euler’s representation of complex number as a complex exponential requires swallowing a very substantial piece of math and, if you’d want to do that, I’ll refer you to one of my posts on complex numbers).


The metaphor of the stopwatch represents a periodic function. To be precise, it represents a sinusoid, i.e. a smooth repetitive oscillation. Now, the stopwatch hand represents the phase of that function, i.e. the φ angle in the illustration above. That angle is a function of time: the speed with which the stopwatch turns is related to some frequency, i.e. the number of oscillations per unit of time (i.e. per second).

You should now wonder: what frequency? What oscillations are we talking about here? Well… As we’re talking photons and electrons here, we should distinguish the two:

  1. For photons, the frequency is given by Planck’s energy-frequency relation, which relates the energy (E) of a photon (1.5 to 3.5 eV for visible light) to its frequency (ν). It’s a simple proportional relation, with Planck’s constant (h) as the proportionality constant: E = hν, or ν = E/h.
  2. For electrons, we have the de Broglie relation, which looks similar to the Planck relation (E = hf, or f = E/h) but, as you know, it’s something different. Indeed, these so-called matter waves are not so easy to interpret because there actually is no precise frequency f. In fact, the matter wave representing some particle in space will consist of a potentially infinite number of waves, all superimposed one over another, as illustrated below.


For the sake of accuracy, I should mention that the animation above has its limitations: the wavetrain is complex-valued and, hence, has a real as well as an imaginary part, so it’s something like the blob underneath. Two functions in one, so to speak: the imaginary part follows the real part with a phase difference of 90 degrees (or π/2 radians). Indeed, if the wavefunction is a regular complex exponential reiθ, then rsin(φ–π/2) = rcos(φ), which proves the point: we have two functions in one here. :-) I am actually just repeating what I said before already: the probability amplitude, or the wavefunction, is a complex number. You’ll usually see it written as Ψ (psi) or Φ (phi). Here also, using boldface (Ψ or Φ instead of Ψ or Φ) would usefully remind the reader that we’re talking something ‘two-dimensional’ (in mathematical space, that is), but this convention is usually not followed.

Photon wave

In any case… Back to frequencies. The point to note is that, when it comes to analyzing electrons (or any other matter-particle), we’re dealing with a range of frequencies f really (or, what amounts to the same, a range of wavelengths λ) and, hence, we should write Δf = ΔE/h, which is just one of the many expressions of the Uncertainty Principle in quantum mechanics.

Now, that’s just one of the complications. Another difficulty is that matter-particles, such as electrons, have some rest mass, and so that enters the energy equation as well (literally). Last but not least, one should distinguish between the group velocity and the phase velocity of matter waves. As you can imagine, that makes for a very complicated relationship between ‘the’ wavelength and ‘the’ frequency. In fact, what I write above should make it abundantly clear that there’s no such thing as the wavelength, or the frequency: it’s a range really, related to the fundamental uncertainty in quantum physics. I’ll come back to that, and so you shouldn’t worry about it here. Just note that the stopwatch metaphor doesn’t work very well for an electron!

In his postmortem lectures for Alix Mautner, Feynman avoids all these complications. Frankly, I think that’s a missed opportunity because I do not think it’s all that incomprehensible. In fact, I write all that follows because I do want you to understand the basics of waves. It’s not difficult. High-school math is enough here. Let’s go for it.

One turn of the stopwatch corresponds to one cycle. One cycle, or 1 Hz (i.e. one oscillation per second) covers 360 degrees or, to use a more natural unit, 2π radians. [Why is radian a more natural unit? Because it measures an angle in terms of the distance unit itself, rather than in arbitrary 1/360 cuts of a full circle. Indeed, remember that the circumference of the unit circle is 2π.] So our frequency ν (expressed in cycles per second) corresponds to a so-called angular frequency ω = 2πν. From this formula, it should be obvious that ω is measured in radians per second.

We can also link this formula to the period of the oscillation, T, i.e. the duration of one cycle. T = 1/ν and, hence, ω = 2π/T. It’s all nicely illustrated below. [And, yes, it’s an animation from Wikipedia: nice and simple.]


The easy math above now allows us to formally write the phase of a wavefunction – let’s denote the wavefunction as φ (phi), and the phase as θ (theta) – as a function of time (t) using the angular frequency ω. So we can write: θ = ωt = 2π·ν·t. Now, the wave travels through space, and the two illustrations above (i.e. the one with the super-imposed waves, and the one with the complex wave train) would usually represent a wave shape at some fixed point in time. Hence, the horizontal axis is not t but x. Hence, we can and should write the phase not only as a function of time but also of space. So how do we do that? Well… If the hypothesis is that the wave travels through space at some fixed speed c, then its frequency ν will also determine its wavelength λ. It’s a simple relationship: c = λν (the number of oscillations per second times the length of one wavelength should give you the distance traveled per second, so that’s, effectively, the wave’s speed).

Now that we’ve expressed the frequency in radians per second, we can also express the wavelength in radians per unit distance too. That’s what the wavenumber does: think of it as the spatial frequency of the wave. We denote the wavenumber by k, and write: k = 2π/λ. [Just do a numerical example when you have difficulty following. For example, if you’d assume the wavelength is 5 units distance (i.e. 5 meter) – that’s a typical VHF radio frequency: ν = (3×10m/s)/(5 m) = 0.6×108 Hz = 60 MHz – then that would correspond to (2π radians)/(5 m) ≈ 1.2566 radians per meter. Of course, we can also express the wave number in oscillations per unit distance. In that case, we’d have to divide k by 2π, because one cycle corresponds to 2π radians. So we get the reciprocal of the wavelength: 1/λ. In our example, 1/λ is, of course, 1/5 = 0.2, so that’s a fifth of a full cycle. You can also think of it as the number of waves (or wavelengths) per meter: if the wavelength is λ, then one can fit 1/λ waves in a meter.


Now, from the ω = 2πν, c = λν and k = 2π/λ relations, it’s obvious that k = 2π/λ = 2π/(c/ν) = (2πν)/c = ω/c. To sum it all up, frequencies and wavelengths, in time and in space, are all related through the speed of propagation of the wave c. More specifically, they’re related as follows:

c = λν = ω/k

From that, it’s easy to see that k = ω/c, which we’ll use in a moment. Now, it’s obvious that the periodicity of the wave implies that we can find the same phase by going one oscillation (or a multiple number of oscillations back or forward in time, or in space. In fact, we can also find the same phase by letting both time and space vary. However, if we want to do that, it should be obvious that we should either (a) go forward in space and back in time or, alternatively, (b) go back in space and forward in time. In other words, if we want to get the same phase, then time and space sort of substitute for each other. Let me quote Feynman on this: “This is easily seen by considering the mathematical behavior of a(tr/c). Evidently, if we add a little time Δt, we get the same value for a(tr/c) as we would have if we had subtracted a little distance: ΔcΔt.” The variable a stands for the acceleration of an electric charge here, causing an electromagnetic wave, but the same logic is valid for the phase, with a minor twist though: we’re talking a nice periodic function here, and so we need to put the angular frequency in front. Hence, the rate of change of the phase in respect to time is measured by the angular frequency ω. In short, we write:

θ = ω(t–x/c) = ωt–kx

Hence, we can re-write the wavefunction, in terms of its phase, as follows:

φ(θ) = φ[θ(x, t)] = φ[ωt–kx]

Note that, if the wave would be traveling in the ‘other’ direction (so in the –x direction), we’d write φ(θ) = φ[kx–ωt]. Time travels in one direction only, of course, but so one minus sign has to be there because of the logic involved in adding time and subtracting distance. You can work out an example (with a sine or cosine wave, for example) for yourself.

So what, you’ll say? Well… Nothing. I just hope you agree that all of this isn’t rocket science: it’s just high-school math. But so it shows you what that stopwatch really is and, hence, – but who am I? – would have put at least one or two footnotes on this in a text like Feynman’s QED.

Now, let me make a much longer and more serious digression:

Digression 1: on relativity and spacetime

As you can see from the argument (or phase) of that wave function φ(θ) = φ[θ(x, t)] = φ[ωt–kx] = φ[–k(x–ct)], any wave equation establishes a deep relation between the wave itself (i.e. the ‘thing’ we’re describing) and space and time. In fact, that’s what the whole wave equation is all about! So let me say a few things more about that.

Because you know a thing or two about physics, you may ask: when we’re talking time, whose time are we talking about? Indeed, if we’re talking photons going from A to B, these photons will be traveling at or near the speed of light and, hence, their clock, as seen from our (inertial) frame of reference, doesn’t move. Likewise, according to the photon, our clock seems to be standing still.

Let me put the issue to bed immediately: we’re looking at things from our point of view. Hence, we’re obviously using our clock, not theirs. Having said that, the analysis is actually fully consistent with relativity theory. Why? Well… What do you expect? If it wasn’t, the analysis would obviously not be valid. :-) To illustrate that it’s consistent with relativity theory, I can mention, for example, that the (probability) amplitude for a photon to travel from point A to B depends on the spacetime interval, which is invariant. Hence, A and B are four-dimensional points in spacetime, involving both spatial as well as time coordinates: A = (xA, yA, zA, tA) and B = (xB, yB, zB, tB). And so the ‘distance’ – as measured through the spacetime interval – is invariant.

Now, having said that, we should draw some attention to the intimate relationship between space and time which, let me remind you, results from the absoluteness of the speed of light. Indeed, one will always measure the speed of light c as being equal to 299,792,458 m/s, always and everywhere. It does not depend on your reference frame (inertial or moving). That’s why the constant c anchors all laws in physics, and why we can write what we write above, i.e. include both distance (x) as well as time (t) in the wave function φ = φ(x, t) = φ[ωt–kx] = φ[–k(x–ct)]. The k and ω are related through the ω/k = c relationship: the speed of light links the frequency in time (ν = ω/2π = 1/T) with the frequency in space (i.e. the wavenumber or spatial frequency k). There is only degree of freedom here: the frequency—in space or in time, it doesn’t matter: ν and ω are not independent.  [As noted above, the relationship between the frequency in time and in space is not so obvious for electrons, or for matter waves in general: for those matter-waves, we need to distinguish group and phase velocity, and so we don’t have a unique frequency.]

Let me make another small digression within the digression here. Thinking about travel at the speed of light invariably leads to paradoxes. In previous posts, I explained the mechanism of light emission: a photon is emitted – one photon only – when an electron jumps back to its ground state after being excited. Hence, we may imagine a photon as a transient electromagnetic wave–something like what’s pictured below. Now, the decay time of this transient oscillation (τ) is measured in nanoseconds, i.e. billionths of a second (1 ns = 1×10–9 s): the decay time for sodium light, for example, is some 30 ns only.

decay time

However, because of the tremendous speed of light, that still makes for a wavetrain that’s like ten meter long, at least (30×10–9 s times 3×10m/s is nine meter, but you should note that the decay time measures the time for the oscillation to die out by a factor 1/e, so the oscillation itself lasts longer than that). Those nine or ten meters cover like 16 to 17 million oscillations (the wavelength of sodium light is about 600 nm and, hence, 10 meter fits almost 17 million oscillations indeed). Now, how can we reconcile the image of a photon as a ten-meter long wavetrain with the image of a photon as a point particle?

The answer to that question is paradoxical: from our perspective, anything traveling at the speed of light – including this nine or ten meter ‘long’ photon – will have zero length because of the relativistic length contraction effect. Length contraction? Yes. I’ll let you look it up, because… Well… It’s not easy to grasp. Indeed, from the three measurable effects on objects moving at relativistic speeds – i.e. (1) an increase of the mass (the energy needed to further accelerate particles in particle accelerators increases dramatically at speeds nearer to c), (2) time dilation, i.e. a slowing down of the (internal) clock (because of their relativistic speeds when entering the Earth’s atmosphere, the measured half-life of muons is five times that when at rest), and (3) length contraction – length contraction is probably the most paradoxical of all.

Let me end this digression with yet another short note. I said that one will always measure the speed of light c as being equal to 299,792,458 m/s, always and everywhere and, hence, that it does not depend on your reference frame (inertial or moving). Well… That’s true and not true at the same time. I actually need to nuance that statement a bit in light of what follows: an individual photon does have an amplitude to travel faster or slower than c, and when discussing matter waves (such as the wavefunction that’s associated with an electron), we can have phase velocities that are faster than light! However, when calculating those amplitudes, is a constant.

That doesn’t make sense, you’ll say. Well… What can I say? That’s how it is unfortunately. I need to move on and, hence, I’ll end this digression and get back to the main story line. Part I explained what probability amplitudes are—or at least tried to do so. Now it’s time for part II: the building blocks of all of quantum electrodynamics (QED).

II. The building blocks: P(A to B), E(A to B) and j

The three basic ‘events’ (and, hence, amplitudes) in QED are the following:

1. P(A to B)

P(A to B) is the (probability) amplitude for a photon to travel from point A to B. However, I should immediately note that A and B are points in spacetime. Therefore, we associate them not only with some specific (x, y, z) position in space, but also with a some specific time t. Now, quantum-mechanical theory gives us an easy formula for P(A to B): it depends on the so-called (spacetime) interval between the two points A and B, i.e. I = Δr– Δt= (x2–x1)2+(y2–y1)2+(z2–z1)– (t2–t1)2. The point to note is that the spacetime interval takes both the distance in space as well as the ‘distance’ in time into account. As I mentioned already, this spacetime interval does not depend on our reference frame and, hence, it’s invariant (as long as we’re talking reference frames that move with constant speed relative to each other). Also note that we should measure time and distance in equivalent units when using that Δr– Δtformula for I. So we either measure distance in light-seconds or, else, we measure time in units that correspond to the time that’s needed for light to travel one meter. If no equivalent units are adopted, the formula is I = Δrc·Δt2.

Now, in quantum theory, anything is possible and, hence, not only do we allow for crooked paths, but we also allow for the difference in time to differ from  the time you’d expect a photon to need to travel along some curve (whose length we’ll denote by l), i.e. l/c. Hence, our photon may actually travel slower or faster than the speed of light c! There is one lucky break, however, that makes all come out alright: it’s easy to show that the amplitudes associated with the odd paths and strange timings generally cancel each other out. [That’s what the QED booklet shows.] Hence, what remains, are the paths that are equal or, importantly, those that very near to the so-called ‘light-like’ intervals in spacetime only. The net result is that light – even one single photon – effectively uses a (very) small core of space as it travels, as evidenced by the fact that even one single photon interferes with itself when traveling through a slit or a small hole!

[If you now wonder what it means for a photon to interfere for itself, let me just give you the easy explanation: it may change its path. We assume it was traveling in a straight line – if only because it left the source at some point in time and then arrived at the slit obviously – but so it no longer travels in a straight line after going through the slit. So that’s what we mean here.]

2. E(A to B)

E(A to B) is the (probability) amplitude for an electron to travel from point A to B. The formula for E(A to B) is much more complicated, and it’s the one I want to discuss somewhat more in detail in this post. It depends on some complex number j (see the next remark) and some real number n.

3. j

Finally, an electron could emit or absorb a photon, and the amplitude associated with this event is denoted by j, for junction number. It’s the same number j as the one mentioned when discussing E(A to B) above.

Now, this junction number is often referred to as the coupling constant or the fine-structure constant. However, the truth is, as I pointed out in my previous post, that these numbers are related, but they are not quite the same: α is the square of j, so we have α = j2. There is also one more, related, number: the gauge parameter, which is denoted by g (despite the g notation, it has nothing to do with gravitation). The value of g is the square root of 4πε0α, so g= 4πε0α. I’ll come back to this. Let me first make an awfully long digression on the fine-structure constant. It will be awfully long. So long that it’s actually part of the ‘core’ of this post actually.

Digression 2: on the fine-structure constant, Planck units and the Bohr radius

The value for j is approximately –0.08542454.

How do we know that?

The easy answer to that question is: physicists measured it. In fact, they usually publish the measured value as the square root of the (absolute value) of j, which is that fine-structure constant α. Its value is published (and updated) by the US National Institute on Standards and Technology. To be precise, the currently accepted value of α is 7.29735257×10−3. In case you doubt, just check that square root:

j = –0.08542454 ≈ –√0.00729735257 = –√α

As noted in Feynman’s (or Leighton’s) QED, older and/or more popular books will usually mention 1/α as the ‘magical’ number, so the ‘special’ number you may have seen is the inverse fine-structure constant, which is about 137, but not quite:

1/α = 137.035999074 ± 0.000000044

I am adding the standard uncertainty just to give you an idea of how precise these measurements are. :-) About 0.32 parts per billion (just divide the 137.035999074 number by the uncertainty). So that‘s the number that excites popular writers, including Leighton. Indeed, as Leighton puts it:

“Where does this number come from? Nobody knows. It’s one of the greatest damn mysteries of physics: a magic number that comes to us with no understanding by man. You might say the “hand of God” wrote that number, and “we don’t know how He pushed his pencil.” We know what kind of a dance to do experimentally to measure this number very accurately, but we don’t know what kind of dance to do on the computer to make this number come out, without putting it in secretly!”

Is it Leighton, or did Feynman really say this? Not sure. While the fine-structure constant is a very special number, it’s not the only ‘special’ number. In fact, we derive it from other ‘magical’ numbers. To be specific, I’ll show you how we derive it from the fundamental properties – as measured, of course – of the electron. So, in fact, I should say that we do know how to make this number come out, which makes me doubt whether Feynman really said what Leighton said he said. :-)

So we can derive α from some other numbers. That brings me to the more complicated answer to the question as to what the value of j really is: j‘s value is the electron charge expressed in Planck units, which I’ll denote by –eP:

j = –eP

[You may want to reflect on this, and quickly verify on the Web. The Planck unit of electric charge, expressed in Coulomb, is about 1.87555×10–18 C. If you multiply that j = –eP, so with –0.08542454, you get the right answer: the electron charge is about –0.160217×10–18 C.]

Now that is strange.

Why? Well… For starters, when doing all those quantum-mechanical calculations, we like to think of j as a dimensionless number: a coupling constant. But so here we do have a dimension: electric charge.

Let’s look at the basics. If is –√α, and it’s also equal to –eP, then the fine-structure constant must also be equal to the square of the electron charge eP, so we can write:

α = eP2

You’ll say: yes, so what? Well… I am pretty sure that, if you’ve ever seen a formula for α, it’s surely not this simple j = –eP or α = eP2 formula. What you’ve seen, most likely, is one or more of the following expressions below :

Fine-structure constant formula

That’s a pretty impressive collection of physical constants, isn’t it? :-) They’re all different but, somehow, when we combine them in one or the other ratio (we have not less than five different expressions here (each identity is a separate expression), and I could give you a few more!), we get the very same number: α. Now that is what I call strange. Truly strange. Incomprehensibly weird!

You’ll say… Well… Those constants must all be related… Of course! That’s exactly the point I am making here. They are, but look how different they are: mmeasures mass, rmeasures distance, e is a charge, and so these are all very different numbers with very different dimensions. Yet, somehow, they are all related through this α number. Frankly, I do not know of any other expression that better illustrates some kind of underlying unity in Nature than the one with those five identities above.

Let’s have a closer look at those constants. You know most of them already. The only constants you may not have seen before are μ0Rand, perhaps, ras well as m. However, these can easily be defined as some easy function of the constants that you did see before, so let me quickly do that:

  1. The μ0 constant is the so-called magnetic constant. It’s something similar as ε0 and it’s referred to as the magnetic permeability of the vacuum. So it’s just like the (electric) permittivity of the vacuum (i.e. the electric constant ε0) and the only reason why this blog hasn’t mentioned this constant before is because I haven’t really discussed magnetic fields so far. I only talked about the electric field vector. In any case, you know that the electric and magnetic force are part and parcel of the same phenomenon (i.e. the electromagnetic interaction between charged particles) and, hence, they are closely related. To be precise, μ0ε0 = 1/c= c–2. So that shows the first and second expression for α are, effectively, fully equivalent. [Just in case you’d doubt that μ0ε0 = 1/c2, let me give you the values: μ0 = 4π·10–7 N/A2, and ε0 = (1/4π·c2)·10C2/N·m2. Just plug them in, and you’ll see it’s bang on. Moreover, note that the ampere (A) unit is equal to the coulomb per second unit (C/s), so even the units come out alright. :-) Of course.]
  2. The ke constant is the Coulomb constant and, from its definition ke = 1/4πε0, it’s easy to see how those two expressions are, in turn, equivalent with the third expression for α.
  3. The Rconstant is the so-called von Klitzing constant. Huh? Yes. I know. I am pretty sure you’ve never ever heard of that one before. Don’t worry about it. It’s, quite simply, equal to Rh/e2. Hence, substituting (and don’t forget that h = 2πħ) will demonstrate the equivalence of the fourth expression for α.
  4. Finally, the re factor is the classical electron radius, which is usually written as a function of me, i.e. the electron mass: re = e2/4πε0mec2. Also note that this also implies that reme = e2/4πε0c2. In words: the product of the electron mass and the electron radius is equal to some constant involving the electron (e), the electric constant (ε0), and c (the speed of light).

I am sure you’re under some kind of ‘formula shock’ now. But you should just take a deep breath and read on. The point to note is that all these very different things are all related through α.

So, again, what is that α really? Well… A strange number indeed. It’s dimensionless (so we don’t measure in kg, m/s, eV·s or whatever) and it pops up everywhere. [Of course, you’ll say: “What’s everywhere? This is the first time I‘ve heard of it!” :-)]

Well… Let me start by explaining the term itself. The fine structure in the name refers to the splitting of the spectral lines of atoms. That’s a very fine structure indeed. :-) We also have a so-called hyperfine structure. Both are illustrated below for the hydrogen atom. The numbers n, JI, and are quantum numbers used in the quantum-mechanical explanation of the emission spectrum, which is  also depicted below, but note that the illustration gives you the so-called Balmer series only, i.e. the colors in the visible light spectrum (there are many more ‘colors’ in the high-energy ultraviolet and the low-energy infrared range).



To be precise: (1) n is the principal quantum number: here it takes the values 1 or 2, and we could say these are the principal shells; (2) the S, P, D,… orbitals (which are usually written in lower case: s, p, d, f, g, h and i) correspond to the (orbital) angular momentum quantum number l = 0, 1, 2,…, so we could say it’s the subshell; (3) the J values correspond to the so-called magnetic quantum number m, which goes from –l to +l; (4) the fourth quantum number is the spin angular momentum s. I’ve copied another diagram below so you see how it works, more or less, that is.

hydrogen spectrum

Now, our fine-structure constant is related to these quantum numbers. How exactly is a bit of a long story, and so I’ll just copy Wikipedia’s summary on this: ” The gross structure of line spectra is the line spectra predicted by the quantum mechanics of non-relativistic electrons with no spin. For a hydrogenic atom, the gross structure energy levels only depend on the principal quantum number n. However, a more accurate model takes into account relativistic and spin effects, which break the degeneracy of the the energy levels and split the spectral lines. The scale of the fine structure splitting relative to the gross structure energies is on the order of ()2, where Z is the atomic number and α is the fine-structure constant.” There you go. You’ll say: so what? Well… Nothing. If you aren’t amazed by that, you should stop reading this.

It is an ‘amazing’ number, indeed, and, hence, it does quality for being “one of the greatest damn mysteries of physics”, as Feynman and/or Leighton put it. Having said that, I would not go as far as to write that it’s “a magic number that comes to us with no understanding by man.” In fact, I think Feynman/Leighton could have done a much better job when explaining what it’s all about. So, yes, I hope to do better than Leighton here and, as he’s still alive, I actually hope he reads this. :-)

The point is: α is not the only weird number. What’s particular about it, as a physical constant, is that it’s dimensionless, because it relates a number of other physical constants in such a way that the units fall away. Having said that, the Planck or Boltzmann constant are at least as weird.

So… What is this all about? Well… You’ve probably heard about the so-called fine-tuning problem in physics and, if you’re like me, your first reaction will be to associate fine-tuning with fine-structure. However, the two terms have nothing in common, except for four letters. :-) OK. Well… I am exaggerating here. The two terms are actually related, to some extent at least, but let me explain how.

The term fine-tuning refers to the fact that all the parameters or constants in the so-called Standard Model of physics are, indeed, all related to each other in the way they are. We can’t sort of just turn the knob of one and change it, because everything falls apart then. So, in essence, the fine-tuning problem in physics is more like a philosophical question: why is the value of all these physical constants and parameters exactly what it is? So it’s like asking: could we change some of the ‘constants’ and still end up with the world we’re living in? Or, if it would be some different world, how would it look like? What if was some other number? What if ke or ε0 was some other number? In short, and in light of those expressions for α, we may rephrase the question as: why is α what is is?

Of course, that’s a question one shouldn’t try to answer before answering some other, more fundamental, question: how many degrees of freedom are there really? Indeed, we just saw that ke and εare intimately related through some equation, and other constants and parameters are related too. So the question is like: what are the ‘dependent’ and the ‘independent’ variables in this so-called Standard Model?

There is no easy answer to that question. In fact, one of the reasons why I find physics so fascinating is that one cannot easily answer such questions. There are the obvious relationships, of course. For example, the ke = 1/4πεrelationship, and the context in which they are used (Coulomb’s Law) does, indeed, strongly suggest that both constants are actually part and parcel of the same thing. Identical, I’d say. Likewise, the μ0ε0 = 1/crelation also suggests there’s only one degree of freedom here, just like there’s only one degree of freedom in that ω/k = relationship (if we set a value for ω, we have k, and vice versa). But… Well… I am not quite sure how to phrase this, but… What physical constants could be ‘variables’ indeed?

It’s pretty obvious that the various formulas for α cannot answer that question: you could stare at them for days and weeks and months and years really, but I’d suggest you use your time to read more of Feynman’s real Lectures instead. :-) One point that may help to come to terms with this question – to some extent, at least – is what I casually mentioned above already: the fine-structure constant is equal to the square of the electron charge expressed in Planck units: α = eP2.

Now, that’s very remarkable because Planck units are some kind of ‘natural units’ indeed (for the detail, see my previous post: among other things, it explains what these Planck units really are) and, therefore, it is quite tempting to think that we’ve actually got only one degree of freedom here: α itself. All the rest should follow from it.


It should… But… Does it?

The answer is: yes and no. To be frank, it’s more no than yes because, as I noted a couple of times already, the fine-structure constant relates a lot of stuff but it’s surely not the only significant number in the Universe. For starters, I said that our E(A to B) formula has two ‘variables':

  1. We have that complex number j, which, as mentioned, is equal to the electron charge expressed in Planck units. [In case you wonder why –eP ≈ –0.08542455 is said to be an amplitude, i.e. a complex number or an ‘arrow’… Well… Complex numbers include the real numbers and, hence, –0.08542455 is both real and complex. When combining ‘arrows’ or, to be precise, when multiplying some complex number with –0.08542455, we will (a) shrink the original arrow to about 8.5% of its original value (8.542455% to be precise) and (b) rotate it over an angle of plus or minus 180 degrees. In other words, we’ll reverse its direction. Hence, using Euler’s notation for complex numbers, we can write: –1 = eiπ eiπ and, hence, –0.085 = 0.085·eiπ = 0.085·eiπ. So, in short, yes, j is a complex number, or an ‘arrow’, if you prefer that term.]
  2. We also have some some real number n in the E(A to B) formula. So what’s the n? Well… Believe it or not, it’s the electron mass! Isn’t that amazing?

You’ll say: “Well… Hmm… I suppose so.” But then you may – and actually should – also wonder: the electron mass? In what units? Planck units again? And are we talking relativistic mass (i.e. its total mass, including the equivalent mass of its kinetic energy) or its rest mass only? And we were talking α here, so can we relate it to α too, just like the electron charge?

These are all very good questions. Let’s start with the second one. We’re talking rather slow-moving electrons here, so the relativistic mass (m) and its rest mass (m0) is more or less the same. Indeed, the Lorentz factor γ in the m = γm0 equation is very close to 1 for electrons moving at their typical speed. So… Well… That question doesn’t matter very much. Really? Yes. OK. Because you’re doubting, I’ll quickly show it to you. What is their ‘typical’ speed?

We know we shouldn’t attach too much importance to the concept of an electron in orbit around some nucleus (we know it’s not like some planet orbiting around some star) and, hence, to the concept of speed or velocity (velocity is speed with direction) when discussing an electron in an atom. The concept of momentum (i.e. velocity combined with mass or energy) is much more relevant. There’s a very easy mathematical relationship that gives us some clue here: the Uncertainty Principle. In fact, we’ll use the Uncertainty Principle to relate the momentum of an electron (p) to the so-called Bohr radius r (think of it as the size of a hydrogen atom) as follows: p ≈ ħ/r. [I’ll come back on this in a moment, and show you why this makes sense.]

Now we also know its kinetic energy (K.E.) is mv2/2, which we can write as p2/2m. Substituting our p ≈ ħ/r conjecture, we get K.E. = mv2/2 = ħ2/2mr2. This is equivalent to m2v2 = ħ2/r(just multiply both sides with m). From that, we get v = ħ/mr. Now, one of the many relations we can derive from the formulas for the fine-structure constant is re = α2r. [I haven’t showed you that yet, but I will shortly. It’s a really amazing expression. However, as for now, just accept it as a simple formula for interim use in this digression.] Hence, r = re2. The rfactor in this expression is the so-called classical electron radius. So we can now write v = ħα2/mre. Let’s now throw c in: v/c = α2ħ/mcre. However, from that fifth expression for α, we know that ħ/mcre = α, so we get v/c = α. We have another amazing result here: the v/c ratio for an electron (i.e. its speed expressed as a fraction of the speed of light) is equal to that fine-structure constant α. So that’s about 1/137, so that’s less than 1% of the speed of light. Now… I’ll leave it to you to calculate the Lorentz factor γ but… Well… It’s obvious that it will be very close to 1. :-) Hence, the electron’s speed – however we want to visualize that – doesn’t matter much indeed, so we should not worry about relativistic corrections in the formulas.

Let’s now look at the question in regard to the Planck units. If you know nothing at all about them, I would advise you to read what I wrote about them in my previous post. Let me just note we get those Planck units by equating not less than five fundamental physical constants to 1, notably (1) the speed of light, (2) Planck’s (reduced) constant, (3) Boltzmann’s constant, (4) Coulomb’s constant and (5) Newton’s constant (i.e. the gravitational constant). Hence, we have a set of five equations here (ħ = kB = ke = G = 1), and so we can solve that to get the five Planck units, i.e. the Planck length unit, the Planck time unit, the Planck mass unit, the Planck energy unit, the Planck charge unit and, finally (oft forgotten), the Planck temperature unit. Of course, you should note that all mass and energy units are directly related because of the mass-energy equivalence relation E = mc2, which simplifies to E = m if c is equated to 1. [I could also say something about the relation between temperature and (kinetic) energy, but I won’t, as it would only further confuse you.]

Now, you may or may not remember that the Planck time and length units are unimaginably small, but that the Planck mass unit is actually quite sizable—at the atomic scale, that is. Indeed, the Planck mass is something huge, like the mass of an eyebrow hair, or a flea egg. Is that huge? Yes. Because if you’d want to pack it in a Planck-sized particle, it would make for a tiny black hole. :-) No kidding. That’s the physical significance of the Planck mass and the Planck length and, yes, it’s weird. :-)

Let me give you some values. First, the Planck mass itself: it’s about 2.1765×10−8 kg. Again, if you think that’s tiny, think again. From the E = mc2 equivalence relationship, we get that this is equivalent to 2 giga-joule, approximately. Just to give an idea, that’s like the monthly electricity consumption of an average American family. So that’s huge indeed! :-) [Many people think that nuclear energy involves the conversion of mass into energy, but the story is actually more complicated than that. In any case… I need to move on.]

Let me now give you the electron mass expressed in the Planck mass unit:

  1. Measured in our old-fashioned super-sized SI kilogram unit, the electron mass is me = 9.1×10–31 kg.
  2. The Planck mass is mP = 2.1765×10−8 kg.
  3. Hence, the electron mass expressed in Planck units is meP = me/mP = (9.1×10–31 kg)/(2.1765×10−8 kg) = 4.181×10−23.

We can, once again, write that as some function of the fine-structure constant. More specifically, we can write:

meP = α/reP = α/α2rP  = 1/αrP

So… Well… Yes: yet another amazing formula involving α.

In this formula, we have reP and rP, which are the (classical) electron radius and the Bohr radius expressed in Planck (length) units respectively. So you can see what’s going on here: we have all kinds of numbers here expressed in Planck units: a charge, a radius, a mass,… And we can relate all of them to the fine-structure constant

Why? Who knows? I don’t. As Leighton puts it: that’s just the way “God pushed His pencil.” :-)

Note that the beauty of natural units ensures that we get the same number for the (equivalent) energy of an electron. Indeed, from the E = mc2 relation, we know the mass of an electron can also be written as 0.511 MeV/c2. Hence, the equivalent energy is 0.511 MeV (so that’s, quite simply, the same number but without the 1/cfactor). Now, the Planck energy EP (in eV) is 1.22×1028 eV, so we get EeP = Ee/EP = (0.511×10eV)/(1.22×1028 eV) = 4.181×10−23. So it’s exactly the same as the electron mass expressed in Planck units. Isn’t that nice? :-)

Now, are all these numbers dimensionless, just like α? The answer to that question is complicated. Yes, and… Well… No:

  1. Yes. They’re dimensionless because they measure something in natural units, i.e. Planck units, and, hence, that’s some kind of relative measure indeed so… Well… Yes, dimensionless.
  2. No. They’re not dimensionless because they do measure something, like a charge, a length, or a mass, and when you chose some kind of relative measure, you still need to define some gauge, i.e. some kind of standard measure. So there’s some ‘dimension’ involved there.

So what’s the final answer? Well… The Planck units are not dimensionless. All we can say is that they are closely related, physically. I should also add that we’ll use the electron charge and mass (expressed in Planck units) in our amplitude calculations as a simple (dimensionless) number between zero and one. So the correct answer to the question as to whether these numbers have any dimension is: expressing some quantities in Planck units sort of normalizes them, so we can use them directly in dimensionless calculations, like when we multiply and add amplitudes.

Hmm… Well… I can imagine you’re not very happy with this answer but it’s the best I can do. Sorry. I’ll let you further ponder that question. I need to move on.  

Note that that 4.181×10−23 is still a very small number (23 zeroes after the decimal point!), even if it’s like 46 million times larger than the electron mass measured in our conventional SI unit (i.e. 9.1×10–31 kg). Does such small number make any sense? The answer is: yes, it does. When we’ll finally start discussing that E(A to B) formula (I’ll give it to you in a moment), you’ll see that a very small number for n makes a lot of sense.

Before diving into it all, let’s first see if that formula for that alpha, that fine-structure constant, still makes sense with me expressed in Planck units. Just to make sure. :-) To do that, we need to use the fifth (last) expression for a, i.e. the one with re in it. Now, in my previous post, I also gave some formula for re: re = e2/4πε0mec2, which we can re-write as reme = e2/4πε0c2. If we substitute that expression for reme  in the formula for α, we can calculate α from the electron charge, which indicates both the electron radius and its mass are not some random God-given variable, or “some magic number that comes to us with no understanding by man“, as Feynman – well… Leighton, I guess – puts it. No. They are magic numbers alright, one related to another through the equally ‘magic’ number α, but so I do feel we actually can create some understanding here.

At this point, I’ll digress once again, and insert some quick back-of-the-envelope argument from Feynman’s very serious Caltech Lectures on Physics, in which, as part of the introduction to quantum mechanics, he calculates the so-called Bohr radius from Planck’s constant h. Let me quickly explain: the Bohr radius is, roughly speaking, the size of the simplest atom, i.e. an atom with one electron (so that’s hydrogen really). So it’s not the classical electron radius re. However, both are also related to that ‘magical number’ α. To be precise, if we write the Bohr radius as r, then re = α2r ≈ 0.000053… times r, which we can re-write as:

α = √(re /r) = (re /r)1/2

So that’s yet another amazing formula involving the fine-structure constant. In fact, it’s the formula I used as an ‘interim’ expression to calculate the relative speed of electrons. I just used it without any explanation there, but I am coming back to it here. Alpha again…

Just think about it for a while. In case you’d still doubt the magic of that number, let me write what we’ve discovered so far:

(1) α is the square of the electron charge expressed in Planck units: α = eP2.

(2) α is the square root of the ratio of (a) the classical electron radius and (b) the Bohr radius: α = √(re /r). You’ll see this more often written as re = α2r. Also note that this is an equation that does not depend on the units, in contrast to equation 1 (above), and 4 and 5 (below), which require you to switch to Planck units. It’s the square of a ratio and, hence, the units don’t matter. They fall away.

(3) α is the (relative) speed of an electron: α = v/c. [The relative speed is the speed as measured against the speed of light. Note that the ‘natural’ unit of speed in the Planck system of units is equal to c. Indeed, if you divide one Planck length by one Planck time unit, you get (1.616×10−35 m)/(5.391×10−44 s) = m/s. However, this is another equation, just like (2), that does not depend on the units: we can express v and c in whatever unit we want, as long we’re consistent and express both in the same units.]

(4) Finally – I’ll show you in a moment – α is also equal to the product of (a) the electron mass (which I’ll simply write as me here) and (b) the classical electron radius re (if both are expressed in Planck units): α = me·re. Now think that’s, perhaps, the most amazing of all of the expressions for α. If you don’t think that’s amazing, I’d really suggest you stop trying to study physics. :-)

Note that, from (2) and (4), we find that:

(5) The electron mass (in Planck units) is equal me = α/r= α/α2r = 1/αr. So that gives us an expression, using α once again, for the electron mass as a function of the Bohr radius r expressed in Planck units.

Finally, we can also substitute (1) in (5) to get:

(6) The electron mass (in Planck units) is equal to me = α/r = eP2/re. Using the Bohr radius, we get me = 1/αr = 1/eP2r.

So… As you can see, this fine-structure constant really links ALL of the fundamental properties of the electron: its charge, its radius, its distance to the nucleus (i.e. the Bohr radius), its velocity, its mass (and, hence, its energy),… In short,


Now that should answer the question in regard to the degrees of freedom we have here, doesn’t it? It looks like we’ve got only one degree of freedom here. Indeed, if we’ve got some value for α, then we’ve have the electron charge, and from the electron charge, we can calculate the Bohr radius r (as I will show below), and if we have r, we have mand re. And then we can also calculate v, which gives us its momentum (mv) and its kinetic energy (mv2/2). In short,


Isn’t that amazing? Hmm… You should reserve your judgment as for now, and carefully go over all of the formulas above and verify my statement. If you do that, you’ll probably struggle to find the Bohr radius from the charge (i.e. from α). So let me show you how you do that, because it will also show you why you should, indeed, reserve your judgment. In other words, I’ll show you why alpha does NOT give us everything! The argument below will, finally, prove some of the formulas that I didn’t prove above. Let’s go for it:

1. If we assume that (a) an electron takes some space – which I’ll denote by r :-) – and (b) that it has some momentum p because of its mass m and its velocity v, then the ΔxΔp = ħ relation (i.e. the Uncertainty Principle in its roughest form) suggests that the order of magnitude of r and p should be related in the very same way. Hence, let’s just boldly write r ≈ ħ/p and see what we can do with that. So we equate Δx with r and Δp with p. As Feynman notes, this is really more like a ‘dimensional analysis’ (he obviously means something very ‘rough’ with that) and so we don’t care about factors like 2 or 1/2. [Indeed, note that the more precise formulation of the Uncertainty Principle is σxσ≥ ħ/2.] In fact, we didn’t even bother to define r very rigorously. We just don’t care about precise statements at this point. We’re only concerned about orders of magnitude. [If you’re appalled by the rather rude approach, I am sorry for that, but just try to go along with it.]

2. From our discussions on energy, we know that the kinetic energy is mv2/2, which we can write as p2/2m so we get rid of the velocity factor. [Why? Because we can’t really imagine what it is anyway. As I said a couple of times already, we shouldn’t think of electrons as planets orbiting around some star. That model doesn’t work.] So… What’s next? Well… Substituting our p ≈ ħ/r conjecture, we get K.E. = ħ2/2mr2. So that’s a formula for the kinetic energy. Next is potential.

3. Unfortunately, the discussion on potential energy is a bit more complicated. You’ll probably remember that we had an easy and very comprehensible formula for the energy that’s needed (i.e. the work that needs to be done) to bring two charges together from a large distance (i.e. infinity). Indeed, we derived that formula directly from Coulomb’s Law (and Newton’s law of force) and it’s U = q1q2/4πε0r12. [If you think I am going too fast, sorry, please check for yourself by reading my other posts.] Now, we’re actually talking about the size of an atom here in my previous post, so one charge is the proton (+e) and the other is the electron (–e), so the potential energy is U = P.E. = –e2/4πε0r, with r the ‘distance’ between the proton and the electron—so that’s the Bohr radius we’re looking for!

[In case you’re struggling a bit with those minus signs when talking potential energy  – I am not ashamed to admit I did! – let me quickly help you here. It has to do with our reference point: the reference point for measuring potential energy is at infinity, and it’s zero there (that’s just our convention). Now, to separate the proton and the electron, we’d have to do quite a lot of work. To use an analogy: imagine we’re somewhere deep down in a cave, and we have to climb back to the zero level. You’ll agree that’s likely to involve some sweat, don’t you? Hence, the potential energy associated with us being down in the cave is negative. Likewise, if we write the potential energy between the proton and the electron as U(r), and the potential energy at the reference point as U(∞) = 0, then the work to be done to separate the charges, i.e. the potential difference U(∞) – U(r), will be positive. So U(∞) – U(r) = 0 – U(r) > 0 and, hence, U(r) < 0. If you still don’t ‘get’ this, think of the electron being in some (potential) well, i.e. below the zero level, and so it’s potential energy is less than zero. Huh? Sorry. I have to move on. :-)]

4. We can now write the total energy (which I’ll denote by E, but don’t confuse it with the electric field vector!) as

E = K.E. + P.E. =  ħ2/2mr– e2/4πε0r

Now, the electron (whatever it is) is, obviously, in some kind of equilibrium state. Why is that obvious? Well… Otherwise our hydrogen atom wouldn’t or couldn’t exist. :-) Hence, it’s in some kind of energy ‘well’ indeed, at the bottom. Such equilibrium point ‘at the bottom’ is characterized by its derivative (in respect to whatever variable) being equal to zero. Now, the only ‘variable’ here is r (all the other symbols are physical constants), so we have to solve for dE/dr = 0. Writing it all out yields:

dE/dr = –ħ2/mr+ e2/4πε0r= 0 ⇔ r = 4πε0ħ2/me2

You’ll say: so what? Well… We’ve got a nice formula for the Bohr radius here, and we got it in no time! :-) But the analysis was rough, so let’s check if it’s any good by putting the values in:

r = 4πε0h2/me2

= [(1/(9×109) C2/N·m2)·(1.055×10–34 J·s)2]/[(9.1×10–31 kg)·(1.6×10–19 C)2]

= 53×10–12 m = 53 pico-meter (pm)

So what? Well… Double-check it on the Internet: the Bohr radius is, effectively, about 53 trillionths of a meter indeed! So we’re right on the spot! 

[In case you wonder about the units, note that mass is a measure of inertia: one kg is the mass of an object which, subject to a force of 1 newton, will accelerate at the rate of 1 m/s per second. Hence, we write F = m·a, which is equivalent to m = F/a. Hence, the kg, as a unit, is equivalent to 1 N/(m/s2). If you make this substitution, we get r in the unit we want to see: [(C2/N·m2)·(N2·m2·s2)/[(N·s2/m)·C2] = m.]

Moreover, if we take that value for r and put it in the (total) energy formula above, we’d find that the energy of the electron is –13.6 eV. [Don’t forget to convert from joule to electronvolt when doing the calculation!] Now you can check that on the Internet too: 13.6 eV is exactly the amount of energy that’s needed to ionize a hydrogen atom (i.e. the energy that’s needed to kick the electron out of that energy well)!

Waw ! Isn’t it great that such simple calculations yield such great results? :-) [Of course, you’ll note that the omission of the 1/2 factor in the Uncertainty Principle was quite strategic. :-)] Using the r = 4πε0ħ2/meformula for the Bohr radius, you can now easily check the re = α2r formula. You should find what we jotted down already: the classical electron radius is equal to re = e2/4πε0mec2. To be precise, re = (53×10–6)·(53×10–12m) = 2.8×10–15 m. Now that’s again something you should check on the Internet. Guess what? […] It’s right on the spot again. :-)

We can now also check that α = m·re formula: α = m·r= 4.181×10−23 times… Hey! Wait! We have to express re in Planck units as well, of course! Now, (2.81794×10–15 m)/(1.616×10–35 m) ≈ 1.7438 ×1020. So now we get 4.181×10−23 times 1.7438×1020 = 7.29×10–3 = 0.00729 ≈ 1/137. Bingo! We got the magic number once again. :-)

So… Well… Doesn’t that confirm we actually do have it all with α?

Well… Yes and no… First, you should note that I had to use h in that calculation of the Bohr radius. Moreover, the other physical constants (most notably c and the Coulomb constant) were actually there as well, ‘in the background’ so to speak, because one needs them to derive the formulas we used above. And then we have the equations themselves, of course, most notably that Uncertainty Principle… So… Well…

It’s not like God gave us one number only (α) and that all the rest flows out of it. We have a whole bunch of ‘fundamental’ relations and ‘fundamental’ constants here.

Having said that, it’s true that statement still does not diminish the magic of alpha.

Hmm… Now you’ll wonder: how many? How many constants do we need in all of physics?

Well… I’d say, you should not only ask about the constants: you should also ask about the equations: how many equations do we need in all of physics? [Just for the record, I had to smile when the Hawking of the movie says that he’s actually looking for one formula that sums up all of physics. Frankly, that’s a nonsensical statement. Hence, I think the real Hawking never said anything like that. Or, if he did, that it was one of those statements one needs to interpret very carefully.]

But let’s look at a few constants indeed. For example, if we have c, h and α, then we can calculate the electric charge e and, hence, the electric constant ε= e2/2αhc. From that, we get Coulomb’s constant ke, because ke is defined as 1/4πε0… But…

Hey! Wait a minute! How do we know that ke = 1/4πε0? Well… From experiment. But… Yes? That means 1/4π is some fundamental proportionality coefficient too, isn’t it?

Wow! You’re smart. That’s a good and valid remark. In fact, we use the so-called reduced Planck constant ħ in a number of calculations, and so that involves a 2π factor too (ħ = h/2π). Hence… Well… Yes, perhaps we should consider 2π as some fundamental constant too! And, then, well… Now that I think of it, there’s a few other mathematical constants out there, like Euler’s number e, for example, which we use in complex exponentials.


I am joking, right? I am not saying that 2π and Euler’s number are fundamental ‘physical’ constants, am I? [Note that it’s a bit of a nuisance we’re also using the symbol for Euler’s number, but so we’re not talking the electron charge here: we’re talking that 2.71828…etc number that’s used in so-called ‘natural’ exponentials and logarithms.]

Well… Yes and no. They’re mathematical constants indeed, rather than physical, but… Well… I hope you get my point. What I want to show here, is that it’s quite hard to say what’s fundamental and what isn’t. We can actually pick and choose a bit among all those constants and all those equations. As one physicist puts its: it depends on how we slice it. The one thing we know for sure is that a great many things are related, in a physical way (α connects all of the fundamental properties of the electron, for example) and/or in a mathematical way (2π connects not only the circumference of the unit circle with the radius but quite a few other constants as well!), but… Well… What to say? It’s a tough discussion and I am not smart enough to give you an unambiguous answer. From what I gather on the Internet, when looking at the whole Standard Model (including the strong force, the weak force and the Higgs field), we’ve got a few dozen physical ‘fundamental’ constants, and then a few mathematical ones as well.

That’s a lot, you’ll say. Yes. At the same time, it’s not an awful lot. Whatever number it is, it does raise a very fundamental question: why are they what they are? That brings us back to that ‘fine-tuning’ problem. Now, I can’t make this post too long (it’s way too long already), so let me just conclude this discussion by copying Wikipedia on that question, because what it has on this topic is not so bad:

“Some physicists have explored the notion that if the physical constants had sufficiently different values, our Universe would be so radically different that intelligent life would probably not have emerged, and that our Universe therefore seems to be fine-tuned for intelligent life. The anthropic principle states a logical truism: the fact of our existence as intelligent beings who can measure physical constants requires those constants to be such that beings like us can exist.

I like this. But the article then adds the following, which I do not like so much, because I think it’s a bit too ‘frivolous':

“There are a variety of interpretations of the constants’ values, including that of a divine creator (the apparent fine-tuning is actual and intentional), or that ours is one universe of many in a multiverse (e.g. the many-worlds interpretation of quantum mechanics), or even that, if information is an innate property of the universe and logically inseparable from consciousness, a universe without the capacity for conscious beings cannot exist.”

Hmm… As said, I am quite happy with the logical truism: we are there because alpha (and a whole range of other stuff) is what it is, and we can measure alpha (and a whole range of other stuff) as what it is, because… Well… Because we’re here. Full stop. As for the ‘interpretations’, I’ll let you think about that for yourself. :-)

I need to get back to the lesson. Indeed, this was just a ‘digression’. My post was about the three fundamental events or actions in quantum electrodynamics, and so I was talking about that E(A to B) formula. However, I had to do that digression on alpha to ensure you understand what I want to write about that. So let me now get back to it. End of digression. :-)

The E(A to B) formula

Indeed, I must assume that, with all these digressions, you are truly despairing now. Don’t. We’re there! We’re finally ready for the E(A to B) formula! Let’s go for it.

We’ve now got those two numbers measuring the electron charge and the electron mass in Planck units respectively. They’re fundamental indeed and so let’s loosen up on notation and just write them as e and m respectively. Let me recap:

1. The value of e is approximately –0.08542455, and it corresponds to the so-called junction number j, which is the amplitude for an electron-photon coupling. When multiplying it with another amplitude (to find the amplitude for an event consisting of two sub-events, for example), it corresponds to a ‘shrink’ to less than one-tenth (something like 8.5% indeed, corresponding to the magnitude of e) and a ‘rotation’ (or a ‘turn’) over 180 degrees, as mentioned above.

Please note what’s going on here: we have a physical quantity, the electron charge (expressed in Planck units), and we use it in a quantum-mechanical calculation as a dimensionless (complex) number, i.e. as an amplitude. So… Well… That’s what physicists mean when they say that the charge of some particle (usually the electric charge but, in quantum chromodynamics, it will be the ‘color’ charge of a quark) is a ‘coupling constant’.

2. We also have m, the electron mass, and we’ll use in the same way, i.e. as some dimensionless amplitude. As compared to j, it’s is a very tiny number: approximately 4.181×10−23. So if you look at it as an amplitude, indeed, then it corresponds to an enormous ‘shrink’ (but no turn) of the amplitude(s) that we’ll be combining it with.

So… Well… How do we do it?

Well… At this point, Leighton goes a bit off-track. Just a little bit. :-) From what he writes, it’s obvious that he assumes the frequency (or, what amounts to the same, the de Broglie wavelength) of an electron is just like the frequency of a photon. Frankly, I just can’t imagine why and how Feynman let this happen. It’s wrong. Plain wrong. As I mentioned in my introduction already, an electron traveling through space is not like a photon traveling through space.

For starters, an electron is much slower (because it’s a matter-particle: hence, it’s got mass). Secondly, the de Broglie wavelength and/or frequency of an electron is not like that of a photon. For example, if we take an electron and a photon having the same energy, let’s say 1 eV (that corresponds to infrared light), then the de Broglie wavelength of the electron will be 1.23 nano-meter (i.e. 1.23 billionths of a meter). Now that’s about one thousand times smaller than the wavelength of our 1 eV photon, which is about 1240 nm. You’ll say: how is that possible? If they have the same energy, then the f = E/h and ν = E/h should give the same frequency and, hence, the same wavelength, no?

Well… No! Not at all! Because an electron, unlike the photon, has a rest mass indeed – measured as not less than 0.511 MeV/c2, to be precise (note the rather particular MeV/c2 unit: it’s from the E = mc2 formula) – one should use a different energy value! Indeed, we should include the rest mass energy, which is 0.511 MeV. So, almost all of the energy here is rest mass energy! There’s also another complication. For the photon, there is an easy relationship between the wavelength and the frequency: it has no mass and, hence, all its energy is kinetic, or movement so to say, and so we can use that ν = E/h relationship to calculate its frequency ν: it’s equal to ν = E/h = (1 eV)/(4.13567×10–15 eV·s) ≈ 0.242×1015 Hz = 242 tera-hertz (1 THz = 1012 oscillations per second). Now, knowing that light travels at the speed of light, we can check the result by calculating the wavelength using the λ = c/ν relation. Let’s do it: (2.998×10m/s)/(242×1012 Hz) ≈ 1240 nm. So… Yes, done!

But so we’re talking photons here. For the electron, the story is much more complicated. That wavelength I mentioned was calculated using the other of the two de Broglie relations: λ = h/p. So that uses the momentum of the electron which, as you know, is the product of its mass (m) and its velocity (v): p = mv. You can amuse yourself and check if you find the same wavelength (1.23 nm): you should! From the other de Broglie relation, f = E/h, you can also calculate its frequency: for an electron moving at non-relativistic speeds, it’s about 0.123×1021 Hz, so that’s like 500,000 times the frequency of the photon we we’re looking at! When multiplying the frequency and the wavelength, we should get its speed. However, that’s where we get in trouble. Here’s the problem with matter waves: they have a so-called group velocity and a so-called phase velocity. The idea is illustrated below: the green dot travels with the wave packet – and, hence, its velocity corresponds to the group velocity – while the red dot travels with the oscillation itself, and so that’s the phase velocity. [You should also remember, of course, that the matter wave is some complex-valued wavefunction, so we have both a real as well as an imaginary part oscillating and traveling through space.]

Wave_group (1)

To be precise, the phase velocity will be superluminal. Indeed, using the usual relativistic formula, we can write that p = γm0v and E = γm0c2, with v the (classical) velocity of the electron and what it always is, i.e. the speed of light. Hence, λ = h/γm0v and = γm0c2/h, and so λf = c2/v. Because v is (much) smaller than c, we get a superluminal velocity. However, that’s the phase velocity indeed, not the group velocity, which corresponds to v. OK… I need to end this digression.

So what? Well, to make a long story short, the ‘amplitude framework’ for electrons is differerent. Hence, the story that I’ll be telling here is different from what you’ll read in Feynman’s QED. I will use his drawings, though, and his concepts. Indeed, despite my misgivings above, the conceptual framework is sound, and so the corrections to be made are relatively minor.

So… We’re looking at E(A to B), i.e. the amplitude for an electron to go from point A to B in spacetime, and I said the conceptual framework is exactly the same as that for a photon. Hence, the electron can follow any path really. It may go in a straight line and travel at a speed that’s consistent with what we know of its momentum (p), but it may also follow other paths. So, just like the photon, we’ll have some so-called propagator function, which gives you amplitudes based on the distance in space as well as in the distance in ‘time’ between two points. Now, Ralph Leighton identifies that propagator function with the propagator function for the photon, i.e. P(A to B), but that’s wrong: it’s not the same.

The propagator function for an electron depends on its mass and its velocity, and/or on the combination of both (like it momentum p = mv and/or its kinetic energy: K.E. = mv2 = p2/2m). So we have a different propagator function here. However, I’ll use the same symbol for it: P(A to B).

So, the bottom line is that, because of the electron’s mass (which, remember, is a measure for inertia), momentum and/or kinetic energy (which, remember, are conserved in physics), the straight line is definitely the most likely path, but (big but!), just like the photon, the electron may follow some other path as well.

So how do we formalize that? Let’s first associate an amplitude P(A to B) with an electron traveling from point A to B in a straight line and in a time that’s consistent with its velocity. Now, as mentioned above, the P here stands for propagator function, not for photon, so we’re talking a different P(A to B) here than that P(A to B) function we used for the photon. Sorry for the confusion. :-) The left-hand diagram below then shows what we’re talking about: it’s the so-called ‘one-hop flight’, and so that’s what the P(A to B) amplitude is associated with.

Diagram 1Now, the electron can follow other paths. For photons, we said the amplitude depended on the spacetime interval I: when negative or positive (i.e. paths that are not associated with the photon traveling in a straight line and/or at the speed of light), the contribution of those paths to the final amplitudes (or ‘final arrow’, as it was called) was smaller.

For an electron, we have something similar, but it’s modeled differently. We say the electron could take a ‘two-hop flight’ (via point C or C’), or a ‘three-hop flight’ (via D and E) from point A to B. Now, it makes sense that these paths should be associated with amplitudes that are much smaller. Now that’s where that n-factor comes in. We just put some real number n in the formula for the amplitude for an electron to go from A to B via C, which we write as:

P(A to C)∗n2∗P(C to B)

Note what’s going on here. We multiply two amplitudes, P(A to C) and P(C to B), which is OK, because that’s what the rules of quantum mechanics tell us: if an ‘event’ consists of two sub-events, we need to multiply the amplitudes (not the probabilities) in order to get the amplitude that’s associated with both sub-events happening. However, we add an extra factor: n2. Note that it must be some very small number because we have lots of alternative paths and, hence, they should not be very likely! So what’s the n? And why n2 instead of just n?

Well… Frankly, I don’t know. Ralph Leighton boldly equates n to the mass of the electron. Now, because he obviously means the mass expressed in Planck units, that’s the same as saying n is the electron’s energy (again, expressed in Planck’s ‘natural’ units), so n should be that number m = meP = EeP = 4.181×10−23. However, I couldn’t find any confirmation on the Internet, or elsewhere, of the suggested n = m identity, so I’ll assume n = m indeed, but… Well… Please check for yourself. It seems the answer is to be found in a mathematical theory that helps physicists to actually calculate j and n from experiment. It’s referred to as perturbation theory, and it’s the next thing on my study list. As for now, however, I can’t help you much. I can only note that the equation makes sense.

Of course, it does: inserting a tiny little number n, close to zero, ensures that those other amplitudes don’t contribute too much to the final ‘arrow’. And it also makes a lot of sense to associate it with the electron’s mass: if mass is a measure of inertia, then it should be some factor reducing the amplitude that’s associated with the electron following such crooked path. So let’s go along with it, and see what comes out of it.

A three-hop flight is even weirder and uses that n2 factor two times:

P(A to E)∗n2∗P(E to D)∗n2∗P(D to B)

So we have an (n2)= nfactor here, which is good, because two hops should be much less likely than one hop. So what do we get? Well… (4.181×10−23)≈ 305×10−92. Pretty tiny, huh? :-) Of course, any point in space is a potential hop for the electron’s flight from point A to B and, hence, there’s a lot of paths and a lot of amplitudes (or ‘arrows’ if you want), which, again, is consistent with a very tiny value for n indeed.

So, to make a long story short, E(A to B) will be a giant sum (i.e. some kind of integral indeed) of a lot of different ways an electron can go from point A to B. It will be a series of terms P(A to E) + P(A to C)∗n2∗P(C to B) + P(A to E)∗n2∗P(E to D)∗n2∗P(D to B) + … for all possible intermediate points C, D, E, and so on.

What about the j? The junction number of coupling constant. How does that show up in the E(A to B) formula? Well… Those alternative paths with hops here and there are actually the easiest bit of the whole calculation. Apart from taking some strange path, electrons can also emit and/or absorb photons during the trip. In fact, they’re doing that constantly actually. Indeed, the image of an electron ‘in orbit’ around the nucleus is that of an electron exchanging so-called ‘virtual’ photons constantly, as illustrated below. So our image of an electron absorbing and then emitting a photon (see the diagram on the right-hand side) is really like the tiny tip of a giant iceberg: most of what’s going on is underneath! So that’s where our junction number j comes in, i.e. the charge (e) of the electron.

So, when you hear that a coupling constant is actually equal to the charge, then this is what it means: you should just note it’s the charge expressed in Planck units. But it’s a deep connection, isn’t? When everything is said and done, a charge is something physical, but so here, in these amplitude calculations, it just shows up as some dimensionless negative number, used in multiplications and additions of amplitudes. Isn’t that remarkable?

d2 d3

The situation becomes even more complicated when more than one electron is involved. For example, two electrons can go in a straight line from point 1 and 2 to point 3 and 4 respectively, but there’s two ways in which this can happen, and they might exchange photons along the way, as shown below. If there’s two alternative ways in which one event can happen, you know we have to add amplitudes, rather than multiply them. Hence, the formula for E(A to B) becomes even more complicated.


Moreover, a single electron may first emit and then absorb a photon itself, so there’s no need for other particles to be there to have lots of j factors in our calculation. In addition, that photon may briefly disintegrate into an electron and a positron, which then annihilate each other to again produce a photon: in case you wondered, that’s what those little loops in those diagrams depicting the exchange of virtual photons is supposed to represent. So, every single junction (i.e. every emission and/or absorption of a photon) involves a multiplication with that junction number j, so if there are two couplings involved, we have a j2 factor, and so that’s 0.085424552 = α ≈ 0.0073. Four couplings implies a factor of 0.085424554 ≈ 0.000053.

Just as an example, I copy two diagrams involving four, five or six couplings indeed. They all have some ‘incoming’ photon, because Feynman uses them to explain something else (the so-called magnetic moment of a photon), but it doesn’t matter: the same illustrations can serve multiple purposes.

d6 d7

Now, it’s obvious that the contributions of the alternatives with many couplings add almost nothing to the final amplitude – just like the ‘many-hop’ flights add almost nothing – but… Well… As tiny as these contributions are, they are all there, and so they all have to be accounted for. So… Yes. You can easily appreciate how messy it all gets, especially in light of the fact that there are so many points that can serve as a ‘hop’ or a ‘coupling’ point!

So… Well… Nothing. That’s it! I am done! I realize this has been another long and difficult story, but I hope you appreciated and that it shed some light on what’s really behind those simplified stories of what quantum mechanics is all about. It’s all weird and, admittedly, not so easy to understand, but I wouldn’t say an understanding is really beyond the reach of us, common mortals. :-)

Post scriptum: When you’ve reached here, you may wonder: so where’s the final formula then for E(A to B)? Well… I have no easy formula for you. From what I wrote above, it should be obvious that we’re talking some really awful-looking integral and, because it’s so awful, I’ll let you find it yourself. :-)

I should also note another reason why I am reluctant to identify n with m. The formulas in Feynman’s QED are definitely not the standard ones. The more standard formulations will use the gauge coupling parameter about which I talked already. I sort of discussed it, indirectly, in my first comments on Feynman’s QED, when I criticized some other part of the book, notably its explanation of the phenomenon of diffraction of light, which basically boiled down to: “When you try to squeeze light too much [by forcing it to go through a small hole], it refuses to cooperate and begins to spread out”, because “there are not enough arrows representing alternative paths.”

Now that raises a lot of questions, and very sensible ones, because that simplification is nonsensical. Not enough arrows? That statement doesn’t make sense. We can subdivide space in as many paths as we want, and probability amplitudes don’t take up any physical space. We can cut up space in smaller and smaller pieces (so we analyze more paths within the same space). The consequence – in terms of arrows – is that directions of our arrows won’t change but their length will be much and much smaller as we’re analyzing many more paths. That’s because of the normalization constraint. However, when adding them all up – a lot of very tiny ones, or a smaller bunch of bigger ones – we’ll still get the same ‘final’ arrow. That’s because the direction of those arrows depends on the length of the path, and the length of the path doesn’t change simply because we suddenly decide to use some other ‘gauge’.

Indeed, the real question is: what’s a ‘small’ hole? What’s ‘small’ and what’s ‘large’ in quantum electrodynamics? Now, I gave an intuitive answer to that question in that post of mine, but it’s much more accurate than Feynman’s, or Leighton’s. The answer to that question is: there’s some kind of natural ‘gauge’, and it’s related to the wavelength. So the wavelength of a photon, or an electron, in this case, comes with some kind of scale indeed. That’s why the fine-structure constant is often written in yet another form:

α = 2πree = rek

λe and kare the Compton wavelength and wavenumber of the electron (so kis not the Coulomb constant here). The Compton wavelength is the de Broglie wavelength of the electron. [You’ll find that Wikipedia defines it as “the wavelength that’s equivalent to the wavelength of a photon whose energy is the same as the rest-mass energy of the electron”, but that’s a very confusing definition, I think.]

The point to note is that the spatial dimension in both the analysis of photons as well as of matter waves, especially in regard to studying diffraction and/or interference phenomena, is related to the frequencies, wavelengths and/or wavenumbers of the wavefunctions involved. There’s a certain ‘gauge’ involved indeed, i.e. some measure that is relative, like the gauge pressure illustrated below. So that’s where that gauge parameter g comes in. And the fact that it’s yet another number that’s closely related to that fine-structure constant is… Well… Again… That alpha number is a very magic number indeed… :-)


The Strange Theory of Light and Matter (III)

Fields and charges (II)

My previous posts was, perhaps, too full of formulas, without offering much reflection. Let me try to correct that here by tying up a few loose ends. The first loose end is about units. Indeed, I haven’t been very clear about that and so let me somewhat more precise on that now.

Note: In case you’re not interested in units, you can skip the first part of this post. However, please do look at the section on the electric constant εand, most importantly, the section on natural units—especially Planck units, as I will touch upon the topic of gauge coupling parameters there and, hence, on quantum mechanics. Also, the third and last part, on the theoretical contradictions inherent in the idea of point charges, may be of interest to you.]

The field energy integrals

When we wrote that down that u = ε0E2/2 formula for the energy density of an electric field (see my previous post on fields and charges for more details), we noted that the 1/2 factor was there to avoid double-counting. Indeed, those volume integrals we use to calculate the energy over all space (i.e. U = ∫(u)dV) count the energy that’s associated with a pair of charges (or, to be precise, charge elements) twice and, hence, they have a 1/2 factor in front. Indeed, as Feynman notes, there is no convenient way, unfortunately, of writing an integral that keeps track of the pairs so that each pair is counted just once. In fact, I’ll have to come back to that assumption of there being ‘pairs’ of charges later, as that’s another loose end in the theory.

U 6

U 7

Now, we also said that that εfactor in the second integral (i.e. the one with the vector dot product EE =|E||E|cos(0) = E2) is there to make the units come out alright. Now, when I say that, what does it mean really? I’ll explain. Let me first make a few obvious remarks:

  1. Densities are always measures in terms per unit volume, so that’s the cubic meter (m3). That’s, obviously, an astronomical unit at the atomic or molecular scale.
  2. Because of historical reasons, the conventional unit of charge is not the so-called elementary charge +e (i.e. the charge of a proton), but the coulomb. Hence, the charge density ρ is expressed in Coulomb per cubic meter (C/m3). The coulomb is a rather astronomical unit too—at the atomic or molecular scale at least: 1 e ≈ 1.6022×10−19 C. [I am rounding here to four digits after the decimal point.]
  3. Energy is in joule (J) and that’s, once again, a rather astronomical unit at the lower end of the scales. Indeed, theoretical physicists prefer to use the electronvolt (eV), which is the energy gained (or lost) when an electron (so that’s a charge of –e, i.e. minus e) moves across a potential difference of one volt. But so we’ll stick to the joule as for now, not the eV, because the joule is the SI unit that’s used when defining most electrical units, such as the ampere, the watt and… Yes. The volt. Let’s start with that one.

The volt

The volt unit (V) measures both potential (energy) as well as potential difference (in both cases, we mean electric potential only, of course). Now, from all that you’ve read so far, it should be obvious that potential (energy) can only be measured with respect to some reference point. In physics, the reference point is infinity, which is so far away from all charges that there is no influence there. Hence, any charge we’d bring there (i.e. at infinity) will just stay where it is and not be attracted or repelled by anything. We say the potential there is zero: Φ(∞) = 0. The choice of that reference point allows us, then, to define positive or negative potential: the potential near positive charges will be positive and, vice versa, the potential near negative charges will be negative. Likewise, the potential difference between the positive and negative terminal of a battery will be positive.

So you should just note that we measure both potential as well as potential difference in volt and, hence, let’s now answer the question of what a volt really is. The answer is quite straightforward: the potential at some point r = (x, y, z) measures the work done when bringing one unit charge (i.e. +e) from infinity to that point. Hence, it’s only natural that we define one volt as one joule per unit charge:

1 volt = 1 joule/coulomb (1 V = 1 J/C).

Also note the following:

  1. One joule is the energy energy transferred (or work done) when applying a force of one newton over a distance of one meter, so one volt can also be measured in newton·meter per coulomb: 1 V = 1 J/C = N·m/C.
  2. One joule can also be written as 1 J = 1 V·C.

It’s quite easy to see why that energy = volt-coulomb product makes sense: higher voltage will be associated with higher energy, and the same goes for higher charge. Indeed, the so-called ‘static’ on our body is usually associated with potential differences of thousands of volts (I am not kidding), but the charges involved are extremely small, because the ability of our body to store electric charge is minimal (i.e. the capacitance (aka capacity) of our body). Hence, the shock involved in the discharge is usually quite small: it is measured in milli-joules (mJ), indeed.

The farad

The remark on ‘static’ brings me to another unit which I should mention in passing: the farad. It measures the capacitance (formerly known as the capacity) of a capacitor (formerly known as a condenser). A condenser consists, quite simply, of two separated conductors: it’s usually illustrated as consisting of two plates or of thin foils (e.g. aluminum foil) separated by an insulating film (e.g. waxed paper), but one can also discuss the capacity of a single body, like our human body, or a charged sphere. In both cases, however, the idea is the same: we have a ‘positive’ charge on one side (+q), and a ‘negative’ charge on the other (–q). In case of a single object, we imagine the ‘other’ charge to be some other large object (the Earth, for instance, but it can also be a car or whatever object that could potentially absorb the charge on our body) or, in case of the charged sphere, we could imagine some other sphere of much larger radius. The farad will then measure the capacity of one or both conductors to store charge.

Now, you may think we don’t need another unit here if that’s the definition: we could just express the capacity of a condensor in terms of its maximum ‘load’, couldn’t we? So that’s so many coulomb before the thing breaks down, when the waxed paper fails to separate the two opposite charges on the aluminium foil, for example. No. It’s not like that. It’s true we can not continue to increase the charge without consequences. However, what we want to measure with the farad is another relationship. Because of the opposite charges on both sides, there will be a potential difference, i.e. a voltage difference. Indeed, a capacitor is like a little battery in many ways: it will have two terminals. Now, it is fairly easy to show that the potential difference (i.e. the voltage) between the two plates will be proportional to the charge. Think of it as follows: if we double the charges, we’re doubling the fields, right? So then we need to do twice the amount of work to carry the unit charge (against the field) from one plate to the other. Now, because the distance is the same, that means the potential difference must be twice what it was.

Now, while we have a simple proportionality here between the voltage and the charge, the coefficient of proportionality will depend on the type of conductors, their shape, the distance and the type of insulator (aka dielectric) between them, and so on and so on. Now, what’s being measured in farad is that coefficient of proportionality, which we’ll denote by C(the proportionality coefficient for the charge), CV ((the proportionality coefficient for the voltage) or, because we should make a choice between the two, quite simply, as C. Indeed, we can either write (1) Q = CQV or, alternatively, V = CVQ, with C= 1/CV. As Feynman notes, “someone originally wrote the equation of proportionality as Q = CV”, so that’s what it will be: the capacitance (aka capacity) of a capacitor (aka condenser) is the ratio of the electric charge Q (on each conductor) to the potential difference V between the two conductors. So we know that’s a constant typical of the type of condenser we’re talking about. Indeed, the capacitance is the constant of proportionality defining the linear relationship between the charge and the voltage means doubling the voltage, and so we can write:

C = Q/V

Now, the charge is measured in coulomb, and the voltage is measured in volt, so the unit in which we will measure C is coulomb per volt (C/V), which is also known as the farad (F):

1 farad = 1 coulomb/volt (1 F = 1 C/V)

[Note the confusing use of the same symbol C for both the unit of charge (coulomb) as well as for the proportionality coefficient! I am sorrry about that, but so that’s convention!].

To be precise, I should add that the proportionality is generally there, but there are exceptions. More specifically, the way the charge builds up (and the way the field builds up, at the edges of the capacitor, for instance) may cause the capacitance to vary a little bit as it is being charged (or discharged). In that case, capacitance will be defined in terms of incremental changes: C = dQ/dV.

Let me conclude this section by giving you two formulas, which are also easily proved but so I will just give you the result:

  1. The capacity of a parallel-plate condenser is C = ε0A/d. In this formula, we have, once again, that ubiquitous electric constant ε(think of it as just another coefficient of proportionality), and then A, i.e. the area of the plates, and d, i.e. the separation between the two plates.
  2. The capacity of a charged sphere of radius r (so we’re talking the capacity of a single conductor here) is C = 4πε0r. This may remind you of the formula for the surface of a sphere (A = 4πr2), but note we’re not squaring the radius. It’s just a linear relationship with r.

I am not giving you these two formulas to show off or fill the pages, but because they’re so ubiquitous and you’ll need them. In fact, I’ll need the second formula in this post when talking about the other ‘loose end’ that I want to discuss.

Other electrical units

From your high school physics classes, you know the ampere and the watt, of course:

  1. The ampere is the unit of current, so it measures the quantity of charge moving or circulating per second. Hence, one ampere is one coulomb per second: 1 A = 1 C/s.
  2. The watt measures power. Power is the rate of energy conversion or transfer with respect to time. One watt is one joule per second: 1 W = 1 J/s = 1 N·m/s. Also note that we can write power as the product of current and voltage: 1 W = (1 A)·(1 V) = (1 C/s)·(1 J/C) = 1 J/s.

Now, because electromagnetism is such well-developed theory and, more importantly, because it has so many engineering and household applications, there are many other units out there, such as:

  • The ohm (Ω): that’s the unit of electrical resistance. Let me quickly define it: the ohm is defined as the resistance between two points of a conductor when a (constant) potential difference (V) of one volt, applied to these points, produces a current (I) of one ampere. So resistance (R) is another proportionality coefficient: R = V/I, and 1 ohm (Ω) = 1 volt/ampere (V/A). [Again, note the (potential) confusion caused by the use of the same symbol (V) for voltage (i.e. the difference in potential) as well as its unit (volt).] Now, note that it’s often useful to write the relationship as V = R·I, so that gives the potential difference as the product of the resistance and the current.
  • The weber (Wb) and the tesla (T): that’s the unit of magnetic flux (i.e. the strength of the magnetic field) and magnetic flux density (i.e. one tesla = one weber per square meter) respectively. So these have to do with the field vector B, rather than E. So we won’t talk about it here.
  • The henry (H): that’s the unit of electromagnetic inductance. It’s also linked to the magnetic effect. Indeed, from Maxwell’s equations, we know that a changing electric current will cause the magnetic field to change. Now, a changing magnetic field causes circulation of E. Hence, we can make the unit charge go around in some loop (we’re talking circulation of E indeed, not flux). The related energy, or the work that’s done by a unit of charge as it travels (once) around that loop, is – quite confusingly! – referred to as electromotive force (emf). [The term is quite confusing because we’re not talking force but energy, i.e. work, and, as you know by now, energy is force times distance, so energy and force are related but not the same.] To ensure you know what we’re talking about, let me note that emf is measured in volts, so that’s in joule per coulomb: 1 V = 1 J/C. Back to the henry now. If the rate of change of current in a circuit (e.g. the armature winding of an electric motor) is one ampere per second, and the resulting electromotive force (remember: emf is energy per coulomb) is one volt, then the inductance of the circuit is one henry. Hence, 1 H = 1 V/(1 A/s) = 1 V·s/A.     

The concept of impedance

You’ve probably heard about the so-called impedance of a circuit. That’s a complex concept, literally, because it’s a complex-valued ratio. I should refer you to the Web for more details, but let me try to summarize it because, while it’s complex, that doesn’t mean it’s complicated. :-) In fact, I think it’s rather easy to grasp after all you’ve gone through already. :-) So let’s give it a try.

When we have a simple direct current (DC), then we have a very straightforward definition of resistance (R), as mentioned above: it’s a simple ratio between the voltage (as measured in volt) and the current (as measured in ampere). Now, with alternating current (AC) circuits, it becomes more complicated, and so then it’s the concept of impedance that kicks in. Just like resistance, impedance also sort of measures the ‘opposition’ that a circuit presents to a current when a voltage is applied, but we have a complex ratio—literally: it’s a ratio with a magnitude and a direction, or a phase as it’s usually referred to. Hence, one will often write the impedance (denoted by Z) using Euler’s formula:

Z = |Z|eiθ

Now, if you don’t know anything about complex numbers, you should just skip all of what follows and go straight to the next section. However, if you do know what a complex number is (it’s an ‘arrow’, basically, and if θ is a variable, then it’s a rotating arrow, or a ‘stopwatch hand’, as Feynman calls it in his more popular Lectures on QED), then you may want to carry on reading.

The illustration below (credit goes to Wikipedia, once again) is, probably, the most generic view of an AC circuit that one can jot down. If we apply an alternating current, both the current as well as the voltage will go up and down. However, the current signal will lag the voltage signal, and the phase factor θ tells us by how much. Hence, using complex-number notation, we write:

V = IZ = I∗|Z|eiθ


Now, while that resembles the V = R·I formula I mentioned when discussing resistance, you should note the bold-face type for V and I, and the ∗ symbol I am using here for multiplication. First the ∗ symbol: that’s a convention Feynman adopts in the above-mentioned popular account of quantum mechanics. I like it, because it makes it very clear we’re not talking a vector cross product A×B here, but a product of two complex numbers. Now, that’s also why I write V and I in bold-face: they have a phase too and, hence, we can write them as:

  • = |V|ei(ωt + θV)
  • = |I|ei(ωt + θI)

This works out as follows:

IZ = |I|ei(ωt + θI)∗|Z|eiθ = |I||Z|ei(ωt + θ+ θ) = |V|ei(ωt + θV)

Indeed, because the equation must hold for all t, we can equate the magnitudes and phases and, hence, we get: |V| = |I||Z| and θ= θI + θ. But voltage and current is something real, isn’t it? Not some complex number? You’re right. The complex notation is used mainly to simplify the calculus, but it’s only the real part of those complex-valued functions that count. [In any case, because we limit ourselves to complex exponentials here, the imaginary part (which is the sine, as opposed to the real part, which is the cosine) is the same as the real part, but with a lag of its own (π/2 or 90 degrees, to be precise). Indeed: when writing Euler’s formula out (eiθ = cos(θ) + isin(θ), you should always remember that the sine and cosine function are basically the same function: they differ only in the phase, as is evident from the trigonometric identity sin(θ+π/) = cos(θ).]

Now, that should be more than enough in terms of an introduction to the units used in electromagnetic theory. Hence, let’s move on.

The electric constant ε0

Let’s now look at  that energy density formula once again. When looking at that u = ε0E2/2 formula, you may think that its unit should be the square of the unit in which we measure field strength. How do we measure field strength? It’s defined as the force on a unit charge (E = F/q), so it should be newton per coulomb (N/C). Because the coulomb can also be expressed in newton·meter/volt (1 V = 1 J/C = N·m/C and, hence, 1 C = 1 N·m/V), we can express field strength not only in newton/coulomb but also in volt per meter: 1 N/C = 1 N·V/N·m = 1 V/m. How do we get from N2/C2 and/or V2/mto J/m3?

Well… Let me first note there’s no issue in terms of units with that ρΦ formula in the first integral for U: [ρ]·[Φ] = (C/m3)·V = [(N·m/V)/m3)·V = (N·m)/m3 = J/m3. No problem whatsoever. It’s only that second expression for U, with the u = ε0E2/2 in the integrand, that triggers the question. Here, we just need to accept that we need that εfactor to make the units come out alright. Indeed, just like other physical constants (such as c, G, or h, for example), it has a dimension: its unit is either C2/N·m2 or, what amounts to the same, C/V·m. So the units come out alright indeed if, and only if, we multiply the N2/C2 and/or V2/m2 units with the dimension of ε0:

  1. (N2/C2)·(C2/N·m2) = (N2·m)·(1/m3) = J/m3
  2. (V2/m2)·(C/V·m) = V·C/m3 = (V·N·m/V)/m= N·m/m3 = J/m3


But so that’s the units only. The electric constant also has a numerical value:

ε0 = 8.854187817…×10−12 C/V·m ≈ 8.8542×10−12 C/V·m

This numerical value of εis as important as its unit to ensure both expressions for U yield the same result. Indeed, as you may or may not remember from the second of my two posts on vector calculus, if we have a curl-free field C (that means ×= 0 everywhere, which is the case when talking electrostatics only, as we are doing here), then we can always find some scalar field ψ such that C = ψ. But so here we have E = –ε0Φ, and so it’s not the minus sign that distinguishes the expression from the C = ψ expression, but the εfactor in front.

It’s just like the vector equation for heat flow: h = –κT. Indeed, we also have a constant of proportionality here, which is referred to as the thermal conductivity. Likewise, the electric constant εis also referred to as the permittivity of the vacuum (or of free space), for similar reasons obviously!

Natural units

You may wonder whether we can’t find some better units, so we don’t need the rather horrendous 8.8542×10−12 C/V·m factor (I am rounding to four digits after the decimal point). The answer is: yes, it’s possible. In fact, there are several systems in which the electric constant (and the magnetic constant, which we’ll introduce later) reduce to 1. The best-known are the so-called Gaussian and Lorentz-Heaviside units respectively.

Gauss defined the unit of charge in what is now referred to as the statcoulomb (statC), which is also referred to as the franklin (Fr) and/or the electrostatic unit of charge (esu), but I’ll refer you to the Wikipedia article on it in case you’d want to find out more about it. You should just note the definition of this unit is problematic in other ways. Indeed, it’s not so easy to try to define ‘natural units’ in physics, because there are quite a few ‘fundamental’ relations and/or laws in physics and, hence, equating this or that constant to one usually has implications on other constants. In addition, one should note that many choices that made sense as ‘natural’ units in the 19th century seem to be arbitrary now. For example:

  1. Why would we select the charge of the electron or the proton as the unit charge (+1 or –1) if we now assume that protons (and neutrons) consists of quarks, which have +2/3 or –1/3?
  2. What unit would we choose as the unit for mass, knowing that, despite all of the simplification that took place as a result of the generalized acceptance of the quark model, we’re still stuck with quite a few elementary particles whose mass would be a ‘candidate’ for the unit mass? Do we chose the electron, the u quark, or the d quark?

Therefore, the approach to ‘natural units’ has not been to redefine mass or charge or temperature, but the physical constants themselves. Obvious candidates are, of course, c and ħ, i.e. the speed of light and Planck’s constant. [You may wonder why physicists would select ħ, rather than h, as a ‘natural’ unit, but I’ll let you think about that. The answer is not so difficult.] That can be done without too much difficulty indeed, and so one can equate some more physical constants with one. The next candidate is the so-called Boltzmann constant (kB). While this constant is not so well known, it does pop up in a great many equations, including those that led Planck to propose his quantum of action, i.e.(see my post on Planck’s constant). When we do that – so when we equate c, ħ and kB with one (ħ = kB = 1), we still have a great many choices, so we need to impose further constraints. The next is to equate the gravitational constant with one, so then we have ħ = kB = G = 1.

Now, it turns out that the ‘solution’ of this ‘set’ of four equations (ħ = kB = G = 1) does, effectively, lead to ‘new’ values for most of our SI base units, most notably length, time, mass and temperature. These ‘new’ units are referred to as Planck units. You can look up their values yourself, and I’ll let you appreciate the ‘naturalness’ of the new units yourself. They are rather weird. The Planck length and time are usually referred to as the smallest possible measurable units of length and time and, hence, they are related to the so-called limits of quantum theory. Likewise, the Planck temperature is a related limit in quantum theory: it’s the largest possible measurable unit of temperature. To be frank, it’s hard to imagine what the scale of the Planck length, time and temperature really means. In contrast, the scale of the Planck mass is something we actually can imagine – it is said to correspond to the mass of an eyebrow hair, or a flea egg – but, again, its physical significance is not so obvious: Nature’s maximum allowed mass for point-like particles, or the mass capable of holding a single elementary charge. That triggers the question: do point-like charges really exist? I’ll come back to that question. But first I’ll conclude this little digression on units by introducing the so-called fine-structure constant, of which you’ve surely heard before.

The fine-structure constant

I wrote that the ‘set’ of equations ħ = kB = G = 1 gave us Planck units for most of our SI base units. It turns out that these four equations do not lead to a ‘natural’ unit for electric charge. We need to equate a fifth constant with one to get that. That fifth constant is Coulomb’s constant (often denoted as ke) and, yes, it’s the constant that appears in Coulomb’s Law indeed, as well as in some other pretty fundamental equations in electromagnetics, such as the field caused by a point charge q: E = q/4πε0r2. Hence, ke = 1/4πε0. So if we equate kwith one, then ε0 will, obviously, be equal to ε= 1/4π.

To make a long story short, adding this fifth equation to our set of five also gives us a Planck charge, and I’ll give you its value: it’s about 1.8755×10−18 C. As I mentioned that the elementary charge is 1 e ≈ 1.6022×10−19 C, it’s easy to that the Planck charge corresponds to some 11.7 times the charge of the proton. In fact, let’s be somewhat more precise and round, once again, to four digits after the decimal point: the qP/e ratio is about 11.7062. Conversely, we can also say that the elementary charge as expressed in Planck units, is about 1/11.7062 ≈ 0.08542455. In fact, we’ll use that ratio in a moment in some other calculation, so please jot it down.

0.08542455? That’s a bit of a weird number, isn’t it? You’re right. And trying to write it in terms of the charge of a u or d quark doesn’t make it any better. Also, note that the first four significant digits (8542) correspond to the first four significant digits after the decimal point of our εconstant. So what’s the physical significance here? Some other limit of quantum theory?

Frankly, I did not find anything on that, but the obvious thing to do is to relate is to what is referred to as the fine-structure constant, which is denoted by α. This physical constant is dimensionless, and can be defined in various ways, but all of them are some kind of ratio of a bunch of these physical constants we’ve been talking about:

Fine-structure constant formula

The only constants you have not seen before are μ0Rand, perhaps, ras well as m. However, these can be defined as a function of the constants that you did see before:

  1. The μ0 constant is the so-called magnetic constant. It’s something similar as ε0 and it’s referred to as the magnetic permeability of the vacuum. So it’s just like the (electric) permittivity of the vacuum (i.e. the electric constant ε0) and the only reason why you haven’t heard of this before is because we haven’t discussed magnetic fields so far. In any case, you know that the electric and magnetic force are part and parcel of the same phenomenon (i.e. the electromagnetic interaction between charged particles) and, hence, they are closely related. To be precise, μ= 1/ε0c2. That shows the first and second expression for α are, effectively, fully equivalent.
  2. Now, from the definition of ke = 1/4πε0, it’s easy to see how those two expressions are, in turn, equivalent with the third expression for α.
  3. The Rconstant is the so-called von Klitzing constant, but don’t worry about it: it’s, quite simply, equal to Rh/e2. Hene, substituting (and don’t forget that h = 2πħ) will demonstrate the equivalence of the fourth expression for α.
  4. Finally, the re factor is the classical electron radius, which is usually written as a function of me, i.e. the electron mass: re = e2/4πε0mec2. This very same equation implies that reme = e2/4πε0c2. So… Yes. It’s all the same really.

Let’s calculate its (rounded) value in the old units first, using the third expression:

  • The econstant is (roughly) equal to (1.6022×10–19 C)= 2.5670×10–38 C2. Coulomb’s constant k= 1/4πεis about 8.9876×10N·m2/C2. Hence, the numerator e2k≈ 23.0715×10–29 N·m2.
  • The (rounded) denominator is ħc = (1.05457×10–34 N·m·s)(2.998×108 m/s) = 3.162×10–26 N·m2.
  • Hence, we get α = kee2/ħc ≈ 7.297×10–3 = 0.007297.

Note that this number is, effectively, dimensionless. Now, the interesting thing is that if we calculate α using Planck units, we get an econstant that is (roughly) equal to 0.08542455= … 0.007297! Now, because all of the other constants are equal to 1 in Planck’s system of units, that’s equal to α itself. So… Yes ! The two values for α are one and the same in the two systems of units and, of course, as you might have guessed, the fine-structure constant is effectively dimensionless because it does not depend on our units of measurement. So what does it correspond to?

Now that would take me a very long time to explain, but let me try to summarize what it’s all about. In my post on quantum electrodynamics (QED) – so that’s the theory of light and matter basically and, most importantly, how they interact – I wrote about the three basic events in that theory, and how they are associated with a probability amplitude, so that’s a complex number, or an ‘arrow’, as Feynman puts it: something with (a) a magnitude and (b) a direction. We had to take the absolute square of these amplitudes in order to calculate the probability (i.e. some real number between 0 and 1) of the event actually happening. These three basic events or actions were:

  1. A photon travels from point A to B. To keep things simple and stupid, Feynman denoted this amplitude by P(A to B), and please note that the P stands for photon, not for probability. I should also note that we have an easy formula for P(A to B): it depends on the so-called space-time interval between the two points A and B, i.e. I = Δr– Δt= (x2–x1)2+(y2–y1)2+(z2–z1)– (t2–t1)2. Hence, the space-time interval takes both the distance in space as well as the ‘distance’ in time into account.
  2. An electron travels from point A to B: this was denoted by E(A to B) because… Well… You guessed it: the of electron. The formula for E(A to B) was much more complicated, but the two key elements in the formula was some complex number j (see below), and some other (real) number n.
  3. Finally, an electron could emit or absorb a photon, and the amplitude associated with this event was denoted by j, for junction.

Now, that junction number j is about –0.1. To be somewhat more precise, I should say it’s about –0.08542455.

–0.08542455? That’s a bit of a weird number, isn’t it? Hey ! Didn’t we see this number somewhere else? We did, but before you scroll up, let’s first interpret this number. It looks like an ordinary (real) number, but it’s an amplitude alright, so you should interpret it as an arrow. Hence, it can be ‘combined’ (i.e. ‘added’ or ‘multiplied’) with other arrows. More in particular, when you multiply it with another arrow, it amounts to a shrink to a bit less than one-tenth (because its magnitude is about 0.085 = 8.5%), and half a turn (the minus sign amounts to a rotation of 180°). Now, in that post of mine, I wrote that I wouldn’t entertain you on the difficulties of calculating this number but… Well… We did see this number before indeed. Just scroll up to check it. We’ve got a very remarkable result here:

j ≈ –0.08542455 = –√0.007297 = –√α = –e expressed in Planck units

So we find that our junction number j or – as it’s better known – our coupling constant in quantum electrodynamics (aka as the gauge coupling parameter g) is equal to the (negative) square root of that fine-structure constant which, in turn, is equal to the charge of the electron expressed in the Planck unit for electric charge. Now that is a very deep and fundamental result which no one seems to be able to ‘explain’—in an ‘intuitive’ way, that is.

I should immediately add that, while we can’t explain it, intuitively, it does make sense. A lot of sense actually. Photons carry the electromagnetic force, and the electromagnetic field is caused by stationary and moving electric charges, so one would expect to find some relation between that junction number j, describing the amplitude to emit or absorb a photon, and the electric charge itself, but… An equality? Really?

Well… Yes. That’s what it is, and I look forward to trying to understand all of this better. For now, however, I should proceed with what I set out to do, and that is to tie up a few loose ends. This was one, and so let’s move to the next, which is about the assumption of point charges.

Note: More popular accounts of quantum theory say α itself is ‘the’ coupling constant, rather than its (negative) square –√α = j = –e (expressed in Planck units). That’s correct: g or j are, technically speaking, the (gauge) coupling parameter, not the coupling constant. But that’s a little technical detail which shouldn’t bother you. The result is still what it is: very remarkable! I should also note that it’s often the value of the reciprocal (1/α) that is specified, i.e. 1/0.007297 ≈ 137.036. But so now you know what this number actually stands for. :-)

Do point charges exist?

Feynman’s Lectures on electrostatics are interesting, among other things, because, besides highlighting the precision and successes of the theory, he also doesn’t hesitate to point out the contradictions. He notes, for example, that “the idea of locating energy in the field is inconsistent with the assumption of the existence of point charges.”


Yes. Let’s explore the point. We do assume point charges in classical physics indeed. The electric field caused by a point charge is, quite simply:

E = q/4πε0r2

Hence, the energy density u is ε0E2/2 = q2/32πε0r4. Now, we have that volume integral U = (ε0/2)∫EEdV = ∫(ε0E2/2)dV integral. As Feynman notes, nothing prevents us from taking a spherical shell for the volume element dV, instead of an infinitesimal cube. This spherical shell would have the charge q at its center, an inner radius equal to r, an infinitesimal thickness dr, and, finally, a surface area 4πr(that’s just the general formula for the surface area of a spherical shell, which I also noted above). Hence, its (infinitesimally small) volume is 4πr2dr, and our integral becomes:


To calculate this integral, we need to take the limit of –q2/8πε0r for (a) r tending to zero (r→0) and for (b) r tending to infinity (r→∞). The limit for r = ∞ is zero. That’s OK and consistent with the choice of our reference point for calculating the potential of a field. However, the limit for r = 0 is zero is infinity! Hence, that U = (ε0/2)∫EEdV basically says there’s an infinite amount of energy in the field of a point charge! How is that possible? It cannot be true, obviously.

So… Where did we do wrong?

Your first reaction may well be that this very particular approach (i.e. replacing our infinitesimal cubes by infinitesimal shells) to calculating our integral is fishy and, hence, not allowed. Maybe you’re right. Maybe not. It’s interesting to note that we run into similar problems when calculating the energy of a charged sphere. Indeed, we mentioned the formula for the capacity of a charged sphere: C = 4πε0r. Now, there’s a similarly easy formula for the energy of a charged sphere. Let’s look at how we charge a condenser:

  • We know that the potential difference between two plates of a condenser represents the work we have to do, per unit charge, to transfer a charge (Q) from one plate to the other. Hence, we can write V = ΔU/ΔQ.
  • We will, of course, want to do a differential analysis. Hence, we’ll transfer charges incrementally, one infinitesimal little charge dQ at the time, and re-write V as V = dU/dQ or, what amounts to the same: dU = V·dQ.
  • Now, we’ve defined the capacitance of a condenser as C = Q/V. [Again, don’t be confused: C stands for capacity here, measured in coulomb per volt, not for the coulomb unit.] Hence, we can re-write dU as dU = Q·dQ/C.
  • Now we have to integrate dU going from zero charge to the final charge Q. Just do a little bit of effort here and try it. You should get the following result: U = Q2/2C. [We could re-write this as U = (C2/V2)/2C =  C·V2/2, which is a form that may be more useful in some other context but not here.]
  • Using that C = 4πε0r formula, we get our grand result. The energy of a charged sphere is:

U = Q2/8πε0r

From that formula, it’s obvious that, if the radius of our sphere goes to zero, its energy should also go to infinity! So it seems we can’t really pack a finite charge Q in one single point. Indeed, to do that, our formula says we need an infinite amount of energy. So what’s going on here?

Nothing much. You should, first of all, remember how we got that integral: see my previous post for the full derivation indeed. It’s not that difficult. We first assumed we had pairs of charges qi and qfor which we calculated the total electrostatic energy U as the sum of the energies of all possible pairs of charges:

U 3

And, then, we looked at a continuous distribution of charge. However, in essence, we still did the same: we counted the energy of interaction between infinitesimal charges situated at two different points (referred to as point 1 and 2 respectively), with a 1/2 factor in front so as to ensure we didn’t double-count (there’s no way to write an integral that keeps track of the pairs so that each pair is counted only once):

U 4

Now, we reduced this double integral by a clever substitution to something that looked a bit better:

U 6

Finally, some more mathematical tricks gave us that U = (ε0/2)∫EEdV integral.

In essence, what’s wrong in that integral above is that it actually includes the energy that’s needed to assemble the finite point charge q itself from an infinite number of infinitesimal parts. Now that energy is infinitely large. We just can’t do it: the energy required to construct a point charge is ∞.

Now that explains the physical significance of that Planck mass ! We said Nature has some kind of maximum allowable mass for point-like particles, or the mass capable of holding a single elementary charge. What’s going on is, as we try to pile more charge on top of the charge that’s already there, we add energy. Now, energy has an equivalent mass. Indeed, the Planck charge (q≈ 1.8755×10−18 C), the Planck length (l= 1.616×10−35 m), the Planck energy (1.956×109 J), and the Planck mass (2.1765×10−8 kg) are all related. Now things start making sense. Indeed, we said that the Planck mass is tiny but, still, it’s something we can imagine, like a flea’s egg or the mass of a hair of a eyebrow. The associated energy (E = mc2, so that’s (2.1765×10−8 kg)·(2.998×108 m/s)2 ≈ 19.56×108 kg·m2/s= 1.956×109 joule indeed.

Now, how much energy is that? Well… That’s about 2 giga-joule, obviously, but so what’s that in daily life? It’s about the energy you would get when burning 40 liter of fuel. It’s also likely to amount, more or less, to your home electricity consumption over a month. So it’s sizable, and so we’re packing all that energy into a Planck volume (lP≈ 4×10−105 m3). If we’d manage that, we’d be able to create tiny black holes, because that’s what that little Planck volume would become if we’d pack so much energy in it. So… Well… Here I just have to refer you to more learned writers than I am. As Wikipedia notes dryly: “The physical significance of the Planck length is a topic of theoretical research. Since the Planck length is so many orders of magnitude smaller than any current instrument could possibly measure, there is no way of examining it directly.”

So… Well… That’s it for now. The point to note is that we would not have any theoretical problems if we’d assume our ‘point charge’ is actually not a point charge but some small distribution of charge itself. You’ll say: Great! Problem solved! 

Well… For now, yes. But Feynman rightly notes that assuming that our elementary charges do take up some space results in other difficulties of explanation. As we know, these difficulties are solved in quantum mechanics, but so we’re not supposed to know that when doing these classical analyses. :-)

Fields and charges (II)

Fields and charges (I)

My previous posts focused mainly on photons, so this one should be focused more on matter-particles, things that have a mass and a charge. However, I will use it more as an opportunity to talk about fields and present some results from electrostatics using our new vector differential operators (see my posts on vector analysis).

Before I do so, let me note something that is obvious but… Well… Think about it: photons carry the electromagnetic force, but have no electric charge themselves. Likewise, electromagnetic fields have energy and are caused by charges, but so they also carry no charge. So… Fields act on a charge, and photons interact with electrons, but it’s only matter-particles (notably the electron and the proton, which is made of quarks) that actually carry electric charge. Does that make sense? It should. :-)

Another thing I want to remind you of, before jumping into it all head first, are the basic units and relations that are valid always, regardless of what we are talking about. They are represented below:


Let me recapitulate the main points:

  • The speed of light is always the same, regardless of the reference frame (inertial or moving), and nothing can travel faster than light (except mathematical points, such as the phase velocity of a wavefunction).
  • This universal rule is the basis of relativity theory and the mass-energy equivalence relation E = mc2.
  • The constant speed of light also allows us to redefine the units of time and/or distance such that c = 1. For example, if we re-define the unit of distance as the distance traveled by light in one second, or the unit of time as the time light needs to travel one meter, then c = 1.
  • Newton’s laws of motion define a force as the product of a mass and its acceleration: F = m·a. Hence, mass is a measure of inertia, and the unit of force is 1 newton (N) = 1 kg·m/s2.
  • The momentum of an object is the product of its mass and its velocity: p = m·v. Hence, its unit is 1 kg·m/s = 1 N·s. Therefore, the concept of momentum combines force (N) as well as time (s).
  • Energy is defined in terms of work: 1 Joule (J) is the work done when applying a force of one newton over a distance of one meter: 1 J = 1 N·m. Hence, the concept of energy combines force (N) and distance (m).
  • Relativity theory establishes the relativistic energy-momentum relation pc = Ev/c, which can also be written as E2 = p2c+ m02c4, with mthe rest mass of an object (i.e. its mass when the object would be at rest, relative to the observer, of course). These equations reduce to m = E and E2 = p2 + m0when choosing time and/or distance units such that c = 1. The mass is the total mass of the object, including its inertial mass as well as the equivalent mass of its kinetic energy.
  • The relationships above establish (a) energy and time and (b) momentum and position as complementary variables and, hence, the Uncertainty Principle can be expressed in terms of both. The Uncertainty Principle, as well as the Planck-Einstein relation and the de Broglie relation (not shown on the diagram), establish a quantum of action, h, whose dimension combines force, distance and time (h ≈ 6.626×10−34 N·m·s). This quantum of action (Wirkung) can be defined in various ways, as it pops up in more than one fundamental relation, but one of the more obvious approaches is to define h as the proportionality constant between the energy of a photon (i.e. the ‘light particle’) and its frequency: h = E/ν.

Note that we talked about forces and energy above, but we didn’t say anything about the origin of these forces. That’s what we are going to do now, even if we’ll limit ourselves to the electromagnetic force only.


According to Wikipedia, electrostatics deals with the phenomena and properties of stationary or slow-moving electric charges with no acceleration. Feynman usually uses the term when talking about stationary charges only. If a current is involved (i.e. slow-moving charges with no acceleration), the term magnetostatics is preferred. However, the distinction does not matter all that much because  – remarkably! – with stationary charges and steady currents, the electric and magnetic fields (E and B) can be analyzed as separate fields: there is no interconnection whatsoever! That shows, mathematically, as a neat separation between (1) Maxwell’s first and second equation and (2) Maxwell’s third and fourth equation:

  1. Electrostatics: (i) ∇•E = ρ/ε0 and (ii) ×E = 0.
  2. Magnetostatics: (iii) c2∇×B = j0 and (iv) B = 0.

Electrostatics: The ρ in equation (i) is the so-called charge density, which describes the distribution of electric charges in space: ρ = ρ(x, y, z). To put it simply: ρ is the ‘amount of charge’ (which we’ll denote by Δq) per unit volume at a given point. As for ε0, that’s a constant which ensures all units are ‘compatible’. Equation (i) basically says we have some flux of E, the exact amount of which is determined by the charge density ρ or, more in general, by the charge distribution in space. As for equation (ii), i.e. ×E = 0, we can sort of forget about that. It means the curl of E is zero: everywhere, and always. So there’s no circulation of E. Hence, E is a so-called curl-free field, in this case at least, i.e. when only stationary charges and steady currents are involved.

Magnetostatics: The j in (iii) represents a steady current indeed, causing some circulation of B. The cfactor is related to the fact that magnetism is actually only a relativistic effect of electricity, but I can’t dwell on that here. I’ll just refer you to what Feynman writes about this in his Lectures, and warmly recommend to read it. Oh… Equation (iv), B = 0, means that the divergence of B is zero: everywhere, and always. So there’s no flux of B. None. So B is a divergence-free field.

Because of the neat separation, we’ll just forget about B and talk about E only.

The electric potential

OK. Let’s try to go through the motions as quickly as we can. As mentioned in my introduction, energy is defined in terms of work done. So we should just multiply the force and the distance, right? 1 Joule = 1 newton × 1 meter, right? Well… Yes and no. In discussions like this, we talk potential energy, i.e. energy stored in the system, so to say. That means that we’re looking at work done against the force, like when we carry a bucket of water up to the third floor or, to use a somewhat more scientific description of what’s going on, when we are separating two masses. Because we’re doing work against the force, we put a minus sign in front of our integral:

formula 1

Now, the electromagnetic force works pretty much like gravity, except that, when discussing gravity, we only have positive ‘charges’ (the mass of some object is always positive). In electromagnetics, we have positive as well as negative charge, and please note that two like charges repel (that’s not the case with gravity). Hence, doing work against the electromagnetic force may involve bringing like charges together or, alternatively, separating opposite charges. We can’t say. Fortunately, when it comes to the math of it, it doesn’t matter: we will have the same minus sign in front of our integral. The point is: we’re doing work against the force, and so that’s what the minus sign stands for. So it has nothing to do with the specifics of the law of attraction and repulsion in this case (electromagnetism as opposed to gravity) and/or the fact that electrons carry negative charge. No.

Let’s get back to the integral. Just in case you forgot, the integral sign ∫ stands for an S: the S of summa, i.e. sum in Latin, and we’re using these integrals because we’re adding an infinite number of infinitesimally small contributions to the total effort here indeed. You should recognize it, because it’s a general formula for energy or work. It is, once again, a so-called line integral, so it’s a bit different than the ∫f(x)dx stuff you learned from high school. Not very different, but different nevertheless. What’s different is that we have a vector dot product F•ds after the integral sign here, so that’s not like f(x)dx. In case you forgot, that f(x)dx product represents the surface of an infinitesimally rectangle, as shown below: we make the base of the rectangle smaller and smaller, so dx becomes an infinitesimal indeed. And then we add them all up and get the area under the curve. If f(x) is negative, then the contributions will be negative.


But so we don’t have little rectangles here. We have two vectors, F and ds, and their vector dot product, F•ds, which will give you… Well… I am tempted to write: the tangential component of the force along the path, but that’s not quite correct: if ds was a unit vector, it would be true—because then it’s just like that h•n product I introduced in our first vector calculus class. However, ds is not a unit vector: it’s an infinitesimal vector, and, hence, if we write the tangential component of the force along the path as Ft, then F•d= |F||ds|cosθ = F·cosθ·ds = Ft·ds. So this F•ds is a tangential component over an infinitesimally small segment of the curve. In short, it’s an infinitesimally small contribution to the total amount of work done indeed. You can make sense of this by looking at the geometrical representation of the situation below.

illustration 1

I am just saying this so you know what that integral stands for. Note that we’re not adding arrows once again, like we did when calculating amplitudes or so. It’s all much more straightforward really: a vector dot product is a scalar, so it’s just some real number—just like any component of a vector (tangential, normal, in the direction of one of the coordinates axes, or in whatever direction) is not a vector but a real number. Hence, W is also just some real number. It can be positive or negative because… Well… When we’d be going down the stairs with our bucket of water, our minus sign doesn’t disappear. Indeed, our convention to put that minus sign there should obviously not depend on what point a and b we’re talking about, so we may actually be going along the direction of the force when going from a to b.

As a matter of fact, you should note that’s actually the situation which is depicted above. So then we get a negative number for W. Does that make sense? Of course it does: we’re obviously not doing any work here as we’re moving along the direction, so we’re surely not adding any (potential) energy to the system. On the contrary, we’re taking energy out of the system. Hence, we are reducing its (potential) energy and, hence, we should have a negative value for W indeed. So, just think of the minus sign being there to ensure we add potential energy to the system when going against the force, and reducing it when going with the force.

OK. You get this. You probably also know we’ll re-define W as a difference in potential between two points, which we’ll write as Φ(b) – Φ(a). Now that should remind you of your high school integral ∫f(x)dx once again. For a definite integral over a line segment [a, b], you’d have to find the antiderivative of f(x), which you’d write as F(x), and then you’d take the difference F(b) – F(a) too. Now, you may or may not remember that this antiderivative was actually a family of functions F(x) + k, and k could be any constant – 5/9, 6π, 3.6×10124, 0.86, whatever! – because such constant vanishes when taking the derivative.

Here we have the same, we can define an infinite number of functions Φ(r) + k, of which the gradient will yield… Stop! I am going too fast here. First, we need to re-write that W function above in order to ensure we’re calculating stuff in terms of the unit charge, so we write:

unit chage

Huh? Well… Yes. I am using the definition of the field E here really: E is the force (F) when putting a unit charge in the field. Hence, if we want the work done per unit charge, i.e. W(unit), then we have to integrate the vector dot product E·ds over the path from a to b. But so now you see what I want to do. It makes the comparison with our high school integral complete. Instead of taking a derivative in regard to one variable only, i.e. dF(x)/dx) = f(x), we have a function Φ here not in one but in three variables: Φ = Φ(x, y, z) = Φ(r) and, therefore, we have to take the vector derivative (or gradient as it’s called) of Φ to get E:

Φ(x, y, z) = (∂Φ/∂x, ∂Φ/∂y, ∂Φ/∂z) = –E(x, y, z)

But so it’s the same principle as what you learned how to use to solve your high school integral. Now, you’ll usually see the expression above written as:

E = –Φ

Why so short? Well… We all just love these mysterious abbreviations, don’t we? :-) Jokes aside, it’s true some of those vector equations pack an awful lot of information. Just take Feynman’s advice here: “If it helps to write out the components to be sure you understand what’s going on, just do it. There is nothing inelegant about that. In fact, there is often a certain cleverness in doing just that.” So… Let’s move on.

I should mention that we can only apply this more sophisticated version of the ‘high school trick’ because Φ and E are like temperature (T) and heat flow (h): they are fields. T is a scalar field and h is a vector field, and so that’s why we can and should apply our new trick: if we have the scalar field, we can derive the vector field. In case you want more details, I’ll just refer you to our first vector calculus class. Indeed, our so-called First Theorem in vector calculus was just about the more sophisticated version of the ‘high school trick': if we have some scalar field ψ (like temperature or potential, for example: just substitute the ψ in the equation below for T or Φ), then we’ll always find that:

First theorem

The Γ here is the curve between point 1 and 2, so that’s the path along which we’re going, and ψ must represent some vector field.

Let’s go back to our W integral. I should mention that it doesn’t matter what path we take: we’ll always get the same value for W, regardless of what path we take. That’s why the illustration above showed two possible paths: it doesn’t matter which one we take. Again, that’s only because E is a vector field. To be precise, the electrostatic field is a so-called conservative vector field, which means that we can’t get energy out of the field by first carrying some charge along one path, and then carrying it back along another. You’ll probably find that’s obvious,  and it is. Just note it somewhere in the back of your mind.

So we’re done. We should just substitute E for Φ, shouldn’t we? Well… Yes. For minus Φ, that is. Another minus sign. Why? Well… It makes that W(unit) integral come out alright. Indeed, we want a formula like W = Φ(b) – Φ(a), not like Φ(a) – Φ(b). Look at it. We could, indeed, define E as the (positive) gradient of some scalar field ψ = –Φ, and so we could write E = ψ, but then we’d find that W = –[ψ(b) – ψ(a)] = ψ(a) – ψ(b).

You’ll say: so what? Well… Nothing much. It’s just that our field vectors would point from lower to higher values of ψ, so they would be flowing uphill, so to say. Now, we don’t want that in physics. Why? It just doesn’t look good. We want our field vectors to be directed from higher potential to lower potential, always. Just think of it: heat (h) flows from higher temperature (T) to lower, and Newton’s apple falls from greater to lower height. Likewise, when putting a unit charge in the field, we want to see it move from higher to lower electric potential. Now, we can’t change the direction of E, because that’s the direction of the force and Nature doesn’t care about our conventions and so we can’t choose the direction of the force. But we can choose our convention. So that’s why we put a minus sign in front of Φ when writing E = –Φ. It makes everything come out alright. :-) That’s why we also have a minus sign in the differential heat flow equation: h = –κT.

So now we have the easy W(unit) = Φ(b) – Φ(a) formula that we wanted all along. Now, note that, when we say a unit charge, we mean a plus one charge. Yes: +1. So that’s the charge of the proton (it’s denoted by e) so you should stop thinking about moving electrons around! [I am saying this because I used to confuse myself by doing that. You end up with the same formulas for W and Φ but it just takes you longer to get there, so let me save you some time here. :-)]

But… Yes? In reality, it’s electrons going through a wire, isn’t? Not protons. Yes. But it doesn’t matter. Units are units in physics, and they’re always +1, for whatever (time, distance, charge, mass, spin, etcetera). AlwaysFor whatever. Also note that in laboratory experiments, or particle accelerators, we often use protons instead of electrons, so there’s nothing weird about it. Finally, and most fundamentally, if we have a –e charge moving through a neutral wire in one direction, then that’s exactly the same as a +e charge moving in the other way.

Just to make sure you get the point, let’s look at that illustration once again. We already said that we have F and, hence, E pointing from a to b and we’ll be reducing the potential energy of the system when moving our unit charge from a to b, so W was some negative value. Now, taking into account we want field lines to point from higher to lower potential, Φ(a) should be larger than Φ(b), and so… Well.. Yes. It all makes sense: we have a negative difference Φ(b) – Φ(a) = W(unit), which amounts, of course, to the reduction in potential energy.

The last thing we need to take care of now, is the reference point. Indeed, any Φ(r) + k function will do, so which one do we take? The approach here is to take a reference point Pat infinity. What’s infinity? Well… Hard to say. It’s a place that’s very far away from all of the charges we’ve got lying around here. Very far away indeed. So far away we can say there is nothing there really. No charges whatsoever. :-) Something like that. :-) In any case. I need to move on. So Φ(P0) is zero and so we can finally jot down the grand result for the electric potential Φ(P) (aka as the electrostatic or electric field potential):


So now we can calculate all potentials, i.e. when we know where the charges are at least. I’ve shown an example below. As you can see, besides having zero potential at infinity, we will usually also have one or more equipotential surfaces with zero potential. One could say these zero potential lines sort of ‘separate’ the positive and negative space. That’s not a very scientifically accurate description but you know what I mean.


Let me make a few final notes about the units. First, let me, once again, note that our unit charge is plus one, and it will flow from positive to negative potential indeed, as shown below, even if we know that, in an actual electric circuit, and so now I am talking about a copper wire or something similar, that means the (free) electrons will move in the other direction.

1280px-Current_notationIf you’re smart (and you are), you’ll say: what about the right-hand rule for the magnetic force? Well… We’re not discussing the magnetic force here but, because you insist, rest assured it comes out alright. Look at the illustration below of the magnetic force on a wire with a current, which is a pretty standard one.

terminalSo we have a given B, because of the bar magnet, and then v, the velocity vector for the… Electrons? No. You need to be consistent. It’s the velocity vector for the unit charges, which are positive (+e). Now just calculate the force F = qv×B = ev×B using the right-hand rule for the vector cross product, as illustrated below. So v is the thumb and B is the index finger in this case. All you need to do is tilt your hand, and it comes out alright.


But… We know it’s electrons going the other way. Well… If you insist. But then you have to put a minus sign in front of the q, because we’re talking minus e (–e). So now v is in the other direction and so v×B is in the other direction indeed, but our force F = qv×B = –ev×is not. Fortunately not, because physical reality should not depend on our conventions. :-) So… What’s the conclusion. Nothing. You may or may not want to remember that, when we say that our current j current flows in this or that direction, we actually might be talking electrons (with charge minus one) flowing in the opposite direction, but then it doesn’t matter. In addition, as mentioned above, in laboratory experiments or accelerators, we may actually be talking protons instead of electrons, so don’t assume electromagnetism is the business of electrons only.

To conclude this disproportionately long introduction (we’re finally ready to talk more difficult stuff), I should just make a note on the units. Electric potential is measured in volts, as you know. However, it’s obvious from all that I wrote above that it’s the difference in potential that matters really. From the definition above, it should be measured in the same unit as our unit for energy, or for work, so that’s the joule. To be precise, it should be measured in joule per unit charge. But here we have one of the very few inconsistencies in physics when it comes to units. The proton is said to be the unit charge (e), but its actual value is measured in coulomb (C). To be precise: +1 e = 1.602176565(35)×10−19 C. So we do not measure voltage – sorry, potential difference :-) – in joule but in joule per coulomb (J/C).

Now, we usually use another term for the joule/coulomb unit. You guessed it (because I said it): it’s the volt (V). One volt is one joule/coulomb: 1 V = 1 J/C. That’s not fair, you’ll say. You’re right, but so the proton charge e is not a so-called SI unit. Is the Coulomb an SI unit? Yes. It’s derived from the ampere (A) which, believe it or not, is actually an SI base unit. One ampere is 6.241×1018 electrons (i.e. one coulomb) per second. You may wonder how the ampere (or the coulomb) can be a base unit. Can they be expressed in terms of kilogram, meter and second, like all other base units. The answer is yes but, as you can imagine, it’s a bit of a complex description and so I’ll refer you to the Web for that.

The Poisson equation

I started this post by saying that I’d talk about fields and present some results from electrostatics using our ‘new’ vector differential operators, so it’s about time I do that. The first equation is a simple one. Using our E = –Φ formula, we can re-write the ∇•E = ρ/ε0 equation as:

∇•E = ∇•∇Φ = ∇2Φ = –ρ/ε0

This is a so-called Poisson equation. The ∇2 operator is referred to as the Laplacian and is sometimes also written as Δ, but I don’t like that because it’s also the symbol for the total differential, and that’s definitely not the same thing. The formula for the Laplacian is given below. Note that it acts on a scalar field (i.e. the potential function Φ in this case).

LaplacianAs Feynman notes: “The entire subject of electrostatics is merely the study of the solutions of this one equation.” However, I should note that this doesn’t prevent Feynman from devoting at least a dozen of his Lectures on it, and they’re not the easiest ones to read. [In case you’d doubt this statement, just have a look at his lecture on electric dipoles, for example.] In short: don’t think the ‘study of this one equation’ is easy. All I’ll do is just note some of the most fundamental results of this ‘study’.

Also note that ∇•E is one of our ‘new’ vector differential operators indeed: it’s the vector dot product of our del operator () with E. That’s something very different than, let’s say, Φ. A little dot and some bold-face type make an enormous difference here. :-) You may or may remember that we referred to the ∇• operator as the divergence (div) operator (see my post on that).

Gauss’ Law

Gauss’ Law is not to be confused with Gauss’ Theorem, about which I wrote elsewhere. It gives the flux of E through a closed surface S, any closed surface S really, as the sum of all charges inside the surface divided by the electric constant ε(but then you know that constant is just there to make the units come out alright).

Gauss' Law

The derivation of Gauss’ Law is a bit lengthy, which is why I won’t reproduce it here, but you should note its derivation is based, mainly, on the fact that (a) surface areas are proportional to r2 (so if we double the distance from the source, the surface area will quadruple), and (b) the magnitude of E is given by an inverse-square law, so it decreases as 1/r2. That explains why, if the surface S describes a sphere, the number we get from Gauss’ Law is independent of the radius of the sphere. The diagram below (credit goes to Wikipedia) illustrates the idea.


The diagram can be used to show how a field and its flux can be represented. Indeed, the lines represent the flux of E emanating from a charge. Now, the total number of flux lines depends on the charge but is constant with increasing distance because the force is radial and spherically symmetric. A greater density of flux lines (lines per unit area) means a stronger field, with the density of flux lines (i.e. the magnitude of E) following an inverse-square law indeed, because the surface area of a sphere increases with the square of the radius. Hence, in Gauss’ Law, the two effect cancel out: the two factors vary with distance, but their product is a constant.

Now, if we describe the location of charges in terms of charge densities (ρ), then we can write Qint as:

Q int

Now, Gauss’ Law also applies to an infinitesimal cubical surface and, in one of my posts on vector calculus, I showed that the flux of E out of such cube is given by E·dV. At this point, it’s probably a good idea to remind you of what this ‘new’ vector differential operator •, i.e. our ‘divergence’ operator, stands for: the divergence of E (i.e. • applied to E, so that’s E) represents the volume density of the flux of E out of an infinitesimal volume around a given point. Hence, it’s the flux per unit volume, as opposed to the flux out of the infinitesimal cube itself, which is the product of and dV, i.e. E·dV.

So what? Well… Gauss’ Law applied to our infinitesimal volume gives us the following equality:

ES 1

That, in turn, simplifies to:

ES 2

So that’s Maxwell’s first equation once again, which is equivalent to our Poisson equation: E = ∇2Φ = –ρ/ε0. So what are we doing here? Just listing equivalent formulas? Yes. I should also note they can be derived from Coulomb’s law of force, which is probably the one you learned in high school. So… Yes. It’s all consistent. But then that’s what we should expect, of course. :-)

The energy in a field

All these formulas look very abstract. It’s about time we use them for something. A lot of what’s written in Feynman’s Lectures on electrostatics is applied stuff indeed: it focuses, among other things, on calculating the potential in various circumstances and for various distributions of charge. Now, funnily enough, while that E = –ρ/ε0 equation is equivalent to Coulomb’s law and, obviously, much more compact to write down, Coulomb’s law is easier to start with for basic calculations. Let me first write Coulomb’s law. You’ll probably recognize it from your high school days:

Coulomb's law

Fis the force on charge q1, and Fis the force on charge q2. Now, qand q2. may attract or repel each other but, in both cases, the forces will be equal and opposite. [In case you wonder, yes, that’s basically the law of action and reaction.] The e12 vector is the unit vector from qto q1, not from qto q2, as one might expect. That’s because we’re not talking gravity here: like charges do not attract but repel and, hence, we have to switch the order here. Having said that, that’s basically the only peculiar thing about the equation. All the rest is standard:

  1. The force is inversely proportional to the square of the distance and so we have an inverse-square law here indeed.
  2. The force is proportional to the charge(s).
  3. Finally, we have a proportionality constant, 1/4πε0, which makes the units come out alright. You may wonder why it’s written the way it’s written, i.e. with that 4π factor, but that factor (4π or 2π) actually disappears in a number of calculations, so then we will be left with just a 1/ε0 or a 1/2ε0 factor. So don’t worry about it.

We want to calculate potentials and all that, so the first thing we’ll do is calculate the force on a unit charge. So we’ll divide that equation by q1, to calculate E(1) = F1/q1:

E 1

Piece of cake. But… What’s E(1) really? Well… It’s the force on the unit charge (+e), but so it doesn’t matter whether or not that unit charge is actually there, so it’s the field E caused by a charge q2. [If that doesn’t make sense to you, think again.] So we can drop the subscripts and just write:

E 3

What a relief, isn’t it? The simplest formula ever: the (magnitude) of the field as a simple function of the charge q and its distance (r) from the point that we’re looking at, which we’ll write as P = (x, y, z). But what origin are we using to measure x, y and z. Don’t be surprised: the origin is q.

Now that’s a formula we can use in the Φ(P) integral. Indeed, the antiderivative is ∫(q/4πε0r2)dr. Now, we can bring q/4πε0 out and so we’re left with ∫(1/r2)dr. Now ∫(1/r2)dr is equal to –1/r + k, and so the whole antiderivative is –q/4πε0r + k. However, the minus sign cancels out with the minus sign in front of the Φ(P) = Φ(x, y, z)  integral, and so we get:

E 4

You should just do the integral to check this result. It’s the same integral but with P0 (infinity) as point a and P as point b in the integral, so we have ∞ as start value and r as end value. The integral then yields Φ(P) – Φ(P0) = –q/4πε0[1/r – 1/∞). [The k constant falls away when subtracting Φ(P0) from Φ(P).] But 1/∞ = 0, and we had a minus sign in front of the integral, which cancels the sign of –q/4πε0. So, yes, we get the wonderfully simple result above. Also please do quickly check if it makes sense in terms of sign: the unit charge is +e, so that’s a positive charge. Hence, Φ(x, y, z) will be positive if the sign of q is also positive, but negative if q would happen to be negative. So that’s OK.

Also note that the potential – which, remember, represents the amount of work to be done when bringing a unit charge (e) from infinity to some distance r from a charge q – is proportional to the charge of q. We also know that the force and, hence, the work is proportional to the charge that we are bringing in (that’s how we calculated the work per unit in the first place: by dividing the total amount of work by the charge). Hence, if we’d not bring some unit charge but some other charge q2, the work done would also be proportional to q2. Now, we need to make sure we understand what we’re writing and so let’s tidy up and re-label our first charge once again as q1, and the distance r as r12, because that’s what r is: the distance between the two charges. We then have another obvious but nice result: the work done in bringing two charges together from a large distance (infinity) is

U 1Now, one of the many nice properties of fields (scalar or vector fields) and the associated energies (because that’s what we are talking about here) is that we can simply add up contributions. For example, if we’d have many charges and we’d want to calculate the potential Φ at a point which we call 1, we can use the same Φ(r) = q/4πε0r formula which we had derived for one charge only, for all charges, and then we simply add the contributions of each to get the total potential:

P 1

Now that we’re here, I should, of course, also give the continuum version of this formula, i.e. the formula used when we’re talking charge densities rather than individual charges. The sum then becomes an infinite sum (i.e. an integral), and qj (note that j goes from 2 to n) becomes a variable which we write as ρ(2). We get:

U 2

Going back to the discrete situation, we get the same type of sum when bringing multiple pairs of charges qi and qj together. Hence, the total electrostatic energy U is the sum of the energies of all possible pairs of charges:

U 3It’s been a while since you’ve seen any diagram or so, so let me insert one just to reassure you it’s as simple as that indeed:

U system

Now, we have to be aware of the risk of double-counting, of course. We should not be adding qiqj/4πε0rij twice. That’s why we write ‘all pairs’ under the ∑ summation sign, instead of the usual i, j subscripts. The continuum version of this equation below makes that 1/2 factor explicit:

U 4

Hmm… What kind of integral is that? It’s a so-called double integral because we have two variables here. Not easy. However, there’s a lucky break. We can use the continuum version of our formula for Φ(1) to get rid of the ρ(2) and dV2 variables and reduce the whole thing to a more standard ‘single’ integral. Indeed, we can write:

U 5Now, because our point (2) no longer appears, we can actually write that more elegantly as:

U 6That looks nice, doesn’t it? But do we understand it? Just to make sure. Let me explain it. The potential energy of the charge ρdV is the product of this charge and the potential at the same point. The total energy is therefore the integral over ϕρdV, but then we are counting energies twice, so that’s why we need the 1/2 factor. Now, we can write this even more beautifully as:

U 7

Isn’t this wonderful? We have an expression for the energy of a field, not in terms of the charges or the charge distribution, but in terms of the field they produce.

I am pretty sure that, by now, you must be suffering from ‘formula overload’, so you probably are just gazing at this without even bothering to try to understand. Too bad, and you should take a break then or just go do something else, like biking or so. :-)

First, you should note that you know this EE expression already: EE is just the square of the magnitude of the field vector E, so EE = E2. That makes sense because we know, from what we know about waves, that the energy is always proportional to the square of an amplitude, and so we’re just writing the same here but with a little proportionality constant (ε0).

OK, you’ll say. But you probably still wonder what use this formula could possibly have. What is that number we get from some integration over all space? So we associate the Universe with some number and then what? Well… Isn’t that just nice? :-) Jokes aside, we’re actually looking at that EE = Eproduct inside of the integral as representing an energy density (i.e. the energy per unit volume). We’ll denote that with a lower-case symbol and so we write:

D 6

Just to make sure you ‘get’ what we’re talking about here: u is the energy density in the little cube dV in the rather simplistic (and, therefore, extremely useful) illustration below (which, just like most of what I write above, I got from Feynman).


Now that should make sense to you—I hope. :-) In any case, if you’re still with me, and if you’re not all formula-ed out you may wonder how we get that ε0EE = ε0E2 expression from that ρΦ expression. Of course, you know that E = –∇Φ, and we also have the Poisson equation ∇2Φ = –ρ/ε0, but that doesn’t get you very far. It’s one of those examples where an easy-looking formula requires a lot of gymnastics. However, as the objective of this post is to do some of that, let me take you through the derivation.

Let’s do something with that Poisson equation first, so we’ll re-write it as ρ = –ε02Φ, and then we can substitute ρ in the integral with the ρΦ product. So we get:

U 8

Now, you should check out those fancy formulas with our new vector differential operators which we listed in our second class on vector calculus, but, unfortunately, none of them apply. So we have to write it all out and see what we get:

D 1

Now that looks horrendous and so you’ll surely think we won’t get anywhere with that. Well… Physicists don’t despair as easily as we do, it seems, and so they do substitute it in the integral which, of course, becomes an even more monstrous expression, because we now have two volume integrals instead of one! Indeed, we get:

D 2But if Φ is a vector field (it’s minus E, remember!), then ΦΦ is a vector field too, and we can then apply Gauss’ Theorem, which we mentioned in our first class on vector calculus, and which – mind you! – has nothing to do with Gauss’ Law. Indeed, Gauss produced so much it’s difficult to keep track of it all. :-) So let me remind you of this theorem. [I should also show why ΦΦ still yields a field, but I’ll assume you believe me.] Gauss’ Theorem basically shows how we can go from a volume integral to a surface integral:

Gauss Theorem-2If we apply this to the second integral in our U expression, we get:

D 4

So what? Where are we going with this? Relax. Be patient. What volume and surface are we talking about here? To make sure we have all charges and influences, we should integrate over all space and, hence, the surface goes to infinity. So we’re talking a (spherical) surface of enormous radius R whose center is the origin of our coordinate system. I know that sounds ridiculous but, from a math point of view, it is just the same like bringing a charge in from infinity, which is what we did to calculate the potential. So if we don’t difficulty with infinite line integrals, we should not have difficulty with infinite surface and infinite volumes. That’s all I can, so… Well… Let’s do it.

Let’s look at that product ΦΦ•n in the surface integral. Φ is a scalar and Φ is a vector, and so… Well… Φ•is a scalar too: it’s the normal component of Φ = –E. [Just to make sure, you should note that the way we define the normal unit vector n is such that ∇Φ•n is some positive number indeed! So n will point in the same direction, more or less, as ∇Φ = –E. So the θ angle  between ∇Φ = –E and n is surely less than ± 90° and, hence, the cosine factor in the ∇Φ•= |∇Φ||n|cosθ = |∇Φ|cosθ is positive, and so the whole vector dot product is positive.]

So, we have a product of two scalars here.  What happens with them if R goes to infinity? Well… The potential varies as 1/r as we’re going to infinity. That’s obvious from that Φ = (q/4πε0)(1/r) formula: just think of q as some kind of average now, which works because we assume all charges are located within some finite distance, while we’re going to infinity. What about Φ•n? Well… Again assuming that we’re reasonably far away from the charges, we’re talking the density of flux lines here (i.e. the magnitude of E) which, as shown above, follows an inverse-square law, because the surface area of a sphere increases with the square of the radius. So Φ•n varies not as 1/r but as 1/r2. To make a long story short, the whole product ΦΦ•n falls of as 1/r goes to infinity. Now, we shouldn’t forget we’re integrating a surface integral here, with r = R, and so it’s R going to infinity. So that surface integral has to go to zero when we include all space. The volume integral still stands however, so our formula for U now consists of one term only, i.e. the volume integral, and so we now have:

D 5

Done !

What’s left?

In electrostatics? Lots. Electric dipoles (like polar molecules), electrolytes, plasma oscillations, ionic crystals, electricity in the atmosphere (like lightning!), dielectrics and polarization (including condensers), ferroelectricity,… As soon as we try to apply our theory to matter, things become hugely complicated. But the theory works. Fortunately! :-) I have to refer you to textbooks, though, in case you’d want to know more about it. [I am sure you don’t, but then one never knows.]

What I wanted to do is to give you some feel for those vector and field equations in the electrostatic case. We now need to bring magnetic field back into the picture and, most importantly, move to electrodynamics, in which the electric and magnetic field do not appear as completely separate things. No! In electrodynamics, they are fully interconnected through the time derivatives ∂E/∂t and ∂B/∂t. That shows they’re part and parcel of the same thing really: electromagnetism. 

But we’ll try to tackle that in future posts. Goodbye for now!

Fields and charges (I)

The wave-particle duality revisited

As an economist, having some knowledge of what’s around in my field (social science), I think I am well-placed to say that physics is not an easy science. Its ‘first principles’ are complicated, and I am not ashamed to say that, after more than a year of study now, I haven’t reached what I would call a ‘true understanding’ of it.

Sometimes, the teachers are to be blamed. For example, I just found out that, in regard to the question of the wave function of a photon, the answer of two nuclear scientists was plain wrong. Photons do have a de Broglie wave, and there is a fair amount of research and actual experimenting going on trying to measure it. One scientific article which I liked in particular, and I hope to fully understand a year from now or so, is on such ‘direct measurement of the (quantum) wavefunction‘. For me, it drove home the message that these idealized ‘thought experiments’ that are supposed to make autodidacts like me understand things better, are surely instructive in regard to the key point, but confusing in other respects.

A typical example of such idealized thought experiment is the double-slit experiment with ‘special detectors’ near the slits, which may or may not detect a photon, depending on whether or not they’re switched on as well as on their accuracy. Depending on whether or not the detectors are switched on, and their accuracy, we get full interference (a), no interference (b), or a mixture of (a) and (b), as shown in (c) and (d).

set-up photons double-slit photons - results

I took the illustrations from Feynman’s lovely little book, QED – The Strange Theory of Light and Matter, and he surely knows what he’s talking about. Having said that, the set-up raises a key question in regard to these detectors: how do they work, exactly? More importantly, how do they disturb the photons?

I googled for actual double-slit experiments with such ‘special detectors’ near the slits, but only found such experiments for electrons. One of these, a 2010 experiment of an Italian team, suggests that it’s the interaction between the detector and the electron wave that may cause the interference pattern to disappear. The idea is shown below. The electron is depicted as an incoming plane wave, which breaks up as it goes through the slits. The slit on the left has no ‘filter’ (which you may think of as a detector) and, hence, the plane wave goes through as a cylindrical wave. The slit on the right-hand side is covered by a ‘filter’ made of several layers of ‘low atomic number material’, so the electron goes through but, at the same time, the barrier creates a spherical wave as it goes through. The researchers note that “the spherical and cylindrical wave do not have any phase correlation, and so even if an electron passed through both slits, the two different waves that come out cannot create an interference pattern on the wall behind them.” [Needless to say, while being represented as ‘real’ waves here, the ‘waves’ are, in fact, complex-valued psi functions.]

double-slit experiment

In fact, to be precise, there actually still was an interference effect if the filter was thin enough. Let me quote the reason for that: “The thicker the filter, the greater the probability for inelastic scattering. When the electron suffers inelastic scattering, it is localized. This means that its wavefunction collapses and, after the measurement act, it propagates roughly as a spherical wave from the region of interaction, with no phase relation at all with other elastically or inelastically scattered electrons. If the filter is made thick enough, the interference effects cancels out almost completely.”

This, of course, doesn’t solve the mystery. The mystery, in such experiments, is that, when we put detectors, it is either the detector at A or the detector at B that goes off. They should never go off together—”at half strength, perhaps?”, as Feynman puts it. That’s why I used italics when writing “even if an electron passed through both slits.” The electron, or the photon in a similar set-up, is not supposed to do that. As mentioned above, the wavefunction collapses or reduces. Now that’s where these so-called ‘weak measurement’ experiments come in: they indicate the interaction doesn’t have to be that way. It’s not all or nothing: our observations should not necessarily destroy the wavefunction. So, who knows, perhaps we will be able, one day, to show that the wavefunction does go through both slits, as it should (otherwise the interference pattern cannot be explained), and then we will have resolved the paradox.

I am pretty sure that, when that’s done, physicists will also be able to relate the image of a photon as a transient electromagnetic wave (first diagram below), being emitted by an atomic oscillator for a few nanoseconds only (we gave the example for sodium light, for which the decay time was 3.2×10–8 seconds) with the image of a photon as a de Broglie wave (second diagram below). I look forward to that day. I think it will come soon.

Photon wavePhoton wave

The wave-particle duality revisited


In the previous posts, I showed how the ‘real-world’ properties of photons and electrons emerge out of very simple mathematical notions and shapes. The basic notions are time and space. The shape is the wavefunction.

Let’s recall the story once again. Space is an infinite number of three-dimensional points (x, y, z), and time is a stopwatch hand going round and round—a cyclical thing. All points in space are connected by an infinite number of paths – straight or crooked, whatever  – of which we measure the length. And then we have ‘photons’ that move from A to B, but so we don’t know what is actually moving in space here. We just associate each and every possible path (in spacetime) between A and B with an amplitude: an ‘arrow‘ whose length and direction depends on (1) the length of the path l (i.e. the ‘distance’ in space measured along the path, be it straight or crooked), and (2) the difference in time between the departure (at point A) and the arrival (at point B) of our photon (i.e. the ‘distance in time’ as measured by that stopwatch hand).

Now, in quantum theory, anything is possible and, hence, not only do we allow for crooked paths, but we also allow for the difference in time to differ from l/c. Hence, our photon may actually travel slower or faster than the speed of light c! There is one lucky break, however, that makes all come out alright: the arrows associated with the odd paths and strange timings cancel each other out. Hence, what remains, are the nearby paths in spacetime only—the ‘light-like’ intervals only: a small core of space which our photon effectively uses as it travels through empty space. And when it encounters an obstacle, like a sheet of glass, it may or may not interact with the other elementary particle–the electron. And then we multiply and add the arrows – or amplitudes as we call them – to arrive at a final arrow, whose square is what physicists want to find, i.e. the likelihood of the event that we are analyzing (such a photon going from point A to B, in empty space, through two slits, or through as sheet of glass, for example) effectively happening.

The combining of arrows leads to diffraction, refraction or – to use the more general description of what’s going on – interference patterns:

  1. Adding two identical arrows that are ‘lined up’ yields a final arrow with twice the length of either arrow alone and, hence, a square (i.e. a probability) that is four times as large. This is referred to as ‘positive’ or ‘constructive’ interference.
  2. Two arrows of the same length but with opposite direction cancel each other out and, hence, yield zero: that’s ‘negative’ or ‘destructive’ interference.

Both photons and electrons are represented by wavefunctions, whose argument is the position in space (x, y, z) and time (t), and whose value is an amplitude or ‘arrow’ indeed, with a specific direction and length. But here we get a bifurcation. When photons interact with other, their wavefunctions interact just like amplitudes: we simply add them. However, when electrons interact with each other, we have to apply a different rule: we’ll take a difference. Indeed, anything is possible in quantum mechanics and so we combine arrows (or amplitudes, or wavefunctions) in two different ways: we can either add them or, as shown below, subtract one from the other.

vector addition

There are actually four distinct logical possibilities, because we may also change the order of A and B in the operation, but when calculating probabilities, all we need is the square of the final arrow, so we’re interested in its final length only, not in its direction (unless we want to use that arrow in yet another calculation). And so… Well… The fundamental duality in Nature between light and matter is based on this dichotomy only: identical (elementary) particles behave in one of two ways: their wavefunctions interfere either constructively or destructively, and that’s what distinguishes bosons (i.e. force-carrying particles, such as photons) from fermions (i.e. matter-particles, such as electrons). The mathematical description is complete and respects Occam’s Razor. There is no redundancy. One cannot further simplify: every logical possibility in the mathematical description reflects a physical possibility in the real world.

Having said that, there is more to an electron than just Fermi-Dirac statistics, of course. What about its charge, and this weird number, its spin?,

Well… That’s what’s this post is about. As Feynman puts it: “So far we have been considering only spin-zero electrons and photons, fake electrons and fake photons.”

I wouldn’t call them ‘fake’, because they do behave like real photons and electrons already but… Yes. We can make them more ‘real’ by including charge and spin in the discussion. Let’s go for it.

Charge and spin

From what I wrote above, it’s clear that the dichotomy between bosons and fermions (i.e. between ‘matter-particles’ and ‘force-carriers’ or, to put it simply, between light and matter) is not based on the (electric) charge. It’s true we cannot pile atoms or molecules on top of each other because of the repulsive forces between the electron clouds—but it’s not impossible, as nuclear fusion proves: nuclear fusion is possible because the electrostatic repulsive force can be overcome, and then the nuclear force is much stronger (and, remember, no quarks are being destroyed or created: all nuclear energy that’s being released or used is nuclear binding energy).

It’s also true that the force-carriers we know best, notably photons and gluons, do not carry any (electric) charge, as shown in the table below. So that’s another reason why we might, mistakenly, think that charge somehow defines matter-particles. However, we can see that matter-particles, first carry very different charges (positive or negative, and with very different values: 1/3, 2/3 or 1), and even be neutral, like the neutrinos. So, if there’s a relation, it’s very complex. In addition, one of the two force-carrier for the weak force, the W boson, can have positive or negative charge too, so that doesn’t make sense, does it? [I admit the weak force is a bit of a ‘special’ case, and so I should leave it out of the analysis.] The point is: the electric charge is what it is, but it’s not what defines matter. It’s just one of the possible charges that matter-particles can carry. [The other charge, as you know, is the color charge but, to confuse the picture once again, that’s a charge that can also be carried by gluons, i.e. the carriers of the strong force.]

Standard_Model_of_Elementary_ParticlesSo what is it, then? Well… From the table above, you can see that the property of ‘spin’ (i.e. the third number in the top left-hand corner) matches the above-mentioned dichotomy in behavior, i.e. the two different types of interference (bosons versus fermions or, to use a heavier term, Bose-Einstein statistics versus Fermi-Dirac statistics): all matter-particles are so-called spin-1/2 particles, while all force-carriers (gauge bosons) all have spin one. [Never mind the Higgs particle: that’s ‘just’ a mechanism to give (most) elementary particles some mass.]

So why is that? Why are matter-particles spin-1/2 particles and force-carries spin-1 particles? To answer that question, we need to answer the question: what’s this spin number? And to answer that question, we first need to answer the question: what’s spin?

Spin in the classical world

In the classical world, it’s, quite simply, the momentum associated with a spinning or rotating object, which is referred to as the angular momentum. We’ve analyzed the math involved in another post, and so I won’t dwell on that here, but you should note that, in classical mechanics, we distinguish two types of angular momentum:

  1. Orbital angular momentum: that’s the angular momentum an object gets from circling in an orbit, like the Earth around the Sun.
  2. Spin angular momentum: that’s the angular momentum an object gets from spinning around its own axis., just like the Earth, in addition to rotating around the Sun, is rotating around its own axis (which is what causes day and night, as you know).

The math involved in both is pretty similar, but it’s still useful to distinguish the two, if only because we’ll distinguish them in quantum mechanics too! Indeed, when I analyzed the math in the above-mentioned post, I showed how we represent angular momentum by a vector that’s perpendicular to the direction of rotation, with its direction given by the ubiquitous right-hand rule—as in the illustration below, which shows both the angular momentum (L) as well as the torque (τ) that’s produced by a rotating mass. The formulas are given too: the angular momentum L is the vector cross product of the position vector r and the linear momentum p, while the magnitude of the torque τ is given by the product of the length of the lever arm and the applied force. An alternative approach is to define the angular velocity ω and the moment of inertia I, and we get the same result: L = Iω. 


Of course, the illustration above shows orbital angular momentum only and, as you know, we no longer have a ‘planetary model’ (aka the Rutherford model) of an atom. So should we be looking at spin angular momentum only?

Well… Yes and no. More yes than no, actually. But it’s ambiguous. In addition, the analogy between the concept of spin in quantum mechanics, and the concept of spin in classical mechanics, is somewhat less than straightforward. Well… It’s not straightforward at all actually. But let’s get on with it and use more precise language. Let’s first explore it for light, not because it’s easier (it isn’t) but… Well… Just because. :-)

The spin of a photon

I talked about the polarization of light in previous posts (see, for example, my post on vector analysis): when we analyze light as a traveling electromagnetic wave (so we’re still in the classical analysis here, not talking about photons as ‘light particles’), we know that the electric field vector oscillates up and down and is, in fact, likely to rotate in the xy-plane (with z being the direction of propagation). The illustration below shows the idealized (aka limiting) case of perfectly circular polarization: if there’s polarization, it is more likely to be elliptical. The other limiting case is plane polarization: in that case, the electric field vector just goes up and down in one direction only. [In case you wonder whether ‘real’ light is polarized, it often is: there’s an easy article on that on the Physics Classroom site.]

spin angular momentumThe illustration above uses Dirac’s bra-ket notation |L〉 and |R〉 to distinguish the two possible ‘states’, which are left- or right-handed polarization respectively. In case you forgot about bra-ket notations, let me quickly remind you: an amplitude is usually denoted by 〈x|s〉, in which 〈x| is the so-called ‘bra’, i.e. the final condition, and |s〉 is the so-called ‘ket’, i.e. the starting condition, so 〈x|s〉 could mean: a photon leaves at s (from source) and arrives at x. It doesn’t matter much here. We could have used any notation, as we’re just describing some state, which is either |L〉 (left-handed polarization) or |R〉 (right-handed polarization). The more intriguing extras in the illustration above, besides the formulas, are the values: ± ħ = ±h/2π. So that’s plus or minus the (reduced) Planck constant which, as you know, is a very tiny constant. I’ll come back to that. So what exactly is being represented here?

At first, you’ll agree it looks very much like the momentum of light (p) which, in a previous post, we calculated from the (average) energy (E) as p = E/c. Now, we know that E is related to the (angular) frequency of the light through the Planck-Einstein relation E = hν = ħω. Now, ω is the speed of light (c) times the wave number (k), so we can write: p = ħω = ħck/c = ħk. The wave number is the ‘spatial frequency’, expressed either in cycles per unit distance (1/λ) or, more usually, in radians per unit distance (k = 2π/λ), so we can also write p = ħk = h/λ. Whatever way we write it, we find that this momentum (p) depends on the energy and/or, what amounts to saying the same, the frequency and/or the wavelength of the light.

So… Well… The momentum of light is not just h or ħ, i.e. what’s written in that illustration above. So it must be something different. In addition, I should remind you this momentum was calculated from the magnetic field vector, as shown below (for more details, see my post on vector calculus once again), so it had nothing to do with polarization really.

radiation pressure

Finally, last but not least, the dimensions of ħ and p = h/λ are also different (when one is confused, it’s always good to do a dimensional analysis in physics):

  1. The dimension of Planck’s constant (both h as well as ħ = h/2π) is energy multiplied by time (J·s or eV·s) or, equivalently, momentum multiplied by distance. It’s referred to as the dimension of action in physics, and h is effectively, the so-called quantum of action.
  2. The dimension of (linear) momentum is… Well… Let me think… Mass times velocity (mv)… But what’s the mass in this case? Light doesn’t have any mass. However, we can use the mass-energy equivalence: 1 eV = 1.7826×10−36 kg. [10−36? Well… Yes. An electronvolt is a very tiny measure of energy.] So we can express p in eV·m/s units.

Hmm… We can check: momentum times distance gives us the dimension of Planck’s constant again – (eV·m/s)·m = eV·s. OK. That’s good… […] But… Well… All of this nonsense doesn’t make us much smarter, does it? :-) Well… It may or may not be more useful to note that the dimension of action is, effectively, the same as the dimension of angular momentum. Huh? Why? Well… From our classical L = r×p formula, we find L should be expressed in m·(eV·m/s) = eV·m2/s  units, so that’s… What? Well… Here we need to use a little trick and re-express energy in mass units. We can then write L in kg·m2/s units and, because 1 Newton (N) is 1 kg⋅m/s2, the kg·m2/s unit is equivalent to the N·m·s = J·s unit. Done!

Having said that, all of this still doesn’t answer the question: are the linear momentum of light, i.e. our p, and those two angular momentum ‘states’, |L〉 and |R〉, related? Can we relate |L〉 and |R〉 to that L = r×p formula?

The answer is simple: no. The |L〉 and |R〉 states represent spin angular momentum indeed, while the angular momentum we would derive from the linear momentum of light using that L = r×p is orbital angular momentum. Let’s introduce the proper symbols: orbital angular momentum is denoted by L, while spin angular momentum is denoted by S. And then the total angular momentum is, quite simply, J = L + S.

L and S can both be calculated using either a vector cross product r × p (but using different values for r and p, of course) or, alternatively, using the moment of inertia tensor I and the angular velocity ω. The illustrations below (which I took from Wikipedia) show how, and also shows how L and S are added to yield J = L + S.



So what? Well… Nothing much. The illustration above show that the analysis – which is entirely classical, so far – is pretty complicated. [You should note, for example, that in the S = Iω and L Iω formulas, we don’t use the simple (scalar) moment of inertia but the moment of inertia tensor (so that’s a matrix denoted by I, instead of the scalar I), because S (or L) and ω are not necessarily pointing in the same direction.

By now, you’re probably very confused and wondering what’s wiggling really. The answer for the orbital angular momentum is: it’s the linear momentum vector p. Now…

Hey! Stop! Why would that vector wiggle?

You’re right. Perhaps it doesn’t. The linear momentum p is supposed to be directed in the direction of travel of the wave, isn’t it? It is. In vector notation, we have p = ħk, and that k vector (i.e. the wavevector) points in the direction of travel of the wave indeed and so… Well… No. It’s not that simple. The wave vector is perpendicular to the surfaces of constant phase, i.e. the so-called wave fronts, as show in the illustration below (see the direction of ek, which is a unit vector in the direction of k).

wave vector

So, yes, if we’re analyzing light moving in a straight one-dimensional line only, or we’re talking a plane wave, as illustrated below, then the orbital angular momentum vanishes.

plane wave

But the orbital angular momentum L does not vanish when we’re looking at a real light beam, like the ones below. Real waves? Well… OK… The ones below are idealized wave shapes as well, but let’s say they are somewhat more real than a plane wave. :-)


So what do we have here? We have wavefronts that are shaped as helices, except for the one in the middle (marked by m = 0) which is, once again, an example of plane wave—so for that one (m = 0), we have zero orbital angular momentum indeed. But look, very carefully, at the m = ± 1 and m = ± 2 situations. For m = ± 1, we have one helical surface with a step length equal to the wavelength λ. For m = ± 2, we have two intertwined helical surfaces with the step length of each helix surface equal to 2λ. [Don’t worry too much about the second and third column: they show a beam cross-section (so that’s not a wave front but a so-called phase front) and the (averaged) light intensity, again of a beam cross-section.] Now, we can further generalize and analyze waves composed of m helices with the step length of each helix surface equal to |m|λ. The Wikipedia article on OAM (orbital angular momentum of light), from which I got this illustration, gives the following formula to calculate the OAM:

Formula OAMThe same article also notes that the quantum-mechanical equivalent of this formula, i.e. the orbital angular momentum of the photons one would associate with the not-cylindrically-symmetric waves above (i.e. all those for which m ≠ 0), is equal to:

Lz = mħ

So what? Well… I guess we should just accept that as a very interesting result. For example, I duly note that Lis along the direction of propagation of the wave (as indicated by the z subscript), and I also note the very interesting fact that, apparently, Lz  can be either positive or negative. Now, I am not quite sure how such result is consistent with the idea of radiation pressure, but I am sure there must be some logical explanation to that. The other point you should note is that, once again, any reference to the energy (or to the frequency or wavelength) of our photon has disappeared. Hmm… I’ll come back to this, as I promised above already.

The thing is that this rather long digression on orbital angular momentum doesn’t help us much in trying to understand what that spin angular momentum (SAM) is all about. So, let me just copy the final conclusion of the Wikipedia article on the orbital angular momentum of light: the OAM is the component of angular momentum of light that is dependent on the field spatial distribution, not on the polarization of light.

So, again, what’s the spin angular momentum? Well… The only guidance we have is that same little drawing again and, perhaps, another illustration that’s supposed to compare SAM with OAM (underneath).

spin angular momentum

800px-Sam-oam-interactionNow, the Wikipedia article on SAM (spin angular momentum), from which I took the illustrations above, gives a similar-looking formula for it:

Formula SAM

When I say ‘similar-looking’, I don’t mean it’s the same. [Of course not! Spin and orbital angular momentum are two different things!]. So what’s different in the two formulas? Well… We don’t have any del operator () in the SAM formula, and we also don’t have any position vector (r) in the integral kernel (or integrand, if you prefer that term). However, we do find both the electric field vector (E) as well as the (magnetic) vector potential (A) in the equation again. Hence, the SAM (also) takes both the electric as well as the magnetic field into account, just like the OAM. [According to the author of the article, the expression also shows that the SAM is nonzero when the light polarization is elliptical or circular, and that it vanishes if the light polarization is linear, but I think that’s much more obvious from the illustration than from the formula… However, I realize I really need to move on here, because this post is, once again, becoming way too long. So…]

OK. What’s the equivalent of that formula in quantum mechanics?

Well… In quantum mechanics, the SAM becomes a ‘quantum observable’, described by a corresponding operator which has only two eigenvalues:

Sz = ± ħ

So that corresponds to the two possible values for Jz, as mentioned in the illustration, and we can understand, intuitively, that these two values correspond to two ‘idealized’ photons which describe a left- and right-handed circularly polarized wave respectively.

So… Well… There we are. That’s basically all there is to say about it. So… OK. So far, so good.

But… Yes? Why do we call a photon a spin-one particle?

That has to do with convention. A so-called spin-zero particle has no degrees of freedom in regard to polarization. The implied ‘geometry’ is that a spin-zero particle is completely symmetric: no matter in what direction you turn it, it will always look the same. In short, it really behaves like a (zero-dimensional) mathematical point. As you can see from the overview of all elementary particles, it is only the Higgs boson which has spin zero. That’s why the Higgs field is referred to as a scalar field: it has no direction. In contrast, spin-one particles, like photons, are also ‘point particles’, but they do come with one or the other kind of polarization, as evident from all that I wrote above. To be specific, they are polarized in the xy-plane, and can have one of two directions. So, when rotating them, you need a full rotation of 360° if you want them to look the same again.

Now that I am here, let me exhaust the topic (to a limited extent only, of course, as I don’t want to write a book here) and mention that, in theory, we could also imagine spin-2 particles, which would look the same after half a rotation (180°). However, as you can see from the overview, none of the elementary particles has spin-2. A spin-2 particle could be like some straight stick, as that looks the same even after it is rotated 180 degrees. I am mentioning the theoretical possibility because the graviton, if it would effectively exist, is expected to be a massless spin-2 boson. [Now why do I mention this? Not sure. I guess I am just noting this to remind you of the fact that the Higgs boson is definitely not the (theoretical) graviton, and/or that we have no quantum theory for gravity.]

Oh… That’s great, you’ll say. But what about all those spin-1/2 particles in the table? You said that all matter-particles are spin 1/2 particles, and that it’s this particular property that actually makes them matter-particles. So what’s the geometry here? What kind of ‘symmetries’ do they respect?

Well… As strange as it sounds, a spin-1/2 particle needs two full rotations (2×360°=720°) until it is again in the same state. Now, in regard to that particularity, you’ll often read something like: “There is nothing in our macroscopic world which has a symmetry like that.” Or, worse, “Common sense tells us that something like that cannot exist, that it simply is impossible.” [I won’t quote the site from which I took this quotes, because it is, in fact, the site of a very respectable  research center!] Bollocks! The Wikipedia article on spin has this wonderful animation: look at how the spirals flip between clockwise and counterclockwise orientations, and note that it’s only after spinning a full 720 degrees that this ‘point’ returns to its original configuration after spinning a full 720 degrees.


So, yes, we can actually imagine spin-1/2 particles, and with not all that much imagination, I’d say. But… OK… This is all great fun, but we have to move on. So what’s the ‘spin’ of these spin-1/2 particles and, more in particular, what’s the concept of ‘spin’ of an electron?

The spin of an electron

When starting to read about it, I thought that the angular momentum of an electron would be easier to analyze than that of a photon. Indeed, while a photon has no mass and no electric charge, that analysis with those E and B vectors is damn complicated, even when sticking to a strictly classical analysis. For an electron, the classical picture seems to be much more straightforward—but only at first indeed. It quickly becomes equally weird, if not more.

We can look at an electron in orbit as a rotating electrically charged ‘cloud’ indeed. Now, from Maxwell’s equations (or from your high school classes even), you know that a rotating electric charged body creates a magnetic dipole. So an electron should behave just like a tiny bar magnet. Of course, we have to make certain assumptions about the distribution of the charge in space but, in general, we can write that the magnetic dipole moment μ is equal to:

formule magnetic dipole moment

In case you want to know, in detail, where this formula comes from, let me refer you to Feynman once again, but trust me – for once :-) – it’s quite straightforward indeed: the L in this formula is the angular momentum, which may be the spin angular momentum, the orbital angular momentum, or the total angular momentum. The e and m are, of course, the charge and mass of the electron respectively.

So that’s a good and nice-looking formula, and it’s actually even correct except for the spin angular momentum as measured in experiments. [You’ll wonder how we can measure orbital and spin angular momentum respectively, but I’ll talk about an 1921 experiment in a few minutes, and so that will give you some clue as to that mystery. :-)] To be precise, it turns out that one has to multiply the above formula for μ with a factor which is referred to as the g-factor. [And, no, it’s got nothing to do with the gravitational constant or… Well… Nothing. :-)] So, for the spin angular momentum, the formula should be:

formula spin angular momentum

Experimental physicists are constantly checking that value and they know measure it to be something like g = is 2.00231930419922 ± 1.5×10−12. So what’s the explanation for that g? Where does it come from? There is, in fact, a classical explanation for it, which I’ll copy hereunder (yes, from Wikipedia). This classical explanation is based on assuming that the distribution of the electric charge of the electron and its mass does not coincide:

classical theory

Why do I mention this classical explanation? Well… Because, in most popular books on quantum mechanics (including Feynman’s delightful QED), you’ll read that (a) the value for g can be derived from a quantum-theoretical equation known as Dirac’s equation (or ‘Dirac theory’, as it’s referred to above) and, more importantly, that (b) physicists call the “accurate prediction of the electron g-factor” from quantum theory (i.e. ‘Dirac’s theory’ in this case) “one of the greatest triumphs” of the theory.

So what about it? Well… Whatever the merits of both explanations (classical or quantum-mechanical), they are surely not the reason why physicists abandoned the classical theory. So what was the reason then? What a stupid question! You know that already! The Rutherford model was, quite simply, not consistent: according to classical theory, electrons should just radiate their energy away and spiral into the nucleus. More in particular, there was yet another experiment that wasn’t consistent with classical theory, and it’s one that’s very relevant for the discussion at hand: it’s the so-called Stern-Gerlach experiment.

It was just as ‘revolutionary’ as the Michelson-Morley experiment (which couldn’t measure the speed of light), or the discovery of the positron in 1932. The Stern-Gerlach experiment was done in 1921, so that’s many years before quantum theory replaced classical theory and, hence, it’s not one of those experiments confirming quantum theory. No. Quite the contrary. It was, in fact, one of the experiments that triggered the so-called quantum revolution. Let me insert the experimental set-up and summarize it (below).


  • The German scientists Otto Stern and Walther Gerlach produced a beam of electrically-neutral silver atoms and let it pass through a (non-uniform) magnetic field. Why silver atoms? Well… Silver atoms are easy to handle (in a lab, that is) and easy to detect with a photoplate.
  • These atoms came out of an oven (literally), in which the silver was being evaporated (yes, one can evaporate silver), so they had no special orientation in space and, so Stern and Gerlach thought, the magnetic moment (or spin) of the outer electrons in these atoms would point into all possible directions in space.
  • As expected, the magnetic field did deflect the silver atoms, just like it would deflect little dipole magnets if you would shoot them through the field. However, the pattern of deflection was not the one which they expected. Instead of hitting the plate all over the place, within some contour, of course, only the contour itself was hit by the atoms. There was nothing in the middle!
  • And… Well… It’s a long story, but I’ll make it short. There was only one possible explanation for that behavior, and that’s that the magnetic moments – and, therefore the spins – had only two orientations in space, and two possible values only which – Surprise, surprise! – are ±ħ/2 (so that’s half the value of the spin angular momentum of photons, which explains the spin-1/2 terminology).

The spin angular momentum of an electron is more popularly known as ‘up’ or ‘down’.

So… What about it? Well… It explains why a atomic orbital can have two electrons, rather than one only and, as such, the behavior of the electron here is the basis of the so-called periodic table, which explains all properties of the chemical elements. So… Yes. Quantum theory is relevant, I’d say. :-)


This has been a terribly long post, and you may no longer remember what I promised to do. What I promised to do, is to write some more about the difference between a photon and an electron and, more in particular, I said I’d write more about their charge, and that “weird number”: their spin. I think I lived up to that promise. The summary is simple:

  1. Photons have no (electric) charge, but they do have spin. Their spin is linked to their polarization in the xy-plane (if z is the direction of propagation) and, because of the strangeness of quantum mechanics (i.e. the quantization of (quantum) observables), the value for this spin is either +ħ orħ, which explains why they are referred to as spin-one particles (because either value is one unit of the Planck constant).
  2. Electrons have both electric charge as well as spin. Their spin is different and is, in fact, related to their electric charge. It can be interpreted as the magnetic dipole moment, which results from the fact we have a spinning charge here. However, again, because of the strangeness of quantum mechanics, their dipole moment is quantized and can take only one of two values: ±ħ/2, which is why they are referred to as spin-1/2 particles.

So now you know everything you need to know about photons and electrons, and then I mean real photons and electrons, including their properties of charge and spin. So they’re no longer ‘fake’ spin-zero photons and electrons now. Isn’t that great? You’ve just discovered the real world! :-)

So… I am done—for the moment, that is… :-) If anything, I hope this post shows that even those ‘weird’ quantum numbers are rooted in ‘physical reality’ (or in physical ‘geometry’ at least), and that quantum theory may be ‘crazy’ indeed, but that it ‘explains’ experimental results. Again, as Feynman says:

“We understand how Nature works, but not why Nature works that way. Nobody understands that. I can’t explain why Nature behave in this particular way. You may not like quantum theory and, hence, you may not accept it. But physicists have learned to realize that whether they like a theory or not is not the essential question. Rather, it is whether or not the theory gives predictions that agree with experiment. The theory of quantum electrodynamics describes Nature as absurd from the point of view of common sense. But it agrees fully with experiment. So I hope you can accept Nature as She is—absurd.”

Frankly speaking, I am not quite prepared to accept Nature as absurd: I hope that some more familiarization with the underlying mathematical forms and shapes will make it look somewhat less absurd. More, I hope that such familiarization will, in the end, make everything look just as ‘logical’, or ‘natural’ as the two ways in which amplitudes can ‘interfere’.

Post scriptum: I said I would come back to the fact that, in the analysis of orbital and spin angular momentum of a photon (OAM and SAM), the frequency or energy variable sort of ‘disappears’. So why’s that? Let’s look at those expressions for |L〉 and |R〉 once again:

Formula L spin

Formula R spin

What’s written here really? If |L〉 and |R〉 are supposed to be equal to either +ħ orħ, then that product of ei(kz–ωt) with the 3×1 matrix (which is a ‘column vector’ in this case) does not seem to make much sense, does it? Indeed, you’ll remember that ei(kz–ωt) just a regular wave function. To be precise, its phase is φ = kz–ωt (with z the direction of propagation of the wave), and its real and imaginary part can be written as eiφ = cos(φ) + isin(φ) = a + bi. Multiplying it with that 3×1 column vector (1, i, 0) or (1, –i, 0) just yields another 3×1 column vector. To be specific, we get:

  1. The 3×1 ‘vector’ (a + bi, –b+ai, 0) for |L〉, and
  2. The 3×1 ‘vector’ (a + bi, b–ai, 0) for |R〉.

So we have two new ‘vectors’ whose components are complex numbers. Furthermore, we can note that their ‘x’-component is the same, their ‘y’-component is each other’s opposite –b+ai = –(b–ai), and their ‘z’-component is 0.

So… Well… In regard to their ‘y’-component, I should note that’s just the result of the multiplication with i and/or –i: multiplying a complex number with i amounts to a 90° degree counterclockwise rotation, while multiplication with –i amounts to the same but clockwise. Hence, we must arrive at two complex numbers that are each other’s opposite. [Indeed, in complex analysis, the value –1 = eiπ = eiπ is a 180° rotation, both clockwise (φ = –π) or counterclockwise (φ = +π), of course!.]

Hmm… Still… What does it all mean really? The truth is that it takes some more advanced math to interpret the result. To be precise, pure quantum states, such |L〉 and |R〉 here, are represented by so-called ‘state vectors’ in a Hilbert space over complex numbers. So that’s what we’ve got here. So… Well… I can’t say much more about this right now: we’ll just need to study some more before we’ll ‘understand’ those expressions for |L〉 and |R〉. So let’s not worry about it right now. We’ll get there.

Just for the record, I should note that, initially, I thought 1/√2 factor in front gave some clue as to what’s going on here: 1/√2 ≈ 0.707 is a factor that’s used to calculate the root mean square (RMS) value for a sine wave. It’s illustrated below. The RMS value is a ‘special average’ one can use to calculate the energy or power (i.e. energy per time unit) of a wave. [Using the term ‘average’ is misleading, because the average of a sine wave is 1/2 over half a cycle, and 0 over a fully cycle, as you can easily see from the shape of the function. But I guess you know what I mean.]

V-rmsIndeed, you’ll remember that the energy (E) of a wave is proportional to the square of its amplitude (A): E ∼ A2. For example, when we have a constant current I, the power P will be proportional to its square: P ∼ I2. With a varying current (I) and voltage (V), the formula is more complicated but we can simply it using the rms values: Pavg = VRMS·IRMS.

So… Looking at that formula, should we think of h and/or ħ as some kind of ‘average’ energy, like the energy of a photon per cycle or per radian? That’s an interesting idea so let’s explore it. If the energy of a photon is equal to E = ν = ω/2π = ħω, then we can also write:

h = E/ν and/or ħ = E/ω

So, yes: is the energy of a photon per cycle obviously and, because the phase covers 2π radians during each cycle, and ħ must be the energy of the photon per radian! That’s a great result, isn’t it? It also gives a wonderfully simple interpretation to Planck’s quantum of action!

Well… No. We made at least two mistakes here. The first mistake is that if we think of a photon as wave train being radiated by an atom – which, as we calculated in another post, lasts about 3.2×10–8 seconds – the graph for its energy is going to resemble the graph of its amplitude, so it’s going to die out and each oscillation will carry less and less energy. Indeed, the decay time given here (τ = 3.2×10–8 seconds) was the time it takes for the radiation (we assumed sodium light with a wavelength of 600 nanometer) to die out by a factor 1/e. To be precise, the shape of the energy curve is E = E0e−t/τ, and so it’s an envelope resembling the A(t) curve below.

decay time

Indeed, remember, the energy of a wave is determined not only by its frequency (or wavelength) but also by its amplitude, and so we cannot assume the amplitude of a ‘photon wave’ is going to be the same everywhere. Just for the record: note that the energy of a wave is proportional to the frequency (doubling the frequency doubles the energy) but, when linking it to the amplitude, we should remember that the energy is proportional to the square of the amplitude, so we write E ∼ A2.

The second mistake is that both ν and ω are the light frequency (expressed in cycles or radians respectively) of the light per second, i.e per time unit. So that’s not the number of cycles or radians that we should associate with the wavetrain! We should use the number of cycles (or radians) packed into one photon. We can calculate that easily from the value for the decay time τ. Indeed, for sodium light, which which has a frequency of 500 THz (500×1012 oscillations per second) and a wavelength of 600 nm (600×10–9 meter), we said the radiation lasts about 3.2×10–8 seconds (that’s actually the time it takes for the radiation’s energy to die out by a factor 1/e, so the wavetrain will actually last (much) longer, but so the amplitude becomes quite small after that time), and so that makes for some 16 million oscillations, and a ‘length’ of the wavetrain of about 9.6 meter! Now, the energy of a sodium light photon is about 2eV (h·ν ≈ 4×10−15 electronvolt·second times 0.5×1015 cycles/sec = 2eV) and so we could say the average energy of each of those 16 million oscillations would be 2/(16×106) eV = 0.125×10–6 eV. But, from all that I wrote above, it’s obvious that this number doesn’t mean all that much, because the wavetrain is not likely to be shaped very regularly.

So, in short, we cannot say that h is the photon energy per cycle or that ħ is the photon energy per radian!  That’s not only simplistic but, worse, false. Planck’s constant is what is is: a factor of proportionality for which there is no simple ‘arithmetic’ and/or ‘geometric’ explanation. It’s just there, and we’ll need to study some more math to truly understand the meaning of those two expressions for |L〉 and |R〉.

Having said that, and having thought about it all some more, I find there’s, perhaps, a more interesting way to re-write E = ν:

h = E/ν = (λ/c)E = T·E

T? Yes. T is the period, so that’s the time needed for one oscillation: T is just the reciprocal of the frequency (T = 1/ν = λ/c). It’s a very tiny number, because we divide (1) a very small number (the wavelength of light measured in meter) by (2) a very large number (the distance (in meter) traveled by light). For sodium light, T is equal to 2×10–15 seconds, so that’s two femtoseconds, i.e. two quadrillionths of a second.

Now, we can think of the period as a fraction of a second, and smaller fractions are, obviously, associated with higher frequencies and, what amounts to the same, shorter wavelengths (and, hence, higher energies). However, when writing T = λ/c, we can also think of T being another kind of fraction: λ/can also be written as the ratio of the wavelength and the distance traveled by light in one second, i.e. a light-second (remember that light-seconds are measures of length, not of distance). The two fractions are the same when we express time and distance in equivalent units indeed (i.e. distance in light-second, or time in sec/units).

So that links h to both time as well as distance and we may look at h as some kind of fraction or energy ‘density’ even (although the term ‘density’ in this context is not quite accurate). In the same vein, I should note that, if there’s anything that should make you think about h, is the fact that its value depends on how we measure time and distance. For example, if w’d measure time in other units (for example, the more ‘natural’ unit defined by the time light needs to travel one meter), then we get a different unit for h. And, of course, you also know we can relate energy to distance (1 J = 1 N·m). But that’s something that’s obvious from h‘s dimension (J·s), and so I shouldn’t dwell on that.

Hmm… Interesting thoughts. I think I’ll develop these things a bit further in one of my next posts. As for now, however, I’ll leave you with your own thoughts on it.

Note 1: As you’re trying to follow what I am writing above, you may have wondered whether or not the duration of the wavetrain that’s emitted by an atom is a constant, or whether or not it packs some constant number of oscillations. I’ve thought about that myself as I wrote down the following formula at some point of time:

h = (the duration of the wave)·(the energy of the photon)/(the number of oscillations in the wave)

As mentioned above, interpreting h as some kind of average energy per oscillation is not a great idea but, having said that, it would be a good exercise for you to try to answer that question in regard to the duration of these wavetrains, and/or the number of oscillations packed into them, yourself. There are various formulas for the Q of an atomic oscillator, but the simplest one is the one expressed in terms of the so-called classical electron radius r0:

Q = 3λ/4πr0

As you can see, the Q depends on λ: higher wavelengths (so lower energy) are associated with higher Q. In fact, the relationship is directly proportional: twice the wavelength will give you twice the Q. Now, the formula for the decay time τ is also dependent on the wavelength. Indeed, τ = 2Q/ω = Qλ/πc. Combining the two formulas yields (if I am not mistaken):

τ = 3λ2/4π2r0c.

Hence, the decay time is proportional to the square of the wavelength. Hmm… That’s an interesting result. But I really need to learn how to be a bit shorter, and so I’ll really let you think now about what all this means or could mean.

Note 2: If that 1/√2 factor has nothing to do with some kind of rms calculation, where does it come from? I am not sure. It’s related to state vector math, it seems, and I haven’t started that as yet. I just copy a formula from Wikipedia here, which shows the same factor in front:

state vector

The formula above is said to represent the “superposition of joint spin states for two particles”. My gut instinct tells me 1/√2 factor has to do with the normalization condition and/or with the fact that we have to take the (absolute) square of the (complex-valued) amplitudes to get the probability.


The Strange Theory of Light and Matter (II)

If we limit our attention to the interaction between light and matter (i.e. the behavior of photons and electrons only—so we we’re not talking quarks and gluons here), then the ‘crazy ideas’ of quantum mechanics can be summarized as follows:

  1. At the atomic or sub-atomic scale, we can no longer look at light as an electromagnetic wave. It consists of photons, and photons come in blobs. Hence, to some extent, photons are ‘particle-like’.
  2. At the atomic or sub-atomic scale, electrons don’t behave like particles. For example, if we send them through a slit that’s small enough, we’ll observe a diffraction pattern. Hence, to some extent, electrons are ‘wave-like’.

In short, photons aren’t waves, but they aren’t particles either. Likewise, electrons aren’t particles, but they aren’t waves either. They are neither. The weirdest thing of all, perhaps, is that, while light and matter are two very different things in our daily experience – light and matter are opposite concepts, I’d say, just like particles and waves are opposite concepts) – they look pretty much the same in quantum physics: they are both represented by a wavefunction.

Let me immediately make a little note on terminology here. The term ‘wavefunction’ is a bit ambiguous, in my view, because it makes one think of a real wave, like a water wave, or an electromagnetic wave. Real waves are described by real-valued wave functions describing, for example, the motion of a ball on a spring, or the displacement of a gas (e.g. air) as a sound wave propagates through it, or – in the case of an electromagnetic wave – the strength of the electric and magnetic field.

You may have questions about the ‘reality’ of fields, but electromagnetic waves – i.e. the classical description of light – are quite ‘real’ too, even if:

  1. Light doesn’t travel in a medium (like water or air: there is no aether), and
  2. The magnitude of the electric and magnetic field (they are usually denoted by E and B) depend on your reference frame: if you calculate the fields using a moving coordinate system, you will get a different mixture of E and B. Therefore, E and B may not feel very ‘real’ when you look at them separately, but they are very real when we think of them as representing one physical phenomenon: the electromagnetic interaction between particles. So the E and B mix is, indeed, a dual representation of one reality. I won’t dwell on that, as I’ve done that in another post of mine.

How ‘real’ is the quantum-mechanical wavefunction?

The quantum-mechanical wavefunction is not like any of these real waves. In fact, I’d rather use the term ‘probability wave’ but, apparently, that’s used only by bloggers like me :-) and so it’s not very scientific. That’s for a good reason, because it’s not quite accurate either: the wavefunction in quantum mechanics represents probability amplitudes, not probabilities. So we should, perhaps, be consistent and term it a ‘probability amplitude wave’ – but then that’s too cumbersome obviously, so the term ‘probability wave’ may be confusing, but it’s not so bad, I think.

Amplitudes and probabilities are related as follows:

  1. Probabilities are real numbers between 0 and 1: they represent the probability of something happening, e.g. a photon moves from point A to B, or a photon is absorbed (and emitted) by an electron (i.e. a ‘junction’ or ‘coupling’, as you know).
  2. Amplitudes are complex numbers, or ‘arrows’ as Feynman calls them: they have a length (or magnitude) and a direction.
  3. We get the probabilities by taking the (absolute) square of the amplitudes.

So photons aren’t waves, but they aren’t particles either. Likewise, electrons aren’t particles, but they aren’t waves either. They are neither. So what are they? We don’t have words to describe what they are. Some use the term ‘wavicle’ but that doesn’t answer the question, because who knows what a ‘wavicle’ is? So we don’t know what they are. But we do know how they behave. As Feynman puts it, when comparing the behavior of light and then of electrons in the double-slit experiment—struggling to find language to describe what’s going on: “There is one lucky break: electrons behave just like light.”

He says so because of that wave function: the mathematical formalism is the same, for photons and for electrons. Exactly the same? […] But that’s such a weird thing to say, isn’t it? We can’t help thinking of light as waves, and of electrons as particles. They can’t be the same. They’re different, aren’t they? They are.

Scales and senses

To some extent, the weirdness can be explained because the scale of our world is not atomic or sub-atomic. Therefore, we ‘see’ things differently. Let me say a few words about the instrument we use to look at the world: our eye.

Our eye is particular. The retina has two types of receptors: the so-called cones are used in bright light, and distinguish color, but when we are in a dark room, the so-called rods become sensitive, and it is believed that they actually can detect a single photon of light. However, neural filters only allow a signal to pass to the brain when at least five photons arrive within less than a tenth of a second. A tenth of a second is, roughly, the averaging time of our eye. So, as Feynman puts it: “If we were evolved a little further so we could see ten times more sensitively, we wouldn’t have this discussion—we would all have seen very dim light of one color as a series of intermittent little flashes of equal intensity.” In other words, the ‘particle-like’ character of light would have been obvious to us.

Let me make a few more remarks here, which you may or may not find useful. The sense of ‘color’ is not something ‘out there':  colors, like red or brown, are experiences in our eye and our brain. There are ‘pigments’ in the cones (cones are the receptors that work only if the intensity of the light is high enough) and these pigments absorb the light spectrum somewhat differently, as a result of which we ‘see’ color. Different animals see different things. For example, a bee can distinguish between white paper using zinc white versus lead white, because they reflect light differently in the ultraviolet spectrum, which the bee can see but we don’t. Bees can also tell the direction of the sun without seeing the sun itself, because they are sensitive to polarized light, and the scattered light of the sky (i.e. the blue sky as we see it) is polarized. The bee can also notice flicker up to 200 oscillations per second, while we see it only up to 20, because our averaging time is like a tenth of a second, which is short for us, but so the averaging time of the bee is much shorter. So we cannot see the quick leg movements and/or wing vibrations of bees, but the bee can!

Sometimes we can’t see any color. For example, we see the night sky in ‘black and white’ because the light intensity is very low, and so it’s our rods, not the cones, that process the signal, and so these rods can’t ‘see’ color. So those beautiful color pictures of nebulae are not artificial (although the pictures are often enhanced). It’s just that the camera that is used to take those pictures (film or, nowadays, digital) is much more sensitive than our eye. 

Regardless, color is a quality which we add to our experience of the outside world ourselves. What’s out there are electromagnetic waves with this or that wavelength (or, what amounts to the same, this or that frequency). So when critics of the exact sciences say so much is lost when looking at (visible) light as an electromagnetic wave in the range of 430 to 790 teraherz, they’re wrong. Those critics will say that physics reduces reality. That is not the case.

What’s going on is that our senses process the signal that they are receiving, especially when it comes to vision. As Feynman puts it: “None of the other senses involves such a large amount of calculation, so to speak, before the signal gets into a nerve that one can make measurements on. The calculations for all the rest of the senses usually happen in the brain itself, where it is very difficult to get at specific places to make measurements, because there are so many interconnections. Here, with the visual sense, we have the light, three layers of cells making calculations, and the results of the calculations being transmitted through the optic nerve.”

Hence, things like color and all of the other sensations that we have are the object of study of other sciences, including biochemistry and neurobiology, or physiology. For all we know, what’s ‘out there’ is, effectively, just ‘boring’ stuff, like electromagnetic radiation, energy and ‘elementary particles’—whatever they are. No colors. Just frequencies. :-)

Light versus matter

If we accept the crazy ideas of quantum mechanics, then the what and the how become one and the same. Hence we can say that photons and electrons are a wavefunction somewhere in space. Photons, of course, are always traveling, because they have energy but no rest mass. Hence, all their energy is in the movement: it’s kinetic, not potential. Electrons, on the other hand, usually stick around some nucleus. And, let’s not forget, they have an electric charge, so their energy is not only kinetic but also potential.

But, otherwise, it’s the same type of ‘thing’ in quantum mechanics: a wavefunction, like those below.


Why diagram A and B? It’s just to emphasize the difference between a real-valued wave function and those ‘probability waves’ we’re looking at here (diagram C to H). A and B represent a mass on a spring, oscillating at more or less the same frequency but a different amplitude. The amplitude here means the displacement of the mass. The function describing the displacement of a mass on a spring (so that’s diagram A and B) is an example of a real-valued wave function: it’s a simple sine or cosine function, as depicted below. [Note that a sine and a cosine are the same function really, except for a phase difference of 90°.]

cos and sine

Let’s now go back to our ‘probability waves’. Photons and electrons, light and matter… The same wavefunction? Really? How can the sunlight that warms us up in the morning and makes trees grow be the same as our body, or the tree? The light-matter duality that we experience must be rooted in very different realities, isn’t it?

Well… Yes and no. If we’re looking at one photon or one electron only, it’s the same type of wavefunction indeed. The same type… OK, you’ll say. So they are the same family or genus perhaps, as they say in biology. Indeed, both of them are, obviously, being referred to as ‘elementary particles’ in the so-called Standard Model of physics. But so what makes an electron and a photon specific as a species? What are the differences?

There’re  quite a few, obviously:

1. First, as mentioned above, a photon is a traveling wave function and, because it has no rest mass, it travels at the ultimate speed, i.e. the speed of light (c). An electron usually sticks around or, if it travels through a wire, it travels at very low speeds. Indeed, you may find it hard to believe, but the drift velocity of the free electrons in a standard copper wire is measured in cm per hour, so that’s very slow indeed—and while the electrons in an electron microscope beam may be accelerated up to 70% of the speed of light, and close to in those huge accelerators, you’re not likely to find an electron microscope or accelerator in Nature. In fact, you may want to remember that a simple thing like electricity going through copper wires in our houses is a relatively modern invention. :-)

So, yes, those oscillating wave functions in those diagrams above are likely to represent some electron, rather than a photon. To be precise, the wave functions above are examples of standing (or stationary) waves, while a photon is a traveling wave: just extend that sine and cosine function in both directions if you’d want to visualize it or, even better, think of a sine and cosine function in an envelope traveling through space, such as the one depicted below.

Photon wave

Indeed, while the wave function of our photon is traveling through space, it is likely to be limited in space because, when everything is said and done, our photon is not everywhere: it must be somewhere. 

At this point, it’s good to pause and think about what is traveling through space. It’s the oscillation. But what’s the oscillation? There is no medium here, and even if there would be some medium (like water or air or something like aether—which, let me remind you, isn’t there!), the medium itself would not be moving, or – I should be precise here – it would only move up and down as the wave propagates through space, as illustrated below. To be fully complete, I should add we also have longitudinal waves, like sound waves (pressure waves): in that case, the particles oscillate back and forth along the direction of wave propagation. But you get the point: the medium does not travel with the wave.


When talking electromagnetic waves, we have no medium. These E and B vectors oscillate but is very wrong to assume they use ‘some core of nearby space’, as Feynman puts it. They don’t. Those field vectors represent a condition at one specific point (admittedly, a point along the direction of travel) in space but, for all we know, an electromagnetic wave travels in a straight line and, hence, we can’t talk about its diameter or so.

Still, as mentioned above, we can imagine, more or less, what E and B stand for (we can use field line to visualize them, for instance), even if we have to take into account their relativity (calculating their values from a moving reference frame results in different mixtures of E and B). But what are those amplitudes? How should we visualize them?

The honest answer is: we can’t. They are what they are: two mathematical quantities which, taken together, form a two-dimensional vector, which we square to find a value for a real-life probability, which is something that – unlike the amplitude concept – does make sense to us. Still, that representation of a photon above (i.e. the traveling envelope with a sine and cosine inside) may help us to ‘understand’ it somehow. Again, you absolute have to get rid of the idea that these ‘oscillations’ would somehow occupy some physical space. They don’t. The wave itself has some definite length, for sure, but that’s a measurement in the direction of travel, which is often denoted as x when discussing uncertainty in its position, for example—as in the famous Uncertainty Principle (ΔxΔp > h).

You’ll say: Oh!—but then, at the very least, we can talk about the ‘length’ of a photon, can’t we? So then a photon is one-dimensional at least, not zero-dimensional! The answer is yes and no. I’ve talked about this before and so I’ll be short(er) on it now. A photon is emitted by an atom when an electron jumps from one energy level to another. It thereby emits a wave train that lasts about 10–8 seconds. That’s not very long but, taking into account the rather spectacular speed of light (3×10m/s), that still makes for a wave train with a length of not less than 3 meter. […] That’s quite a length, you’ll say. You’re right. But you forget that light travels at the speed of light and, hence, we will see this length as zero because of the relativistic length contraction effect. So… Well… Let me get back to the question: if photons and electrons are both represented by a wavefunction, what makes them different?

2. A more fundamental difference between photons and electrons is how they interact with each other.

From what I’ve written above, you understand that probability amplitudes are complex numbers, or ‘arrows’, or ‘two-dimensional vectors’. [Note that all of these terms have precise mathematical definitions and so they’re actually not the same, but the difference is too subtle to matter here.] Now, there are two ways of combining amplitudes, which are referred to as ‘positive’ and ‘negative’ interference respectively. I should immediately note that there’s actually nothing ‘positive’ or ‘negative’ about the interaction: we’re just putting two arrows together, and there are two ways to do that. That’s all.

The diagrams below show you these two ways. You’ll say: there are four! However, remember that we square an arrow to get a probability. Hence, the direction of the final arrow doesn’t matter when we’re taking the square: we get the same probability. It’s the direction of the individual amplitudes that matters when combining them. So the square of A+B is the same as the square of –(A+B) = –A+(–B) = –AB. Likewise, the square of AB is the same as the square of –(AB) = –A+B.

vector addition

These are the only two logical possibilities for combining arrows. I’ve written ad nauseam about this elsewhere: see my post on amplitudes and statistics, and so I won’t go into too much detail here. Or, in case you’d want something less than a full mathematical treatment, I can refer you to my previous post also, where I talked about the ‘stopwatch’ and the ‘phase': the convention for the stopwatch is to have its hand turn clockwise (obviously!) while, in quantum physics, the phase of a wave function will turn counterclockwise. But so that’s just convention and it doesn’t matter, because it’s the phase difference between two amplitudes that counts. To use plain language: it’s the difference in the angles of the arrows, and so that difference is just the same if we reverse the direction of both arrows (which is equivalent to putting a minus sign in front of the final arrow).

OK. Let me get back to the lesson. The point is: this logical or mathematical dichotomy distinguishes bosons (i.e. force-carrying ‘particles’, like photons, which carry the electromagnetic force) from fermions (i.e. ‘matter-particles’, such as electrons and quarks, which make up protons and neutrons). Indeed, the so-called ‘positive’ and ‘negative’ interference leads to two very different behaviors:

  1. The probability of getting a boson where there are already present, is n+1 times stronger than it would be if there were none before.
  2. In contrast, the probability of getting two electrons into exactly the same state is zero. 

The behavior of photons makes lasers possible: we can pile zillions of photon on top of each other, and then release all of them in one powerful burst. [The ‘flickering’ of a laser beam is due to the quick succession of such light bursts. If you want to know how it works in detail, check my post on lasers.]

The behavior of electrons is referred to as Fermi’s exclusion principle: it is only because real-life electrons can have one of two spin polarizations (i.e. two opposite directions of angular momentum, which are referred to as ‘up’ or ‘down’, but they might as well have been referred to as ‘left’ or ‘right’) that we find two electrons (instead of just one) in any atomic or molecular orbital.

So, yes, while both photons and electrons can be described by a similar-looking wave function, their behavior is fundamentally different indeed. How is that possible? Adding and subtracting ‘arrows’ is a very similar operation, isn’it?

It is and it isn’t. From a mathematical point of view, I’d say: yes. From a physics point of view, it’s obviously not very ‘similar’, as it does lead to these two very different behaviors: the behavior of photons allows for laser shows, while the behavior of electrons explain (almost) all the peculiarities of the material world, including us walking into doors. :-) If you want to check it out for yourself, just check Feynman’s Lectures for more details on this or, else, re-read my posts on it indeed.

3. Of course, there are even more differences between photons and electrons than the two key differences I mentioned above. Indeed, I’ve simplified a lot when I wrote what I wrote above. The wavefunctions of electrons in orbit around a nucleus can take very weird shapes, as shown in the illustration below—and please do google a few others if you’re not convinced. As mentioned above, they’re so-called standing waves, because they occupy a well-defined position in space only, but standing waves can look very weird. In contrast, traveling plane waves, or envelope curves like the one above, are much simpler.


In short: yes, the mathematical representation of photons and electrons (i.e. the wavefunction) is very similar, but photons and electrons are very different animals indeed.

Potentiality and interconnectedness

I guess that, by now, you agree that quantum theory is weird but, as you know, quantum theory does explain all of the stuff that couldn’t be explained before: “It works like a charm”, as Feynman puts it. In fact, he’s often quoted as having said the following:

“It is often stated that of all the theories proposed in this century, the silliest is quantum theory. Some say the the only thing that quantum theory has going for it, in fact, is that it is unquestionably correct.”

Silly? Crazy? Uncommon-sensy? Truth be told, you do get used to thinking in terms of amplitudes after a while. And, when you get used to them, those ‘complex’ numbers are no longer complicated. :-) Most importantly, when one thinks long and hard enough about it (as I am trying to do), it somehow all starts making sense.

For example, we’ve done away with dualism by adopting a unified mathematical framework, but the distinction between bosons and fermions still stands: an ‘elementary particle’ is either this or that. There are no ‘split personalities’ here. So the dualism just pops up at a different level of description, I’d say. In fact, I’d go one step further and say it pops up at a deeper level of understanding.

But what about the other assumptions in quantum mechanics. Some of them don’t make sense, do they? Well… I struggle for quite a while with the assumption that, in quantum mechanics, anything is possible really. For example, a photon (or an electron) can take any path in space, and it can travel at any speed (including speeds that are lower or higher than light). The probability may be extremely low, but it’s possible.

Now that is a very weird assumption. Why? Well… Think about it. If you enjoy watching soccer, you’ll agree that flying objects (I am talking about the soccer ball here) can have amazing trajectories. Spin, lift, drag, whatever—the result is a weird trajectory, like the one below:


But, frankly, a photon taking the ‘southern’ route in the illustration below? What are the ‘wheels and gears’ there? There’s nothing sensible about that route, is there?


In fact, there’s at least three issues here:

  1. First, you should note that strange curved paths in the real world (such as the trajectories of billiard or soccer balls) are possible only because there’s friction involved—between the felt of the pool table cloth and the ball, or between the balls, or, in the case of soccer, between the ball and the air. There’s no friction in the vacuum. Hence, in empty space, all things should go in a straight line only.
  2. While it’s quite amazing what’s possible, in the real world that is, in terms of ‘weird trajectories’, even the weirdest trajectories of a billiard or soccer ball can be described by a ‘nice’ mathematical function. We obviously can’t say the same of that ‘southern route’ which a photon could follow, in theory that is. Indeed, you’ll agree the function describing that trajectory cannot be ‘nice’. So even we’d allow all kinds of ‘weird’ trajectories, shouldn’t we limit ourselves to ‘nice’ trajectories only? I mean: it doesn’t make sense to allow the photons traveling from your computer screen to your retina take some trajectory to the Sun and back, does it?
  3. Finally, and most fundamentally perhaps, even when we would assume that there’s some mechanism combining (a) internal ‘wheels and gears’ (such as spin or angular momentum) with (b) felt or air or whatever medium to push against, what would be the mechanism determining the choice of the photon in regard to these various paths? In Feynman’s words: How does the photon ‘make up its mind’?

Feynman answers these questions, fully or partially (I’ll let you judge), when discussing the double-slit experiment with photons:

“Saying that a photon goes this or that way is false. I still catch myself saying, “Well, it goes either this way or that way,” but when I say that, I have to keep in mind that I mean in the sense of adding amplitudes: the photon has an amplitude to go one way, and an amplitude to go the other way. If the amplitudes oppose each other, the light won’t get there—even though both holes are open.”

It’s probably worth re-calling the results of that experiment here—if only to help you judge whether or not Feynman fully answer those questions above!

The set-up is shown below. We have a source S, two slits (A and B), and a detector D. The source sends photons out, one by one. In addition, we have two special detectors near the slits, which may or may not detect a photon, depending on whether or not they’re switched on as well as on their accuracy.

set-up photons

First, we close one of the slits, and we find that 1% of the photons goes through the other (so that’s one photon for every 100 photons that leave S). Now, we open both slits to study interference. You know the results already:

  1. If we switch the detectors off (so we have no way of knowing where the photon went), we get interference. The interference pattern depends on the distance between A and B and varies from 0% to 4%, as shown in diagram (a) below. That’s pretty standard. As you know, classical theory can explain that too assuming light is an electromagnetic wave. But so we have blobs of energy – photons – traveling one by one. So it’s really that double-slit experiment with electrons, or whatever other microscopic particles (as you know, they’ve done these interference electrons with large molecules as well—and they get the same result!). We get the interference pattern by using those quantum-mechanical rules to calculate probabilities: we first add the amplitudes, and it’s only when we’re finished adding those amplitudes, that we square the resulting arrow to the final probability.
  2. If we switch those special detectors on, and if they are 100% reliable (i.e. all photons going through are being detected), then our photon suddenly behaves like a particle, instead of as a wave: they will go through one of the slits only, i.e. either through A, or, alternatively, through B. So the two special detectors never go off together. Hence, as Feynman puts it: we shouldn’t think there is “sneaky way that the photon divides in two and then comes back together again.” It’s one or the other way and, and there’s no interference: the detector at D goes off 2% of the time, which is the simple sum of the probabilities for A and B (i.e. 1% + 1%).
  3. When the special detectors near A and B are not 100% reliable (and, hence, do not detect all photons going through), we have three possible final conditions: (i) A and D go off, (ii) B and D go off, and (iii) D goes off alone (none of the special detectors went off). In that case, we have a final curve that’s a mixture, as shown in diagram (c) and (d) below. We get it using the same quantum-mechanical rules: we add amplitudes first, and then we square to get the probabilities.

double-slit photons - results

Now, I think you’ll agree with me that Feynman doesn’t answer my (our) question in regard to the ‘weird paths’. In fact, all of the diagrams he uses assume straight or nearby paths. Let me re-insert two of those diagrams below, to show you what I mean.

 Many arrowsFew arrows

So where are all the strange non-linear paths here? Let me, in order to make sure you get what I am saying here, insert that illustration with the three crazy routes once again. What we’ve got above (Figure 33 and 34) is not like that. Not at all: we’ve got only straight lines there! Why? The answer to that question is easy: the crazy paths don’t matter because their amplitudes cancel each other out, and so that allows Feynman to simplify the whole situation and show all the relevant paths as straight lines only.


Now, I struggled with that for quite a while. Not because I can’t see the math or the geometry involved. No. Feynman does a great job showing why those amplitudes cancel each other out indeed (if you want a summary, see my previous post once again).  My ‘problem’ is something else. It’s hard to phrase it, but let me try: why would we even allow for the logical or mathematical possibility of ‘weird paths’ (and let me again insert that stupid diagram below) if our ‘set of rules’ ensures that the truly ‘weird’ paths (like that photon traveling from your computer screen to your eye doing a detour taking it to the Sun and back) cancel each other out anyway? Does that respect Occam’s Razor? Can’t we devise some theory including ‘sensible’ paths only?

Of course, I am just an autodidact with limited time, and I know hundreds (if not thousands) of the best scientists have thought long and hard about this question and, hence, I readily accept the answer is quite simply: no. There is no better theory. I accept that answer, ungrudgingly, not only because I think I am not so smart as those scientists but also because, as I pointed out above, one can’t explain any path that deviates from a straight line really, as there is no medium, so there are no ‘wheels and gears’. The only path that makes sense is the straight line, and that’s only because…

Well… Thinking about it… We think the straight path makes sense because we have no good theory for any of the other paths. Hmm… So, from a logical point of view, assuming that the straight line is the only reasonable path is actually pretty random too. When push comes to shove, we have no good theory for the straight line either!

You’ll say I’ve just gone crazy. […] Well… Perhaps you’re right. :-) But… Somehow, it starts to make sense to me. We allow for everything to, then, indeed weed out the crazy paths using our interference theory, and so we do end up with what we’re ending up with: some kind of vague idea of “light not really traveling in a straight line but ‘smelling’ all of the neighboring paths around it and, hence, using a small core of nearby space“—as Feynman puts it.

Hmm… It brings me back to Richard Feynman’s introduction to his wonderful little book, in which he says we should just be happy to know how Nature works and not aspire to know why it works that way. In fact, he’s basically saying that, when it comes to quantum mechanics, the ‘how’ and the ‘why’ are one and the same, so asking ‘why’ doesn’t make sense, because we know ‘how’. He compares quantum theory with the system of calculation used by the Maya priests, which was based on a system of bars and dots, which helped them to do complex multiplications and divisions, for example. He writes the following about it: “The rules were tricky, but they were a much more efficient way of getting an answer to complicated questions (such as when Venus would rise again) than by counting beans.”

When I first read this, I thought the comparison was flawed: if a common Maya Indian did not want to use the ‘tricky’ rules of multiplication and what have you (or, more likely, if he didn’t understand them), he or she could still resort to counting beans. But how do we count beans in quantum mechanics? We have no ‘simpler’ rules than those weird rules about adding amplitudes and taking the (absolute) square of complex numbers so… Well… We actually are counting beans here then:

  1. We allow for any possibility—any path: straight, curved or crooked. Anything is possible.
  2. But all those possibilities are inter-connected. Also note that every path has a mirror image: for every route ‘south’, there is a similar route ‘north’, so to say, except for the straight line, which is a mirror image of itself.
  3. And then we have some clock ticking. Time goes by. It ensures that the paths that are too far removed from the straight line cancel each other. [Of course, you’ll ask: what is too far? But I answered that question –  convincingly, I hope – in my previous post: it’s not about the ‘number of arrows’ (as suggested in the caption under that Figure 34 above), but about the frequency and, hence, the ‘wavelength’ of our photon.]
  4. And so… Finally, what’s left is a limited number of possibilities that interfere with each other, which results in what we ‘see': light seems to use a small core of space indeed–a limited number of nearby paths.

You’ll say… Well… That still doesn’t ‘explain’ why the interference pattern disappears with those special detectors or – what amounts to the same – why the special detectors at the slits never click simultaneously.

You’re right. How do we make sense of that? I don’t know. You should try to imagine what happens for yourself. Everyone has his or her own way of ‘conceptualizing’ stuff, I’d say, and you may well be content and just accept all of the above without trying to ‘imagine’ what’s happening really when a ‘photon’ goes through one or both of those slits. In fact, that’s the most sensible thing to do. You should not try to imagine what happens and just follow the crazy calculus rules.

However, when I think about it, I do have some image in my head. The image is of one of those ‘touch-me-not’ weeds. I quickly googled one of these images, but I couldn’t quite find what I am looking for: it would be more like something that, when you touch it, curls up in a little ball. Any case… You know what I mean, I hope.


You’ll shake your head now and solemnly confirm that I’ve gone mad. Touch-me-not weeds? What’s that got to do with photons? 

Well… It’s obvious you and I cannot really imagine how a photon looks like. But I think of it as a blob of energy indeed, which is inseparable, and which effectively occupies some space (in three dimensions that is). I also think that, whatever it is, it actually does travel through both slits, because, as it interferes with itself, the interference pattern does depend on the space between the two slits as well as the width of those slits. In short, the whole ‘geometry’ of the situation matters, and so the ‘interaction’ is some kind of ‘spatial’ thing. [Sorry for my awfully imprecise language here.]

Having said that, I think it’s being detected by one detector only because only one of them can sort of ‘hook’ it, somehow. Indeed, because it’s interconnected and inseparable, it’s the whole blob that gets hooked, not just one part of it. [You may or may not imagine that the detectors that’s got the best hold of it gets it, but I think that’s pushing the description too much.] In any case, the point is that a photon is surely not like a lizard dropping its tail while trying to escape. Perhaps it’s some kind of unbreakable ‘string’ indeed – and sorry for summarizing string theory so unscientifically here – but then a string oscillating in dimensions we can’t imagine (or in some dimension we can’t observe, like the Kaluza-Klein theory suggests). It’s something, for sure, and something that stores energy in some kind of oscillation, I think.

What it is, exactly, we can’t imagine, and we’ll probably never find out—unless we accept that the how of quantum mechanics is not only the why, but also the what. :-)

Does this make sense? Probably not but, if anything, I hope it fired your imagination at least. :-)

The Strange Theory of Light and Matter (II)

The Strange Theory of Light and Matter (I)

I am of the opinion that Richard Feynman’s wonderful little common-sense introduction to the ‘uncommon-sensy‘ theory of quantum electrodynamics (The Strange Theory of Light and Matter), which were published a few years before his death only, should be mandatory reading for high school students.

I actually mean that: it should just be part of the general education of the first 21st century generation. Either that or, else, the Education Board should include a full-fledged introduction to complex analysis and quantum physics in the curriculum. :-)

Having praised it (just now, as well as in previous posts), I re-read it recently during a trek in Nepal with my kids – I just grabbed the smallest book I could find the morning we left :-) – and, frankly, I now think Ralph Leighton, who transcribed and edited these four short lectures, could have cross-referenced it better. Moreover, there are two or three points where Feynman (or Leighton?) may have sacrificed accuracy for readability. Let me recapitulate the key points and try to improve here and there.

Amplitudes and arrows

The booklet avoids scary mathematical terms and formulas but doesn’t avoid the fundamental concepts behind, and it doesn’t avoid the kind of ‘deep’ analysis one needs to get some kind of ‘feel’ for quantum mechanics either. So what are the simplifications?

A probability amplitude (i.e. a complex number) is, quite simply, an arrow, with a direction and a length. Thus Feynman writes: “Arrows representing probabilities from 0% to 16% [as measured by the surface of the square which has the arrow as its side] have lengths from 0 to 0.4.” That makes sense: such geometrical approach does away, for example, with the need to talk about the absolute square (i.e. the square of the absolute value, or the squared norm) of a complex number – which is what we need to calculate probabilities from probability amplitudes. So, yes, it’s a wonderful metaphor. We have arrows and surfaces now, instead of wave functions and absolute squares of complex numbers.

The way he combines these arrows make sense too. He even notes the difference between photons (bosons) and electrons (fermions): for bosons, we just add arrows; for fermions, we need to subtract them (see my post on amplitudes and statistics in this regard).

There is also the metaphor for the phase of a wave function, which is a stroke of genius really (I mean it): the direction of the ‘arrow’ is determined by a stopwatch hand, which starts turning when a photon leaves the light source, and stops when it arrives, as shown below.

front and back reflection amplitude

OK. Enough praise. What are the drawbacks?

The illustration above accompanies an analysis of how light is either reflected from the front surface of a sheet of a glass or, else, from the back surface. Because it takes more time to bounce off the back surface (the path is associated with a greater distance), the front and back reflection arrows point in different directions indeed (the stopwatch is stopped somewhat later when the photon reflects from the back surface). Hence, the difference in phase (but that’s a term that Feynman also avoids) is determined by the thickness of the glass. Just look at it. In the upper part of the illustration above, the thickness is such that the chance of a photon reflecting off the front or back surface is 5%: we add two arrows, each with a length of 0.2, and then we square the resulting (aka final) arrow. Bingo! We get a surface measuring 0.05, or 5%.

Huh? Yes. Just look at it: if the angle between the two arrows would be 90° exactly, it would be 0.08 or 8%, but the angle is a bit less. In the lower part of the illustration, the thickness of the glass is such that the two arrows ‘line up’ and, hence, they form an arrow that’s twice the length of either arrow alone (0.2 + 0.2 = 0.4), with a square four times as large (0.16 = 16%). So… It all works like a charm, as Feynman puts it.


But… Hey! Look at the stopwatch for the front reflection arrows in the upper and lower diagram: they point in the opposite direction of the stopwatch hand! Well… Hmm… You’re right. At this point, Feynman just notes that we need an extra rule: “When we are considering the path of a photon bouncing off the front surface of the glass, we reverse the direction of the arrow.

He doesn’t say why. He just adds this random rule to the other rules – which most readers who read this book already know. But why this new rule? Frankly, this inconsistency – or lack of clarity – would wake me up at night. This is Feynman: there must be a reason. Why?

Initially, I suspected it had something to do with the two types of ‘statistics’ in quantum mechanics (i.e. those different rules for combining amplitudes of bosons and fermions respectively, which I mentioned above). But… No. Photons are bosons anyway, so we surely need to add, not subtract. So what is it?

[…] Feynman explains it later, much later – in the third of the four chapters of this little book, to be precise. It’s, quite simply, the result of the simplified model he uses in that first chapter. The photon can do anything really, and so there are many more arrows than just two. We actually should look at an infinite number of arrows, representing all possible paths in spacetime, and, hence, the two arrows (i.e. the one for the reflection from the front and back surface respectively) are combinations of many other arrows themselves. So how does that work?

An analysis of partial reflection (I)

The analysis in Chapter 3 of the same phenomenon (i.e. partial reflection by glass) is a simplified analysis too, but it’s much better – because there are no ‘random’ rules here. It is what Leighton promises to the reader in his introduction: “A complete description, accurate in every detail, of a framework onto which more advanced concepts can be attached without modification. Nothing has to be ‘unlearned’ later.

Well… Accurate in every detail? Perhaps not. But it’s good, and I still warmly recommend a reading of this delightful little book to anyone who’d ask me what to read as a non-mathematical introduction to quantum mechanics. I’ll limit myself here to just some annotations.

The first drawing (a) depicts the situation:

  1. A photon from a light source is being reflected by the glass. Note that it may also go straight through, but that’s a possibility we’ll analyze separately. We first assume that the photon is effectively being reflected by the glass, and so we want to calculate the probability of that event using all these ‘arrows’, i.e. the underlying probability amplitudes.
  2. As for the geometry of the situation: while the light source and the detector seem to be positioned at some angle from the normal, that is not the case: the photon travels straight down (and up again when reflected). It’s just a limitation of the drawing. It doesn’t really matter much for the analysis: we could look at a light beam coming in at some angle, but so we’re not doing that. It’s the simplest situation possible, in terms of experimental set-up that is. I just want to be clear on that.

partial reflection

Now, rather than looking at the front and back surface only (as Feynman does in Chapter 1), the glass sheet is now divided into a number of very thin sections: five, in this case, so we have six points from which the photon can be scattered into the detector at A: X1 to X6. So that makes six possible paths. That’s quite a simplification but it’s easy to see it doesn’t matter: adding more sections would result in many more arrows, but these arrows would also be much smaller, and so the final arrow would be the same.

The more significant simplification is that the paths are all straight paths, and that the photon is assumed to travel at the speed of light, always. If you haven’t read the booklet, you’ll say that’s obvious, but it’s not: a photon has an amplitude to go faster or slower than c but, as Feynman points out, these amplitudes cancel out over longer distances. Likewise, a photon can follow any path in space really, including terribly crooked paths, but these paths also cancel out. As Feynman puts it: “Only the paths near the straight-line path have arrows pointing in nearly the same direction, because their timings are nearly the same, and only these arrows are important, because it is from them that we accumulate a large final arrow.” That makes perfect sense, so there’s no problem with the analysis here either.

So let’s have a look at those six arrows in illustration (b). They point in a slightly different direction because the paths are slightly different and, hence, the distances (and, therefore, the timings) are different too. Now, Feynman (but I think it’s Leighton really) loses himself here in a digression on monochromatic light sources. A photon is a photon: it will have some wave function with a phase that varies in time and in space and, hence, illustration (b) makes perfect sense. [I won’t quote what he writes on a ‘monochromatic light source’ because it’s quite confusing and, IMHO, not correct.]

The stopwatch metaphor has only one minor shortcoming: the hand of a stopwatch rotates clockwise (obviously!), while the phase of an actual wave function goes counterclockwise with time. That’s just convention, and I’ll come back to it when I discuss the mathematical representation of the so-called wave function, which gives you these amplitudes. However, it doesn’t change the analysis, because it’s the difference in the phase that matters when combining amplitudes, so the clock can turn in either way indeed, as long as we’re agreed on it.

At this point, I can’t resist: I’ll just throw the math in. If you don’t like it, you can just skip the section that follows.

Feynman’s arrows and the wave function

The mathematical representation of Feynman’s ‘arrows’ is the wave function:

f = f(x–ct)

Is that the wave function? Yes. It is: it’s a function whose argument is x – ct, with x the position in space, and t the time variable. As for c, that’s the speed of light. We throw it in to make the units in which we measure time and position compatible. 

Really? Yes: f is just a regular wave function. To make it look somewhat more impressive, I could use the Greek symbol Φ (phi) or Ψ (psi) for it, but it’s just what it is: a function whose value depends on position and time indeed, so we write f = f(x–ct). Let me explain the minus sign and the c in the argument.

Time and space are interchangeable in the argument, provided we measure time in the ‘right’ units, and so that’s why we multiply the time in seconds with c, so the new unit of time becomes the time that light needs to travel a distance of one meter. That also explains the minus sign in front of ct: if we add one distance unit (i.e. one meter) to the argument, we have to subtract one time unit from it – the new time unit of course, so that’s the time that light needs to travel one meter – in order to get the same value for f. [If you don’t get that x–ct thing, just think a while about this, or make some drawing of a wave function. Also note that the spacetime diagram in illustration (b) above assumes the same: time is measured in an equivalent unit as distance, so the 45% line from the south-west to the north-east, that bounces back to the north-west, represents a photon traveling at speed c in space indeed: one unit of time corresponds to one meter of travel.]

Now I want to be a bit more aggressive. I said is a simple function. That’s true and not true at the same time. It’s a simple function, but it gives you probability amplitudes, which are complex numbers – and you may think that complex numbers are, perhaps, not so simple. However, you shouldn’t be put off. Complex numbers are really like Feynman’s ‘arrows’ and, hence, fairly simple things indeed. They have two dimensions, so to say: an a– and a b-coordinate. [I’d say an x– and y-coordinate, because that’s what you usually see, but then I used the x symbol already for the position variable in the argument of the function, so you have to switch to a and b for a while now.]

This a– and b-coordinate are referred to as the real and imaginary part of a complex number respectively. The terms ‘real’ and ‘imaginary’ are confusing because both parts are ‘real’ – well… As real as numbers can be, I’d say. :-) They’re just two different directions in space: the real axis is the a-axis in coordinate space, and the imaginary axis is the b-axis. So we could write it as an ordered pair of numbers (a, b). However, we usually write it as a number itself, and we distinguish the b-coordinate from the a-coordinate by writing an i in front: (a, b) = a + ib. So our function f = f(x–ct) is a complex-valued function: it will give you two numbers (an a and a b) instead of just one when you ‘feed’ it with specific values for x and t. So we write:

f = f(x–ct) = (a, b) = a + ib

So what’s the shape of this function? Is it linear or irregular or what? We’re talking a very regular wave function here, so it’s shape is ‘regular’ indeed. It’s a periodic function, so it repeats itself again and again. The animations below give you some idea of such ‘regular’ wave functions. Animation A and B shows a real-valued ‘wave': a ball on a string that goes up and down, for ever and ever. Animations C to H are – believe it or not – basically the same thing, but so we have two numbers going up and down. That’s all.


The wave functions above are, obviously, confined in space, and so the horizontal axis represents the position in space. What we see, then, is how the real and imaginary part of these wave functions varies as time goes by. [Think of the blue graph as the real part, and the imaginary part as the pinkish thing – or the other way around. It doesn’t matter.] Now, our wave function – i.e. the one that Feynman uses to calculate all those probabilities – is even more regular than those shown above: its real part is an ordinary cosine function, and it’s imaginary part is a sine. Let me write this in math:

f = f(x–ct) = a + ib = r(cosφ + isinφ)

It’s really the most regular wave function in the world: the very simple illustration below shows how the two components of f vary as a function in space (i.e. the horizontal axis) while we keep the time fixed, or vice versa: it could also show how the function varies in time at one particular point in space, in which case the horizontal axis would represent the time variable. It is what it is: a sine and a cosine function, with the angle φ as its argument.

cos and sine

Note that a sine function is the same as a cosine function, but it just lags a bit. To be precise, the phase difference is 90°, or π/2 in radians (the radian (i.e. the length of the arc on the unit circle) is a much more natural unit to express angles, as it’s fully compatible with our distance unit and, hence, most – if not all – of our other units). Indeed, you may or may not remember the following trigonometric identities: sinφ = cos(π/2–φ) = cos(φ–π/2).

In any case, now we have some r and φ here, instead of a and b. You probably wonder where I am going with all of this. Where are the x and t variables? Be patient! You’re right. We’ll get there. I have to explain that r and φ first. Together, they are the so-called polar coordinates of Feynman’s ‘arrow’ (i.e. the amplitude). Polar coordinates are just as good as coordinates as these Cartesian coordinates we’re used to (i.e. a and b). It’s just a different coordinate system. The illustration below shows how they are related to each other. If you remember anything from your high school trigonometry course, you’ll immediately agree that a is, obviously, equal to rcosφ, and b is rsinφ, which is what I wrote above. Just as good? Well… The polar coordinate system has some disadvantages (all of those expressions and rules we learned in vector analysis assume rectangular coordinates, and so we should watch out!) but, for our purpose here, polar coordinates are actually easier to work with, so they’re better.


Feynman’s wave function is extremely simple because his ‘arrows’ have a fixed length, just like the stopwatch hand. They’re just turning around and around and around as time goes by. In other words, is constant and does not depend on position and time. It’s the angle φ that’s turning and turning and turning as the stopwatch ticks while our photon is covering larger and larger distances. Hence, we need to find a formula for φ that makes it explicit how φ changes as a function in spacetime. That φ variable is referred to as the phase of the wave function. That’s a term you’ll encounter frequently and so I had better mention it. In fact, it’s generally used as a synonym for any angle, as you can see from my remark on the phase difference between a sine and cosine function.

So how do we express φ as a function of x and t? That’s where Euler’s formula comes in. Feynman calls it the most remarkable formula in mathematics – our jewel! And he’s probably right: of all the theorems and formulas, I guess this is the one we can’t do without when studying physics. I’ve written about this in another post, and repeating what I wrote there would eat up too much space, so I won’t do it and just give you that formula. A regular complex-valued wave function can be represented as a complex (natural) exponential function, i.e. an exponential function with Euler’s number e (i.e. 2.728…) as the base, and the complex number iφ as the (variable) exponent. Indeed, according to Euler’s formula, we can write:

f = f(x–ct) = a + ib = r(cosφ + isinφ) = r·eiφ

As I haven’t explained Euler’s formula (you should really have a look at my posts on it), you should just believe me when I say that r·eiφ is an ‘arrow’ indeed, with length r and angle φ (phi), as illustrated above, with a and b coordinates arcosφ and b = rsinφ. What you should be able to do now, is to imagine how that φ angle goes round and round as time goes by, just like Feynman’s ‘arrow’ goes round and round – just like a stopwatch hand indeed, but note our φ angle turns counterclockwise indeed.

Fine, you’ll say – but so we need a mathematical expression, don’t we? Yes,we do. We need to know how that φ angle (i.e. the variable in our r·eiφ function) changes as a function of x and t indeed. It turns out that the φ in r·eiφ can be substituted as follows:

eiφ = r·ei(ωt–kx) = r·eik(x–ct)

Huh? Yes. The phase (φ) of the probability amplitude (i.e. the ‘arrow’) is a simple linear function of x and t indeed: φ = ωt–kx = –k(x–ct). What about all these new symbols, k and ω? The ω and k in this equation are the so-called angular frequency and the wave number of the wave. The angular frequency is just the frequency expressed in radians, and you should think of the wave number as the frequency in space. [I could write some more here, but I can’t make it too long, and you can easily look up stuff like this on the Web.] Now, the propagation speed c of the wave is, quite simply, the ratio of these two numbers: c = ω/k. [Again, it’s easy to show how that works, but I won’t do it here.]

Now you know it all, and so it’s time to get back to the lesson.

An analysis of partial reflection (II)

Why did I digress? Well… I think that what I write above makes much more sense than Leighton’s rather convoluted description of a monochromatic light source as he tries to explain those arrows in diagram (b) above. Whatever it is, a monochromatic light source is surely not “a device that has been carefully arranged so that the amplitude for a photon to be emitted at a certain time can be easily calculated.” That’s plain nonsense. Monochromatic light is light of a specific color, so all photons have the same frequency (or, to be precise, their wave functions have all the same well-defined frequency), but these photons are not in phase. Photons are emitted by atoms, as an electron moves from one energy level to the other. Now, when a photon is emitted, what actually happens is that the atom radiates a train of waves only for about 10–8 sec, so that’s about 10 billionths of a second. After 10–8 sec, some other atom takes over, and then another atom, and so on. Each atom emits one photon, whose energy is the difference between the two energy levels that the electron is jumping to or from. So the phase of the light that is being emitted can really only stay the same for about 10–8 sec. Full stop.

Now, what I write above on how atoms actually emit photons is a paraphrase of Feynman’s own words in his much more serious series of Lectures on Mechanics, Radiation and Heat. Therefore, I am pretty sure it’s Leighton who gets somewhat lost when trying to explain what’s happening. It’s not photons that interfere. It’s the probability amplitudes associated with the various paths that a photon can take. To be fully precise, we’re talking the photon here, i.e. the one that ends up in the detector, and so what’s going on is that the photon is interfering with itself. Indeed, that’s exactly what the ‘craziness’ of quantum mechanics is all about: we sent electrons, one by one, through two slits, and we observe an interference pattern. Likewise, we got one photon here, that can go various ways, and it’s those amplitudes that interfere, so… Yes: the photon interferes with itself.

OK. Let’s get back to the lesson and look at diagram (c) now, in which the six arrows are added. As mentioned above, it would not make any difference if we’d divide the glass in 10 or 20 or 1000 or a zillion ‘very thin’ sections: there would be many more arrows, but they would be much smaller ones, and they would cover the same circular segment: its two endpoints would define the same arc, and the same chord on the circle that we can draw when extending that circular segment. Indeed, the six little arrows define a circle, and that’s the key to understanding what happens in the first chapter of Feynman’s QED, where he adds two arrows only, but with a reversal of the direction of the ‘front reflection’ arrow. Here there’s no confusion – Feynman (or Leighton) eloquently describe what they do:

“There is a mathematical trick we can use to get the same answer [i.e. the same final arrow]: Connecting the arrows in order from 1 to 6, we get something like an arc, or part of a circle. The final arrow forms the chord of this arc. If we draw arrows from the center of the ‘circle’ to the tail of arrow 1 and to the head of arrow 6, we get two radii. If the radius arrow from the center to arrow 1 is turned 180° (“subtracted”), then it can be combined with the other radius arrow to give us the same final arrow! That’s what I was doing in the first lecture: these two radii are the two arrows I said represented the ‘front surface’ and ‘back surface’ reflections. They each have the famous length of 0.2.”

That’s what’s shown in part (d) of the illustration above and, in case you’re still wondering what’s going on, the illustration below should help you to make your own drawings now.

CircularsegmentSo… That explains the phenomenon Feynman wanted to explain, which is a phenomenon that cannot be explained in classical physics. Let me copy the original here:


Partial reflection by glass—a phenomenon that cannot be explained in classical physics? Really?

You’re right to raise an objection: partial reflection by glass can, in fact, be explained by the classical theory of light as an electromagnetic wave. The assumption then is that light is effectively being reflected by both the front and back surface and the reflected waves combine or cancel out (depending on the thickness of the glass and the angle of reflection indeed) to match the observed pattern. In fact, that’s how the phenomenon was explained for hundreds of years! The point to note is that the wave theory of light collapsed as technology advanced, and experiments could be made with very weak light hitting photomultipliers. As Feynman writes: “As the light got dimmer and dimmer, the photomultipliers kept making full-sized clicks—there were just fewer of them. Light behaved as particles!”

The point is that a photon behaves like an electron when going through two slits: it interferes with itself! As Feynman notes, we do not have any ‘common-sense’ theory to explain what’s going on here. We only have quantum mechanics, and quantum mechanics is an “uncommon-sensy” theory: a “strange” or even “absurd” theory, that looks “cockeyed” and incorporates “crazy ideas”. But… It works.

Now that we’re here, I might just as well add a few more paragraphs to fully summarize this lovely publication – if only because summarizing stuff like this helps me to come to terms with understanding things better myself!

Calculating amplitudes: the basic actions

So it all boils down to calculating amplitudes: an event is divided into alternative ways of how the event can happen, and the arrows for each way are ‘added’. Now, every way an event can happen can be further subdivided into successive steps. The amplitudes for these steps are then ‘multiplied’. For example, the amplitude for a photon to go from A to C via B is the ‘product’ of the amplitude to go from A to B and the amplitude to go from B to C.

I marked the terms ‘multiplied’ and ‘product’ with apostrophes, as if to say it’s not a ‘real’ product. But it is an actual multiplication: it’s the product of two complex numbers. Feynman does not explicitly compare this product to other products, such as the dot (•) or cross (×) product of two vectors, but he uses the ∗ symbol for multiplication here, which clearly distinguishes VW from VW or V×W indeed or, more simply, from the product of two ordinary numbers. [Ordinary numbers? Well… With ‘ordinary’ numbers, I mean real numbers, of course, but once you get used to complex numbers, you won’t like that term anymore, because complex numbers start feeling just as ‘real’ as other numbers – especially when you get used to the idea of those complex-valued wave functions underneath reality.]

Now, multiplying complex numbers, or ‘arrows’ using QED’s simpler language, consists of adding their angles and multiplying their lengths. That being said, the arrows here all have a length smaller than one (because their square cannot be larger than one, because that square is a probability, i.e. a (real) number between 0 and 1), Feynman defines successive multiplication as successive ‘shrinks and turns’ of the unit arrow. That all makes sense – very much sense.

But what’s the basic action? As Feynman puts the question: “How far can we push this process of splitting events into simpler and simpler subevents? What are the smallest possible bits and pieces? Is there a limit?” He immediately answers his own question. There are three ‘basic actions':

  1. A photon goes from one point (in spacetime) to another: this amplitude is denoted by P(A to B).
  2. An electron goes from one point to another: E(A to B).
  3. An electron emits and/or absorbs a photon: this is referred to as a ‘junction’ or a ‘coupling’, and the amplitude for this is denoted by the symbol j, i.e. the so-called junction number.

How do we find the amplitudes for these?

The amplitudes for (1) and (2) are given by a so-called propagator functions, which give you the probability amplitude for a particle to travel from one place to another in a given time indeed, or to travel with a certain energy and momentum. Judging from the Wikipedia article on these functions, the subject-matter is horrendously complicated, and the formulas are too, even if Feynman says it’s ‘very simple’ – for a photon, that is. The key point to note is that any path is possible. Moreover, there are also amplitudes for photons to go faster or slower than the speed of light (c)! However, these amplitudes make smaller contributions, and cancel out over longer distances. The same goes for the crooked paths: the amplitudes cancel each other out as well.

What remains are the ‘nearby paths’. In my previous post (check the section on electromagnetic radiation), I noted that, according to classical wave theory, a light wave does not occupy any physical space: we have electric and magnetic field vectors that oscillate in a direction that’s perpendicular to the direction of propagation, but these do not take up any space. In quantum mechanics, the situation is quite different. As Feynman puts it: “When you try to squeeze light too much [by forcing it to go through a small hole, for example, as illustrated below], it refuses to cooperate and begins to spread out.” He explains this in the text below the second drawing: “There are not enough arrows representing the paths to Q to cancel each other out.”

Many arrowsFew arrows

Not enough arrows? We can subdivide space in as many paths as we want, can’t we? Do probability amplitudes take up space? And now that we’re asking the tougher questions, what’s a ‘small’ hole? What’s ‘small’ and what’s ‘large’ in this funny business?

Unfortunately, there’s not much of an attempt in the booklet to try to answer these questions. One can begin to formulate some kind of answer when doing some more thinking about these wave functions. To be precise, we need to start looking at their wavelength. The frequency of a typical photon (and, hence, of the wave function representing that photon) is astronomically high. For visible light, it’s in the range of 430 to 790 teraherz, i.e. 430–790×1012 Hz. We can’t imagine such incredible numbers. Because the frequency is so high, the wavelength is unimaginably small. There’s a very simple and straightforward relation between wavelength (λ) and frequency (ν) indeed: c = λν. In words: the speed of a wave is the wavelength (i.e. the distance (in space) of one cycle) times the frequency (i.e. the number of cycles per second). So visible light has a wavelength in the range of 390 to 700 nanometer, i.e. 390–700 billionths of a meter. A meter is a rather large unit, you’ll say, so let me express it differently: it’s less than one thousandth of a micrometer, and a micrometer itself is one thousandth of a millimeter. So, no, we can’t imagine that distance either.

That being said, that wavelength is there, and it does imply that some kind of scale is involved. A wavelength covers one full cycle of the oscillation: it means that, if we travel one wavelength in space, our ‘arrow’ will point in the same direction again. Both drawings above (Figure 33 and 34) suggest the space between the two blocks is less than one wavelength. It’s a bit hard to make sense of the direction of the arrows but note the following:

  1. The phase difference between (a) the ‘arrow’ associated with the straight route (i.e. the ‘middle’ path) and (b) the ‘arrow’ associated with the ‘northern’ or ‘southern’ route (i.e. the ‘highest’ and ‘lowest’ path) in Figure 33 is like quarter of a full turn, i.e. 90°. [Note that the arrows for the northern and southern route to P point in the same direction, because they are associated with the same timing. The same is true for the two arrows in-between the northern/southern route and the middle path.]
  2. In Figure 34, the phase difference between the longer routes and the straight route is much less, like 10° only.

Now, the calculations involved in these analyses are quite complicated but you can see the explanation makes sense: the gap between the two blocks is much narrower in Figure 34 and, hence, the geometry of the situation does imply that the phase difference between the amplitudes associated with the ‘northern’ and ‘southern’ routes to Q is much smaller than the phase difference between those amplitudes in Figure 33. To be precise,

  1. The phase difference between (a) the ‘arrow’ associated with the ‘northern route’ to Q and (b) the ‘arrow’ associated with the ‘southern’ route to Q (i.e. the ‘highest’ and ‘lowest’ path) in Figure 33 is like three quarters of a full turn, i.e. 270°. Hence, the final arrow is very short indeed, which means that the probability of the photon going to Q is very low indeed. [Note that the arrows for the northern and southern route no longer point in the same direction, because they are associated with very different timings: the ‘southern route’ is shorter and, hence, faster.]
  2. In Figure 34, we have a phase difference between the shortest and longest route that is like 60° only and, hence, the final arrow is very sizable and, hence, the probability of the photon going to Q is, accordingly, quite substantial.

OK… What did I say here about P(A to B)? Nothing much. I basically complained about the way Feynman (or Leighton, more probably) explained the interference or diffraction phenomenon and tried to do a better job before tacking the subject indeed: how do we get that P(A to B)?

A photon can follow any path from A to B, including the craziest ones (as shown below). Any path? Good players give a billiard ball extra spin that may make the ball move in a curved trajectory, and will also affect its its collision with any other ball – but a trajectory like the one below? Why would a photon suddenly take a sharp turn left, or right, or up, or down? What’s the mechanism here? What are the ‘wheels and gears inside’ of the photon that (a) make a photon choose this path in the first place and (b) allow it to whirl, swirl and twirl like that?


We don’t know. In fact, the question may make no sense, because we don’t know what actually happens when a photon travels through space. We know it leaves as a lump of energy, and we know it arrives as a similar lump of energy. When we actually put a detector to check which path is followed – by putting special detectors at the slits in the famous double-slit experiment, for example – the interference pattern disappears. So… Well… We don’t know how to describe what’s going on: a photon is not a billiard ball, and it’s not a classical electromagnetic wave either. It is neither. The only thing that we know is that we get probabilities that match with the results of experiment if we accept this nonsensical assumptions and do all of the crazy arithmetic involved. Let me get back to the lesson.  

Photons can also travel faster or slower than the speed of light (c is some 3×108 meter per second but, in our special time unit, it’s equal to one). Does that violate relativity? It doesn’t, apparently, but for the reasoning behind I must, once again, refer you to more sophisticated writing.

In any case, if the mathematicians and physicists have to take into account both of these assumptions (any path is possible, and speeds higher or lower than c are possible too!), they must be looking at some kind of horrendous integral, don’t they?

They are. When everything is said and done, that propagator function is some monstrous integral indeed, and I can’t explain it to you in a couple of words – if only because I am struggling with it myself. :-) So I will just believe Feynman when he says that, when the mathematicians and physicists are finished with that integral, we do get some simple formula which depends on the value of the so-called spacetime interval between two ‘points’ – let’s just call them 1 and 2 – in space and time. You’ve surely heard about it before: it’s denoted by sor I (or whatever) and it’s zero if an object moves at the speed of light, which is what light is supposed to do – but so we’re dealing with a different situation here. :-) To be precise, I consists of two parts:

  1. The distance d between the two points (1 and 2), i.e. Δr, which is just the square root of d= Δr= (x2–x2)2+(y2–y1)2+(z2–z1)2. [This formula is just a three-dimensional version of the Pythagorean Theorem.]
  2. The ‘distance’ (or difference) in time, which is usually expressed in those ‘equivalent’ time units that we introduced above already, i.e. the time that light – traveling at the speed of light :-) – needs to travel one meter. We will usually see that component of I in a squared version too: Δt= (t2–t1)2, or, if time is expressed in the ‘old’ unit (i.e. seconds), then we write c2Δt2 = c2(t2–t1)2.

Now, the spacetime interval itself is defined as the excess of the squared distance (in space) over the squared time difference:

s= I = Δr– Δt= (x2–x2)2+(y2–y1)2+(z2–z1)– (t2–t1)2

You know we can then define time-like, space-like and light-like intervals, and these, in turn, define the so-called light cone. The spacetime interval can be negative, for example. In that case, Δt2 will be greater than Δr2, so there is no ‘excess’ of distance over time: it means that the time difference is large enough to allow for a cause–effect relation between the two events, and the interval is said to be time-like. In any case, that’s not the topic of this post, and I am sorry I keep digressing.

The point to note is that the formula for the propagator favors light-like intervals: they are associated with large arrows. Space- and time-like intervals, on the other hand, will contribute much smaller arrows. In addition, the arrows for space- and time-like intervals point in opposite directions, so they will cancel each other out. So, when everything is said and done, over longer distances, light does tend to travel in a straight line and at the speed of light. At least, that’s what Feynman tells us, and I tend to believe him. :-)

But so where’s the formula? Feynman doesn’t give it, probably because it would indeed confuse us. Just google ‘propagator for a photon’ and you’ll see what I mean. He does integrate the above conclusions in that illustration (b) though. What illustration? 

Oh… Sorry. You probably forgot what I am trying to do here, but so we’re looking at that analysis of partial reflection of light by glass. Let me insert it once again so you don’t have to scroll all the way up.

partial reflection

You’ll remember that Feynman divided the glass sheet into five sections and, hence, there are six points from which the photon can be scattered into the detector at A: X1 to X6. So that makes six possible paths: these paths are all straight (so Feynman makes abstraction of all of the crooked paths indeed), and the other assumption is that the photon effectively traveled at the speed of light, whatever path it took (so Feynman also assumes the amplitudes for speeds higher or lower than c cancel each other out). So that explains the difference in time at emission from the light source. The longest path is the path to point X6 and then back up to the detector. If the photon would have taken that path, it would have to be emitted earlier in time – earlier as compared to the other possibilities, which take less time. So it would have to be emitted at T = T6. The direction of the ‘arrow’ is like one o’clock. The shorter paths are associated with shorter times (the difference between the time of arrival and departure is shorter) and so T5 is associated with an arrow in the 12 o’clock direction, T5 is 11 o’clock, and so on, till T5, which points at the 9 o’clock direction.

But… What? These arrows also include the reflection, i.e. the interaction between the photon and some electron in the glass, don’t they? […] Right you are. Sorry. So… Yes. The action above involves four ‘basic actions':

  1. A photon is emitted by the source at a time T = T1, T2, T3, T4, T5 or T6: we don’t know. Quantum-mechanical uncertainty. :-)
  2. It goes from the source to one of the points X = X1, X2, X3, X4, X5 or Xin the glass: we don’t know which one, because we don’t have a detector there.
  3. The photon interacts with an electron at that point.
  4. It makes it way back up to the detector at A.

Step 1 does not have any amplitude. It’s just the start of the event. Well… We start with the unit arrow pointing north actually, so its length is one and its direction is 12 o’clock. And so we’ll shrink and turn it, i.e. multiply it with other arrows, in the next steps.

Steps 2 and 4 are straightforward and are associated with arrows of the same length. Their direction depends on the distance traveled and/or the time of emission: it amounts to the same because we assume the speed is constant and exactly the same for the six possibilities (that speed is c = 1 obviously). But what length? Well… Some length according to that formula which Feynman didn’t give us. :-)

So now we need to analyze the third of those three basic actions: a ‘junction’ or ‘coupling’ between an electron and a photon. At this point, Feynman embarks on a delightful story highlighting the difficulties involved in calculating that amplitude. A photon can travel following crooked paths and at devious speeds, but an electron is even worse: it can take what Feynman refers to as ‘one-hop flights’, ‘two-hop flights’, ‘three-hop flights’,… any ‘n-hop flight’ really. Each stop involves an additional amplitude, which is represented by n2, with n some number that has been determined from experiment. The formula for E(A to B) then becomes a series of terms: P(A to B) + (P(A to C)∗n2∗(P(C to B) + (P(A to D)∗n2∗P(D to E)∗n2∗P(E to C)+…

P(A to B) is the ‘one-hop flight’ here, while C, D and E are intermediate points, and (P(A to C)∗n2∗(P(C to B) and (P(A to D)∗n2∗P(D to E)∗n2∗P(E to C) are the ‘two-hop’ and ‘three-hop’ flight respectively. Note that this calculation has to be made for all possible intermediate points C, D, E and so on. To make matters worse, the theory assumes that electrons can emit and absorb photons along the way, and then there’s a host of other problems, which Feynman tries to explain in the last and final chapter of his little book. […]

Hey! Stop it!


You’re talking about E(A to B) here. You’re supposed to be talking about that junction number j.

Oh… Sorry. You’re right. Well… That junction number j is about –0.1. I know that looks like an ordinary number, but it’s an amplitude, so you should interpret it as an arrow. When you multiply it with another arrow, it amounts to a shrink to one-tenth, and half a turn. Feynman entertains us also on the difficulties of calculating this number but, you’re right, I shouldn’t be trying to copy him here – if only because it’s about time I finish this post. :-)

So let me conclude it indeed. We can apply the same transformation (i.e. we multiply with j) to each of the six arrows we’ve got so far, and the result is those six arrows next to the time axis in illustration (b). And then we combine them to get that arc, and then we apply that mathematical trick to show we get the same result as in a classical wave-theoretical analysis of partial reflection.

Done. […] Are you happy now?

[…] You shouldn’t be. There are so many questions that have been left unanswered. For starters, Feynman never gives that formula for the length of P(A to B), so we have no clue about the length of these arrows and, hence, about that arc. If physicists know their length, it seems to have been calculated backwards – from those 0.2 arrows used in the classical wave theory of light. Feynman is actually quite honest about that, and simply writes:

“The radius of the arc [i.e. the arc that determines the final arrow] evidently depends on the length of the arrow for each section, which is ultimately determined by the amplitude S that an electron in an atom of glass scatters a photon. This radius can be calculated using the formulas for the three basic actions. […] It must be said, however, that no direct calculation from first principles for a substance as complex as glass has actually been done. In such cases, the radius is determined by experiment. For glass, it has been determined from experiment that the radius is approximately 0.2 (when the light shines directly onto the glass at right angles).”

Well… OK. I think that says enough. So we have a theory – or first principles at last – but we don’t them to calculate. That actually sounds a bit like metaphysics to me. :-) In any case… Well… Bye for now!

But… Hey! You said you’d analyze how light goes straight through the glass as well?

Yes. I did. But I don’t feel like doing that right now. I think we’ve got enough stuff to think about right now, don’t we? :-)

The Strange Theory of Light and Matter (I)

Applied vector analysis (II)

We’ve covered a lot of ground in the previous post, but we’re not quite there yet. We need to look at a few more things in order to gain some kind of ‘physical’ understanding’ of Maxwell’s equations, as opposed to a merely ‘mathematical’ understanding only. That will probably disappoint you. In fact, you probably wonder why one needs to know about Gauss’ and Stokes’ Theorems if the only objective is to ‘understand’ Maxwell’s equations.

To some extent, your skepticism is justified. It’s already quite something to get some feel for those two new operators we’ve introduced in the previous post, i.e. the divergence (div) and curl operators, denoted by ∇• and × respectively. By now, you understand that these two operators act on a vector field, such as the electric field vector E, or the magnetic field vector B, or, in the example we used, the heat flow h, so we should write •(a vector) and ×(a vector. And, as for that del operator – i.e.  without the dot (•) or the cross (×) – if there’s one diagram you should be able to draw off the top of your head, it’s the one below, which shows:

  1. The heat flow vector h, whose magnitude is the thermal energy that passes, per unit time and per unit area, through an infinitesimally small isothermal surface, so we write: h = |h| = ΔJ/ΔA.
  2. The gradient vector T, whose direction is opposite to that of h, and whose magnitude is proportional to h, so we can write the so-called differential equation of heat flow: h = –κT.
  3. The components of the vector dot product ΔT = T•ΔR = |T|·ΔR·cosθ.

Temperature drop

You should also remember that we can re-write that ΔT = T•ΔR = |T|·ΔR·cosθ equation – which we can also write as ΔT/ΔR = |T|·cosθ – in a more general form:

Δψ/ΔR = |ψ|·cosθ

That equation says that the component of the gradient vector ψ along a small displacement ΔR is equal to the rate of change of ψ in the direction of ΔRAnd then we had three important theorems, but I can imagine you don’t want to hear about them anymore. So what can we do without them? Let’s have a look at Maxwell’s equations again and explore some linkages.

Curl-free and divergence-free fields

From what I wrote in my previous post, you should remember that:

  1. The curl of a vector field (i.e. ×C) represents its circulation, i.e. its (infinitesimal) rotation.
  2. Its divergence (i.e. ∇•C) represents the outward flux out of an (infinitesimal) volume around the point we’re considering.

Back to Maxwell’s equations:

Maxwell's equations-2

Let’s start at the bottom, i.e. with equation (4). It says that a changing electric field (i.e. ∂E/∂t ≠ 0) and/or a (steady) electric current (j0) will cause some circulation of B, i.e. the magnetic field. It’s important to note that (a) the electric field has to change and/or (b) that electric charges (positive or negative) have to move  in order to cause some circulation of B: a steady electric field will not result in any magnetic effects.

This brings us to the first and easiest of all the circumstances we can analyze: the static case. In that case, the time derivatives ∂E/∂t and ∂B/∂t are zero, and Maxwell’s equations reduce to:

  1. ∇•E = ρ/ε0. In this equation, we have ρ, which represents the so-called charge density, which describes the distribution of electric charges in space: ρ = ρ(x, y, z). To put it simply: ρ is the ‘amount of charge’ (which we’ll denote by Δq) per unit volume at a given point. Hence, if we  consider a small volume (ΔV) located at point (x, y, z) in space – an infinitesimally small volume, in fact (as usual) –then we can write: Δq =  ρ(x, y, z)ΔV. [As for ε0, you already know this is a constant which ensures all units are ‘compatible’.] This equation basically says we have some flux of E, the exact amount of which is determined by the charge density ρ or, more in general, by the charge distribution in space.  
  2. ×E = 0. That means that the curl of E is zero: everywhere, and always. So there’s no circulation of E. We call this a curl-free field.
  3. B = 0. That means that the divergence of B is zero: everywhere, and always. So there’s no flux of B. None. We call this a divergence-free field.
  4. c2∇×B = j0. So here we have steady current(s) causing some circulation of B, the exact amount of which is determined by the (total) current j. [What about that cfactor? Well… We talked about that before: magnetism is, basically, a relativistic effect, and so that’s where that factor comes from. I’ll just refer you to what Feynman writes about this in his Lectures, and warmly recommend to read it, because it’s really quite interesting: it gave me at least a much deeper understanding of what it’s all about, and so I hope it will help you as much.]

Now you’ll say: why bother with all these difficult mathematical constructs if we’re going to consider curl-free and divergence-free fields only. Well… B is not curl-free, and E is not divergence-free. To be precise:

  1. E is a field with zero curl and a given divergence, and
  2. B is a field with zero divergence and a given curl.

Yeah, but why can’t we analyze fields that have both curl and divergence? The answer is: we can, and we will, but we have to start somewhere, and so we start with an easier analysis first.

Electrostatics and magnetostatics

The first thing you should note is that, in the static case (i.e. when charges and currents are static), there is no interdependence between E and B. The two fields are not interconnected, so to say. Therefore, we can neatly separate them into two pairs:

  1. Electrostatics: (1) ∇•E = ρ/ε0 and (2) ×E = 0.
  2. Magnetostatics: (1) ∇×B = j/c2ε0 and (2) B = 0.

Now, I won’t go through all of the particularities involved. In fact, I’ll refer you to a real physics textbook on that (like Feynman’s Lectures indeed). My aim here is to use these equations to introduce some more math and to gain a better understanding of vector calculus – an understanding that goes, in fact, beyond the math (i.e. a ‘physical’ understanding, as Feynman terms it).

At this point, I have to introduce two additional theorems. They are nice and easy to understand (although not so easy to prove, and so I won’t):

Theorem 1: If we have a vector field – let’s denote it by C – and we find that its curl is zero everywhere, then C must be the gradient of something. In other words, there must be some scalar field ψ (psi) such that C is equal to the gradient of ψ. It’s easier to write this down as follows:

If ×= 0, there is a ψ such that C = ψ.

Theorem 2: If we have a vector field – let’s denote it by D, just to introduce yet another letter – and we find that its divergence is zero everywhere, then D must be the curl of some vector field A. So we can write:

If D = 0, there is an A such that D = ×A.

We can apply this to the situation at hand:

  1. For E, there is some scalar potential Φ such that E = –Φ. [Note that we could integrate the minus sign in Φ, but we leave it there as a reminder that the situation is similar to that of heat flow. It’s a matter of convention really: E ‘flows’ from higher to lower potential.]
  2. For B, there is a so-called vector potential A such that B = ×A.

The whole game is then to compute Φ and A everywhere. We can then take the gradient of Φ, and the curl of A, to find the electric and magnetic field respectively, at every single point in space. In fact, most of Feynman’s second Volume of his Lectures is devoted to that, so I’ll refer you that if you’d be interested. As said, my goal here is just to introduce the basics of vector calculus, so you gain a better understanding of physics, i.e. an understanding which goes beyond the math.


We’re almost done. Electrodynamics is, of course, much more complicated than the static case, but I don’t have the intention to go too much in detail here. The important thing is to see the linkages in Maxwell’s equations. I’ve highlighted them below:

Maxwell interaction

I know this looks messy, but it’s actually not so complicated. The interactions between the electric and magnetic field are governed by equation (2) and (4), so equation (1) and (3) is just ‘statics’. Something needs to trigger it all, of course. I assume it’s an electric current (that’s the arrow marked by [0]).

Indeed, equation (4), i.e. c2∇×B = ∂E/∂t + j0, implies that a changing electric current – an accelerating electric charge, for instance – will cause the circulation of B to change. More specifically, we can write: ∂[c2∇×B]/∂t = ∂[j0]∂t. However, as the circulation of B changes, the magnetic field B itself must be changing. Hence, we have a non-zero time derivative of B (∂B/∂t ≠ 0). But, then, according to equation (2), i.e. ∇×E = –∂B/∂t, we’ll have some circulation of E. That’s the dynamics marked by the red arrows [1].

Now, assuming that ∂B/∂t is not constant (because that electric charge accelerates and decelerates, for example), the time derivative ∂E/∂t will be non-zero too (∂E/∂t ≠ 0). But so that feeds back into equation (4), according to which a changing electric field will cause the circulation of B to change. That’s the dynamics marked by the yellow arrows [2].

The ‘feedback loop’ is closed now: I’ve just explained how an electromagnetic field (or radiation) actually propagates through space. Below you can see one of the fancier animations you can find on the Web. The blue oscillation is supposed to represent the oscillating magnetic vector, while the red oscillation is supposed to represent the electric field vector. Note how the effect travels through space.


This is, of course, an extremely simplified view. To be precise, it assumes that the light wave (that’s what an electromagnetic wave actually is) is linearly (aka as plane) polarized, as the electric (and magnetic field) oscillate on a straight line. If we choose the direction of propagation as the z-axis of our reference frame, the electric field vector will oscillate in the xy-plane. In other words, the electric field will have an x- and a y-component, which we’ll denote as Ex and Erespectively, as shown in the diagrams below, which give various examples of linear polarization.

linear polarizationLight is, of course, not necessarily plane-polarized. The animation below shows circular polarization, which is a special case of the more general elliptical polarization condition.


The relativity of magnetic and electric fields

Allow me to make a small digression here, which has more to do with physics than with vector analysis. You’ll have noticed that we didn’t talk about the magnetic field vector anymore when discussing the polarization of light. Indeed, when discussing electromagnetic radiation, most – if not all – textbooks start by noting we have E and B vectors, but then proceed to discuss the E vector only. Where’s the magnetic field? We need to note two things here.

1. First, I need to remind you of the force on any electrically charged particle (and note we only have electric charge: there’s no such thing as a magnetic charge according to Maxwell’s third equation) consists of two components. Indeed, the total electromagnetic force (aka Lorentz force) on a charge q is:

F = q(E + v×B) = qE + q(v×B) = FE + FM

The velocity vector v is the velocity of the charge: if the charge is not moving, then there’s no magnetic force. The illustration below shows you the components of the vector cross product that, by now, you’re fully familiar with. Indeed, in my previous post, I gave you the expressions for the x, y and z coordinate of a cross product, but there’s a geometrical definition as well:

v×B = |v||B|sin(θ)n

magnetic force507px-Right_hand_rule_cross_product

The magnetic force FM is q(v×B) = qv×B q|v||B|sin(θ)n. The unit vector n determines the direction of the force, which is determined by that right-hand rule that, by now, you also are fully familiar with: it’s perpendicular to both v and B (cf. the two 90° angles in the illustration). Just to make sure, I’ve also added the right-hand rule illustration above: check it out, as it does involve a bit of arm-twisting in this case. :-)

In any case, the point to note here is that there’s only one electromagnetic force on the particle. While we distinguish between an E and a B vector, the E and B vector depend on our reference frame. Huh? Yes. The velocity v is relative: we specify the magnetic field in a so-called inertial frame of reference here. If we’d be moving with the charge, the magnetic force would, quite simply, disappear, because we’d have a v equal to zero, so we’d have v×B = 0×B= 0. Of course, all other charges (i.e. all ‘stationary’ and ‘moving’ charges that were causing the field in the first place) would have different velocities as well and, hence, our E and B vector would look very different too: they would come in a ‘different mixture’, as Feynman puts it. [If you’d want to know in what mixture exactly, I’ll refer you Feynman: it’s a rather lengthy analysis (five rather dense pages, in fact), but I can warmly recommend it: in fact, you should go through it if only to test your knowledge at this point, I think.]

You’ll say: So what? That doesn’t answer the question above. Why do physicists leave out the magnetic field vector in all those illustrations?

You’re right. I haven’t answered the question. This first remark is more like a warning. Let me quote Feynman on it:

“Since electric and magnetic fields appear in different mixtures if we change our frame of reference, we must be careful about how we look at the fields E and B. […] The fields are our way of describing what goes on at a point in space. In particular, E and B tell us about the forces that will act on a moving particle. The question “What is the force on a charge from a moving magnetic field?” doesn’t mean anything precise. The force is given by the values of E and B at the charge, and the F = q(E + v×B) formula is not to be altered if the source of E or B is moving: it is the values of E and B that will be altered by the motion. Our mathematical description deals only with the fields as a function of xy, z, and t with respect to some inertial frame.”

If you allow me, I’ll take this opportunity to insert another warning, one that’s quite specific to how we should interpret this concept of an electromagnetic wave. When we say that an electromagnetic wave ‘travels’ through space, we often tend to think of a wave traveling on a string: we’re smart enough to understand that what is traveling is not the string itself (or some part of the string) but the amplitude of the oscillation: it’s the vertical displacement (i.e. the movement that’s perpendicular to the direction of ‘travel’) that appears first at one place and then at the next and so on and so on. It’s in that sense, and in that sense only, that the wave ‘travels’. However, the problem with this comparison to a wave traveling on a string is that we tend to think that an electromagnetic wave also occupies some space in the directions that are perpendicular to the direction of travel (i.e. the x and y directions in those illustrations on polarization). Now that’s a huge misconception! The electromagnetic field is something physical, for sure, but the E and B vectors do not occupy any physical space in the x and y direction as they ‘travel’ along the z direction!

Let me conclude this digression with Feynman’s conclusion on all of this:

“If we choose another coordinate system, we find another mixture of E and B fields. However, electric and magnetic forces are part of one physical phenomenon—the electromagnetic interactions of particles. While the separation of this interaction into electric and magnetic parts depends very much on the reference frame chosen for the description, the complete electromagnetic description is invariant: electricity and magnetism taken together are consistent with Einstein’s relativity.”

2. You’ll say: I don’t give a damn about other reference frames. Answer the question. Why are magnetic fields left out of the analysis when discussing electromagnetic radiation?

The answer to that question is very mundane. When we know E (in one or the other reference frame), we also know B, and, while B is as ‘essential’ as E when analyzing how an electromagnetic wave propagates through space, the truth is that the magnitude of B is only a very tiny fraction of that of E.

Huh? Yes. That animation with these oscillating blue and red vectors is very misleading in this regard. Let me be precise here and give you the formulas:

E vector of wave

B vector of a wave

I’ve analyzed these formulas in one of my other posts (see, for example, my first post on light and radiation), and so I won’t repeat myself too much here. However, let me recall the basics of it all. The eR′ vector is a unit vector pointing in the apparent direction of the charge. When I say ‘apparent’, I mean that this unit vector is not pointing towards the present position of the charge, but at where is was a little while ago, because this ‘signal’ can only travel from the charge to where we are now at the same speed of the wave, i.e. at the speed of light c. That’s why we prime the (radial) vector R also (so we write R′ instead of R). So that unit vector wiggles up and down and, as the formula makes clear, it’s the second-order derivative of that movement which determines the electric field. That second-order derivative is the acceleration vector, and it can be substituted for the vertical component of the acceleration of the charge that caused the radiation in the first place but, again, I’ll refer you my post on that, as it’s not the topic we want to cover here.

What we do want to look at here, is that formula for B: it’s the cross product of that eR′ vector (the minus sign just reverses the direction of the whole thing) and E divided by c. We also know that the E and eR′ vectors are at right angles to each, so the sine factor (sinθ) is 1 (or –1) too. In other words, the magnitude of B is |E|/c =  E/c, which is a very tiny fraction of E indeed (remember: c ≈ 3×108).

So… Yes, for all practical purposes, B doesn’t matter all that much when analyzing electromagnetic radiation, and so that’s why physicists will note it but then proceed and look at E only when discussing radiation. Poor BThat being said, the magnetic force may be tiny, but it’s quite interesting. Just look at its direction! Huh? Why? What’s so interesting about it?  I am not talking the direction of B here: I am talking the direction of the force. Oh… OK… Hmm… Well…

Let me spell it out. Take the force formula: F = q(E + v×B) = qE + q(v×B). When our electromagnetic wave hits something real (I mean anything real, like a wall, or some molecule of gas), it is likely to hit some electron, i.e. an actual electric charge. Hence, the electric and magnetic field should have some impact on it. Now, as we pointed here, the magnitude of the electric force will be the most important one – by far – and, hence, it’s the electric field that will ‘drive’ that charge and, in the process, give it some velocity v, as shown below. In what direction? Don’t ask stupid questions: look at the equation. FE = qE, so the electric force will have the same direction as E.

radiation pressure

But we’ve got a moving charge now and, therefore, the magnetic force comes into play as well! That force is FM  = q(v×B) and its direction is given by the right-hand rule: it’s the F above in the direction of the light beam itself. Admittedly, it’s a tiny force, as its magnitude is F = qvE/c only, but it’s there, and it’s what causes the so-called radiation pressure (or light pressure tout court). So, yes, you can start dreaming of fancy solar sailing ships (the illustration below shows one out of of Star Trek) but… Well… Good luck with it! The force is very tiny indeed and, of course, don’t forget there’s light coming from all directions in space!

solar sail

Jokes aside, it’s a real and interesting effect indeed, but I won’t say much more about it. Just note that we are really talking the momentum of light here, and it’s a ‘real’ as any momentum. In an interesting analysis, Feynman calculates this momentum and, rather unsurprisingly (but please do check out how he calculates these things, as it’s quite interesting), the same 1/c factor comes into play once: the momentum (p) that’s being delivered when light hits something real is equal to 1/c of the energy that’s being absorbed. So, if we denote the energy by W (in order to not create confusion with the E symbol we’ve used already), we can write: p = W/c.

Now I can’t resist one more digression. We’re, obviously, fully in classical physics here and, hence, we shouldn’t mention anything quantum-mechanical here. That being said, you already know that, in quantum physics, we’ll look at light as a stream of photons, i.e. ‘light particles’ that also have energy and momentum. The formula for the energy of a photon is given by the Planck relation: E = hf. The h factor is Planck’s constant here – also quite tiny, as you know – and f is the light frequency of course. Oh – and I am switching back to the symbol E to denote energy, as it’s clear from the context I am no longer talking about the electric field here.

Now, you may or may not remember that relativity theory yields the following relations between momentum and energy:  

E2 – p2c2 = m0cand/or pc = Ev/c

In this equations, mstands, obviously, for the rest mass of the particle, i.e. its mass at v = 0. Now, photons have zero rest mass, but their speed is c. Hence, both equations reduce to p = E/c, so that’s the same as what Feynman found out above: p = W/c.

Of course, you’ll say: that’s obvious. Well… No, it’s not obvious at all. We do find the same formula for the momentum of light (p) – which is great, of course –  but so we find the same thing coming from very different necks parts of the woods. The formula for the (relativistic) momentum and energy of particles comes from a very classical analysis of particles – ‘real-life’ objects with mass, a very definite position in space and whatever other properties you’d associate with billiard balls – while that other p = W/c formula comes out of a very long and tedious analysis of light as an electromagnetic wave. The two analytical frameworks couldn’t differ much more, could they? Yet, we come to the same conclusion indeed.

Physics is wonderful. :-)

So what’s left?

Lots, of course! For starters, it would be nice to show how these formulas for E and B with eR′ in them can be derived from Maxwell’s equations. There’s no obvious relation, is there? You’re right. Yet, they do come out of the very same equations. However, for the details, I have to refer you to Feynman’s Lectures once again – to the second Volume to be precise. Indeed, besides calculating scalar and vector potentials in various situations, a lot of what he writes there is about how to calculate these wave equations from Maxwell’s equations. But so that’s not the topic of this post really. It’s, quite simply, impossible to ‘summarize’ all those arguments and derivations in a single post. The objective here was to give you some idea of what vector analysis really is in physics, and I hope you got the gist of it, because that’s what needed to proceed. :-)

The other thing I left out is much more relevant to vector calculus. It’s about that del operator () again: you should note that it can be used in many more combinations. More in particular, it can be used in combinations involving second-order derivatives. Indeed, till now, we’ve limited ourselves to first-order derivatives only. I’ll spare you the details and just copy a table with some key results:

  1. •(T) = div(grad T) = T = ()T = ∇2T = ∂2T/∂x+ ∂2T/∂y+ ∂2T/∂z= a scalar field
  2. ()h = ∇2= a vector field
  3. (h) = grad(div h) = a vector field
  4. ×(×h) = curl(curl h) =(h) – ∇2h
  5. ∇•(×h) = div(curl h) = 0 (always)
  6. ×(T) = curl(grad T) = 0 (always)

So we have yet another set of operators here: not less than six, to be precise. You may think that we can have some more, like (×), for example. But… No. A (×) operator doesn’t make sense. Just write it out and think about it. Perhaps you’ll see why. You can try to invent some more but, if you manage, you’ll see they won’t make sense either. The combinations that do make sense are listed above, all of them.

Now, while of these combinations make (some) sense, it’s obvious that some of these combinations are more useful than others. More in particular, the first operator, ∇2, appears very often in physics and, hence, has a special name: it’s the Laplacian. As you can see, it’s the divergence of the gradient of a function.

Note that the Laplace operator (∇2) can be applied to both scalar as well as vector functions. If we operate with it on a vector, we’ll apply it to each component of the vector function. The Wikipedia article on the Laplace operator shows how and where it’s used in physics, and so I’ll refer to that if you’d want to know more. Below, I’ll just write out the operator itself, as well as how we apply it to a vector:



So that covers (1) and (2) above. What about the other ‘operators’?

Let me start at the bottom. Equations (5) and (6) are just what they are: two results that you can use in some mathematical argument or derivation. Equation (4) is… Well… Similar: it’s an identity that may or may not help one when doing some derivation.


What about (3), i.e. the gradient of the divergence of some vector function? Nothing special. As Feynman puts it: “It is a possible vector field, but there is nothing special to say about it. It’s just some vector field which may occasionally come up.”

So… That should conclude my little introduction to vector analysis, and so I’ll call it a day now. :-) I hope you enjoyed it.

Applied vector analysis (II)

Applied vector analysis (I)

The relationship between math and physics is deep. When studying physics, one sometimes feels physics and math become one and the same. But they are not. In fact, eminent physicists such as Richard Feynman warn against emphasizing the math side of physics too much: “It is not because you understand the Maxwell equations mathematically inside out, that you understand physics inside out.”

We should never lose sight of the fact that all these equations and mathematical constructs represent physical realities. So the math is nothing but the ‘language’ in which we express physical reality and, as Feynman puts it, one (also) needs to develop a ‘physical’ – as opposed to a ‘mathematical’ – understanding of the equations. Now you’ll ask: what’s a ‘physical’ understanding? Well… Let me quote Feynman once again on that: “A physical understanding is a completely unmathematical, imprecise, and inexact thing, but absolutely necessary for a physicist.

It’s rather surprising to hear that from him: Feynman doesn’t like philosophy (see, for example, what he writes on the philosophical implications of the Uncertainty Principle), and he’s surely not the only physicist thinking philosophy is pretty useless. Indeed, while most physicists – or scientists in general, I’d say – will admit there is some value in a philosophy of science (that’s the branch of philosophy concerned with the foundations and methods of science), they will usually smile derisively when hearing someone talk about metaphysics. However, if metaphysics is the branch of philosophy that deals with ‘first principles’, then it’s obvious that the Standard Model (SM) in physics is, in fact, also some kind of ‘metaphysical’ model! Indeed, what everything is said and done, physicists assume those complex-valued wave functions are, somehow, ‘real’, but all they can ‘see’ (i.e. measure or verify by experiment) are (real-valued) probabilities: we can’t ‘see’ the probability amplitudes.

The only reason why we accept the SM theory is because its predictions agree so well with experiment. Very well indeed. The agreement between theory and experiment is most perfect in the so-called electromagnetic sector of the SM, but the results for the weak force (which I referred to as the ‘weird force’ in some of my posts) are very good too. For example, using CERN data, researchers could finally, recently, observe an extremely rare decay mode which, once again, confirms that the Standard Model, as complicated as it is, is the best we’ve got: just click on the link if you want to hear more about it. [And please do: stuff like this is quite readable and, hence, interesting.]

As this blog makes abundantly clear, it’s not easy to ‘summarize’ the Standard Model in a couple of sentences or in one simple diagram. In fact, I’d say that’s impossible. If there’s one or two diagrams sort of ‘covering’ it all, then it’s the two diagrams that you’ve seen ad nauseam already: (a) the overview of the three generations of matter, with the gauge bosons for the electromagnetic, strong and weak force respectively, as well as the Higgs boson, next to it, and (b) the overview of the various interactions between them. [And, yes, these two diagrams come from Wikipedia.]


I’ve said it before: the complexity of the Standard Model (it has not less than 61 ‘elementary’ particles taking into account that quarks and gluons come in various ‘colors’, and also including all antiparticles – which we have to include them in out count because they are just as ‘real’ as the particles), and the ‘weirdness’ of the weak force, plus a astonishing range of other ‘particularities’ (these ‘quantum numbers’ or ‘charges’ are really not easy to ‘understand’), do not make for a aesthetically pleasing theory but, let me repeat it again, it’s the best we’ve got. Hence, we may not ‘like’ it but, as Feynman puts it: “Whether we like or don’t like a theory is not the essential question. It is whether or not the theory gives predictions that agree with experiment.” (Feynman, QED – The Strange Theory of Light and Matter, p. 10)

It would be foolish to try to reduce the complexity of the Standard Model to a couple of sentences. That being said, when digging into the subject-matter of quantum mechanics over the past year, I actually got the feeling that, when everything is said and done, modern physics has quite a lot in common with Pythagoras’ ‘simple’ belief that mathematical concepts – and numbers in particular – might have greater ‘actuality’ than the reality they are supposed to describe. To put it crudely, the only ‘update’ to the Pythagorean model that’s needed is to replace Pythagoras’ numerological ideas by quantum-mechanical wave functions, describing probability amplitudes that are represented by complex numbers. Indeed, complex numbers are numbers too, and Pythagoras would have reveled in their beauty. In fact, I can’t help thinking that, if he could have imagined them, he would surely have created a ‘religion’ around Euler’s formula, rather than around the tetrad. :-)

In any case… Let’s leave the jokes and silly comparisons aside, as that’s not what I want to write about in this post (if you want to read more about this, I’ll refer you another blog of mine). In this post, I want to present the basics of vector calculus, an understanding of which is absolutely essential in order to gain both a mathematical as well as a ‘physical’ understanding of what fields really are. So that’s classical mechanics once again. However, as I found out, one can’t study quantum mechanics without going through the required prerequisites. So let’s go for it.

Vectors in math and physics

What’s a vector? It may surprise you, but the term ‘vector’, in physics and in math, refers to more than a dozen different concepts, and that’s a major source of confusion for people like us–autodidacts. The term ‘vector’ refers to many different things indeed. The most common definitions are:

  1. The term ‘vector’ often refers to a (one-dimensional) array of numbers. In that case, a vector is, quite simply, an element of Rn, while the array will be referred to as an n-tuple. This definition can be generalized to also include arrays of alphanumerical values, or blob files, or any type of object really, but that’s a definition that’s more relevant for other sciences – most notably computer science. In math and physics, we usually limit ourselves to arrays of numbers. However, you should note that a ‘number’ may also be a complex number, and so we have real as well as complex vector spaces. The most straightforward example of a complex vector space is the set of complex numbers itself: C. In that case, the n-tuple is a ‘1-tuple’, aka as a singleton, but the element in it (i.e. a complex number) will have ‘two dimensions’, so to speak. [Just like the term ‘vector’, the term ‘dimension’ has various definitions in math and physics too, and so it may be quite confusing.] However, we can also have 2-tuples, 3-tuples or, more in general, n-tuples of complex numbers. In that case, the vector space is denoted by Cn. I’ve written about vector spaces before and so I won’t say too much about this.
  2. A vector can also be a point vector. In that case, it represents the position of a point in physical space – in one, two or three dimensions – in relation to some arbitrarily chosen origin (i.e. the zero point). As such, we’ll usually write it as x (in one dimension) or, in three dimensions, as (x, y, z). More generally, a point vector is often denoted by the bold-face symbol R. This definition is obviously ‘related’ to the definition above, but it’s not the same: we’re talking physical space here indeed, not some ‘mathematical’ space. Physical space can be curved, as you obviously know when you’re reading this blog, and I also wrote about that in the above-mentioned post, so you can re-visit that topic too if you want. Here, I should just mention one point which may or may not confuse you: while (two-dimensional) point vectors and complex numbers have a lot in common, they are not the same, and it’s important to understand both the similarities as well as the differences between both. For example, multiplying two vectors and multiplying two complex numbers is definitely not the same. I’ll come back to this.
  3. A vector can also be a displacement vector: in that case, it will specify the change in position of a point relative to its previous position. Again, such displacement vectors may be one-, two-, or three-dimensional, depending on the space we’re envisaging, which may be one-dimensional (a simple line), two-dimensional (i.e. the plane), three-dimensional (i.e. three-dimensional space), or four-dimensional (i.e. space-time). A displacement vector is often denoted by s or ΔR, with the delta indicating we’re talking a a distance or a difference indeed: s = ΔR = R2 – R1 = (x2 – x1, y2 – y1, z2 – z1). That’s kids’ stuff, isn’t it?
  4. A vector may also refer to a so-called four-vector: a four-vector obeys very specific transformation rules, referred to as the Lorentz transformation. In this regard, you’ve surely heard of space-time vectors, referred to as events, and noted as X = (ct, r), with r the spatial vector r = (x, y, z) and c the speed of light (which, in this case, is nothing but a proportionality constant ensuring that space and time are measured in compatible units). So we can also write X as X = (ct, x, y, z). However, there is another four-vector which you’ve probably also seen already (see, for example, my post on (special) Relativity Theory): P = (E/c, p), which relates energy and momentum in spacetime. Of course, spacetime can also be curved. In fact, Einstein’s (general) Relativity Theory is about the curvature of spacetime, not of ordinary space. But I should not write more about this here, as it’s about time I get back to the main story line of this post.
  5. Finally, we also have vector operators, like the gradient vector . Now that is what I want to write about in this post. Vector operators are also considered to be ‘vectors’ – to some extent, at least: we use them in a ‘vector products’, for example, as I will show below – but, because they are operators and, as such, “hungry for something to operate on”, they are obviously quite different from any of the ‘vectors’ I defined in point (1) to (4) above. [Feynman attributes this ‘hungry for something to operate on’ expression to the British mathematician Sir James Hopwood Jeans, who’s best known from the infamous Rayleigh-Jeans law, whose inconsistency with observations is known as the ultraviolet catastrophe or ‘black-body radiation problem’. But that’s a fairly useless digression so let me got in with it.]

In a text on physics, the term ‘vector’ may refer to any of the above but it’s often the second and third definition (point and/or displacement vectors) that will be implicit. As mentioned above, I want to write about the fifth ‘type’ of vector: vector operators. Now, the title of this post is ‘vector calculus’, and so you’ll immediately wonder why I say these vector operators may or may not be defined as vectors. Moreover, the fact of the matter is that these operators operate on yet another type of ‘vector’ – so that’s a sixth definition I need to introduce here: field vectors.

Now, funnily enough, the term ‘field vector’, while being the most obvious description of what it is, is actually not widely used: what I call a ‘field vector’ is usually referred to as a gradient, and the vectors and B are usually referred to as the electric or magnetic field. Indeed, if you google the terms ‘electromagnetic vector’ (or electric or magnetic vector tout court), you will usually be redirected. However, when everything is said and done, E and B are vectors: they have a magnitude, and they have a direction.

So, truth be told, vector calculus (aka vector analysis) in physics is about (vector) fields and (vector) operators,. While the ‘scene’ for these fields and operators is, obviously, physical space (or spacetime) and, hence, a vector space, it’s good to be clear on terminology and remind oneself that, in physics, vector calculus is not about mathematical vectors: it’s about real stuff. That’s why Feynman prefers a much longer term than vector calculus or vector analysis: he calls it differential calculus of vector fields which, indeed, is what it is – but I am sure you would not have bothered starting reading this post if I would have used that term too. :-)

Now, this has probably become the longest introduction ever to a blog post, and so I had better get on with it. :-)

Vector fields and scalar fields

Let’s dive straight into it. Vector fields like E and B behave like h, which is the symbol used in a number of textbooks for the heat flow in some body or block of material: E, B and h are all vector fields derived from a scalar field.

Huh? Scalar field? Aren’t we talking vectors? We are. If I say we can derive the vector field h (i.e. the heat flow) from a scalar field, I am basically saying that the relationship between h and the temperature T (i.e. the scalar field) is direct and very straightforward. Likewise, the relationship between E and the scalar field Φ is also direct and very straightforward.

[To be fully precise and complete, I should qualify the latter statement: it’s only true in electrostatics, i.e. when we’re considering charges that don’t move. When we have moving charges, magnetic effects come into play, and then we have a more complicated relationship between (i) two scalar fields, namely A (the magnetic potential – i.e. the ‘magnetic scalar field’) and Φ (the electrostatic potential, or ‘electric scalar field’), and (ii) two vector fields, namely B and E. The relationships between the two are then a bit more complicated than the relationship between T and h. However, the math involved is the same. In fact, the complication arises from the fact that magnetism is actually a relativistic effect. However, at this stage, this statement will only confuse you, and so I will write more about that in my next post.]

Let’s look at h and T. As you know, the temperature is a measure for energy. In a block of material, the temperature T will be a scalar: some real number that we can measure in Kelvin, Fahrenheit or Celsius but – whatever unit we use – any observer using the same unit will measure the same at any given point. That’s what distinguishes a ‘scalar’ quantity from ‘real numbers’ in general: a scalar field is something real. It represents something physical. A real number is just… Well… A real number, i.e. a mathematical concept only.  

The same is true for a vector field: it is something real. As Feynman puts it: “It is not true that any three numbers form a vector [in physics]. It is true only if, when we rotate the coordinate system, the components of the vector transform among themselves in the correct way.” What’s the ‘correct way’? It’s a way that ensures that any observer using the same unit will measure the same at any given point.

How does it work?

In physics, we associate a point in space with physical realities, such as:

  1. Temperature, the ‘height‘ of a body in a gravitational field, or the pressure distribution in a gas or a fluid, are all examples of scalar fields: they are just (real) numbers from a math point of view but, because they do represent a physical reality, these ‘numbers’ respect certain mathematical conditions: in practice, they will be a continuous or continuously differentiable function of position.
  2. Heat flow (h), the velocity (v) of the molecules/atoms in a rotating object, or the electric field (E), are examples of vector fields. As mentioned above, the same condition applies: any observer using the same unit should measure the same at any given point.
  3. Tensors, which represent, for example, stress or strain at some point in space (in various directions), or the curvature of space (or spacetime, to be fully correct) in the general theory of relativity.
  4. Finally, there are also spinors, which are often defined as a “generalization of tensors using complex numbers instead of real numbers.” They are very relevant in quantum mechanics, it is said, but I don’t know enough about them to write about them, and so I won’t.

How do we derive a vector field, like h, from a scalar field (i.e. T in this case)? The two illustrations below (taken from Feynman’s Lectures) illustrate the ‘mechanics’ behind it: heat flows, obviously, from the hotter to the colder places. At this point, we need some definitions. Let’s start with the definition of the heat flow: the (magnitude of the) heat flow (h) is the amount of thermal energy (ΔJ) that passes, per unit time and per unit area, through an infinitesimal surface element at right angles to the direction of flow.

Fig 1 Fig 2

A vector has both a magnitude and a direction, as defined above, and, hence, if we define ef as the unit vector in the direction of flow, we can write:

h = h·ef = (ΔJ/Δa)·ef

ΔJ stands for the thermal energy flowing through an area marked as Δa in the diagram above per unit time. So, if we incorporate the idea that the aspect of time is already taken care of, we can simplify the definition above, and just say that the heat flow is the flow of thermal energy per unit area. Simple trigonometry will then yield an equally simple formula for the heat flow through any surface Δa2 (i.e. any surface that is not at right angles to the heat flow h):

ΔJ/Δa2 = (ΔJ/Δa1)cosθ = h·n


When I say ‘simple’, I must add that all is relative, of course, Frankly, I myself did not immediately understand why the heat flow through the Δa1 and Δa2 areas below must be the same. That’s why I added the blue square in the illustration above (which I took from Feynman’s Lectures): it’s the same area as Δa1, but it shows more clearly – I hope! – why the heat flow through the two areas is the same indeed, especially in light of the fact that we are looking at infinitesimally small areas here (so we’re taking a limit here).

As for the cosine factor in the formula above, you should note that, in that ΔJ/Δa2 = (ΔJ/Δa1)cosθ = h·equation, we have a dot product (aka as a scalar product) of two vectors: (1) h, the heat flow and (2) n, the unit vector that is normal (orthogonal) to the surface Δa2. So let me remind you of the definition of the scalar (or dot) product of two vectors. It yields a (real) number:

A·B = |A||B|cosθ, with θ the angle between A and B

In this case, h·n = |h||n|cosθ = |h|·1·cosθ = |h|cosθ = h·cosθ. What we are saying here is that we get the component of the heat flow that’s perpendicular (or normal, as physicists and mathematicians seem to prefer to say) to the surface Δa2 by taking the dot product of the heat flow h and the unit normal n. We’ll use this formula later, and so it’s good to take note of it here.

OK. Let’s get back to the lesson. The only thing that we need to do to prove that ΔJ/Δa2 = (ΔJ/Δa1)cosθ formula is show that Δa2 = Δa1/cosθ or, what amounts to the same, that Δa1 = Δa2cosθ. Now that is something you should be able to figure out yourself: it’s quite easy to show that the angle between h and n is equal to the angle between the surfaces Δa1 and Δa2. The rest is just plain triangle trigonometry.

For example, when the surfaces coincide, the angle θ will be zero and then h·n is just equal to |h|cosθ = |h| = h·1 = h = ΔJ/Δa1. The other extreme is that orthogonal surfaces: in that case, the angle θ will be 90° and, hence, h·n = |h||n|cos(π/2) = |h|·1·0 = 0: there is no heat flow normal to the direction of heat flow.

OK. That’s clear enough (or should be clear enough). The point to note is that the vectors h and n represent physical entities and, therefore, they do not depend on our reference frame (except for the units we use to measure things). That allows us to define  vector equations.

The ∇ (del) operator and the gradient

Let’s continue our example of temperature and heat flow. In a block of material, the temperature (T) will vary in the x, y and z direction and, hence, the partial derivatives ∂T/∂x, ∂T/∂y and ∂T/∂z make sense: they measure how the temperature varies with respect to position. Now, the remarkable thing is that the 3-tuple (∂T/∂x, ∂T/∂y, ∂T/∂z) is a physical vector itself: it is independent, indeed, of the reference frame (provided we measure stuff in the same unit) – so we can do a translation and/or a rotation of the coordinate axes and we get the same value. This means this set of three numbers is a vector indeed:

(∂T/∂x, ∂T/∂y, ∂T/∂z) = a vector

If you like to see a formal proof of this, I’ll refer you to Feynman once again – but I think the intuitive argument will do: if temperature and space are real, then the derivatives of temperature in regard to the x-, y- and z-directions should be equally real, isn’t it? Let’s go for the more intricate stuff now.

If we go from one point to another, in the x-, y- or z-direction, then we can define some (infinitesimally small) displacement vector ΔR = (Δx, Δy, Δz), and the difference in temperature between two nearby points (ΔT) will tend to the (total) differential of T – which we denote by ΔT – as the two point get closer and closer. Hence, we write:

ΔT = (∂T/∂x)Δx + (∂T/∂y)Δy + (∂T/∂z)Δz

The two equations above combine to yield:

ΔT = (∂T/∂x, ∂T/∂y, ∂T/∂z)(Δx, Δy, Δz) = T·ΔR

In this equation, we used the (del) operator, i.e. the vector differential operator. It’s an operator like the differential operator ∂/∂x (i.e. the derivative) but, unlike the derivative, it returns not one but three values, i.e. a vector, which is usually referred to as the gradient, i.e. T in this case. More in general, we can write f(x, y, z), ψ or followed by whatever symbol for the function we’re looking at.

In other words, the operator acts on a scalar-valued function (T), aka a scalar field, and yields a vector:

T = (∂T/∂x, ∂T/∂y, ∂T/∂z)

That’s why we write  in bold-type too, just like the vector R. Indeed, using bold-type (instead of an arrow or so) is a convenient way to mark a vector, and the difference (in print) between  and ∇ is subtle, but it’s there – and for a good reason as you can see!

If T is a vector, what’s its direction? Think about it. […] The rate of change of T in the x-, y- and z-direction are the x-, y- and z-component of our T vector respectively. In fact, the rate of change of T in any direction will be the component of the T vector in that direction. Now, the magnitude of a vector component will always be smaller than the magnitude of the vector itself, except if it’s the component in the same direction as the vector, in which case the component is the vector. [If you have difficulty understanding this, read what I write once again, but very slowly and attentively.] Therefore, the direction of T will be the direction in which the (instantaneous) rate of change of T is largest. In Feynman’s words: “The gradient of T has the direction of the steepest uphill slope in T.” Now, it should be quite obvious what direction that really is: it is the opposite direction of the heat flow h.

That’s all you need to know to understand our first real vector equation:

h = –κT

Indeed, you don’t need too much math to understand this equation in the way we want to understand it, and that’s in some kind of ‘physical‘ way (as opposed to just the math side of it). Let me spell it out:

  1. The direction of heat flow is opposite to the direction of the gradient vector T. Hence, heat flows from higher to lower temperature (i.e. ‘downhill’), as we would expect, of course!). So that’s the minus sign.
  2. The magnitude of h is proportional to the magnitude of the gradient T, with the constant of proportionality equal to κ (kappa), which is called the thermal conductivity. Now, in case you wonder what this means (again: do go beyond the math, please!), remember that the heat flow is the flow of thermal energy per unit area (and per unit time, of course): |h| = h = ΔJ/Δa.

But… Yes? Why would it be proportional? Why don’t we have some exponential relation or something? Good question, but the answer is simple, and it’s rooted in physical reality – of course! The heat flow between two places – let’s call them 1 and 2 – is proportional to the temperature difference between those two places, so we have: ΔJ ∼  T2 – T1. In fact, that’s where the factor of proportionality comes in. If we imagine a very small slab of material (infinitesimally small, in fact) with two faces, parallel to the isothermals, with a surface area ΔA and a tiny distance Δs between them, we can write:

ΔJ = κ(T2 – T1)ΔA/Δs = ΔJ = κ·ΔT·ΔA/Δs ⇔ ΔJ/ΔA = κΔT/Δs

Now, we defined ΔJ/ΔA as the magnitude of h. As for its direction, it’s obviously perpendicular (not parallel) to the isothermals. Now, as Δs tends to zero, ΔT/Δs is nothing but the rate of change of T with position. We know it’s the maximum rate of change, because the position change is also perpendicular to the isotherms (if the faces are parallel, that tiny distance Δs is perpendicular). Hence, ΔT/Δs must be the magnitude of the gradient vector (T). As its direction is opposite to that of h, we can simply pop in a minus sign and switch from magnitudes to vectors to write what we wrote already: h = –κT.

But let’s get back to the lesson. I think you ‘get’ all of the above. In fact, I should probably not have introduced that extra equation above (the ΔJ expression) and all the extra stuff (i.e. the ‘infinitesimally small slab’ explanation), as it probably only confuses you. So… What’s the point really? Well… Let’s look, once again, at that equation h = –κT and  let us generalize the result:

  1. We have a scalar field here, the temperature T – but it could be any scalar field really!
  2. When we have the ‘formula’ for the scalar field – it’s obviously some function T(x, y, z) – we can derive the heat flow h from it, i.e. a vector quantity, which has a property which we can vaguely refer to as ‘flow’.
  3. We do so using this brand-new operator . That’s a so-called vector differential operator aka the del operator. We just apply it to the scalar field and we’re done! The only thing left is to add some proportionality factor, but so that’s just because of our units. [In case you wonder about the symbol it self, ∇ is the so-called nabla symbol: the name comes from the Hebrew word for a harp, which has a similar shape indeed.] 

This truly is a most remarkable result, and we’ll encounter the same equation elsewhere. For example, if the electric potential is Φ, then we can immediately calculate the electric field using the following formula:

E = –Φ

Indeed, the situation is entirely analogous from a mathematical point of view. For example, we have the same minus sign, so E also ‘flows’ from higher to lower potential. Where’s the factor of proportionality? Well… We don’t have one, as we’ll make sure that the units in which we’ll measure E and Φ are ‘fully compatible’ (but so don’t worry about them now). Of course, as mentioned above, this formula for E is only valid in electrostatics, i.e. when there are no moving charges. When moving charges are involved, we also have the magnetic force coming into play, and then equations become a bit more complicated. However, this extra complication does not fundamentally alter the logic involved, and I’ll come back to this so you see how it all nicely fits together.

Note: In case you feel I’ve skipped some of the ‘explanation’ of that vector equation h = –κT… Well… You may be right. I feel that it’s enough to simply point out that T is a vector with opposite direction to h, so that explains the minus sign in front of the T factor. The only thing left to ‘explain’ then is the magnitude of h, but so that’s why we pop in that kappa factor (κ), and so we’re done, I think, in terms of ‘understanding’ this equation. But so that’s what I think indeed. Feynman offers a much more elaborate ‘explanation‘, and so you can check that out if you think my approach to it is a bit too much of a shortcut.

Interim summary

So far, we have only have shown two things really:

[I] The first thing to always remember is that h·n product: it gives us the component of ‘flow’ (per unit time and per unit area) of perpendicular through any surface element Δa. Of course, this result is valid for any other vector field, or any vector for that matter: the scalar product of a vector and a unit vector will always yield the component of that vector in the direction of that unit vector.

Now, you should note that the term ‘component’ (of a vector) usually refers to a number (not to a vector) – and surely in this case, because we calculate it using a scalar product! I am just highlighting this because it did confuse me for quite a while. Why? Well… The concept of a ‘component’ of a vector is, obviously, intimately associated with the idea of ‘direction': we always talk about the component in some direction, e.g. in the x-, y- or z-direction, or in the direction of any combination of x, y and z. Hence, I think it’s only natural to think of a ‘component’ as a vector in its own right. However, as I note here, we shouldn’t do that: a ‘component’ is just a magnitude, i.e. a number only. If we’d want to include the idea of direction, it’s simple: we can just multiply the component with the normal vector n once again, and then we have a vector quantity once again, instead of just a scalar. So then we just write (h·nn = (h·n)nSimple, isn’t it? :-)

[As I am smiling here, I should quickly say something about this dot (·) symbol: we use the same symbol here for (i) a product between scalars (i.e. real or complex numbers), like 3·4; (ii) a product between a scalar and a vector, like 3·– but then I often omit the dot to simply write 3v; and, finally, (iii) a scalar product of two vectors, like h·indeed. We should, perhaps, introduce some new symbol for multiplying numbers, like ∗ for example, but then I hope you’re smart enough to see from the context what’s going on really.]

Back to the lesson. Let me jot down the formula once again: h·n = |h||n|cosθ = h·cosθ. Hence, the number we get here is (i.e. the amount of heat flow in the direction of flow) multiplied by cosθ, with θ the angle between (i) the surface we’re looking at (which, as mentioned above, is any surface really) and (ii) the surface that’s perpendicular to the direction of flow.

Hmm… […] The direction of flow? Let’s take a moment to think about what we’re saying here. Is there any particular or unique direction really? Heat flows in all directions from warmer to colder areas, and not just in one direction – doesn’t it? It does. Once again, the terminology may confuse you – which is yet another reason why math is so much better as a language to express physical ideas :-) – and so we should be precise: the direction of h is the direction in which the amount of heat flow (i.e. h·cosθ) is largest (hence, the angle θ is zero). As we pointed out above, that’s the direction in which the temperature T changes the fastest. In fact, as Feynman notes: “We can, if we wish, consider that this statement defines h.”

That brings me to the second thing you should – always and immediately – remember from all of that I’ve written above.

[II] If we write the infinitesimal (i.e. the differential) change in temperature (in whatever direction) as ΔT, then we know that

ΔT = (∂T/∂x, ∂T/∂y, ∂T/∂z)(Δx, Δy, Δz) = T·ΔR

Now, what does this say really? Δis an (infinitesimal) displacement vector: ΔR = (Δx, Δy, Δz). Hence, it has some direction. To be clear: that can be any direction in space really. So that’s simple. What about the second factor in this dot product, i.e. that gradient vector T? 

The direction of the gradient (i.e. T) is not just ‘any direction': it’s the direction in which the rate of change of T is largest, and we know what direction that is: it’s the opposite direction of the heat flow h, as evidenced by the minus sign in our vector equations h = –κT or E = –Φ. So, once again, we have a (scalar) product of two vectors here, T·ΔR, which yields…


Good question. That T·Δexpression is very similar to that h·n expression above, but it’s not quite the same. It’s also a vector dot product – or a scalar product, in other words, but, unlike that n vector, the ΔR vector is not a unit vector: it’s an infinitesimally small displacement vector. So we do not get some ‘component’ of T. More in general, you should note that the dot product of two vectors A and B does not, in general, yield the component of A in the direction of B, unless B is a unit vector – which, again, is not the case here. So if we don’t have that here, what do we have?

Let’s look at the (physical) geometry of the situation once again. Heat obviously flows in one direction only: from warmer to colder places – not in the opposite direction. Therefore, the θ in the h·n = h·cosθ expression varies from –90° to +90° only. Hence, the cosine factor (cosθ) is always positive. Always. Indeed, we do not have any right- or left-hand rule here to distinguish between the ‘front’ side and the ‘back’ side of our surface area. So when we’re looking at that h·n product, we should remember that that normal unit vector n is a unit vector that’s normal to the surface but which is oriented, generally, towards the direction of flow. Therefore, that h·n product will always yield some positive value, because θ varies from –90° to +90° only indeed.

When looking at that ΔT = T·ΔR product, the situation is quite different: while T has a very specific direction (I really mean unique)  – which, as mentioned above is opposite to that of h – that ΔR vector can point in any direction – and then I mean literally any direction, including directions ‘uphill’. Likewise, it’s obvious that the temperature difference ΔT can be both positive or negative (or zero, when we’re moving on a isotherm itself). In fact, it’s rather obvious that, if we go in the direction of flow, we go from higher to lower temperatures and, hence, ΔT will, effectively, be negative: ΔT = T2 – T1 < 0, as shown below.

Temperature drop

Now, because |T| and |ΔR| are absolute values (or magnitudes) of vectors, they are always positive (always!). Therefore, if ΔT has a minus sign, it will have to come from the cosine factor in the ΔT = T·ΔR = |T|·|ΔRcosθ expression. [Again, if you wonder where this expression comes from: it’s just the definition of a vector dot product.] Therefore, ΔT and cosθ will have the same sign, always, and θ can have any value between –180° to +180°. In other words, we’re effectively looking at the full circle here. To make a long story short, we can write the following:

ΔT = |T|·|ΔRcosθ = |T|·ΔR·cosθ ⇔ ΔT/ΔR = |T|cosθ

As you can see, θ is the angle between T and ΔR here and, as mentioned above, it can take on any value – well… Any value between –180° to +180°, that is. ΔT/ΔR is, obviously, the rate of change of T in the direction of ΔR and, from the expression above, we can see it is equal to the component of T in the direction of ΔR:

ΔT/ΔR = |T|cosθ

So we have a negative component here? Yes. The rate of change is negative and, therefore, we have a negative component. Indeed, any vector has components in all directions, including directions that point away from it. However, in the directions that point away from it, the component will be negative. More in general, we have the following interesting result: the rate of change of a scalar field ψ in the direction of a (small) displacement ΔR is the component of the gradient ∇ψ along that displacement. We write that result as:

Δψ/ΔR = |T|cosθ

We’ve said a lot of (not so) interesting things here, but we still haven’t answered the original question: what’s T·ΔR? Well… We can’t say more than what we said already: it’s equal to ΔT, which is a differential: ΔT = (∂T/∂x)Δx + (∂T/∂y)Δy + (∂T/∂z)Δz. A differential and a derivative (i.e. a rate of change) are not the same, but they are obviously closely related, as evidenced from the equations above: the rate of change is the change per unit distance.

In any case, this is enough of a recapitulation. In fact, this ‘interim summary’ has become longer than the preceding text! We’re now ready to discuss what I’ll call the First Theorem of vector calculus in physics. Of course, never mind the term: what’s first or second or third doesn’t matter really: you’ll need all of the theorems below to understand vector calculus.

The First Theorem

Let’s assume we have some scalar field ψ in space: ψ might be the temperature, but it could be any scalar field really. Now, if we go from one point (1) to another (2) in space, as shown below, we’ll follow some arbitrary path, which is denoted by the curve Γ in the illustrations below. Each point along the curve can then be associated with a gradient ψ (think of the h = –κT and E = –Φ expressions above if you’d want examples). Its tangential component is obviously equal to (ψ)t·Δs = ψ·Δs. [Please note, once again, the subtle difference between Δs (with the s in bold-face) and Δs: Δs is a vector, and Δs is its magnitude.] 

Line integral-1 Line integral-2

As shown in the illustrations above, we can mark off the curve at a number of points (a, b, c, etcetera) and join these points by straight-line segments Δsi. Now let’s consider the first line segment, i.e. Δs1. It’s obvious that the change in ψ from point 1 to point a is equal to Δψ= ψ(a) – ψ(1). Now, we have that general Δψ = (∂ψ/∂x, ∂ψ/∂y, ∂ψ/∂z)(Δx, Δy, Δz) = ψ·Δs equation. [If you find it difficult to interpret what I am writing here, just substitute ψ for T and Δs for ΔR.] So we can write:

Δψ= ψ(a) – ψ(1) = (ψ)1·Δs1

Likewise, we can write:

ψ(b) – ψ(a) = (ψ)2·Δs1

In these expressions, (ψ)and (ψ)mean the gradient evaluated at segment Δs1 and point Δs2 respectively, not at point 1 and 2 – obviously. Now, if we add the two equations above, we get:

ψ(b) – ψ(1) = (ψ)1·Δs+ (ψ)2·Δs1

To make a long story short, we can keep adding such terms to get:

ψ(2) – ψ(1) = ∑(ψ)i·Δsi

We can add more and more segments and, hence, take a limit: as Δsi tends to zero, ∑ becomes a sum of an infinite number of terms – which we denote using the integral sign ∫ – in which ds is – quite simply – just the infinitesimally small displacement vector. In other words, we get the following line integral along that curve Γ: 

Line integral - expression

This is a gem, and our First Theorem indeed. It’s a remarkable result, especially taking into account the fact that the path doesn’t matter: we could have chosen any curve Γ indeed, and the result would be the same. So we have:

Line integral - expression -2

You’ll say: so what? What do we do with this? Well… Nothing much for the moment, but we’ll need this result later. So I’d say: just hang in there, and note this is the first significant use of our del operator in a mathematical expression that you’ll encounter very often in physics. So just let it sink in, and allow me to proceed with the rest of the story.

Before doing so, however, I should note that even Feynman sins when trying to explain this theorem in a more ‘intuitive’ way. Indeed, in his Lecture on the topic, he writes the following: “Since the gradient represents the rate of change of a field quantity, if we integrate that rate of change, we should get the total change.” Now, from that Δψ/ΔR = |ψ|cosθ formula, it’s obvious that the gradient is the rate of change in a specific direction only. To be precise, in this particular case – with the field quantity ψ equal to the temperature T – it’s the direction in which T changes the fastest.

You should also note that the integral above is not the type of integral you known from high school. Indeed, it’s not of the rather straightforward ∫f(x)dx type, with f(x) the integrand and dx the variable of integration. That type of integral, we knew how to solve. A line integral is quite different. Look at it carefully: we have a vector dot product after the ∫ sign. So, unlike what Feynman suggests, it’s not just a matter of “integrating the rate of change.”

Now, I’ll refer you to Wikipedia for a good discussion of what a line integral really is, but I can’t resist the temptation to copy the animation in that article, because it’s very well made indeed. While it shows that we can think of a line integral as the two- or three-dimensional equivalent of the standard type of integral we learned to solve in high school (you’ll remember the solution was also the area under the graph of the function that had to be integrated), the way to go about it is quite different. Solving them will, in general, involve some so-called parametrization of the curve C.


However, this post is becoming way too long and, hence, I really need to move on now.

Operations with ∇:  divergence and curl

You may think we’ve covered a lot of ground already, and we did. At the same time, everything I wrote above is actually just the start of it. I emphasized the physics of the situation so far. Let me now turn to the math involved. Let’s start by dissociating the del operator from the scalar field, so we just write:

 = (∂/∂x, ∂/∂y, ∂/∂z)

This doesn’t mean anything, you’ll say, because the operator has nothing to operate on. And, yes, you’re right. However, in math, it doesn’t matter: we can combine this ‘meaningless’ operator (which looks like a vector, because it has three components) with something else. For example, we can do a vector dot product:

·(a vector)

As mentioned above, we can ‘do’ this product because has three components, so it’s a ‘vector’ too (although I find such name-giving quite confusing), and so we just need to make sure that the vector we’re operating on has three components too. To continue with our heat flow example, we can write, for example:

·h = (∂/∂x, ∂/∂y, ∂/∂z)·(hxhyhz) = ∂hx/∂x + ∂hy/∂y, ∂hz/∂z

This del operator followed by a dot, and acting on a vector – i.e. ·(vector) – is, in fact, a new operator. Note that we use two existing symbols, the del () and the dot (·), but it’s one operator really. [Inventing a new symbol for it would not be wise, because we’d forget where it comes from and, hence, probably scratch our head when we’d see it.] It’s referred to as a vector operator, just like the del operator, but don’t worry about the terminology here because, once again, the terminology here might confuse you. Indeed, our del operator acted on a scalar to yield a vector, and now it’s the other way around: we have an operator acting on a vector to return a scalar. In a few minutes, we’ll define yet another operator acting on a vector to return a vector. Now, all of these operators are so-called vector operators, not because there’s some vector involved, but because they all involve the del operator. It’s that simple. So there’s no such thing as a scalar operator. :-) But let me get back to the main line of the story. This ·  operator is quite important in physics, and so it has a name (and an abbreviated notation) of its own:

·h = div h = the divergence of h

The physical significance of the divergence is related to the so-called flux of a vector field: it measures the magnitude of a field’s source or sink at a given point. Continuing our example with temperature, consider air as it is heated or cooled. The relevant vector field is now the velocity of the moving air at a point. If air is heated in a particular region, it will expand in all directions such that the velocity field points outward from that region. Therefore the divergence of the velocity field in that region would have a positive value, as the region is a source. If the air cools and contracts, the divergence has a negative value, as the region is a sink.

A less intuitive but more accurate definition is the following: the divergence represents the volume density of the outward flux of a vector field from an infinitesimal volume around a given point.

Phew! That sounds more serious, doesn’t it? We’ll come back to this definition when we’re ready to define the concept of flux somewhat accurately. For now, just note two of Maxwell’s famous equations involve the divergence operator:

·E = ρ/ε0 and ·B = 0

In my previous post, I gave a verbal description of those two equations:

  1. The flux of E through a closed surface = (the net charge inside)/ε0
  2. The flux of B through a closed surface = zero

The first equation basically says that electric charges cause an electric field. The second equation basically says there is no such thing as a magnetic charge: the magnetic force only appears when charges are moving and/or when electric fields are changing. Note that we’re talking closed surface here, so they define a volume indeed. We can also look at the flux through a non-closed surface (and we’ll do that shortly) but, in the context of Maxwell’s equations, we’re talking volumes and, hence, closed surfaces.

Of course, you’ll anticipate the second new operator now, because that’s the one that appears in the other two equations in Maxwell’s set of equations. It’s the cross product:

∇×E = (∂/∂x, ∂/∂y, ∂/∂z)×(Ex, Ey, Ez) = … What?

Well… The cross product is not as straightforward to write down as the dot product. We get a vector indeed, not a scalar, and its three components are:

(∇×E)z = ∇xEyE= ∂Ey/∂x – ∂Ex/∂y

(∇×E)x = ∇yEzE= ∂Ez/∂y – ∂Ey/∂z

(∇×E)y = ∇zExE= ∂Ex/∂z – ∂Ez/∂x

I know this looks pretty monstrous, but so that’s how cross products work. Please do check it out: you have to play with the order of the x, y and z subscripts. I gave the geometric formula for a dot product above, so I should also give you the same for a cross product:

A×B = |A||B|sin(θ)n

In this formula, we once again have θ, the angle between A and B, but note that, this time around, it’s the sine, not the cosine, that pops up when calculating the magnitude of this vector. In addition, we have n at the end: n is a unit vector at right angles to both A and B. It’s what makes the cross product a vector. Indeed, as you can see, multiplying by n will not alter the magnitude (|A||B|sinθ) of this product, but it gives the whole thing a direction, so we get a new vector indeed. Of course, we have two unit vectors at right angles to A and B, and so we need a rule to choose one of these: the direction of the n vector we want is given by that right-hand rule which we encountered a couple of times already.

Again, it’s two symbols but one operator really, and we also have a special name (and notation) for it:

∇×h = curl h = the curl of h

The curl is, just like the divergence, a so-called vector operator but, as mentioned above, that’s just because it involves the del operator. Just note that it acts on a vector and that its result is a vector too. What’s the geometric interpretation of the curl? Well… It’s a bit hard to describe that but let’s try. The curl describes the ‘rotation’ or ‘circulation’ of a vector field:

  1. The direction of the curl is the axis of rotation, as determined by the right-hand rule.
  2. The magnitude of the curl is the magnitude of rotation.

I know. This is pretty abstract, and I’ll probably have to come back to it in another post. As for now, just note we defined three new operators in this ‘introduction’ to vector calculus:

  1. T = grad T = a vector
  2. ∇·h = div h = a scalar
  3. ×h = curl h = a vector

That’s all. It’s all we need to understand Maxwell’s famous equations:

Maxwell's equations-2

Huh? Hmm… You’re right: understanding the symbols, to some extent, doesn’t mean we ‘understand’ these equations. What does it mean to ‘understand’ an equation? Let me quote Feynman on that: “What it means really to understand an equation—that is, in more than a strictly mathematical sense—was described by Dirac. He said: “I understand what an equation means if I have a way of figuring out the characteristics of its solution without actually solving it.” So if we have a way of knowing what should happen in given circumstances without actually solving the equations, then we “understand” the equations, as applied to these circumstances.”

We’re surely not there yet. In fact, I doubt we’ll ever reach Dirac’s understanding of Maxwell’s equations. But let’s do what we can.

In order to ‘understand’ the equations above in a more ‘physical’ way, let’s explore the concepts of divergence and curl somewhat more. We said the divergence was related to the ‘flux’ of a vector field, and the curl was related to its ‘circulation’. In my previous post, I had already illustrated those two concepts copying the following diagrams from Feynman’s Lectures:


flux = (average normal component)·(surface area)

So that’s the flux (through a non-closed surface).

To illustrate the concept of circulation, we have not one but three diagrams, shown below. Diagram (a) gives us the vector field, such as the velocity field in a liquid. In diagram (b), we imagine a tube (of uniform cross section) that follows some arbitrary closed curve. Finally, in diagram (c), we imagine we’d suddenly freeze the liquid everywhere except inside the tube. Then the liquid in the tube would circulate as shown in (c), and so that’s the concept of circulation.


We have a similar formula as for the flux:

circulation = (the average tangential component)·(the distance around)

In both formulas (flux and circulation), we have a product of two scalars: (i) the average normal component and the average tangential component (for the flux and circulation respectively) and (ii) the surface area and the distance around (again, for the flux and circulation respectively). So we get a scalar as a result. Does that make sense? When we related the concept of flux to the divergence of a vector field, we said that the flux would have a positive value if the region is a source, and a negative value if the region is a sink. So we have a number here (otherwise we wouldn’t be talking ‘positive’ or ‘negative’ values). So that’s OK. But are we talking about the same number? Yes. I’ll show they are the same in a few minutes.

But what about circulation? When we related the concept of circulation of the curl of a vector field, we introduced a vector cross product, so that yields a vector, not a scalar. So what’s the relation between that vector and the number we get when multiplying the ‘average tangential component’ and the ‘distance around’. The answer requires some more mathematical analysis, and I’ll give you what you need in a minute. Let me first make a remark about conventions here.

From what I write above, you see that we use a plus or minus sign for the flux to indicate the direction of flow: the flux has a positive value if the region is a source, and a negative value if the region is a sink. Now, why don’t we do the same for circulation? We said the curl is a vector, and its direction is the axis of rotation as determined by the right-hand rule. Why do we need a vector here? Why can’t we have a scalar taking on positive or negative values, just like we do for the flux?

The intuitive answer to this question (i.e. the ‘non-mathematical’ or ‘physical’ explanation, I’d say) is the following. Although we can calculate the flux through a non-closed surface, from a mathematical point of view, flux is effectively being defined by referring to the infinitesimal volume around some point and, therefore, we can easily, and unambiguously, determine whether we’re inside or outside of that volume. Therefore, the concepts of positive and negative values make sense, as we can define them referring to some unique reference point, which is either inside or outside of the region.

When talking circulation, however, we’re talking about some curve in space. Now it’s not so easy to find some unique reference point. We may say that we are looking at some curve from some point ‘in front of’ that curve, but some other person whose position, from our point of view, would be ‘behind’ the curve, would not agree with our definition of ‘in front of': in fact, his definition would be exactly the opposite of ours. In short, because of the geometry of the situation involved, our convention in regard to the ‘sign’ of circulation (positive or negative) becomes somewhat more complicated. It’s no longer a simple matter of ‘inward’ or ‘outward’ flow: we need something like a ‘right-hand rule’ indeed. [We could, of course, also adopt a left-hand rule but, by now, you know that, in physics, there’s not much use for a left hand. :-)]

That also ‘explains’ why the vector cross product is non-commutative: A×BB×A. To be fully precise, A×B and B×have the same magnitude but opposite direction: A×B = |A||B|sin(θ)n = –|A||B|sin(θ)(–n) = –(B×A) = B×A. The dot product, on the other hand, is fully commutative: A·B = B·A.

In fact, the concept of circulation is very much related to the concept of angular momentum which, as you’ll remember from a previous post, also involves a vector cross product.


I’ve confused you too much already. The only way out is the full mathematical treatment. So let’s go for that.


Some of the confusion as to what flux actually means in electromagnetism is probably caused by the fact that the illustration above is not a closed surface and, from my previous post, you should remember that Maxwell’s first and third equation define the flux of E and B through closed surfaces. It’s not that the formula above for the flux through a non-closed surface is wrong: it’s just that, in electromagnetism, we usually talk about the flux through a closed surface.

A closed surface has no boundary. In contrast, the surface area above does have a clear boundary and, hence, it’s not a closed surface. A sphere is an example of a closed surface. A cube is an example as well. In fact, an infinitesimally small cube is what’s used to prove a very convenient theorem, referred to as Gauss’ Theorem. We will not prove it here, but just try to make sure you ‘understand’ what it says.

Suppose we have some vector field C and that we have some closed surface S – a sphere, for example, but it may also be some very irregular volume. Its shape doesn’t matter: the only requirement is that it’s defined by a closed surface. Let’s then denote the volume that’s enclosed by this surface by V. Now, the flux through some (infinitesimal) surface element da will, effectively, be given by that formula above:

flux = (average normal component)·(surface area)

What’s the average normal component in this case? It’s given by that ΔJ/Δa2 = (ΔJ/Δa1)cosθ = h·formula, except that we just need to substitute h for C here, so we have C·n instead of h·n. To get the flux through the closed surface S, we just need to add all the contributions. Adding those contributions amounts to taking the following surface integral:

Surface integral

Now, I talked about Gauss’ Theorem above, and I said I would not prove it, but this is what Gauss’ Theorem says:

Gauss Theorem

Huh? Don’t panic. Just try to ‘read’ what’s written here. From all that I’ve said so far, you should ‘understand’ the surface integral on the left-hand side. So that should be OK. Let’s now look at the right-hand side. The right-hand side uses the divergence operator which I introduced above: ·(vector). In this case, ·C. That’s a scalar, as we know, and it represents the outward flux from an infinitesimally small cube inside the surface indeed. The volume integral on the right-hand side adds all of the fluxes out of each part (think of it as zillions of infinitesimally small cubes) of the volume V that is enclosed by the (closed) surface S. So that’s what Gauss’ Theorem is all about. In words, we can state Gauss’ Theorem as follows:

Gauss’ Theorem: The (surface) integral of the normal component of a vector (field) over a closed surface is the (volume) integral of the divergence of the vector over the volume enclosed by the surface.

Again, I said I would not prove Gauss’ Theorem, but its proof is actually quite intuitive: to calculate the flux out of a large volume, we can ‘cut it up’ in smaller volumes, and then calculate the flux out of these volumes. If we add it up, we’ll get the total flux. In any case, I’ll refer you to Feynman in case you’d want to see how it goes exactly. So far, I did what I promised to do, and that’s to relate the formula for flux (i.e. that (average normal component)·(surface area) formula) to the divergence operator. Let’s now do the same for the curl.


For non-native English speakers (like me), it’s always good to have a look at the common-sense definition of ‘curl': as a verb (to curl), it means ‘to form or cause to form into a curved or spiral shape’. As a noun (e.g. a curl of hair), it means ‘something having a spiral or inwardly curved form’. It’s clear that, while not the same, we can indeed relate this common-sense definition to the concept of circulation that we introduced above:

circulation = (the average tangential component)·(the distance around)

So that’s the (scalar) product we already mentioned above. How do we relate it to that curl operator?

Patience, please ! The illustration below shows what we actually have to do to calculate the circulation around some loop Γ: we take an infinite number of vector dot products C·ds. Take a careful look at the notation here: I use bold-face for s and, hence, ds is some little vector indeed. Going to the limit, ds becomes a differential indeed. The fact that we’re talking a vector dot product here ensures that only the tangential component of C enters the equation’, so to speak. I’ll come back to that in a moment. Just have a good look at the illustration first.


Such infinite sum of vector dot products C·dis, once again, an integral. It’s another ‘special’ integral, in fact. To be precise, it’s a line integral. Moreover, it’s not just any line integral: we have to go all around the (closed) loop to take it. We cannot stop somewhere halfway. That’s why Feynman writes it with a little loop (ο) through the integral sign (∫):

Line integral

Note the subtle difference between the two products in the integrands of the integrals above: Ctds versus C·ds. The first product is just a product of two scalars, while the second is a dot product of two vectors. Just check it out using the definition of a dot product (A·B = |A||B|cosθ) and substitute A and B by C and ds respectively, noting that the tangential component Ct equals C times cosθ indeed.

Now, once again, we want to relate this integral with that dot product inside to one of those vector operators we introduced above. In this case, we’ll relate the circulation with the curl operator. The analysis involves infinitesimal squares (as opposed to those infinitesimal cubes we introduced above), and the result is what is referred to as Stokes’ Theorem. I’ll just write it down:

Stokes Theorem

Again, the integral on the left was explained above: it’s a line integral taking around the full loop Γ. As for the integral on the right-hand side, that’s a surface integral once again but, instead of a div operator, we have the curl operator inside and, moreover, the integrand is the normal component of the curl only. Now, remembering that we can always find the normal component of a vector (i.e. the component that’s normal to the surface) by taking the dot product of that vector and the unit normal vector (n), we can write Stokes’s Theorem also as:

Stokes Theorem-2

That doesn’t look any ‘nicer’, but it’s the form in which you’ll usually see it. Once again, I will not give you any formal proof of this. Indeed, if you’d want to see how it goes, I’ll just refer you to Feynman’s Lectures. However, the philosophy behind is the same. The first step is to prove that we can break up the surface bounded by the loop Γ into a number of smaller areas, and that the circulation around Γ will be equal to the sum of the circulations around the little loops. The idea is illustrated below:

Proof Stokes

Of course, we then go to the limit and cut up the surface into an infinite number of infinitesimally small squares. The next step in the proof then shows that the circulation of around an infinitesimal square is, indeed, (i) the component of the curl of C normal to the surface enclosed by that square multiplied by (ii) the area of that (infinitesimal) square. The diagram and formula below do not give you the proof but just illustrate the idea:

Stokes proof

Stokes proof - 2

OK, you’ll say, so what? Well… Nothing much. I think you have enough to digest as for now. It probably looks very daunting, but so that’s all we need to know – for the moment that is – to arrive at a better ‘physical’ understanding of Maxwell’s famous equations. I’ll come back to them in my next post. Before proceeding to the summary of this whole post, let me just write down Stokes’ Theorem in words:

Stokes’ TheoremThe line integral of the tangential component of a vector (field) around a closed loop is equal to the surface integral of the normal component of the curl of that vector over any surface which is bounded by the loop.


We’ve defined three so-called vector operators, which we’ll use very often in physics:

  1. T = grad T = a vector
  2. ∇·h = div h = a scalar
  3. ×h = curl h = a vector

Moreover, we also explained three important theorems, which we’ll use as least as much:

[1] The First Theorem:

Line integral - expression -2

[2] Gauss Theorem:

Gauss Theorem-2

[3] Stokes Theorem:

Stokes Theorem-2

As said, we’ll come back to them in my next post. As for now, just try to familiarize yourself with these div and curl operators. Try to ‘understand’ them as good as you can. Don’t look at them as just some weird mathematical definition: try to understand them in a ‘physical’ way, i.e. in a ‘completely unmathematical, imprecise, and inexact way’, remembering that’s what it takes to understand to truly understand physics. :-)

Applied vector analysis (I)

Back to tedious stuff: an introduction to electromagnetism

It seems I skipped too many chapters in Feynman’s second volume of Lectures (on electromagnetism) and so I have to return to that before getting back to quantum physics. So let me just do that in the next couple of posts. I’ll have to start with the basics: Maxwell’s equations.

Indeed, electromagnetic phenomena are described by a set of four equations known as Maxwell’s equations. They relate two fields: the electric field (E) and the magnetic field (B). The electric field appears when we have electric charges: positive (e.g. protons or positively charged ions) or negative (e.g. electrons or negatively charged ions). That’s obvious.

In contrast, there is no such thing as ‘magnetic charges’. The magnetic field appears only when the electric field changes, or when charges move. In turn, the change in the magnetic field causes an electric field, and that’s how electromagnetic radiation basically works: a changing electric field causes a magnetic field, and the build-up of that magnetic field (so that’s a changing magnetic field) causes a build-up of an electric field, and so on and so on.

OK. That’s obvious too. But how does it work exactly? Before explaining this, I need to point out some more ‘obvious’ things:

1. From Maxwell’s equations, we can calculate the magnitude of E and B. Indeed, a specific functional form for E and is what we get when we solve Maxwell’s set of equations, and we’ll jot down that solution in a moment–even if I am afraid you will shake your head when you see it. The point to note is that what we get as a solution for E and B is a solution in a particular frame of reference only: if we switch to another reference frame, E and B will look different.

Huh? Yes. According to the principle of relativity, we cannot say which charges are ‘stationary’ and which charges are ‘moving’ in any absolute sense: it all depends on our frame our reference.

But… Yes? Then if we put an electric charge in these fields, the force on it will also be different?

Yes. Forces also look different when moving from one reference to another.

But… Yes? The physical effect surely has to be the same, regardless of the reference frame?

Yes. The point is that, if we look at an electric charge q moving along a current-carrying wire in a coordinate system at rest with respect to the wire, with the same velocity (v0) as the conduction electrons (v), then the whole force on the electric charge will be ‘magnetic': F = qv0×B and E = 0. Now, if we’re looking at the same situation from a frame of reference that is moving with q, then our charge is at rest, and so there can be no magnetic force on it. Hence, the force on it must come from an electric field! But what produces the electric field? Our current-carrying wire is supposed to be neutral!

Well… It turns out that our ‘neutral’ wire appears to be charged when moving. We’ll explain – in very much detail – why this is so later. Now, you should just note that “we should not attach too much reality to E and B, because they appear in different ‘mixtures’ in different coordinate systems”, as Feynman puts it. In fact, you may or may not heard that magnetism is actually nothing but a “relativistic effect” of electricity. Well… That’s true, but we’ll also explain how that works later only. Let’s not jump the gun.

2. The remark above is related to the other ‘obvious’ thing I wanted to say before presenting Maxwell’s equations: fields are very useful to describe what’s going on but, when everything is said and done, what we really want to know is what force will be acting on a charge, because that’s what’s going to tell us how that charge is going to move. In other words, we want to find the equations of motion, and the force determines how the charge’s momentum will change: F = dp/dt = d(mv)/dt (i.e. Newton’s equation of motion).

So how does that work? We’ve given the formula before:

F = q(E + v×B) = qE + q(v×B)

This is a sum of two vectors:

  1. qE is the ‘electric force: that force is in the same direction as the electric field, but with a magnitude equal to q times E. [Note I use a bold letter (E) for a vector (which we may define as some quantity with a direction) and a non-bold letter (E) for its magnitude.]
  2. q(v×B) is the ‘magnetic’ force: that force depends on both v as well as on B. Its direction is given by the so-called right-hand rule for a vector cross-product (as opposed to a dot product, which is denoted by a dot (·) and which yields a scalar instead of a new vector).

That right-hand rule is illustrated below. Note that, if we switch a and b, the b×a vector will point downwards. The magnitude of q(v×B) is given by |v×B| = |v||B|sinθ (with θ the angle between v and B).    


We know the direction of (because we’re talking about some charge that is moving here) but what direction is B? It’s time to be a bit more systematic now.

Flux and circulation

In order to understand Maxwell’s equations, one needs to understand two concepts related to a vector field: flux and circulation. The two concepts are best illustrated referring to a vector field describing the flow of a liquid:

1. If we have a surface, the flux will give us the net amount of fluid going out through the surface per unit time. The illustration below (which I took from Feynman’s Lectures) gives us not only the general idea but a formal definition as well:


2. The concept of circulation is linked to the idea of some net rotational motion around some loop. In fact, that’s exactly what it describes. I’ll again use Feynman’s illustration (and description) because I couldn’t find anything better.

circulation-1 circulation-2 circulation-3

Diagram (a) gives us the velocity field in the liquid. Now, imagine a tube (of uniform cross section) that follows some arbitrary closed curve, like in (b), and then imagine we’d suddenly freeze the liquid everywhere except inside the tube: the liquid in the tube would circulate as shown in (c). Formally, the circulation is defined as:

circulation = (the average tangential component)·(the distance around)

OK. So far, so good. Back to electromagnetism.

E and B

We’re familiar with the electric field E from our high school physics course. Indeed, you’ll probably recognize the two examples below: (a) a (positive) charge near a (neutral) conducting sheet, and (b) two opposite charges next to each other. Note the convention: the field lines emanate from the positive charge. Does that mean that the force is in that direction too? Yes. But remember: if a particle is attracted to another, the latter particle is attracted to the former too! So there’s a force in both directions !


What more can we say about this? Well… It is clear that the field E is directed radially. In terms of our flux and circulation concepts, we say that there’s an outgoing flux from the (positive) point charge. Furthermore, it would seem to be pretty obvious (we’d need to show why, but we won’t do that here: just look at Coulomb’s Law once again) that the flux should be proportional to the charge, and it is: if we double the charge, the flux doubles too. That gives us Maxwell’s first equation:

flux of E through a closed surface = (the net charge inside)/ε0

Note we’re talking a closed surface here, like a sphere for example–but it does not have to be a nice symmetric shape: Maxwell’s first equation is valid for any closed surface. The expression above is Coulomb’s Law, which you’ll also surely remember from your high school physics course: while it looks very different, it’s the same. It’s just because we’re using that flux concept here that we seem to be getting an entirely different expression. But so we’re not: it’s the same as Coulomb’s Law.

As for the ε0 factor, that’s just a constant that depends on the units we’re using to measure what we write above, so don’t worry about it. [I am noting it here because you’ll see it pop up later too.]

For B, we’ve got a similar-looking law:

flux of B through a closed surface = 0 (= zero = nil)

That’s not the same, you’ll say. Well… Yes and no. It’s the same really, but the zero on the right-hand side of the expression above says there’s no such thing as a ‘magnetic’ charge.

Hmm… But… If we can’t create any flux of B, because ‘magnetic charges’ don’t exist, so how do we get magnetic fields then? 

Well… We wrote that above already, and you should remember it from your high school physics course as well: a magnetic field is created by (1) a moving charge (i.e. a flow or flux of electric current) or (2) a changing electric field.

Situation (1) is illustrated below: the current in the wire creates some circulation of B around the wire. How much? Not much: the magnetic effect is very small as compared to the electric effect (that has to do with magnetism being a relativistic effect of electricity but, as mentioned above, I’ll explain that later only). To be precise, the equation is the following:

c2(circulation of B)= (flux of electric current)/ε0

magnetic field wire

That c2 factor on the left-hand side becomes 1/c2 if we move it to the other side and, yes, is the speed of light here – so you can see we’re talking a very small amount of circulation only indeed! [As for the ε0 factor, that’s just the same constant: it’s got to do with the units we’re using to measure stuff.]

One last point perhaps: what’s the direction of the circulation? Well… There’s a so-called right-hand grip rule for that, which is illustrated below.


OK. Enough about this. Let’s go to situation (2): a changing electric field. That effect is usually illustrated with Faraday’s original 1831 experiment, which is shown below with a more modern voltmeter :-) : when the wire on one side of the iron ring is connected to the battery, we’ll see a transient current on the other side. It’s transient only, so the current quickly disappears. That’s why transformers don’t work with DC. In fact, it is said that Faraday was quite disappointed to see that the current didn’t last! Likewise, when the wire is disconnected, we’ll briefly see another transient current.


So this effect is due to the changing electric field, which causes a changing magnetic field. But so where is that magnetic field? We’re talking currents here, aren’t we? Yes, you’re right. To understand why we have a transient current in the voltmeter, you need to understand yet another effect: a changing magnetic field causes an electric field, and so that’s what actually generates the transient current. However, what’s going on in the iron ring is the magnetic effect, and so that’s caused by the changing electric field as we connect/disconnect the battery to the wire. Capito?

I guess so… So what’s the equation that captures this situation, i.e. situation (2)? That equation involves both flux and circulation, so we’ll have a surface (S) as well as a curve (C). The equation is the following one: for any surface S (not closed this time because, if the surface was closed, it wouldn’t have an edge!), we have:

c2(circulation of B around C)= d(flux of E through S)/dt

I mentioned above that the reverse is also true. A changing magnetic field causes an electric field, and the equation for that looks very similar, except that we don’t have the c2 factor:

circulation of around = d(flux of through S)/dt

Let me quickly mention the presence of absence of that c2 or 1/c2 factor in the previous equations once again. It is interesting. It’s got nothing to do with the units. It’s really a proportionality factor: any change in E will only cause a little change in (because of the 1/c2 factor in the first equation), but the reverse is not true: there’s no c2  in the second equation. Again, it’s got to do with magnetism being a relativistic effect of electricity, so the magnetic effect is, in most cases, tiny as compared to the electric effect, except when we’re talking charges that are moving at relativistic speeds (i.e. speeds close to c). As said, we’ll come back to that–later, much later. Let’s get back to Maxwell’s equations first.

Maxwell’s equations

We can now combine all of the equations above in one set, and so these are Maxwell’s four famous equations:

  1. The flux of E through a closed surface = (the net charge inside)/ε0
  2. The circulation of E around = d(flux of through S)/dt (with the curve or edge around S)
  3. The flux of B through a closed surface = 0
  4. c2(circulation of B around C)= d(flux of E through S)/dt + (flux of electric current)/ε0

From a mathematical point of view, this is a set of differential equations, and they are not easy to grasp intuitively. As Feynman puts it: “The laws of Newton were very simple to write down, but they had a lot of complicated consequences and it took us a long time to learn about them all. These laws are not nearly as simple to write down, which means that the consequences are going to be more elaborate and it will take us quite a lot of time to figure them all out.”

Indeed, Feynman needs about twenty (!) Lectures in that second Volume to show what it all implies, as he walks us through electrostatics, magnetostatics and various other ‘special’ cases before giving us the ‘complete’ or ‘general’ solution to the equations. This ‘general’ solution, in mathematical notation, is the following:

Maxwell's equations

Huh? What’s that? Well… The four equations are the equations we explained already, but this time in mathematical notation: flux and circulation can be expressed much more elegantly using the differential operator  indeed. As for the solutions to Maxwell’s set of equations, you can see they are expressed using two other concepts: the scalar potential Φ and the vector potential A.

Now, it is not my intention to summarize two dozen of Feynman’s Lectures in just a few lines, so I’ll have to leave you here for the moment.


Huh? What? What about my promise to show that magnetism is a relativistic effect of electricity indeed?

Well… I wanted to do that just now, but when I look at it, I realize that I’d end up copying most of Feynman’s little exposé on it and, hence, I’ll just refer you to that particular section. It’s really quite exciting but – as you might expect – it does take a bit of time to wrestle through it.

That being said, it really does give you a kind of an Aha-Erlebnis and, therefore, I really warmly recommend it ! Just click on the link ! :-)

Back to tedious stuff: an introduction to electromagnetism

Amplitudes and statistics

When re-reading Feynman’s ‘explanation’ of Bose-Einstein versus Fermi-Dirac statistics (Lectures, Vol. III, Chapter 4), and my own March 2014 post summarizing his argument, I suddenly felt his approach raises as many questions as it answers. So I thought it would be good to re-visit it, which is what I’ll do here. Before you continue reading, however, I should warn you: I am not sure I’ll manage to do a better job now, as compared to a few months ago. But let me give it a try.

Setting up the experiment

The (thought) experiment is simple enough: what’s being analyzed is the (theoretical) behavior of two particles, referred to as particle a and particle b respectively that are being scattered into  two detectors, referred to as 1 and 2. That can happen in two ways, as depicted below: situation (a) and situation (b). [And, yes, it’s a bit confusing to use the same letters a and b here, but just note the brackets and you’ll be fine.] It’s an elastic scattering and it’s seen in the center-of-mass reference frame in order to ensure we can analyze it using just one variable, θ, for the angle of incidence. So there is no interaction between those two particles in a quantum-mechanical sense: there is no exchange of spin (spin flipping) nor is there any exchange of energy–like in Compton scattering, in which a photon gives some of its energy to an electron, resulting in a Compton shift (i.e. the wavelength of the scattered photon is different than that of the incoming photon). No, it’s just what it is: two particles deflecting each other. […] Well… Maybe. Let’s fully develop the argument to see what’s going on.

Identical particles-aIdentical particles-b

First, the analysis is done for two non-identical particles, say an alpha particle (i.e. a helium nucleus) and then some other nucleus (e.g. oxygen, carbon, beryllium,…). Because of the elasticity of the ‘collision’, the possible outcomes of the experiment are binary: if particle a gets into detector 1, it means particle b will be picked up by detector 2, and vice versa. The first situation (particle a gets into detector 1 and particle b goes into detector 2) is depicted in (a), i.e. the illustration on the left above, while the opposite situation, exchanging the role of the particles, is depicted in (b), i.e. the illustration on the right-hand side. So these two ‘ways’ are two different possibilities which are distinguishable not only in principle but also in practice, for non-identical particles that is (just imagine a detector which can distinguish helium from oxygen, or whatever other substance the other particle is). Therefore, strictly following the rules of quantum mechanics, we should add the probabilities of both events to arrive at the total probability of some particle (and with ‘some’, I mean particle a or particle b) ending up in some detector (again, with ‘some’ detector, I mean detector 1 or detector 2).

Now, this is where Feynman’s explanation becomes somewhat tricky. The whole event (i.e. some particle ending up in some detector) is being reduced to two mutually exclusive possibilities that are both being described by the same (complex-valued) wave function f, which has that angle of incidence as its argument. To be precise: the angle of incidence is θ for the first possibility and it’s π–θ for the second possibility. That being said, it is obvious, even if Feynman doesn’t mention it, that both possibilities actually represent a combination of two separate things themselves:

  1. For situation (a), we have particle a going to detector 1 and particle b going to detector 2. Using Dirac’s so-called bra-ket notation, we should write 〈1|a〉〈2|b〉 = f(θ), with f(θ) a probability amplitude, which should yield a probability when taking its absolute square: P(θ) = |f(θ)|2.
  2. For situation (b), we have particle b going to detector 1 and particle a going to 2, so we have 〈1|b〉〈2|a〉, which Feynman equates with f(π–θ), so we write 〈1|b〉〈2|a〉 = 〈2|a〉〈1|b〉 = f(π–θ).

Now, Feynman doesn’t dwell on this–not at all, really–but this casual assumption–i.e. the assumption that situation (b) can be represented by using the same wave function f–merits some more reflection. As said, Feynman is very brief on it: he just says situation (b) is the same situation as (a), but then detector 1 and detector 2 being switched (so we exchange the role of the detectors, I’d say). Hence, the relevant angle is π–θ and, of course, it’s a center-of-mass view again so if a goes to 2, then b has to go to 1. There’s no Third Way here. In short, a priori it would seem to be very obvious indeed to associate only one wave function (i.e. that (complex-valued) f(θ) function) with the two possibilities: that wave function f yields a probability amplitude for θ and, hence, it should also yield some (other) probability amplitude for π–θ, i.e. for the ‘other’ angle. So we have two probability amplitudes but one wave function only.

You’ll say: Of course! What’s the problem? Why are you being fussy? Well… I think these assumptions about f(θ) and f(π–θ) representing the underlying probability amplitudes are all nice and fine (and, yes, they are very reasonable indeed), but I also think we should take them for what they are at this moment: assumptions.

Huh? Yes. At this point, I would like to draw your attention to the fact that the only thing we can measure are real-valued possibilities. Indeed, when we do this experiment like a zillion times, it will give us some real number P for the probability that a goes to 1 and b goes to 2 (let me denote this number as P(θ) = Pa→1 and b→2), and then, when we change the angle of incidence by switching detector 1 and 2, it will also give us some (other) real number for the probability that a goes to 2 and b goes to 1 (i.e. a number which we can denote as P(π–θ) = Pa→2 and b→1). Now, while it would seem to be very reasonable that the underlying probability amplitudes are the same, we should be honest with ourselves and admit that the probability amplitudes are something we cannot directly measure.

At this point, let me quickly say something about Dirac’s bra-ket notation, just in case you haven’t heard about it yet. As Feynman notes, we have to get away from thinking too much in terms of wave functions traveling through space because, in quantum mechanics, all sort of stuff can happen (e.g. spin flipping) and not all of it can be analyzed in terms of interfering probability amplitudes. Hence, it’s often more useful to think in terms of a system being in some state and then transitioning to some other state, and that’s why that bra-ket notation is so helpful. We have to read these bra-kets from right to left: the part on the right, e.g. |a〉, is the ket and, in this case, that ket just says that we’re looking at some particle referred to as particle a, while the part on the left, i.e. 〈1|, is the bra, i.e. a shorthand for particle a having arrived at detector 1. If we’d want to be complete, we should write:

〈1|a〉 = 〈particle a arrives at detector 1|particle a leaves its source〉

Note that 〈1|a〉 is some complex-valued number (i.e. a probability amplitude) and so we multiply it here with some other complex number, 〈2|b〉, because it’s two things happening together. As said, don’t worry too much about it. Strictly speaking, we don’t need wave functions and/or probability amplitudes to analyze this situation because there is no interaction in the quantum-mechanical sense: we’ve got a scattering process indeed (implying some randomness in where those particles end up, as opposed to what we’d have in a classical analysis of two billiard balls colliding), but we do not have any interference between wave functions (probability amplitudes) here. We’re just introducing the wave function f because we want to illustrate the difference between this situation (i.e. the scattering of non-identical particles) and what we’d have if we’d be looking at identical particles being scattered.

At this point, I should also note that this bra-ket notation is more in line with Feynman’s own so-called path integral formulation of quantum mechanics, which is actually implicit in his line of argument: rather than thinking about the wave function as representing the (complex) amplitude of some particle to be at point x in space at point t in time, we think about the amplitude as something that’s associated with a path, i.e. one of the possible itineraries from the source (its origin) to the detector (its destination). That explains why this f(θ) function doesn’t mention the position (x) and space (t) variables. What x and t variables would we use anyway? Well… I don’t know. It’s true the position of the detectors is fully determined by θ, so we don’t need to associate any x or t with them. Hence, if we’d be thinking about the space-time variables, then we should be talking the position in space and time of both particle a and particle b. Indeed, it’s easy to see that only a slight change in the horizontal (x) or vertical position (y) of either particle would ensure that both particles do not end up in the detectors. However, as mentioned above, Feynman doesn’t even mention this. Hence, we must assume that any randomness in any x or t variable is captured by that wave function f, which explains why this is actually not a classical analysis: so, in short, we do not have two billiard balls colliding here.

Hmm… You’ll say I am a nitpicker. You’ll say that, of course, any uncertainty is indeed being incorporated in the fact that we represent what’s going on by a wave function f which we cannot observe directly but whose absolute square represents a probability (or, to use precise statistical terminology, a probability density), which we can measure: P = |f(θ)|2 = f(θ)·f*(θ), with f* the complex conjugate of the complex number f. So… […] What? Well… Nothing. You’re right. This thought experiment describes a classical situation (like two billiard balls colliding) and then it doesn’t, because we cannot predict the outcome (i.e. we can’t say where the two billiard balls are going to end up: we can only describe the likely outcome in terms of probabilities Pa→1 and b→2 = |f(θ)|and Pa→2 and b→1 = |f(π–θ)|2. Of course, needless to say, the normalization condition should apply: if we add all probabilities over all angles, then we should get 1, we can write: ∫|f(θ)|2dθ = ∫f(θ)·f*(θ)dθ = 1. So that’s it, then?

No. Let this sink in for a while. I’ll come back to it. Let me first make a bit of a detour to illustrate what this thought experiment is supposed to yield, and that’s a more intuitive explanation of Bose-Einstein statistics and Fermi-Dirac statistics, which we’ll get out of the experiment above if we repeat it using identical particles. So we’ll introduce the terms Bose-Einstein statistics and Fermi-Dirac statistics. Hence, there should also be some term for the reference situation described above, i.e a situation in which we non-identical particles are ‘interacting’, so to say, but then with no interference between their wave functions. So, when everything is said and done, it’s a term we should associate with classical mechanics. It’s called Maxwell-Boltzmann statistics.

Huh? Why would we need ‘statistics’ here? Well… We can imagine many particles engaging like this–just colliding elastically and, thereby, interacting in a classical sense, even if we don’t know where exactly they’re going to end up, because of uncertainties in initial positions and what have you. In fact, you already know what this is about: it’s the behavior of particles as described by the kinetic theory of gases (often referred to as statististical mechanics) which, among other things, yields a very elegant function for the distribution of the velocities of gas molecules, as shown below for various gases (helium, neon, argon and xenon) at one specific temperature (25º C), i.e. the graph on the left-hand side, or for the same gas (oxygen) at different temperatures (–100º C, 20º C and 600º C), i.e. the graph on the right-hand side.

Now, all these density functions and what have you are, indeed, referred to as Maxwell-Boltzmann statistics, by physicists and mathematicians that is (you know they always need some special term in order to make sure other people (i.e. people like you and me, I guess) have trouble understanding them).

700px-MaxwellBoltzmann-en 800px-Maxwell-Boltzmann_distribution_1

In fact, we get the same density function for other properties of the molecules, such as their momentum and their total energy. It’s worth elaborating on this, I think, because I’ll later compare with Bose-Einstein and Fermi-Dirac statistics.

Maxwell-Boltzmann statistics

Kinetic gas theory yields a very simple and beautiful theorem. It’s the following: in a gas that’s in thermal equilibrium (or just in equilibrium, if you want), the probability (P) of finding a molecule with energy E is proportional to e–E/kT, so we have:

P ∝ e–E/kT

Now that’s a simple function, you may think. If we treat E as just a continuous variable, and T as some constant indeed – hence, if we just treat (the probability) P as a function of (the energy) E – then we get a function like the one below (with the blue, red and green using three different values for T).


So how do we relate that to the nice bell-shaped curves above? The very simple graphs above seem to indicate the probability is greatest for E = 0, and then just goes down, instead of going up initially to reach some maximum around some average value and then drop down again. Well… The fallacy here, of course, is that the constant of proportionality is itself dependent on the temperature. To be precise, the probability density function for velocities is given by:

Boltzmann distribution

The function for energy is similar. To be precise, we have the following function:

Boltzmann distribution-energy

This (and the velocity function too) is a so-called chi-squared distribution, and ϵ is the energy per degree of freedom in the system. Now these functions will give you such nice bell-shaped curves, and so all is alright. In any case, don’t worry too much about it. I have to get back to that story of the two particles and the two detectors.

However, before I do so, let me jot down two (or three) more formulas. The first one is the formula for the expected number 〈Ni〉 of particles occupying energy level ε(and the brackets here, 〈Ni〉, have nothing to do with the bra-ket notation mentioned above: it’s just a general notation for some expected value):

Boltzmann distribution-no of particlesThis formula has the same shape as the ones above but we brought the exponential function down, into the denominator, so the minus sign disappears. And then we also simplified it by introducing that gi factor, which I won’t explain here, because the only reason why I wanted to jot this down is to allow you to compare this formula with the equivalent formula when (a) Fermi-Dirac and (b) Bose-Einstein statistics apply:

B-E and F-D distribution-no of particles

Do you see the difference? The only change in the formula is the ±1 term in the denominator: we have a minus one (–1) for Fermi-Dirac statistics and a plus one (+1) for Bose-Einstein statistics indeed. That’s all. That’s the difference with Maxwell-Boltzmann statistics.

Huh? Yes. Think about it, but don’t worry too much. Just make a mental note of it, as it will be handy when you’d be exploring related articles. [And, of course, please don’t think I am bagatellizing the difference between Maxwell-Boltzmann, Bose-Einstein and Fermi-Dirac statistics here: that ±1 term in the denominator is, obviously, a very important difference, as evidenced by the consequences of formulas like the one above: just think about the crowding-in effect in lasers as opposed to the Pauli exclusion principle, for example. :-)]

Setting up the experiment (continued)

Let’s get back to our experiment. As mentioned above, we don’t really need probability amplitudes in the classical world: ordinary probabilities, taking into account uncertainties about initial conditions only, will do. Indeed, there’s a limit to the precision with which we can measure the position in space and time of any particle in the classical world as well and, hence, we’d expect some randomness (as captured in the scattering phenomenon) but, as mentioned above, ordinary probabilities would do to capture that. Nevertheless, we did associate probability amplitudes with the events described above in order to illustrate the difference with the quantum-mechanical world. More specifically, we distinguished:

  1. Situation (a): particle a goes to detector 1 and b goes to 2, versus
  2. Situation (b): particle a goes to 2 and b goes to 1.

In our bra-ket notation:

  1. 〈1|a〉〈2|b〉 = f(θ), and
  2. 〈1|b〉〈2|a〉 = f(π–θ).

The f(θ) function is a quantum-mechanical wave function. As mentioned above, while we’d expect to see some space (x) and time (t) variables in it, these are, apparently, already captured by the θ variable. What about f(π–θ)? Well… As mentioned above also, that’s just the same function as f(θ) but using the angle π–θ as the argument. So, the following remark is probably too trivial to note but let me do it anyway (to make sure you understand what we’re modeling here really): while it’s the same function f, the values f(θ) and f(π–θ) are, of course, not necessarily equal and, hence, the corresponding probabilities are also not necessarily the same. Indeed, some angles of scattering may be more likely than others. However, note that we assume that the function f itself is  exactly the same for the two situations (a) and (b), as evidenced by that normalization condition we assume to be respected: if we add all probabilities over all angles, then we should get 1, so ∫|f(θ)|2dθ = ∫f(θ)·f*(θ)dθ = 1.

So far so good, you’ll say. However, let me ask the same critical question once again: why would we use the same wave function f for the second situation? 

Huh? You’ll say: why wouldn’t we? Well… Think about it. Again, how do we find that f(θ) function? The assumption here is that we just do the experiment a zillion times while varying the angle θ and, hence, that we’ll find some average corresponding to P(θ), i.e. the probability. Now, the next step then is to equate that average value to |f(θ)|obviously, because we have this quantum-mechanical theory saying probabilities are the absolute square of probability amplitudes. And,  so… Well… Yes. We then just take the square root of the P function to find the f(θ) function, isn’t it?

Well… No. That’s where Feynman is not very accurate when it comes to spelling out all of the assumptions underpinning this thought experiment. We should obviously watch out here, as there’s all kinds of complications when you do something like that. To a large extent (perhaps all of it), the complications are mathematical only.

First, note that any number (real or complex, but note that |f(θ)|2 is a real number) has two distinct real square roots: a positive and a negative one: x = ± √x2. Secondly, we should also note that, if f(θ) is a regular complex-valued wave function of x and t and θ (and with ‘regular’, we mean, of course, that’s it’s some solution to a Schrödinger (or Schrödinger-like) equation), then we can multiply it with some random factor shifting its phase Θ (usually written as Θ = kx–ωt+α) and the square of its absolute value (i.e. its squared norm) will still yield the same value. In mathematical terms, such factor is just a complex number with a modulus (or length or norm–whatever terminology you prefer) equal to one, which we can write as a complex exponential: eiα, for example. So we should note that, from a mathematical point of view, any function eiαf(θ) will yield the same probabilities as f(θ). Indeed,

|f(θ)|= |eiαf(θ)|= (|eiα||f(θ)|)= |eiα|2|f(θ)|= 12|f(θ)|2

Likewise, while we assume that this function f(π–θ) is the same function f as that f(θ) function, from a mathematical point of view, the function eiβf(π–θ) would do just as well, because its absolute square yields the very same (real) probability |f(π–θ)|2. So the question as to what wave function we should take for the probability amplitude is not as easy to answer as you may think. Huh? So what function should we take then? Well… We don’t know. Fortunately, it doesn’t matter, for non-identical particles that is. Indeed, when analyzing the scattering of non-identical particles, we’re interested in the probabilities only and we can calculate the total probability of particle a ending up in detector 1 or 2 (and, hence, particle b ending up in detector 2 or 1) as the following sum:

|eiαf(θ)|2 +|eiβf(π–θ)|= |f(θ)|2 +|f(π–θ)|2.

In other words, for non-identical particles, these phase factors (eiα or eiβ) don’t matter and we can just forget about them.

However, and that’s the crux of the matter really, we should mention them, of course, in case we’d have to add the probability amplitudeswhich is exactly what we’ll have to do when we’re looking at identical particles, of course. In fact, in that case (i.e. when these phase factors eiα and eiβ will actually matter), you should note that what matters really is the phase difference, so we could replace α and β with some δ (which is what we’ll do below).

However, let’s not put the cart before the horse and conclude our analysis of what’s going on when we’re considering non-identical parties: in that case, this phase difference doesn’t matter. And the remark about the positive and negative square root doesn’t matter either. In fact, if you want, you can subsume it under the phase difference story by writing eiα as eiα = ± 1. To be more explicit: we could say that –f(θ) is the probability amplitude, as |–f(θ)|is also equal to that very same real number |f(θ)|2. OK. Done.

Bose-Einstein and Fermi-Dirac statistics

As I mentioned above, the story becomes an entirely different one when we’re doing the same experiment with identical particles. At this point, Feynman’s argument becomes rather fuzzy and, in my humble opinion, that’s because he refused to be very explicit about all of those implicit assumptions I mentioned above. What I can make of it, is the following:

1. We know that we’ll have to add probability amplitudes, instead of probabilities, because we’re talking one event that can happen in two indistinguishable ways. Indeed, for non-identical particles, we can, in principle (and in practice) distinguish situation (a) and (b) – and so that’s why we only have to add some real-valued numbers representing probabilities – but so we cannot do do that for identical particles.

2. Situation (a) is still being described by some probability amplitude f(θ). We don’t know what function exactly, but we assume there is some unique wave function f(θ) out there that accurately describes the probability amplitude of particle a going to 1 (and, hence, particle b going to 2), even if we can’t tell which is a and which is b. What about the phase factor? Well… We just assume we’ve chosen our t such that α = 0. In short, the assumption is that situation (a) is represented by some probability amplitude (or wave function, if you prefer that term) f(θ).

3. However, a (or some) particle (i.e. particle a or particle b) ending up in a (some) detector (i.e. detector 1 or detector 2) may come about in two ways that cannot be distinguished one from the other. One is the way described above, by that wave function f(θ). The other way is by exchanging the role of the two particles. Now, it would seem logical to associate the amplitude f(π–θ) with the second way. But we’re in the quantum-mechanical world now. There’s uncertainty, in position, in momentum, in energy, in time, whatever. So we can’t be sure about the phase. That being said, the wave function will still have the same functional form, we must assume, as it should yield the same probability when squaring. To account for that, we will allow for a phase factor, and we know it will be important when adding the amplitudes. So, while the probability for the second way (i.e. the square of its absolute value) should be the same, its probability amplitude does not necessarily have to be the same: we have to allow for positive and negative roots or, more generally, a possible phase shift. Hence, we’ll write the probability amplitude as eiδf(π–θ) for the second way. [Why do I use δ instead of β? Well… Again: note that it’s the phase difference that matters. From a mathematical point of view, it’s the same as inserting an eiβ factor: δ can take on any value.]

4. Now it’s time for the Big Trick. Nature doesn’t matter about our labeling of particles. If we have to multiply the wave function (i.e. f(π–θ), or f(θ)–it’s the same: we’re talking a complex-valued function of some variable (i.e. the angle θ) here) with a phase factor eiδ when exchanging the roles of the particles (or, what amounts to the same, exchanging the role of the detectors), we should get back to our point of departure (i.e. no exchange of particles, or detectors) when doing that two times in a row, isn’t it? So we exchange the role of particle a and b in this analysis (or the role of the detectors), and then we’d exchange their roles once again, then there’s no exchange of roles really and we’re back at the original situation. So we must have eiδeiδf(θ) = f(θ) (and eiδeiδf(π–θ) = f(π–θ) of course, which is exactly the same statement from a mathematical point of view).

5. However, that means (eiδ)= +1, which, in turn, implies that eiδ is plus or minus one: eiδ = ± 1. So that means the phase difference δ must be equal to 0 or π (or –π, which is the same as +π).

In practical terms, that means we have two ways of combining probability amplitudes for identical particles: we either add them or, else, we subtract them. Both cases exist in reality, and lead to the dichotomy between Bose and Fermi particles:

  1. For Bose particles, we find the total probability amplitude for this scattering event by adding the two individual amplitudes: f(θ) + f(π–θ).
  2. For Fermi particles, we find the total probability amplitude for this scattering event by subtracting the two individual amplitudes: f(θ) – f(π–θ).

As compared to the probability for non-identical particles which, you’ll remember, was equal to |f(θ)|2 +|f(π–θ)|2, we have the following Bose-Einstein and Fermi-Dirac statistics:

  1. For Bose particles: the combined probability is equal to |f(θ) + f(π–θ)|2. For example, if θ is 90°, then we have a scattering probability that is exactly twice the probability for non-identical particles. Indeed, if θ is 90°, then f(θ) = f(π–θ), and then we have |f(π/2) + f(π/2)|2 = |2f(π/2)|2 = 4|f(π/2)|2. Now, that’s two times |f(π/2)|2 +|f(π/2)|2 = 2|f(π/2)|2 indeed.
  2. For Fermi particles (e.g. electrons), we have a combined probability equal to |f(θ) – f(π–θ)|2. Again, if θ is 90°, f(θ) = f(π–θ), and so it would mean that we have a combined probability which is equal to zero ! Now, that‘s a strange result, isn’t it? It is. Fortunately, the strange result has to be modified because electrons will also have spin and, hence, in half of the cases, the two electrons will actually not be identical but have opposite spin. That changes the analysis substantially (see Feynman’s Lectures, III-3-12). To be precise, if we take the spin factor into, we’ll find a total probability (for θ = 90°) equal to |f(π/2)|2, so that’s half of the probability for non-identical particles.

Hmm… You’ll say: Now that was a complicated story! I fully agree. Frankly, I must admit I feel like I still don’t quite ‘get‘ the story with that phase shift eiδ, in an intuitive way that is (and so that’s the reason for going through the trouble of writing out this post). While I think it makes somewhat more sense now (I mean, more than when I wrote a post on this in March), I still feel I’ve only brought some of the implicit assumptions to the fore. In essence, what we’ve got here is a mathematical dichotomy (or a mathematical possibility if you want) corresponding to what turns out to be an actual dichotomy in Nature: in quantum-mechanics, particles are either bosons or fermions. There is no Third Way, in quantum-mechanics that is (there is a Third Way in reality, of course: that’s the classical world!).

I guess it will become more obvious as I’ll get somewhat more acquainted with the real arithmetic involved in quantum-mechanical calculations over the coming weeks. In short, I’ve analyzed this thing over and over again, but it’s still not quite clear me. I guess I should just move on and accept that:

  1. This explanation ‘explains’ the experimental evidence, and that’s different probabilities for identical particles as compared to non-identical particles.
  2. This explanation ‘complements’ analyses such as that 1916 analysis of blackbody radiation by Einstein (see my post on that), which approaches interference from an angle that’s somewhat more intuitive.

A numerical example

I’ve learned that, when some theoretical piece feels hard to read, an old-fashioned numerical example often helps. So let’s try one here. We can experiment with many functional forms but let’s keep things simple. From the illustration (which I copy below for your convenience), that angle θ can take any value between −π and +π, so you shouldn’t think detector 1 can only be ‘north’ of the collision spot: it can be anywhere.

Identical particles-a

Now, it may or may not make sense (and please work out other examples than this one here), but let’s assume particle a and b are more likely to go in a line that’s more or less straight. In other words, the assumption is that both particles deflect each other only slightly, or even not at all. After all, we’re talking ‘point-like’ particles here and so, even when we try hard, it’s hard to make them collide really.

That would amount to a typical bell-shaped curve for that probability density curve P(θ): one like the blue curve below. That one shows that the probability of particle a and b just bouncing back (i.e. θ ≈ ±π) is (close to) zero, while it’s highest for θ ≈ 0, and some intermediate value for anything angle in-between. The red curve shows P(π–θ), which can be found by mirroring the P(θ) around the vertical axis, which yields the same function because the function is symmetrical: P(θ) = P(–θ), and then shifting it by adding the vertical distance π. It should: it’s the second possibility, remember? Particle a ending up in detector 2. But detector 2 is positioned at the angle π–θ and, hence, if π–θ is close to ±π (so if θ ≈ 0), that means particle 1 is basically bouncing back also, which we said is unlikely. On the other hand, if detector 2 is positioned at an angle π–θ ≈ 0, then we have the highest probability of particle a going right to it. In short, the red curve makes sense too, I would think. [But do think about yourself: you’re the ultimate judge!]

Example - graph

The harder question, of course, concerns the choice of some wave function f(θ) to match those P curves above. Remember that these probability densities P are real numbers and any real number is the absolute square (aka the squared norm) of an infinite number of complex numbers! So we’ve got l’embarras du choix, as they say in French. So… What do to? Well… Let’s keep things simple and stupid and choose a real-valued wave function f(θ), such as the blue function below. Huh? You’ll wonder if that’s legitimate. Frankly, I am not 100% sure, but why not? The blue f(θ) function will give you the blue P(θ) above, so why not go along with it? It’s based on a cosine function but it’s only half of a full cycle. Why? Not sure. I am just trying to match some sinusoidal function with the probability density function here, so… Well… Let’s take the next step.

Example 2

The red graph above is the associated f(π–θ) function. Could we choose another one? No. There’s no freedom of choice here, I am afraid: if we choose a functional form for f(θ), then our f(π–θ) function is fixed too. So it is what it is: negative between –π and 0, and positive between 0 and +π and 0. Now that is definitely not good, because f(π–θ) for θ = –π is not equal to f(π–θ) for θ = +π: they’re opposite values. That’s nonsensical, isn’t it? Both the f(θ) and the f(π–θ) should be something cyclical… But, again, let’s go along with it as for now: note that the green horizontal line is the sum of the squared (absolute) values of f(θ) and f(π–θ), and note that it’s some constant.

Now, that’s a funny result, because I assumed both particles were more likely to go in some straight line, rather than recoil with some sharp angle θ. It again indicates I must be doing something wrong here. However, the important thing for me here is to compare with the Bose-Einstein and Fermi-Dirac statistics. What’s the total probability there if we take that blue f(θ) function? Well… That’s what’s shown below. The horizontal blue line is the same as the green line in the graph above: a constant probability for some particle (a or b) ending up in some detector (1 or 2). Note that the surface, when added, of the two rectangles above the x-axis (i.e. the θ-axis) should add up to 1. The red graph gives the probability when the experiment is carried out for (identical) bosons (or Bose particles as I like to call them). It’s weird: it makes sense from a mathematical point of view (the surface under the curve adds up to the same surface under the blue line, so it adds up to 1) but, from a physics point of view, what does this mean? A maximum at θ = π/2 and a minimum at θ = –π/2? Likewise, how to interpret the result for fermions?


Is this OK? Well… To some extent, I guess. It surely matches the theoretical results I mentioned above: we have twice the probability for bosons for θ = 90° (red curve), and a probability equal to zero for the same angle when we’re talking fermions (green curve). Still, this numerical example triggers more questions than it answers. Indeed, my starting hypothesis was very symmetrical: both particle a and b are likely to go in a straight line, rather than being deflected in some sharp(er) angle. Now, while that hypothesis gave a somewhat unusual but still understandable probability density function in the classical world (for non-identical particles, we got a constant for P(θ) + P(π–θ)), we get this weird asymmetry in the quantum-mechanical world: we’re much more likely to catch boson in a detector ‘north’ of the line of firing than ‘south’ of it, and vice versa for fermions.

That’s weird, to say the least. So let’s go back to the drawing board and take another function for f(θ) and, hence, for f(π–θ). This time, the two graphs below assume that (i) f(θ) and f(π–θ) have a real as well as an imaginary part and (ii) that they go through a full cycle, instead of a half-cycle only. This is done by equating the real part of the two functions with cos(θ) and cos(π–θ) respectively, and their imaginary part with sin(θ) and sin(π–θ) respectively. [Note that we conveniently forget about the normalization condition here.]


What do we see? Well… The imaginary part of f(θ) and f(π–θ) is the same, because sin(π–θ) = sin(θ). We also see that the real part of f(θ) and f(π–θ) are the same except for a phase difference equal to π: cos(π–θ) = cos[–(θ–π)] = cos(θ–π). More importantly, we see that the absolute square of both f(θ) and f(π–θ) yields the same constant, and so their sum P = |f(θ)|2 +|f(π–θ)|= 2|f(θ)|2 = 2|f(π–θ)|= 2P(θ) = 2P(π–θ). So that’s another constant. That’s actually OK because, this time, I did not favor one angle over the other (so I did not assume both particles were more likely to go in some straight line rather than recoil).

Now, how does this compare to Bose-Einstein and Fermi-Dirac statistics? That’s shown below. For Bose-Einstein (left-hand side), the sum of the real parts of f(θ) and f(π–θ) yields zero (blue line), while the sum of their imaginary parts (i.e. the red graph) yields a sine-like function but it has double the amplitude of sin(θ). That’s logical: sin(θ) + sin(π–θ) = 2sin(θ). The green curve is the more interesting one, because that’s the total probability we’re looking for. It has two maxima now, at +π/2 and at –π/2. That’s good, as it does away with that ‘weird asymmetry’ we got when we used a ‘half-cycle’ f(θ) function.

B-E and F-D

Likewise, the Fermi-Dirac probability density function looks good as well (right-hand side). We have the imaginary parts of f(θ) and f(π–θ) that ‘add’ to zero: sin(θ) – sin(π–θ) = 0 (I put ‘add’ between brackets because, with Fermi-Dirac, we’re subtracting of course), while the real parts ‘add’ up to a double cosine function: cos(θ) – cos(π–θ) = cos(θ) – [–cos(θ)] = 2cos(θ). We now get a minimum at +π/2 and at –π/2, which is also in line with the general result we’d expect. The (final) graph below summarizes our findings. It gives the three ‘types’ of probabilities, i.e. the probability of finding some particle in some detector as a function of the angle –π < θ < +π using:

  1. Maxwell-Boltzmann statistics: that’s the green constant (non-identical particles, and probability does not vary with the angle θ).
  2. Bose-Einstein: that’s the blue graph below. It has two maxima, at +π/2 and at –π/2, and two minima, at 0 and at ±π (+π and –π are the same angle obviously), with the maxima equal to twice the value we get under Maxwell-Boltzmann statistics.
  3. Finally, the red graph gives the Fermi-Dirac probabilities. Also two maxima and minima, but at different places: the maxima are at θ = 0 and  θ = ±π, while the minima are at at +π/2 and at –π/2.


Funny, isn’t it? These probability density functions are all well-behaved, in the sense that they add up to the same total (which should be 1 when applying the normalization condition). Indeed, the surfaces under the green, blue and red lines are obviously the same. But so we get these weird fluctuations for Bose-Einstein and Fermi-Dirac statistics, favoring two specific angles over all others, while there’s no such favoritism when the experiment involves non-identical particles. This, of course, just follows from our assumption concerning f(θ). What if we double the frequency of f(θ), i.e. from one cycle to two cycles between –π and +π? Well… Just try it: take f(θ) = cos(2·θ) + isin(2·θ) and do the calculations. You should get the following probability graphs: we have the same green line for non-identical particles, but interference with four maxima (and four minima) for the Bose-Einstein and Fermi-Dirac probabilities.

summary 2

Again… Funny, isn’t it? So… What to make of this? Frankly, I don’t know. But one last graph makes for an interesting observation: if the angular frequency of f(θ) takes on larger and larger values, the Bose-Einstein and Fermi-Dirac probability density functions also start oscillating wildly. For example, the graphs below are based on a f(θ) function equal to f(θ) = cos(25·θ) + isin(25·θ). The explosion of color hurts the eye, doesn’t it? :-) But, apart from that, do you now see why physicists say that, at high frequencies, the interference pattern gets smeared out? Indeed, if we move the detector just a little bit (i.e. we change the angle θ just a little bit) in the example below, we hit a maximum instead of a minimum, and vice versa. In short, the granularity may be such that we can only measure that green line, in which case we’d think we’re dealing with Maxwell-Boltzmann statistics, while the underlying reality may be different.

summary 4

That explains another quote in Feynman’s famous introduction to quantum mechanics (Lectures, Vol. III, Chapter 1): “If the motion of all matter—as well as electrons—must be described in terms of waves, what about the bullets in our first experiment? Why didn’t we see an interference pattern there? It turns out that for the bullets the wavelengths were so tiny that the interference patterns became very fine. So fine, in fact, that with any detector of finite size one could not distinguish the separate maxima and minima. What we saw was only a kind of average, which is the classical curve. In the Figure below, we have tried to indicate schematically what happens with large-scale objects. Part (a) of the figure shows the probability distribution one might predict for bullets, using quantum mechanics. The rapid wiggles are supposed to represent the interference pattern one gets for waves of very short wavelength. Any physical detector, however, straddles several wiggles of the probability curve, so that the measurements show the smooth curve drawn in part (b) of the figure.”

Interference with bullets

But that should really conclude this post. It has become way too long already. One final remark, though: the ‘smearing out’ effect also explains why those three equations for 〈Ni〉 sometimes do amount to more or less the same thing: the Bose-Einstein and Fermi-Dirac formulas may approximate the Maxwell-Boltzmann equation. In that case, the ±1 term in the denominator does not make much of a difference. As we said a couple of times already, it all depends on scale. :-)

Concluding remarks

1. The best I can do in terms of interpreting the above, is to tell myself that we cannot fully ‘fix’ the functional form of the wave function for the second or ‘other’ way the event can happen if we’re ‘fixing’ the functional form for the first of the two possibilities. We have to allow for a phase shift eiδ indeed, which incorporates all kinds of considerations of uncertainty in regard to both time and position and, hence, in regard to energy and momentum also (using both the ΔEΔt = ħ/2 and ΔxΔp = ħ/2 expressions)–I assume (but that’s just a gut instinct). And then the symmetry of the situation then implies eiδ can only take on one of two possible values: –1 or +1 which, in turn, implies that δ is equal to 0 or π.

2. For those who’d think I am basically doing nothing but re-write a chapter out of Feynman’s Lectures, I’d refute that. One point to note is that Feynman doesn’t seem to accept that we should introduce a phase factor in the analysis for non-identical particles as well. To be specific: just switching the detectors (instead of the particles) also implies that one should allow for the mathematical possibility of the phase of that f function being shifted by some random factor δ. The only difference with the quantum-mechanical analysis (i.e. the analysis for identical particles) is that the phase factor doesn’t make a difference as to the final result, because we’re not adding amplitudes but their absolute squares and, hence, a phase shift doesn’t matter.

3. I think all of the reasoning above makes not only for a very fine but also a very beautiful theoretical argument, even I feel like I don’t fully ‘understand’ it, in an intuitive way that is. I hope this post has made you think. Isn’t it wonderful to see that the theoretical or mathematical possibilities of the model actually correspond to realities, both in the classical as well as in the quantum-mechanical world? In fact, I can imagine that most physicists and mathematicians would shrug this whole reflection off like… Well… Like: “Of course! It’s obvious, isn’t it?” I don’t think it’s obvious. I think it’s deep. I would even qualify it as mysterious, and surely as beautiful. :-)

Amplitudes and statistics